<div align="center" id="top">
<img src="https://socialify.git.ci/julep-ai/julep/image?description=1&descriptionEditable=Serverless%20AI%20Workflows%20for%20Data%20%26%20ML%20Teams&font=Source%20Code%20Pro&logo=https%3A%2F%2Fraw.githubusercontent.com%2Fjulep-ai%2Fjulep%2Fdev%2F.github%2Fjulep-logo.svg&owner=1&forks=1&pattern=Solid&stargazers=1&theme=Auto" alt="julep" />

<br>
  <p>
    <a href="https://dashboard.julep.ai">
      <img src="https://img.shields.io/badge/Get_API_Key-FF5733?style=logo=" alt="Get API Key" height="28">
    </a>
    <span>&nbsp;</span>
    <a href="https://docs.julep.ai">
      <img src="https://img.shields.io/badge/Documentation-4B32C3?style=logo=gitbook&logoColor=white" alt="Documentation" height="28">
    </a>
  </p>
  <p>
   <a href="https://www.npmjs.com/package/@julep/sdk"><img src="https://img.shields.io/npm/v/%40julep%2Fsdk?style=social&amp;logo=npm&amp;link=https%3A%2F%2Fwww.npmjs.com%2Fpackage%2F%40julep%2Fsdk" alt="NPM Version" height="28"></a>
    <span>&nbsp;</span>
    <a href="https://pypi.org/project/julep"><img src="https://img.shields.io/pypi/v/julep?style=social&amp;logo=python&amp;label=PyPI&amp;link=https%3A%2F%2Fpypi.org%2Fproject%2Fjulep" alt="PyPI - Version" height="28"></a>
    <span>&nbsp;</span>
    <a href="https://hub.docker.com/u/julepai"><img src="https://img.shields.io/docker/v/julepai/agents-api?sort=semver&amp;style=social&amp;logo=docker&amp;link=https%3A%2F%2Fhub.docker.com%2Fu%2Fjulepai" alt="Docker Image Version" height="28"></a>
    <span>&nbsp;</span>
    <a href="https://choosealicense.com/licenses/apache/"><img src="https://img.shields.io/github/license/julep-ai/julep" alt="GitHub License" height="28"></a>
  </p>
  
  <h3>
    <a href="https://discord.com/invite/JTSBGRZrzj" rel="dofollow">Discord</a>
    ·
    <a href="https://x.com/julep_ai" rel="dofollow">𝕏</a>
    ·
    <a href="https://www.linkedin.com/company/julep-ai" rel="dofollow">LinkedIn</a>
  </h3>
</div>

# Task Definition: Chatbot for a Website

### Overview

This task implements an automated system to index and process a website. The system crawls the website, extracts relevant information, and creates a searchable knowledge base that can be queried programmatically. Finally, it creates a chatbot that can answer questions about the website using the RAG knowledge base.

### Task Tools:

- **get_page**: Web crawler component for systematically traversing the QuickBlox website using JIna's API
- **create_agent_doc**: Document processor for converting web content into indexed, searchable documents

### Task Input:

Required parameter:
- **url**: Entry point URL for the crawler (e.g., "https://en.wikipedia.org/wiki/Artificial_intelligence")
- **reducing_strength**: Number of sentences to club together to reduce the size of the chunks list

### Task Output:

- Indexed knowledge base containing processed QuickBlox documentation
- Query interface for programmatic access to the documentation

### Task Flow

1. Create an agent with the get_page tool
2. Get the get_page tool's API key 
3. Crawl website using get_page tool:
    - Specify URL and page limit
    - Enable smart mode, proxy, readability
    - Filter out images and SVGs
    - Use sentence-based chunking (15 sentences per chunk)
4. For each crawled page:
    - Extract content
    - Break into chunks
    - Generate succinct context for each chunk
    - Combine chunk with context
5. Create agent documents:
    - Store processed chunks as documents
    - Add metadata like source
    - Index for search/retrieval
6. Finally chat with the agent in the session and retrieve the documents that are relevant to the user's query

```plaintext
+----------------+     +----------------+     +----------------+     +---------------+
|  Jina AI       |     |  Extract       |     |  Process       |     |  Create       |
|  Crawler       | --> |  Content       | --> |  Content       | --> |  Agent Docs   |
|  (URL Entry)   |     |  (Web Pages)   |     |  (Chunks)      |     |  (Index)      |
+----------------+     +----------------+     +----------------+     +---------------+
                                                                             |
+----------------+     +----------------+     +----------------+     +---------------+
|  Query         |     |  Search        |     |  Retrieve      |     |  Chat with    |
|  Interface     | <-- |  Index         | <-- |  Documents     | <-- |  Agent        |
|  (User Input)  |     |  (Knowledge)   |     |  (Context)     |     |  (Session)    |
+----------------+     +----------------+     +----------------+     +---------------+
```

## Implementation

To recreate the notebook and see the code implementation for this task, you can access the Google Colab notebook using the link below:

<a target="_blank" href="https://colab.research.google.com/github/julep-ai/julep/blob/dev/cookbooks/08-rag-chatbot.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

### Additional Information

For more details about the task or if you have any questions, please don't hesitate to contact the author:

**Author:** Julep AI  
**Contact:** [hey@julep.ai](mailto:hey@julep.ai) or  <a href="https://discord.com/invite/JTSBGRZrzj" rel="dofollow">Discord</a>

Installing the Julep Client

In [None]:
!pip install --upgrade julep --quiet

#### NOTE:

- UUIDs are generated for both the agent and task to uniquely identify them within the system.
- Once created, these UUIDs should remain unchanged for simplicity.
- Altering a UUID will result in the system treating it as a new agent or task.
- If a UUID is changed, the original agent or task will continue to exist in the system alongside the new one.

In [1]:
# Global UUID is generated for agent and task
import uuid

# Set your API keys
AGENT_ID = uuid.uuid4()
TASK_ID = uuid.uuid4()

## Creating Julep Client with the API Key

Get your API key from [here](https://dashboard.julep.ai/)

In [2]:
from julep import Client

JULEP_API_KEY = "julep_api_key_here"

# Create a Julep client
client = Client(api_key=JULEP_API_KEY, environment="production")

### Creating an "agent"

Agent is the object to which LLM settings, like model, temperature along with tools are scoped to.

To learn more about the agent, please refer to the Agent section in [Julep Concepts](https://docs.julep.ai/docs/concepts/agents).

In [3]:
# Create agent
agent = client.agents.create_or_update(
    agent_id=AGENT_ID,
    name="Website Crawler",
    about="An AI assistant that can crawl any website and create a knowledge base.",
    model="gpt-4o-mini",
)

In [4]:
from pprint import pprint
num_docs = (client.agents.docs.list(agent_id=agent.id, limit=100).items)
print(f"Number of documents in the agent's document store: {len(num_docs)}")
pprint(num_docs)

Number of documents in the agent's document store: 0
[]


### Defining a Task

Tasks in Julep are Github-Actions-style workflows that define long-running, multi-step actions.

You can use them to conduct complex actions by defining them step-by-step.

To learn more about tasks, please refer to the `Tasks` section in [Julep Concepts](https://docs.julep.ai/docs/concepts/tasks).

In [5]:
import yaml

JINA_API_KEY = "your_jina_api_key_here"

task_def = yaml.safe_load(f'''
# yaml-language-server: $schema=https://raw.githubusercontent.com/julep-ai/julep/refs/heads/dev/schemas/create_task_request.json                 
name: Crawl a website and create a agent document
description: This task crawls a website and creates a agent document
                          
########################################################
####################### INPUT SCHEMA ###################
########################################################
input_schema:
  type: object
  properties:
    url:
      type: string
      description: The URL of the website to crawl
    reducing_strength:
      type: integer
      description: The no of sentence to club together
                                        
########################################################
####################### TOOLS ##########################
########################################################
                          
# Define the tools that the agent will use in this workflow
tools:
- name : create_agent_doc
  description: Create an agent doc
  type: system
  system:
    resource: agent
    subresource: doc
    operation: create

- name: get_page
  type: api_call
  api_call:
    method: GET
    url: https://r.jina.ai/
    headers:
      accept: application/json
      x-return-format: markdown
      x-with-images-summary: "true"
      x-with-links-summary: "true"
      x-retain-images: "none"
      x-no-cache: "true"
      Authorization: "Bearer {JINA_API_KEY}"

########################################################
####################### SUB WORKFLOW ###################
########################################################

index_page:

# Step 0: Evaluate the content
- evaluate:
    document: $ _.document
    chunks: |
      $ [" ".join(_.content[i:i + max(_.reducing_strength, len(_.content) // 9)]) 
        for i in range(0, len(_.content), max(_.reducing_strength, len(_.content) // 9))]
  label: docs

# Step 1: Create a new document and add it to the agent docs store
- over: $ [(steps[0].input.document, chunk.strip()) for chunk in _.chunks]
  parallelism: 3
  map:
    prompt: 
    - role: user
      content: >-
        $ f"""
        <document>
        {{_[0]}}  
        </document>

        Here is the chunk we want to situate within the whole document
        <chunk>
        {{_[1]}}
        </chunk>

        Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. 
        Answer only with the succinct context and nothing else.
        """
    unwrap: true
    settings:
      max_tokens: 16000

# Step 2: Evaluate the final chunks
- evaluate:
    final_chunks: |
      $ [
        NEWLINE.join([succint, chunk.strip()]) for chunk, succint in zip(steps['docs'].output.chunks, _)
      ]

# Step 3: Create a new document and add it to the agent docs store
- over: $ _.final_chunks
  parallelism: 3
  map:
    tool: create_agent_doc
    arguments:
      agent_id: $ str(agent.id)
      data:
        metadata:
          source: jina_crawler
        title: Website Document
        content: $ _

########################################################
####################### MAIN WORKFLOW ##################
########################################################

main:
# Step 0: Get the content of the product page
- tool: get_page
  arguments:
    url: $ "https://r.jina.ai/" + steps[0].input.url
  
# Step 1: Chunk the content
- evaluate:
    result: $ chunk_doc(_.json.data.content.strip())

# Step 2: Evaluate step to document chunks
- workflow: index_page
  arguments:
    content: $ _.result
    document: $ steps[0].output.json.data.content.strip()
    reducing_strength: $ steps[0].input.reducing_strength
''')

# Create the task
task = client.tasks.create_or_update(
    agent_id=AGENT_ID,
    task_id=TASK_ID,
    **task_def
)

<span style="color:olive;">Notes:</span>
- The `unwrap: True` in the prompt step is used to unwrap the output of the prompt step (to unwrap the `choices[0].message.content` from the output of the model).
- The `$` sign is used to differentiate between a Python expression and a string.
- The `_` refers to the output of the previous step.
- The `steps[index].input` refers to the input of the step at `index`.
- The `steps[index].output` refers to the output of the step at `index`.

### Creating an Execution

An execution is a single run of a task. It is a way to run a task with a specific set of inputs.

To learn more about executions, please refer to the `Executions` section in [Julep Concepts](https://docs.julep.ai/docs/concepts/execution).

In [6]:
execution = client.executions.create(
    task_id=task.id,
    input={
        "url": "https://en.wikipedia.org/wiki/Artificial_intelligence",
        "reducing_strength": 5 # number of sentences to club together to reduce the size of the chunks list
    }
)

## Checking execution details and output

There are multiple ways to get the execution details and the output:

1. **Get Execution Details**: This method retrieves the details of the execution, including the output of the last transition that took place.

2. **List Transitions**: This method lists all the task steps that have been executed up to this point in time, so the output of a successful execution will be the output of the last transition (first in the transition list as it is in reverse chronological order), which should have a type of `finish`.


<span style="color:olive;">Note: You need to wait for a few seconds for the execution to complete before you can get the final output, so feel free to run the following cells multiple times until you get the final output.</span>

In [8]:
status = client.executions.get(execution_id=execution.id).status

print("Execution status: ", status)

Execution status:  starting


In [9]:
import json
execution_transitions = client.executions.transitions.list(
    execution_id=execution.id).items

for index, transition in enumerate((execution_transitions)):
    print("Index: ", len(execution_transitions) - index, " Type: ", transition.type)
    print("output: ", json.dumps(transition.output, indent=2))
    print("-" * 100)

Index:  50  Type:  finish
output:  [
  {
    "id": "067ae624-9790-7969-8000-6416fdb20d41",
    "title": "Website Document",
    "content": [
      "The chunk is an introductory excerpt that outlines the definition of artificial intelligence (AI), its major applications, goals, and historical development within the larger context of AI research. It establishes foundational concepts regarding AI's capabilities, challenges in simulating human-like intelligence, and the evolution of AI technologies, setting the stage for subsequent discussions on detailed goals, techniques, applications, ethics, and future prospects of AI in the document.\nJump to content\nMain menu\nSearch\nAppearance\nDonate\nCreate account\nLog in\nPersonal tools\n\t\tPhotograph your local culture, help Wikipedia and win!\n Toggle the table of contents\nArtificial intelligence\n163 languages\nArticle\nTalk\nRead\nView source\nView history\nTools\nFrom Wikipedia, the free encyclopedia\n\"AI\" redirects here. For other us

In [12]:
crawled_pages = execution_transitions[-2].output

print("Crawled pages: ")
crawled_pages

Crawled pages: 


{'json': {'code': 200,
  'data': {'url': 'https://en.wikipedia.org/wiki/Artificial_intelligence',
   'links': {'': 'https://en.wikipedia.org/wiki/Artificial_intelligence?action=edit',
    '1': 'https://en.wikipedia.org/wiki/GPT-1',
    '2': 'https://en.wikipedia.org/wiki/GPT-2',
    '3': 'https://en.wikipedia.org/wiki/GPT-3',
    '4': 'https://en.wikipedia.org/wiki/GPT-4',
    'J': 'https://en.wikipedia.org/wiki/GPT-J',
    '^': 'https://en.wikipedia.org/wiki/Artificial_intelligence#cite_ref-FOOTNOTEGalvan1997_451-0',
    'b': 'https://en.wikipedia.org/wiki/Artificial_intelligence#cite_ref-Kateman-2023_436-1',
    'c': 'https://en.wikipedia.org/wiki/Artificial_intelligence#cite_ref-Thomson-2022_435-2',
    'd': 'https://en.wikipedia.org/wiki/Artificial_intelligence#cite_ref-FOOTNOTEUNESCO2021_340-3',
    'e': 'https://en.wikipedia.org/wiki/Special:EditPage/Template:Glossaries_of_science_and_engineering',
    'f': 'https://en.wikipedia.org/wiki/Artificial_intelligence#cite_ref-FOOTNOTER

## Lisitng the Document Store for the Agent

The document store is where the agent stores the documents it has created. Each document has a `title` , `content`, `id`, `metadata`, `created_at` and the `vector embedding` associated with it. This will be used for the retrieval of the documents when the agent is queried.

In [10]:
docs = client.agents.docs.list(agent_id=agent.id, limit=100).items
num_docs = len(docs)
print("Number of documents in the document store: ", num_docs)

Number of documents in the document store:  10


In [72]:
# # # UNCOMMENT THIS TO DELETE ALL THE AGENT'S DOCUMENTS

# for doc in client.agents.docs.list(agent_id=agent.id, limit=1000):
#     client.agents.docs.delete(agent_id=agent.id, doc_id=doc.id)

## Creating a Session

A session is used to interact with the agent. It is used to send messages to the agent and receive responses.
Situation is the initial message that is sent to the agent to set the context for the conversation. Out here you can add more information about the agent and the task it is performing to help the agent answer better. Additionally, you can also define the `search_threshold` and `search_query_chars` which are used to control the retrieval of the documents from the document store which will be used for the retrieval of the documents when the agent is queried.
More information about the session can be found [here](https://github.com/julep-ai/julep/blob/dev/docs/julep-concepts.md#session).

In [11]:
# Custom system template for this particular session to help the agent answer better
system_template = """
You are an AI agent designed to assist users with their queries about a certain website.
Your goal is to provide clear and detailed responses.

**Guidelines**:
1. Assume the user is unfamiliar with the company and products.
2. Thoroughly read and comprehend the user's question.
3. Use the provided context documents to find relevant information.
4. Craft a detailed response based on the context and your understanding of the company and products.
5. Include links to specific website pages for further information when applicable.

**Response format**:
- Use simple, clear language.
- Include relevant website links.

**Important**:
- For questions related to the business, only use the information that are explicitly given in the documents above.
- If the user asks about the business, and it's not given in the documents above, respond with an answer that states that you don't know.
- Use the most recent and relevant data from context documents.
- Be proactive in helping users find solutions.
- Ask for clarification if the query is unclear.
- Inform users if their query is unrelated to the given website.
- Avoid using the following in your response: Based on the provided documents, based on the provided information, based on the documentation... etc.

{%- if docs -%}
**Relevant documents**:{{NEWLINE}}
  {%- for doc in docs -%}
    {{doc.title}}{{NEWLINE}}
    {%- if doc.content is string -%}
      {{doc.content}}{{NEWLINE}}
    {%- else -%}
      {%- for snippet in doc.content -%}
        {{snippet}}{{NEWLINE}}
      {%- endfor -%}
    {%- endif -%}
    {{"---"}}
  {%- endfor -%}

{%- else -%}
There are no documents available for this query.
{%- endif -%}

"""

print(f"Agent created with ID: {agent.id}")

# Create a session for interaction
session = client.sessions.create(
    system_template=system_template,
    agent=agent.id,
    recall_options={
        "mode": "hybrid",
        "num_search_messages": 1,
        "max_query_length": 800,
        "confidence": -0.9,
        "alpha": 0.5,
        "limit": 10,
        "mmr_strength": 0.5,
    },
)

print(f"Session created with ID: {session.id}")

Agent created with ID: 7a5431c3-aec9-4fc7-9f53-fe336f3f4bd4
Session created with ID: 067ae632-26d2-7d7c-8000-cb9786309817


## Chatting with the Agent

The chat method is used to send messages to the agent and receive responses. The messages are sent as a list of dictionaries with the `role` and `content` keys.

In [13]:
%%time
user_question = "tell me about artificial intelligence"

response = client.sessions.chat(
    session_id=session.id,
    messages=[
        {
            "role": "user",
            "content": user_question,
        }
    ],
    recall=True,
)

print(response.choices[0].message.content)

Artificial intelligence (AI) refers to the ability of machines, particularly computer systems, to perform tasks that typically require human intelligence. These tasks can include learning, reasoning, problem-solving, perception, and language understanding.

### Key Aspects of Artificial Intelligence:

1. **Definition**: AI is defined as intelligence exhibited by machines and involves the study and development of algorithms that allow computers to perceive their environment and learn from it. The goal is for these machines to act in ways that maximize their chances of achieving specific objectives.

2. **Applications**:
   - **Search Engines**: AI powers search engines like Google to provide relevant results quickly.
   - **Recommendation Systems**: Services like Netflix or Amazon use AI to suggest content or products based on user behavior.
   - **Virtual Assistants**: AI-driven assistants such as Siri and Alexa help users with various tasks through voice commands.
   - **Autonomous Ve

### Check the matched documents

In [14]:
print("Matched docs:\n\n")

for index, doc in enumerate(response.docs):
    print(f"Doc {index + 1}:")
    print(f"Title: {doc.title}")
    print(f"Snippet content:\n{doc.snippet.content}")
    print("-" * 100)

Matched docs:


Doc 1:
Title: Website Document
Snippet content:
The chunk is an introductory excerpt that outlines the definition of artificial intelligence (AI), its major applications, goals, and historical development within the larger context of AI research. It establishes foundational concepts regarding AI's capabilities, challenges in simulating human-like intelligence, and the evolution of AI technologies, setting the stage for subsequent discussions on detailed goals, techniques, applications, ethics, and future prospects of AI in the document.
Jump to content
Main menu
Search
Appearance
Donate
Create account
Log in
Personal tools
		Photograph your local culture, help Wikipedia and win!
 Toggle the table of contents
Artificial intelligence
163 languages
Article
Talk
Read
View source
View history
Tools
From Wikipedia, the free encyclopedia
"AI" redirects here. For other uses, see AI (disambiguation) and Artificial intelligence (disambiguation).
 Part of a series on
Artificial in