<div align="center" id="top">
 <img src="https://socialify.git.ci/julep-ai/julep/image?description=1&descriptionEditable=Rapidly%20build%20AI%20workflows%20and%20agents&font=Source%20Code%20Pro&logo=https%3A%2F%2Fraw.githubusercontent.com%2Fjulep-ai%2Fjulep%2Fdev%2F.github%2Fjulep-logo.svg&owner=1&forks=1&pattern=Solid&stargazers=1&theme=Auto" alt="julep" width="640" height="320" />
</div>

<p align="center">
  <br />
  <a href="https://docs.julep.ai" rel="dofollow">Explore Docs (wip)</a>
  ·
  <a href="https://discord.com/invite/JTSBGRZrzj" rel="dofollow">Discord</a>
  ·
  <a href="https://x.com/julep_ai" rel="dofollow">𝕏</a>
  ·
  <a href="https://www.linkedin.com/company/julep-ai" rel="dofollow">LinkedIn</a>
</p>

<p align="center">
    <a href="https://www.npmjs.com/package/@julep/sdk"><img src="https://img.shields.io/npm/v/%40julep%2Fsdk?style=social&amp;logo=npm&amp;link=https%3A%2F%2Fwww.npmjs.com%2Fpackage%2F%40julep%2Fsdk" alt="NPM Version"></a>
    <span>&nbsp;</span>
    <a href="https://pypi.org/project/julep"><img src="https://img.shields.io/pypi/v/julep?style=social&amp;logo=python&amp;label=PyPI&amp;link=https%3A%2F%2Fpypi.org%2Fproject%2Fjulep" alt="PyPI - Version"></a>
    <span>&nbsp;</span>
    <a href="https://hub.docker.com/u/julepai"><img src="https://img.shields.io/docker/v/julepai/agents-api?sort=semver&amp;style=social&amp;logo=docker&amp;link=https%3A%2F%2Fhub.docker.com%2Fu%2Fjulepai" alt="Docker Image Version"></a>
    <span>&nbsp;</span>
    <a href="https://choosealicense.com/licenses/apache/"><img src="https://img.shields.io/github/license/julep-ai/julep" alt="GitHub License"></a>
</p>

# Task Definition: Customer Support Chatbot

### Overview

This task implements an automated system to index and process QuickBlox's website. The system crawls the QuickBlox website, extracts relevant information, and creates a searchable knowledge base that can be queried programmatically. Finally, it creates a chatbot that can answer questions about the website using the RAG knowledge base.

### Task Tools:

- **spider_crawler**: Web crawler component for systematically traversing the QuickBlox website
- **create_agent_doc**: Document processor for converting web content into indexed, searchable documents

### Task Input:

Required parameter:
- **url**: Entry point URL for the crawler (e.g., "https://quickblox.com/")

### Task Output:

- Indexed knowledge base containing processed QuickBlox documentation
- Query interface for programmatic access to the documentation

### Task Flow

1. Create an agent with the spider_crawler tool
2. Get the Spider tool's API key 
3. Crawl website using spider_crawler tool:
    - Specify URL and page limit
    - Enable smart mode, proxy, readability
    - Filter out images and SVGs
    - Use sentence-based chunking (15 sentences per chunk)
4. For each crawled page:
    - Extract content
    - Break into chunks
    - Generate succinct context for each chunk
    - Combine chunk with context
5. Create agent documents:
    - Store processed chunks as documents
    - Add metadata like source
    - Index for search/retrieval
6. Finally chat with the agent in the session and retrieve the documents that are relevant to the user's query

```plaintext
+----------------+     +----------------+     +----------------+     +----------------+
|  Spider        |     |  Extract       |     |  Process       |     |  Create       |
|  Crawler      | --> |  Content       | --> |  Content       | --> |  Agent Docs   |
|  (URL Entry)   |     |  (Web Pages)   |     |  (Chunks)      |     |  (Index)      |
+----------------+     +----------------+     +----------------+     +----------------+
                                                                             |
+----------------+     +----------------+     +----------------+     +----------------+
|  Query         |     |  Search        |     |  Retrieve      |     |  Chat with    |
|  Interface    | <-- |  Index         | <-- |  Documents     | <-- |  Agent        |
|  (User Input)  |     |  (Knowledge)   |     |  (Context)     |     |  (Session)    |
+----------------+     +----------------+     +----------------+     +----------------+
```

## Implementation

To recreate the notebook and see the code implementation for this task, you can access the Google Colab notebook using the link below:

<a target="_blank" href="https://colab.research.google.com/github/julep-ai/julep/blob/dev/cookbooks/08_customer_support_chatbot.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

### Additional Information

For more details about the task or if you have any questions, please don't hesitate to contact the author:

**Author:** Julep AI  
**Contact:** [hey@julep.ai](mailto:hey@julep.ai) or  <a href="https://discord.com/invite/JTSBGRZrzj" rel="dofollow">Discord</a>

In [3]:
# Global UUID is generated for agent and task
from dotenv import load_dotenv
from julep import Client
import os
import uuid

load_dotenv()

# NOTE: these UUIDs are used in order not to use the `create_or_update` methods instead of
# the `create` methods for the sake of not creating new resources every time a cell is run.
AGENT_UUID ="123e4567-e89b-12d3-a456-426614174000"
TASK_UUID = "123e4567-e89b-12d3-a456-426614174001"
SESSION_UUID = "123e4567-e89b-12d3-a456-426614174002"
JULEP_API_KEY = os.getenv("JULEP_API_KEY")
SPIDER_API_KEY = os.getenv("SPIDER_API_KEY")


# Create a Julep client
client = Client(api_key=JULEP_API_KEY, environment="dev")

## Creating an "agent"

Agent is the object to which LLM settings, like model, temperature along with tools are scoped to.

To learn more about the agent, please refer to the [documentation](https://github.com/julep-ai/julep/blob/dev/docs/julep-concepts.md#agent).

In [4]:
# Create agent
agent = client.agents.create_or_update(
    agent_id=AGENT_UUID,
    name="Quickblox Navigator",
    about="An AI assistant that can navigate the Quickblox website and assist you in finding the information you need to make the most of our services.",
    model="gpt-4o",
)

In [5]:
num_docs = len(client.agents.docs.list(agent_id=AGENT_UUID, limit=1000).items)
print(f"Number of documents in the agent's document store: {num_docs}")

Number of documents in the agent's document store: 0


## Defining a Task

Tasks in Julep are Github-Actions-style workflows that define long-running, multi-step actions.

You can use them to conduct complex actions by defining them step-by-step.

To learn more about tasks, please refer to the `Tasks` section in [Julep Concepts](https://github.com/julep-ai/julep/blob/dev/docs/julep-concepts.md#tasks).

In [6]:
import yaml

task_def = yaml.safe_load(f"""
name: Crawl a website and create a agent document

# Define the tools that the agent will use in this workflow
tools:
- name: spider_crawler
  type: integration
  integration:
    provider: spider
    method: crawl
    setup:
      spider_api_key: "{SPIDER_API_KEY}"

- name : create_agent_doc
  description: Create an agent doc
  type: system
  system:
    resource: agent
    subresource: doc
    operation: create

index_page:

- evaluate:
    documents: _['content']

- over: "[(_0.content, chunk) for chunk in _['documents']]"
  parallelism: 3
  map:
    prompt: 
    - role: user
      content: >-
        <document>
        {{{{_[0]}}}}
        </document>

        Here is the chunk we want to situate within the whole document
        <chunk>
        {{{{_[1]}}}}
        </chunk>

        Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. 
        Answer only with the succinct context and nothing else. 
    
    unwrap: true
    settings:
      max_tokens: 16000

- evaluate:
    final_chunks: |
      [
        NEWLINE.join([chunk, succint]) for chunk, succint in zip(_1.documents, _)
      ]

# Create a new document and add it to the agent docs store
- over: _['final_chunks']
  parallelism: 3
  map:
    tool: create_agent_doc
    arguments:
      agent_id: "'{agent.id}'"
      data:
        metadata:
          source: "'spider_crawler'"

        title: "'Quickblox Website'"
        content: _

# Define the steps of the workflow
main:

# Define a tool call step that calls the spider_crawler tool with the url input
- tool: spider_crawler
  arguments:
    url: "_['url']" # You can also use 'inputs[0]['url']'
    params:
      request: "'smart_mode'"
      limit: _['pages_limit'] # <--- This is the number of pages to crawl (taken from the input of the task)
      return_format: "'markdown'"
      proxy_enabled: "True"
      filter_output_images: "True" # <--- This is to execlude images from the output
      filter_output_svg: "True" # <--- This is to execlude svg from the output
      # filter_output_main_only: "True"
      readability: "True" # <--- This is to make the output more readable
      sitemap: "True" # <--- This is to crawl the sitemap
      chunking_alg: # <--- Using spider's bysentence algorithm to chunk the output
        type: "'bysentence'"
        value: "15" # <--- This is the number of sentences per chunk

# Evaluate step to document chunks
- foreach:
    in: _['result']
    do:
      workflow: index_page
      arguments:
        content: _.content
""")


<span style="color:olive;">Notes:</span>
- The reason for using the quadruple curly braces `{{{{}}}}` for the jinja template is to avoid conflicts with the curly braces when using the `f` formatted strings in python. [More information here](https://stackoverflow.com/questions/64493332/jinja-templating-in-airflow-along-with-formatted-text)
- The `unwrap: True` in the prompt step is used to unwrap the output of the prompt step (to unwrap the `choices[0].message.content` from the output of the model).


## Creating a task

In [7]:
# Create the task
task = client.tasks.create_or_update(
    task_id=TASK_UUID,
    agent_id=AGENT_UUID,
    **task_def
)

## Creating an Execution (Starting from the Homepage)

An execution is a single run of a task. It is a way to run a task with a specific set of inputs.

In [8]:
execution_homepage = client.executions.create(
    task_id=TASK_UUID,
    input={
        "url": "https://quickblox.com/",
        "pages_limit": 5,
    }
)

## Checking execution details and output

There are multiple ways to get the execution details and the output:

1. **Get Execution Details**: This method retrieves the details of the execution, including the output of the last transition that took place.

2. **List Transitions**: This method lists all the task steps that have been executed up to this point in time, so the output of a successful execution will be the output of the last transition (first in the transition list as it is in reverse chronological order), which should have a type of `finish`.


<span style="color:olive;">Note: You need to wait for a few seconds for the execution to complete before you can get the final output, so feel free to run the following cells multiple times until you get the final output.</span>


In [10]:
status = client.executions.get(execution_id=execution_homepage.id).status

print("Execution status: ", status)

Execution status:  succeeded


In [11]:
import json
execution_transitions = client.executions.transitions.list(
    execution_id=execution_homepage.id, limit=2000).items

for index, transition in enumerate(execution_transitions):
    print("Index: ", index, "Type: ", transition.type)
    print("output: ", json.dumps(transition.output, indent=2))
    print("-" * 100)

Index:  0 Type:  finish
output:  [
  [
    {
      "created_at": "2024-12-16T20:09:56.144855Z",
      "id": "df682ff9-595c-403e-853e-a292a30d8c1b",
      "jobs": [
        "53a783b5-e5fd-4412-a63c-8d4d9327587e"
      ]
    },
    {
      "created_at": "2024-12-16T20:09:56.121164Z",
      "id": "caf19cfa-3969-4ba3-bc42-4190c8ef5a09",
      "jobs": [
        "83dcbbee-cf29-47ad-9991-412927946f7f"
      ]
    },
    {
      "created_at": "2024-12-16T20:09:56.192716Z",
      "id": "ecf13b56-070c-4deb-8ca2-333891d14d7f",
      "jobs": [
        "8ffda58e-b473-4b9f-b59b-aafddba12b30"
      ]
    },
    {
      "created_at": "2024-12-16T20:09:59.306138Z",
      "id": "4f86452c-8c52-4f5e-814d-4c79d43ce093",
      "jobs": [
        "4e9c6028-c35c-4529-b420-5898f0bee811"
      ]
    },
    {
      "created_at": "2024-12-16T20:09:59.288487Z",
      "id": "f535903f-f4f9-4195-86e6-e121d8d2df4f",
      "jobs": [
        "44b556d7-07ca-41f5-b0b6-492a473da0a7"
      ]
    }
  ],
  [
    {
      "creat

In [12]:
crawled_pages = [r['url'] for r in execution_transitions[-2].output['result']]

print("Crawled pages: ")
crawled_pages

Crawled pages: 


['https://quickblox.com/',
 'https://quickblox.com/privacy-policy/',
 'https://quickblox.com/hosting/hipaa-compliant-hosting/',
 'https://quickblox.com/hosting/on-premise/',
 'https://quickblox.com/products/q-consultation/']

## Running another Execution (starting from the Pricing Page)

In [9]:
execution_pricing = client.executions.create(
    task_id=TASK_UUID,
    input={
        "url": "https://quickblox.com/pricing/",
        "pages_limit": 5,
    }
)

In [13]:
status = client.executions.get(execution_id=execution_pricing.id).status

print("Execution status: ", status)

Execution status:  succeeded


In [14]:
import json
execution_transitions = client.executions.transitions.list(
    execution_id=execution_pricing.id, limit=2000).items

for index, transition in enumerate(execution_transitions):
    print("Index: ", index, "Type: ", transition.type)
    print("output: ", json.dumps(transition.output, indent=2))
    print("-" * 100)

Index:  0 Type:  finish
output:  [
  [
    {
      "created_at": "2024-12-16T20:10:22.334162Z",
      "id": "fc9c87d0-0361-4327-a677-7a93dd259286",
      "jobs": [
        "2ee72270-367e-4156-b769-f13494fd5b72"
      ]
    },
    {
      "created_at": "2024-12-16T20:10:22.224113Z",
      "id": "0c911e5c-88e6-40ca-bd71-b152dd157638",
      "jobs": [
        "0d9c58a6-2eeb-4b58-ab7e-8b04ccbed2f6"
      ]
    },
    {
      "created_at": "2024-12-16T20:10:22.163004Z",
      "id": "0821a429-f587-4a7d-a554-85922b504d8b",
      "jobs": [
        "f941c3ae-b948-494c-b0e6-15c134e1502a"
      ]
    },
    {
      "created_at": "2024-12-16T20:10:23.907048Z",
      "id": "ebde7f7a-d9eb-42d9-9d31-588bd38b833a",
      "jobs": [
        "60df1e15-6ab5-496f-9639-1abc31af7b1c"
      ]
    }
  ],
  [
    {
      "created_at": "2024-12-16T20:10:31.112575Z",
      "id": "0c012c84-045d-4669-8a29-f705efb5fcb4",
      "jobs": [
        "768a66f6-d3de-4018-a121-5b4d2c5e4148"
      ]
    },
    {
      "creat

In [15]:
crawled_pages = [r['url'] for r in execution_transitions[-2].output['result']]

print("Crawled pages: ")
crawled_pages

Crawled pages: 


['https://quickblox.com/pricing/',
 'https://quickblox.com/hosting/',
 'https://quickblox.com/',
 'https://quickblox.com/hosting/hipaa-compliant-hosting/',
 'https://quickblox.com/products/video-conferencing/']

## Lisitng the Document Store for the Agent

The document store is where the agent stores the documents it has created. Each document has a `title` , `content`, `id`, `metadata`, `created_at` and the `vector embedding` associated with it. This will be used for the retrieval of the documents when the agent is queried.

In [16]:
docs = client.agents.docs.list(agent_id=AGENT_UUID, limit=1000).items
num_docs = len(docs)
print("Number of documents in the document store: ", num_docs)

Number of documents in the document store:  59


In [2]:
# # # UNCOMMENT THIS TO DELETE ALL THE AGENT'S DOCUMENTS

# for doc in client.agents.docs.list(agent_id=AGENT_UUID, limit=1000):
#     client.agents.docs.delete(agent_id=AGENT_UUID, doc_id=doc.id)

## Creating a Session

A session is used to interact with the agent. It is used to send messages to the agent and receive responses.
Situation is the initial message that is sent to the agent to set the context for the conversation. Out here you can add more information about the agent and the task it is performing to help the agent answer better. Additionally, you can also define the `search_threshold` and `search_query_chars` which are used to control the retrieval of the documents from the document store which will be used for the retrieval of the documents when the agent is queried.
More information about the session can be found [here](https://github.com/julep-ai/julep/blob/dev/docs/julep-concepts.md#session).

In [17]:
situation = """
You are an AI agent designed to assist users with their queries about Quickblox company and products.
Your goal is to provide clear and detailed responses.

**Guidelines**:
1. Assume the user is unfamiliar with the company and products.
2. Thoroughly read and comprehend the user's question.
3. Use the provided context documents to find relevant information.
4. Craft a detailed response based on the context and your understanding of the company and products.
5. Include links to specific Quickblox pages for further information when applicable.

**Response format**:
- Use simple, clear language.
- Include relevant website links.

**Important**:
- For questions related to the business, only use the information that are explicitly given in the documents above.
- If the user asks about the business, and it's not given in the documents above, respond with an answer that states that you don't know.
- Use the most recent and relevant data from context documents.
- Be proactive in helping users find solutions.
- Ask for clarification if the query is unclear.
- Inform users if their query is unrelated to Quickblox.
- Avoid using the following in your response: Based on the provided documents, based on the provided information, based on the documentation... etc.

{%- if docs -%}
**Relevant documents**:{{NEWLINE}}
  {%- for doc in docs -%}
    {{doc.title}}{{NEWLINE}}
    {%- if doc.content is string -%}
      {{doc.content}}{{NEWLINE}}
    {%- else -%}
      {%- for snippet in doc.content -%}
        {{snippet}}{{NEWLINE}}
      {%- endfor -%}
    {%- endif -%}
    {{"---"}}
  {%- endfor -%}

{%- else -%}
There are no documents available for this query.
{%- endif -%}

"""

print(f"Agent created with ID: {agent.id}")

# Create a session for interaction
session = client.sessions.create(
    # session_id=SESSION_UUID,
    situation=situation,
    agent=AGENT_UUID,
    recall_options={
        "mode": "hybrid",
        "num_search_messages": 1,
        # "max_query_length": 800,
        "confidence": 0.5,
        "alpha": 0.5,
        "limit": 10,
        # "mmr_strength": 0.5,
    },
)

print(f"Session created with ID: {session.id}")

Agent created with ID: 123e4567-e89b-12d3-a456-426614174000
Session created with ID: c0733829-d44c-4eb8-9629-0b881d6d5350


## Chatting with the Agent

The chat method is used to send messages to the agent and receive responses. The messages are sent as a list of dictionaries with the `role` and `content` keys.

In [18]:
user_question = "What are the pricing plans for Quickblox?"

response = client.sessions.chat(
    session_id=session.id,
    messages=[
        {
            "role": "user",
            "content": user_question,
        }
    ],
    recall=True,
    model="claude-3-5-sonnet-20241022",
)

print(response.choices[0].message.content)

Let me outline QuickBlox's pricing plans:

1. Basic Plan (Free)
- 500 total users
- 1 month data retention
- 10 MB file size limit
- Core features including SDKs, 1-1 chat, group chat, online presence, etc.
- 1 AI extension
- 1 SmartChat Assistant bot (30-day free trial)
- Community support

2. Starter Plan ($107/month)
- 10,000 total users
- 3 months data retention
- 25 MB file size limit
- All core and advanced features
- 2 AI extensions
- 1 SmartChat Assistant bot
- Ticketing system support

3. Growth Plan ($269/month)
- 25,000 total users
- 6 months data retention
- 50 MB file size limit
- All core and advanced features
- 3 AI extensions
- 2 SmartChat Assistant bots
- Ticketing system support

4. HIPAA Cloud Plan ($430/month)
- 20,000 total users
- Custom data retention
- 50 MB file size limit
- All core and advanced features
- 3 AI extensions
- 2 SmartChat Assistant bots
- HIPAA compliance
- Ticketing system support

5. Enterprise Plan (Starting from $647)
- Custom number of users

### Check the matched documents

In [19]:
print("Matched docs:\n\n")
for index, doc in enumerate(response.docs):
    print(f"Doc {index + 1}:")
    print(f"Title: {doc.title}")
    print(f"Snippet content:\n{doc.snippet.content}")
    print("-" * 100)

Matched docs:


Doc 1:
Title: Quickblox Website
Snippet content:
[New Product: Chat UI Kits for every platform!](https://quickblox.com/ui-kit/)
* [Blog](https://quickblox.com/blog/)
* [Contacts](https://quickblox.com/contacts/)
* [Support](https://help.quickblox.com/)
* [Log in](https://admin.quickblox.com/signin?_ga=2.85423902.661672082.1589381590-557719420.1579854846)
[](https://quickblox.com)
* Products
* ### Communication Tools
* [SDKs and APIs](https://quickblox.com/sdk/)
Reliable & robust software tools to add communication features to any app or website
* [iOS SDK](https://quickblox.com/sdk/ios-chat-sdk/)
* [Android SDK](https://quickblox.com/sdk/android-chat-sdk/)
* [JavaScript SDK](https://quickblox.com/sdk/javascript-chat-sdk/)
* [React Native SDK](https://quickblox.com/sdk/react-native-chat-sdk/)
* [Flutter SDK](https://quickblox.com/sdk/flutter-chat-sdk/)
* [Chat API](https://quickblox.com/sdk/chat-api/)
* [Chat UI Kits](https://quickblox.com/ui-kit/)
Build your own messeng