<div align="center" id="top">
<img src="https://socialify.git.ci/julep-ai/julep/image?description=1&descriptionEditable=Serverless%20AI%20Workflows%20for%20Data%20%26%20ML%20Teams&font=Source%20Code%20Pro&logo=https%3A%2F%2Fraw.githubusercontent.com%2Fjulep-ai%2Fjulep%2Fdev%2F.github%2Fjulep-logo.svg&owner=1&forks=1&pattern=Solid&stargazers=1&theme=Auto" alt="julep" />

<br>
  <p>
    <a href="https://dashboard.julep.ai">
      <img src="https://img.shields.io/badge/Get_API_Key-FF5733?style=logo=" alt="Get API Key" height="28">
    </a>
    <span>&nbsp;</span>
    <a href="https://docs.julep.ai">
      <img src="https://img.shields.io/badge/Documentation-4B32C3?style=logo=gitbook&logoColor=white" alt="Documentation" height="28">
    </a>
  </p>
  <p>
   <a href="https://www.npmjs.com/package/@julep/sdk"><img src="https://img.shields.io/npm/v/%40julep%2Fsdk?style=social&amp;logo=npm&amp;link=https%3A%2F%2Fwww.npmjs.com%2Fpackage%2F%40julep%2Fsdk" alt="NPM Version" height="28"></a>
    <span>&nbsp;</span>
    <a href="https://pypi.org/project/julep"><img src="https://img.shields.io/pypi/v/julep?style=social&amp;logo=python&amp;label=PyPI&amp;link=https%3A%2F%2Fpypi.org%2Fproject%2Fjulep" alt="PyPI - Version" height="28"></a>
    <span>&nbsp;</span>
    <a href="https://hub.docker.com/u/julepai"><img src="https://img.shields.io/docker/v/julepai/agents-api?sort=semver&amp;style=social&amp;logo=docker&amp;link=https%3A%2F%2Fhub.docker.com%2Fu%2Fjulepai" alt="Docker Image Version" height="28"></a>
    <span>&nbsp;</span>
    <a href="https://choosealicense.com/licenses/apache/"><img src="https://img.shields.io/github/license/julep-ai/julep" alt="GitHub License" height="28"></a>
  </p>
  
  <h3>
    <a href="https://discord.com/invite/JTSBGRZrzj" rel="dofollow">Discord</a>
    ·
    <a href="https://x.com/julep_ai" rel="dofollow">𝕏</a>
    ·
    <a href="https://www.linkedin.com/company/julep-ai" rel="dofollow">LinkedIn</a>
  </h3>
</div>

# Task Definition: Chatbot for a Website

### Overview

This task implements an automated system to index and process a website. The system crawls the website, extracts relevant information, and creates a searchable knowledge base that can be queried programmatically. Finally, it creates a chatbot that can answer questions about the website using the RAG knowledge base.

### Task Tools:

- **get_page**: Web crawler component for systematically scraping a website's page using JIna's API
- **create_agent_doc**: Document processor for converting web content into indexed, searchable documents

### Task Input:

Required parameter:
- **url**: Entry point URL for the crawler (e.g., "https://en.wikipedia.org/wiki/Artificial_intelligence", or "https://julep.ai")
- **reducing_strength**: Number of sentences to club together to reduce the size of the chunks list

### Task Output:

- Indexed knowledge base containing processed documentation about the given website links
- Query interface for programmatic access to the documentation

### Task Flow

1. Create an agent with the get_page tool
2. Get the get_page tool's API key 
3. Crawl website using get_page tool:
    - Specify URL and page limit
    - Enable smart mode, proxy, readability
    - Filter out images and SVGs
    - Use sentence-based chunking (15 sentences per chunk)
4. For each crawled page:
    - Extract content
    - Break into chunks
    - Generate succinct context for each chunk
    - Combine chunk with context
5. Create agent documents:
    - Store processed chunks as documents
    - Add metadata like source
    - Index for search/retrieval
6. Finally chat with the agent in the session and retrieve the documents that are relevant to the user's query

```plaintext
+----------------+     +----------------+     +----------------+     +---------------+
|  Jina AI       |     |  Extract       |     |  Process       |     |  Create       |
|  Crawler       | --> |  Content       | --> |  Content       | --> |  Agent Docs   |
|  (URL Entry)   |     |  (Web Pages)   |     |  (Chunks)      |     |  (Index)      |
+----------------+     +----------------+     +----------------+     +---------------+
                                                                             |
+----------------+     +----------------+     +----------------+     +---------------+
|  Query         |     |  Search        |     |  Retrieve      |     |  Chat with    |
|  Interface     | <-- |  Index         | <-- |  Documents     | <-- |  Agent        |
|  (User Input)  |     |  (Knowledge)   |     |  (Context)     |     |  (Session)    |
+----------------+     +----------------+     +----------------+     +---------------+
```

## Implementation

To recreate the notebook and see the code implementation for this task, you can access the Google Colab notebook using the link below:

<a target="_blank" href="https://colab.research.google.com/github/julep-ai/julep/blob/dev/cookbooks/08-rag-customer-support-chatbot.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

### Additional Information

For more details about the task or if you have any questions, please don't hesitate to contact the author:

**Author:** Julep AI  
**Contact:** [hey@julep.ai](mailto:hey@julep.ai) or  <a href="https://discord.com/invite/JTSBGRZrzj" rel="dofollow">Discord</a>

Installing the Julep Client

In [1]:
!pip install julep -U --quiet


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


#### NOTE:

- UUIDs are generated for both the agent and task to uniquely identify them within the system.
- Once created, these UUIDs should remain unchanged for simplicity.
- Altering a UUID will result in the system treating it as a new agent or task.
- If a UUID is changed, the original agent or task will continue to exist in the system alongside the new one.

In [2]:
# Global UUID is generated for agent and task
import uuid

# NOTE: these UUIDs are used in order not to use the `create_or_update` methods instead of
# the `create` methods for the sake of not creating new resources every time a cell is run.
AGENT_UUID = uuid.uuid4()
TASK_UUID = uuid.uuid4()
SESSION_UUID = uuid.uuid4()

## Creating Julep Client with the API Key

Get your API key from [here](https://dashboard.julep.ai/)

In [3]:
from julep import Client
import os
from dotenv import load_dotenv

load_dotenv(override=True)

JULEP_API_KEY = os.environ.get("JULEP_API_KEY")

# Create a Julep client
client = Client(api_key=JULEP_API_KEY, environment="production")


### Creating an "agent"

Agent is the object to which LLM settings, like model, temperature along with tools are scoped to.

To learn more about the agent, please refer to the Agent section in [Julep Concepts](https://docs.julep.ai/docs/concepts/agents).

In [4]:
# Create agent
agent = client.agents.create_or_update(
    agent_id=AGENT_UUID,
    name="Website Navigator",
    about="An AI assistant that can navigate a company's website and assist you in finding the information you need to make the most of our services.",
    model="gpt-4o",
)

In [5]:
docs = [*client.agents.docs.list(agent_id=AGENT_UUID)]
print(f"Number of documents in the agent's document store: {len(docs)}")

Number of documents in the agent's document store: 0


### Defining a Task

Tasks in Julep are Github-Actions-style workflows that define long-running, multi-step actions.

You can use them to conduct complex actions by defining them step-by-step.

To learn more about tasks, please refer to the `Tasks` section in [Julep Concepts](https://docs.julep.ai/docs/concepts/tasks).

In [6]:
import yaml
jina_api_key = "jina_api_key"

task_def = yaml.safe_load('''
# yaml-language-server: $schema=https://raw.githubusercontent.com/julep-ai/julep/refs/heads/dev/schemas/create_task_request.json
name: Julep Jina Crawler Task
description: A Julep agent that can crawl a website and store the content in the document store.

########################################################
################### INPUT SCHEMA #######################
########################################################

input_schema:
  type: object
  properties:
    url:
      type: string
    reducing_strength:
      type: integer
      
########################################################
################### TOOLS ##############################
########################################################

tools:
- name: get_page
  type: api_call
  api_call:
    method: GET
    url: https://r.jina.ai/
    headers:
      accept: application/json
    x-return-format: markdown
    x-with-images-summary: "true"
    x-with-links-summary: "true"
    x-retain-images: "none"
    x-no-cache: "true"
    Authorization: "Bearer {jina_api_key}"

- name : create_agent_doc
  description: Create an agent doc
  type: system
  system:
    resource: agent
    subresource: doc
    operation: create

########################################################
################### INDEX PAGE SUBWORKFLOW ##############
########################################################

index_page:

# Step #0 - Evaluate the content
- evaluate:
    document: $ _.document
    chunks: |
      $ [" ".join(_.content[i:i + max(_.reducing_strength, len(_.content) // 9)]) 
        for i in range(0, len(_.content), max(_.reducing_strength, len(_.content) // 9))]
    label: docs

# Step #1 - Process each content chunk in parallel
- over: "$ [(steps[0].input.content, chunk) for chunk in _['chunks']]"
  parallelism: 3
  map:
    prompt: 
    - role: user
      content: >-
        $ f"""
        <document>
        {_[0]}
        </document>

        Here is the chunk we want to situate within the whole document
        <chunk>
        {_[1]}
        </chunk>

        Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. 
        Answer only with the succinct context and nothing else."""
    
    unwrap: true
    settings:
      max_tokens: 16000

# Step #2 - Create a new document and add it to the agent docs store
- evaluate:
    final_chunks: |
      $ [
        NEWLINE.join([chunk, succint]) for chunk, succint in zip(steps[1].input.chunks, _)
      ]

# Step #3 - Create a new document and add it to the agent docs store
- over: $ _['final_chunks']
  parallelism: 3
  map:
    tool: create_agent_doc
    arguments:
      agent_id: "$ str(agent.id)" # <--- This is the agent id of the agent you want to add the document to
      data:
        metadata:
          source: "jina_crawler"

        title: "Website Document"
        content: $ _

########################################################
################### MAIN WORKFLOW ######################
########################################################

main:

# Step 0: Get the content of the product page
- tool: get_page
  arguments:
    url: $ "https://r.jina.ai/" + steps[0].input.url

# Step 1: Chunk the content
- evaluate:
    result: $ chunk_doc(_.json.data.content.strip())

# Step 2: Evaluate step to document chunks
- workflow: index_page
  arguments:
    content: $ _.result
    document: $ steps[0].output.json.data.content.strip()
    reducing_strength: $ steps[0].input.reducing_strength
''')

<span style="color:olive;">Notes:</span>
- The `unwrap: True` in the prompt step is used to unwrap the output of the prompt step (to unwrap the `choices[0].message.content` from the output of the model).
- The `$` sign is used to differentiate between a Python expression and a string.
- The `_` refers to the output of the previous step.
- The `steps[index].input` refers to the input of the step at `index`.
- The `steps[index].output` refers to the output of the step at `index`.

In [7]:
# Create the task
task = client.tasks.create_or_update(
    task_id=TASK_UUID,
    agent_id=AGENT_UUID,
    **task_def
)

### Creating an Execution

An execution is a single run of a task. It is a way to run a task with a specific set of inputs.

To learn more about executions, please refer to the `Executions` section in [Julep Concepts](https://docs.julep.ai/docs/concepts/execution).

In [14]:
execution_homepage = client.executions.create(
    task_id=TASK_UUID,
    input={
        "url": "https://julep.ai/",
        "reducing_strength": 2
    }
)

## Monitoring and analyzing execution

There are multiple ways to track and analyze the execution:

1. **Stream Execution Status**: This method streams real-time status updates from the execution, allowing you to monitor its progress as it happens. Each event contains the current status of the execution (e.g., `running`, `succeeded`, `failed`).

2. **List Transitions**: This method lists all the task steps that have been executed up to this point in time, so the output of a successful execution will be the output of the last transition (first in the transition list as it is in reverse chronological order), which should have a type of `finish`.

<span style="color:olive;">Note: The streaming status updates will continue until the execution completes. You can see the transitions after the execution has completed to analyze the detailed outputs of each step.</span>

In [None]:
for event in client.executions.status.stream(execution_id=execution_homepage.id):
    print(event.status)

running
running
running
running
running
running
running
running
running
succeeded


In [17]:
import json
execution_transitions = client.executions.transitions.list(
    execution_id=execution_homepage.id, limit=2000).items

for index, transition in enumerate(reversed(execution_transitions)):
    print("Index: ", index, "Type: ", transition.type)
    print("output: ", json.dumps(transition.output, indent=2))
    print("-" * 100)

Index:  0 Type:  init
output:  {
  "url": "https://julep.ai/",
  "reducing_strength": 2
}
----------------------------------------------------------------------------------------------------
Index:  1 Type:  step
output:  {
  "json": {
    "code": 200,
    "data": {
      "url": "https://julep.ai/",
      "title": "Julep AI",
      "usage": {
        "tokens": 4630
      },
      "description": "Backend for building AI agent workflows. A new DSL for building tasks and a server for running them. \nAllows companie so build and deploy AI pipelines in minues."
    },
    "meta": {
      "usage": {
        "tokens": 4630
      }
    },
    "status": 20000
  },
  "content": "eyJjb2RlIjoyMDAsInN0YXR1cyI6MjAwMDAsImRhdGEiOnsidGl0bGUiOiJKdWxlcCBBSSIsImRlc2NyaXB0aW9uIjoiQmFja2VuZCBmb3IgYnVpbGRpbmcgQUkgYWdlbnQgd29ya2Zsb3dzLiBBIG5ldyBEU0wgZm9yIGJ1aWxkaW5nIHRhc2tzIGFuZCBhIHNlcnZlciBmb3IgcnVubmluZyB0aGVtLiBcbkFsbG93cyBjb21wYW5pZSBzbyBidWlsZCBhbmQgZGVwbG95IEFJIHBpcGVsaW5lcyBpbiBtaW51ZXMuIiwidXJsIjoiaH

## Lisitng the Document Store for the Agent

The document store is where the agent stores the documents it has created. Each document has a `title` , `content`, `id`, `metadata`, `created_at` and the `vector embedding` associated with it. This will be used for the retrieval of the documents when the agent is queried.

In [4]:
docs = [*client.agents.docs.list(agent_id=AGENT_UUID)]
print("Number of documents in the document store: ", len(docs))

Number of documents in the document store:  4


In [2]:
# # # UNCOMMENT THIS TO DELETE ALL THE AGENT'S DOCUMENTS

# for doc in client.agents.docs.list(agent_id=AGENT_UUID, limit=1000):
#     client.agents.docs.delete(agent_id=AGENT_UUID, doc_id=doc.id)

## Creating a Session

A session is used to interact with the agent. It is used to send messages to the agent and receive responses.
Situation is the initial message that is sent to the agent to set the context for the conversation. Out here you can add more information about the agent and the task it is performing to help the agent answer better. Additionally, you can also define the `search_threshold` and `search_query_chars` which are used to control the retrieval of the documents from the document store which will be used for the retrieval of the documents when the agent is queried.
More information about the session can be found [here](https://github.com/julep-ai/julep/blob/dev/docs/julep-concepts.md#session).

In [None]:
situation = """
You are an AI agent designed to assist users with their queries about our company and products.
Your goal is to provide clear and detailed responses.

**Guidelines**:
1. Assume the user is unfamiliar with the company and products.
2. Thoroughly read and comprehend the user's question.
3. Use the provided context documents to find relevant information.
4. Craft a detailed response based on the context and your understanding of the company and products.
5. Include links to specific company pages for further information when applicable.

**Response format**:
- Use simple, clear language.
- Include relevant website links.

**Important**:
- For questions related to the business, only use the information that are explicitly given in the documents above.
- If the user asks about the business, and it's not given in the documents above, respond with an answer that states that you don't know.
- Use the most recent and relevant data from context documents.
- Be proactive in helping users find solutions.
- Ask for clarification if the query is unclear.
- Avoid using the following in your response: Based on the provided documents, based on the provided information, based on the documentation... etc.

{%- if docs -%}
**Relevant documents**:{{NEWLINE}}
  {%- for doc in docs -%}
    {{doc.title}}{{NEWLINE}}
    {%- if doc.content is string -%}
      {{doc.content}}{{NEWLINE}}
    {%- else -%}
      {%- for snippet in doc.content -%}
        {{snippet}}{{NEWLINE}}
      {%- endfor -%}
    {%- endif -%}
    {{"---"}}
  {%- endfor -%}

{%- else -%}
There are no documents available for this query.
{%- endif -%}
"""

# Create a session for interaction
session = client.sessions.create(
    situation=situation,
    agent=str(AGENT_UUID),
    recall_options={
        "mode": "hybrid",
        "num_search_messages": 1,
        # "max_query_length": 800,
        "confidence": 0.5,
        "alpha": 0.5,
        "limit": 10,
        # "mmr_strength": 0.5,
    },
)
print(f"Agent created with ID: {agent.id}")
print(f"Session created with ID: {session.id}")

Agent created with ID: cd2bc3ee-fe18-4ddc-8f8a-d547dbf183d7
Session created with ID: 0684842b-fa18-7840-8000-eba0bbbca948


## Chatting with the Agent

The chat method is used to send messages to the agent and receive responses. The messages are sent as a list of dictionaries with the `role` and `content` keys.

In [23]:
def chat_with_agent(user_question):
    response = client.sessions.chat(
        session_id=session.id,
        messages=[
            {
                "role": "user",
                "content": user_question,
            }
        ],
        recall=True,
    )

    return response

In [20]:
# Example usage
user_question = "How do I build with julep?"
response = chat_with_agent(user_question)

print(response.choices[0].message.content)

Building with Julep involves a few straightforward steps to create, deploy, and manage AI workflows. Here's a step-by-step guide:

1. **Create an Agent**: Define agents that can interact with users within a session. Agents are the core of your AI workflows.

2. **Add Tools**: Equip your agents with necessary tools such as web search, API calls, or custom integrations. This allows your agent to access and retrieve data efficiently.

3. **Define Your Tasks**: Use YAML to define multi-step processes. You can incorporate decision trees, loops, and parallel execution to handle complex workflows.

4. **Deploy**: Once you have configured your agent and defined its tasks, you can execute production-grade workflows with a single command.

Here's a basic example of creating an agent with Julep:

```python
agent = julep.agents.create(
    name="Spiderman",
    about="AI that can crawl the web and extract data",
    model="gpt-4o-mini",
    default_settings={
        "temperature": 0.75,
        "

In [24]:
# Example usage
user_question = "What customer testimonials has julep received?"
response = chat_with_agent(user_question)

print(response.choices[0].message.content)

Julep AI has received several positive testimonials from customers, showcasing its transformative impact on various businesses. Here are some highlights:

1. **Suryansh Tibarewal, CEO of EssentiallySports**: He mentioned that Julep sits at the intersection of what Zapier did for simplifying workflows and what Vercel did for simplifying shipping, expressing excitement about the endless possibilities with Julep's technology.

2. **Vedant Maheshwari, CEO of Vidyo.ai**: Noted Julep as a "GAME CHANGER," stating that Vidyo was able to ship six months' worth of product development in just six hours.

3. **Madhavan, CEO of Reclaim Protocol**: Shared that their AI stack development time was significantly reduced from months to a couple of weeks, largely due to Julep handling most of the infrastructure orchestration, allowing them to focus on refining prompts.

4. **Akshay Pruthi, CEO of Calm Sleep**: Described the experience as akin to reviewing designs on Figma, where he could brainstorm, test

### Check the matched documents

In [25]:
print("Matched docs:\n\n")
for index, doc in enumerate(response.docs):
    print(f"Doc {index + 1}:")
    print(f"Title: {doc.title}")
    print(f"Snippet content:\n{doc.snippet.content}")
    print("-" * 100)

Matched docs:


Doc 1:
Title: Website Document
Snippet content:
[Image 15](https://framerusercontent.com/images/iLE57X7WyaBHXDO0aySJzS6Q0.png?lossless=1)   
*   ! [Image 16](https://framerusercontent.com/images/nMzDfyarSbUk5On2jln2FPr0KE.png?lossless=1)   
*   ! [Image 17](https://framerusercontent.com/images/Nfxk7UAuqT0uV8GtgL0VT3kfnOk.png?lossless=1)   
*   ! [Image 18](https://framerusercontent.com/images/GFOoxnAxwzUg52KtlUHpKhw.png?lossless=1)   
*   ! [Image 19](https://framerusercontent.com/images/rJrZQ6elomObJLKjXuDKNtcS18.png?lossless=1)   
*   ! [Image 20](https://framerusercontent.com/images/hOHRHiTvMQHs5nyJhWAmCcK6eQ.png?lossless=1)   
*   ! [Image 21](https://framerusercontent.com/images/iLE57X7WyaBHXDO0aySJzS6Q0.png?lossless=1)   
*   ! [Image 22](https://framerusercontent.com/images/nMzDfyarSbUk5On2jln2FPr0KE.png?lossless=1)   
*   ! [Image 23](https://framerusercontent.com/images/Nfxk7UAuqT0uV8GtgL0VT3kfnOk.png?lossless=1)   
*   ! [Image 24](https://framerusercontent.co