<div align="center" id="top">
<img src="https://socialify.git.ci/julep-ai/julep/image?description=1&descriptionEditable=Serverless%20AI%20Workflows%20for%20Data%20%26%20ML%20Teams&font=Source%20Code%20Pro&logo=https%3A%2F%2Fraw.githubusercontent.com%2Fjulep-ai%2Fjulep%2Fdev%2F.github%2Fjulep-logo.svg&owner=1&forks=1&pattern=Solid&stargazers=1&theme=Auto" alt="julep" />

<br>
  <p>
    <a href="https://dashboard.julep.ai">
      <img src="https://img.shields.io/badge/Get_API_Key-FF5733?style=logo=" alt="Get API Key" height="28">
    </a>
    <span>&nbsp;</span>
    <a href="https://docs.julep.ai">
      <img src="https://img.shields.io/badge/Documentation-4B32C3?style=logo=gitbook&logoColor=white" alt="Documentation" height="28">
    </a>
  </p>
  <p>
   <a href="https://www.npmjs.com/package/@julep/sdk"><img src="https://img.shields.io/npm/v/%40julep%2Fsdk?style=social&amp;logo=npm&amp;link=https%3A%2F%2Fwww.npmjs.com%2Fpackage%2F%40julep%2Fsdk" alt="NPM Version" height="28"></a>
    <span>&nbsp;</span>
    <a href="https://pypi.org/project/julep"><img src="https://img.shields.io/pypi/v/julep?style=social&amp;logo=python&amp;label=PyPI&amp;link=https%3A%2F%2Fpypi.org%2Fproject%2Fjulep" alt="PyPI - Version" height="28"></a>
    <span>&nbsp;</span>
    <a href="https://hub.docker.com/u/julepai"><img src="https://img.shields.io/docker/v/julepai/agents-api?sort=semver&amp;style=social&amp;logo=docker&amp;link=https%3A%2F%2Fhub.docker.com%2Fu%2Fjulepai" alt="Docker Image Version" height="28"></a>
    <span>&nbsp;</span>
    <a href="https://choosealicense.com/licenses/apache/"><img src="https://img.shields.io/github/license/julep-ai/julep" alt="GitHub License" height="28"></a>
  </p>
  
  <h3>
    <a href="https://discord.com/invite/JTSBGRZrzj" rel="dofollow">Discord</a>
    ·
    <a href="https://x.com/julep_ai" rel="dofollow">𝕏</a>
    ·
    <a href="https://www.linkedin.com/company/julep-ai" rel="dofollow">LinkedIn</a>
  </h3>
</div>

## Task Definition: Spider Crawler Integration

### Overview

This task is a simple task that leverages the spider `integration` tool, and combines it with a prompt step to crawl a website for a given URL, and then create a summary of the results.

### Task Tools:

**Spider Crawler**: An `integration` type tool that can crawl the web and extract data from a given URL.

### Task Input:

**url**: The URL of the website to crawl.

### Task Output:

**output**: A dictionary that contains a `documents` key which contains the extracted data from the given URL. Check the output below for a detailed output schema.

### Task Flow

1. **Input**: The user provides a URL to crawl.

2. **Spider Tool Integration**: The `spider_crawler` tool is called to crawl the web and extract data from the given URL.

3. **Prompt Step**: The prompt step is used to create a summary of the results from the spider tool.

4. **Output**: The final output is the summary of the results from the spider tool.

```plaintext
+----------+     +-------------+     +------------+     +-----------+
|  User    |     |   Spider    |     |   Prompt   |     |  Output   |
|  Input   | --> |   Crawler   | --> |   Step     | --> |   Step    |
| (URL)    |     |             |     |            |     | Output    |
+----------+     +-------------+     +------------+     +-----------+
      |                |                  |                  |
      |                |                  |                  |
      v                v                  v                  v
   "https://spider.cloud"   Extract data   Create summary   "Here are the
                            from URL       of results      results from the
                                                        spider tool
```


## Implementation

To recreate the notebook and see the code implementation for this task, you can access the Google Colab notebook using the link below:

<a target="_blank" href="https://colab.research.google.com/github/julep-ai/julep/blob/dev/cookbooks/01-website-crawler.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

### Additional Information

For more details about the task or if you have any questions, please don't hesitate to contact the author:

**Author:** Julep AI  
**Contact:** [hey@julep.ai](mailto:hey@julep.ai) or  <a href="https://discord.com/invite/JTSBGRZrzj" rel="dofollow">Discord</a>

Installing the Julep Client

In [31]:
!pip install --upgrade julep --quiet

#### NOTE:

- UUIDs are generated for both the agent and task to uniquely identify them within the system.
- Once created, these UUIDs should remain unchanged for simplicity.
- Altering a UUID will result in the system treating it as a new agent or task.
- If a UUID is changed, the original agent or task will continue to exist in the system alongside the new one.

In [32]:
import uuid

# NOTE: these UUIDs are used in order not to use the `create_or_update` methods instead of
# the `create` methods for the sake of not creating new resources every time a cell is run.
AGENT_UUID = uuid.uuid4()
TASK_UUID = uuid.uuid4()

## Creating Julep Client with the API Key

Get your API key from [here](https://dashboard.julep.ai/)

In [33]:
from julep import Client
import os

JULEP_API_KEY = os.environ['JULEP_API_KEY']

# Create a Julep client
client = Client(api_key=JULEP_API_KEY, environment="production")

### Creating an "agent"

Agent is the object to which LLM settings, like model, temperature along with tools are scoped to.

To learn more about the agent, please refer to the Agent section in [Julep Concepts](https://docs.julep.ai/docs/concepts/agents).

In [34]:
# Create agent
agent = client.agents.create_or_update(
    agent_id=AGENT_UUID,
    name="Spiderman",
    about="AI that can crawl the web and extract data",
    model="gpt-4o",
)

### Defining a Task

Tasks in Julep are Github-Actions-style workflows that define long-running, multi-step actions.

You can use them to conduct complex actions by defining them step-by-step.

To learn more about tasks, please refer to the `Tasks` section in [Julep Concepts](https://docs.julep.ai/docs/concepts/tasks).

In [99]:
import yaml

SPIDER_API_KEY = "YOUR_SPIDER_API_KEY"

# Define the task
task_def = yaml.safe_load(f"""       
# yaml-language-server: $schema=https://raw.githubusercontent.com/julep-ai/julep/refs/heads/dev/schemas/create_task_request.json                
name: Julep Crawling Task
description: Crawl a website and create a summary of the results

########################################################
####################### INPUT SCHEMA ###################
########################################################

input_schema:
  type: object
  properties:
    url:
      type: string
      description: The URL of the website to crawl

########################################################
####################### TOOLS ##########################
########################################################

# Define the tools that the task will use in this workflow
tools:
- name: spider_crawler
  type: integration
  integration:
    provider: spider
    setup:
      spider_api_key: "{SPIDER_API_KEY}"

########################################################
####################### MAIN WORKFLOW ##################
########################################################

main:

# Step 0: Call the spider_crawler tool with the url input
- tool: spider_crawler
  arguments:
    url: $ _['url'] # You can also use 'steps[0].input.url'
    params:
      request: smart_mode
      limit: $ 2 # limit to 2 pages
      return_format: markdown
      proxy_enabled: $ True
      filter_output_images: $ True
      filter_output_svg: $ True
      readability: $ True

# Step 1: Evaluate step to create a summary of the results
- evaluate:
    documents: |
      $ " ".join(
      list(
        page['content'] for page in _['result']
        )
      )
      
# Step 2: Prompt step to create a summary of the results
- prompt: |
    $ f'''
    You are {{agent.about}}
    I have given you this url: {{steps[0].input.url}}
    And you have crawled that website. Here are the results you found:
    {{_['documents']}}
    I want you to create a short summary (no longer than 100 words) of the results you found while crawling that website.
    '''
  unwrap: true
""")

<span style="color:olive;">Notes:</span>
- The `unwrap: True` in the prompt step is used to unwrap the output of the prompt step (to unwrap the `choices[0].message.content` from the output of the model).
- The `$` sign is used to differentiate between a Python expression and a string.
- The `_` refers to the output of the previous step.
- The `steps[index].input` refers to the input of the step at `index`.
- The `steps[index].output` refers to the output of the step at `index`.

In [100]:
# creating the task object
task = client.tasks.create_or_update(
    task_id=TASK_UUID,
    agent_id=AGENT_UUID,
    **task_def
)

### Creating an Execution

An execution is a single run of a task. It is a way to run a task with a specific set of inputs.

To learn more about executions, please refer to the `Executions` section in [Julep Concepts](https://docs.julep.ai/docs/concepts/execution).

In [101]:
# creating an execution object
execution = client.executions.create(
    task_id=TASK_UUID,
    input={
        "url": "https://docs.julep.ai"
    }
)

## Checking execution details and output

There are multiple ways to get the execution details and the output:

1. **Get Execution Details**: This method retrieves the details of the execution, including the output of the last transition that took place.

2. **List Transitions**: This method lists all the task steps that have been executed up to this point in time, so the output of a successful execution will be the output of the last transition (first in the transition list as it is in reverse chronological order), which should have a type of `finish`.


<span style="color:olive;">Note: You need to wait for a few seconds for the execution to complete before you can get the final output, so feel free to run the following cells multiple times until you get the final output.</span>


In [111]:
# Get execution details
execution = client.executions.get(execution.id)
# Print the output
print(execution.output)

Julep is a platform for creating sophisticated AI agents that manage complex tasks using long-term memory and multi-step processes. The documentation offers guidance on getting started, using the Julep CLI, understanding core concepts like agents and tasks, and integrating external tools and APIs. Key features include persistent AI agents, stateful sessions, and support for Retrieval-Augmented Generation (RAG). Users can explore tutorials, API references, and SDKs, with additional resources for developers, enterprises, and researchers. The site emphasizes robust task management, easy integration, and extensive support through community links and detailed guides.


In [112]:
# Lists all the task steps that have been executed up to this point in time
transitions = client.executions.transitions.list(execution_id=execution.id).items

# Transitions are retrieved in reverse chronological order
for transition in reversed(transitions):
    print("Transition type: ", transition.type)
    print("Transition output: ", transition.output)
    print("-"*50)

Transition type:  init
Transition output:  {'url': 'https://docs.julep.ai'}
--------------------------------------------------
Transition type:  step
Transition output:  {'result': [{'url': 'https://docs.julep.ai/', 'costs': {'ai_cost': 0, 'file_cost': 0.0002, 'total_cost': 0.0004, 'compute_cost': 0.0001, 'transform_cost': 0.0001, 'bytes_transferred_cost': 0}, 'error': None, 'status': 200, 'content': '[Julephome page](https://docs.julep.ai/)v1\nSearch Julep docs\n* [Community](https://discord.gg/p4c7ehs4vD)\n* [Changelog](https://github.com/julep-ai/julep/blob/dev/CHANGELOG.md)\n##### Get Started\n* [\nWelcome to Julep\n](https://docs.julep.ai/introduction/julep)\n* [\nInstallation\n](https://docs.julep.ai/introduction/install)\n* [\nQuick Start\n](https://docs.julep.ai/introduction/quickstart)\n##### Julep CLI\n* [\nGetting Started\n](https://docs.julep.ai/julepcli/introduction)\n* [\nCommand Reference\n](https://docs.julep.ai/julepcli/commands)\n##### Core Concepts\n* [\nAgents\n](ht

### Running the same task with a different URL

We will use the same code to run the same task, but with a different URL

In [129]:
execution = client.executions.create(
    task_id=TASK_UUID,
    input={
        "url": "https://www.producthunt.com/"
    }
)

In [130]:
import time

execution = client.executions.get(execution.id)

while execution.status != "succeeded":
    time.sleep(5)
    execution = client.executions.get(execution.id)
    print("Execution status: ", execution.status)
    print("-"*50)

execution = client.executions.get(execution.id)

print("\n".join(execution.output.split(". ")))

Execution status:  running
--------------------------------------------------
Execution status:  succeeded
--------------------------------------------------
Product Hunt is a platform to discover and launch new tech products
It features daily, weekly, and monthly top product lists, showcasing innovations in AI, productivity, developer tools, and more
Today's top product is "Boxo," a tool for deploying AI features in mobile apps
Other highlights include AI-powered email and research tools like Mailgo and Claude Research
Promoted products include discounts on Intercom and tools like Sider for AI research
The site also offers community forums, newsletters, and a marketplace for the latest tech launches and discussions in various tech categories.


<span style="color:olive;">Note: you can get the output of the crawling step by accessing the corresponding transition's output from the transitions list.</span>

Example:

In [131]:
transitions = client.executions.transitions.list(execution_id=execution.id).items

transitions[1].output

{'documents': '[](https://www.producthunt.com/)\n[Subscribe](https://www.producthunt.com/newsletters?ref=header_nav&amp;campaign=weekly_newsletter&amp;source=header_nav)\nSign in\nWelcome to Product Hunt!\nThe place to launch and discover new tech products.Take a tour.\n# Top Products Launching Today\n[](https://www.producthunt.com/posts/boxo-2)\n[1. Boxo](https://www.producthunt.com/posts/boxo-2)[Quickly deploy AI-generated features in your mobile app](https://www.producthunt.com/posts/boxo-2)\n[Productivity](https://www.producthunt.com/topics/productivity)•[Developer Tools](https://www.producthunt.com/topics/developer-tools)•[Artificial Intelligence](https://www.producthunt.com/topics/artificial-intelligence)\n56\n254\n[](https://www.producthunt.com/posts/mailgo-3)\n[2. Mailgo](https://www.producthunt.com/posts/mailgo-3)[AI-powered cold email platform that boosts deliverability](https://www.producthunt.com/posts/mailgo-3)\n[Email](https://www.producthunt.com/topics/email)•[Sales](htt

## Related Concepts

- [Agents](/concepts/agents)
- [Tasks](/concepts/tasks)
- [Tools](/concepts/tools)
