<div align="center" id="top">
 <img src="https://socialify.git.ci/julep-ai/julep/image?description=1&descriptionEditable=API%20for%20AI%20agents%20and%20multi-step%20tasks&forks=1&name=1&owner=1&pattern=Solid&stargazers=1&font=Source%20Code%20Pro&logo=https%3A%2F%2Fraw.githubusercontent.com%2Fjulep-ai%2Fjulep%2Fdev%2F.github%2Fjulep-logo.svg&theme=Auto" alt="julep" width="640" height="320" />
</div>

<p align="center">
  <br />
  <a href="https://docs.julep.ai" rel="dofollow">Explore Docs (wip)</a>
  ·
  <a href="https://discord.com/invite/JTSBGRZrzj" rel="dofollow">Discord</a>
  ·
  <a href="https://x.com/julep_ai" rel="dofollow">𝕏</a>
  ·
  <a href="https://www.linkedin.com/company/julep-ai" rel="dofollow">LinkedIn</a>
</p>

<p align="center">
    <a href="https://www.npmjs.com/package/@julep/sdk"><img src="https://img.shields.io/npm/v/%40julep%2Fsdk?style=social&amp;logo=npm&amp;link=https%3A%2F%2Fwww.npmjs.com%2Fpackage%2F%40julep%2Fsdk" alt="NPM Version"></a>
    <span>&nbsp;</span>
    <a href="https://pypi.org/project/julep"><img src="https://img.shields.io/pypi/v/julep?style=social&amp;logo=python&amp;label=PyPI&amp;link=https%3A%2F%2Fpypi.org%2Fproject%2Fjulep" alt="PyPI - Version"></a>
    <span>&nbsp;</span>
    <a href="https://hub.docker.com/u/julepai"><img src="https://img.shields.io/docker/v/julepai/agents-api?sort=semver&amp;style=social&amp;logo=docker&amp;link=https%3A%2F%2Fhub.docker.com%2Fu%2Fjulepai" alt="Docker Image Version"></a>
    <span>&nbsp;</span>
    <a href="https://choosealicense.com/licenses/apache/"><img src="https://img.shields.io/github/license/julep-ai/julep" alt="GitHub License"></a>
</p>

## Task Definition: Video Processing with Natural Language

### Overview

This task is leverages the cloudinary `integration` tool, and combines it with a prompt step to convert a natural language instructions and apply them to a given video.

### Task Tools:

**Cloudinary**: An `integration` type tool that can upload and transform media files.

### Task Input:

**video_url**: The URL of the video to transform.

**public_id**: The public id of the video to transform.

**transformation_prompt**: The natural language instructions to apply to the video.

### Task Output:

**transformed_video_url**: The URL of the transformed video.

### Task Flow

1. **Input**: The user provides a URL to a video and transoformation instrucitons (in natural language) to apply to the video.

2. **Cloudinary Tool Integration**: The `cloudinary_upload` tool is called to upload the video to cloudinary.

3. **Prompt Step**: The prompt step is used to convert the natural language instructions into a json of transformation instructions that are compatible with cloudinary's API. In this step, `gemini-1.5-pro` is used as the model due to its ability to read video files.

4. **Cloudinary Tool Integration**: The `cloudinary_upload` tool is called again to apply the transformation instructions to the video.

5. **Output**: The final output is the URL of the transformed video.

```plaintext
+----------+     +------------+     +------------+     +-----------+
|  Video   |     | Cloudinary |     |  Prompt    |     | Processed |
|  Input   | --> |  Upload    | --> |  Step      | --> |  Video    |
|          |     |            |     | (Gemini)   |     |           |
+----------+     +------------+     +------------+     +-----------+
      |                |                  |                  |
      |                |                  |                  |
      v                v                  v                  v
"video.mp4 +    Upload video    Generate JSON     "video.mp4 with
NL prompt"                    transformations     blur & overlay"
```

## Implementation

To recreate the notebook and see the code implementation for this task, you can access the Google Colab notebook using the link below:

<a target="_blank" href="https://colab.research.google.com/github/julep-ai/julep/blob/dev/cookbooks/05-video-processing-with-natural-language.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

### Additional Information

For more details about the task or if you have any questions, please don't hesitate to contact the author:

**Author:** Julep AI  
**Contact:** [hey@julep.ai](mailto:hey@julep.ai) or  <a href="https://discord.com/invite/JTSBGRZrzj" rel="dofollow">Discord</a>

Installing the Julep Client

In [1]:
!pip install --upgrade julep --quiet

In [2]:
import uuid

# NOTE: these UUIDs are used in order not to use the `create_or_update` methods instead of
# the `create` methods for the sake of not creating new resources every time a cell is run.
AGENT_UUID = uuid.uuid4()
TASK_UUID = uuid.uuid4()

### Creating Julep Client with the API Key

In [3]:
from julep import Client
import os

api_key = os.getenv("JULEP_API_KEY")

# Create a Julep client
client = Client(api_key=api_key, environment="dev")

## Creating an "agent"

Agent is the object to which LLM settings, like model, temperature along with tools are scoped to.

To learn more about the agent, please refer to the [documentation](https://github.com/julep-ai/julep/blob/dev/docs/julep-concepts.md#agent).

In [4]:
# Create agent
agent = client.agents.create_or_update(
    agent_id=AGENT_UUID,
    name="Spiderman",
    about="AI that can crawl the web and extract data",
    model="gpt-4o",
)

## Defining a Task

Tasks in Julep are Github-Actions-style workflows that define long-running, multi-step actions.

You can use them to conduct complex actions by defining them step-by-step.

To learn more about tasks, please refer to the `Tasks` section in [Julep Concepts](https://github.com/julep-ai/julep/blob/dev/docs/julep-concepts.md#tasks).

In [17]:
import yaml

cloudinary_api_key = os.getenv("CLOUDINARY_API_KEY")
cloudinary_api_secret = os.getenv("CLOUDINARY_API_SECRET")
cloudinary_cloud_name = os.getenv("CLOUDINARY_CLOUD_NAME")

# Define the task
task_def = yaml.safe_load(f"""
name: Video Processing With Natural Language

input_schema:
  type: object
  properties:
    video_url:
      type: string
      description: The url of the file to upload
    public_id:
      type: string
      description: The public id of the file to upload
    transformation_prompt:
      type: string
      description: The prompt for the transformations to apply to the file

tools:
- name: cloudinary_upload
  type: integration
  integration:
    provider: cloudinary
    method: media_upload
    setup:
      cloudinary_api_key: "{cloudinary_api_key}"
      cloudinary_api_secret: "{cloudinary_api_secret}"
      cloudinary_cloud_name: "{cloudinary_cloud_name}"

main:
- tool: cloudinary_upload
  arguments:
    file: '_0.video_url'
    public_id: '_0.public_id'
    upload_params:
      resource_type: "'video'"

- prompt:
  - role: user
    content:

      - type: text
        text: |-
          You are a Cloudinary expert. You are given a medial url. it might be an image or a video.
          You need to come up with a json of transformations to apply to the given media.
          Overall the json could have multiple transformation json objects.
          Each transformation json object can have the multiple key value pairs.
          Each key value pair should have the key as the transformation name like "aspect_ratio", "crop", "width" etc and the value as the transformation parameter value.
          Given below is an example of a transformation json list. Don't provide explanations and/or comments in the json.
          ```json
          [
            {{
              "aspect_ratio": "1.0",
              "width": 250,
            }},
            {{
              "fetch_format": "auto"
            }},
            {{
              "overlay":
              {{
                "url": "<image_url>"
              }}
            }},
            {{
              "flags": "layer_apply"
            }}
          ]
          ```
      - type: image_url
        image_url:
          url: "{{{{_.url}}}}"

      - type: text
        text: |-
          Hey, check the video above, I need to apply the following transformations using cloudinary.
          {{{{_0.transformation_prompt}}}}

  unwrap: true
  settings:
    model: gemini/gemini-1.5-pro

# Extract the json from the model's response
- evaluate:
    model_transformation: load_json(
        _[_.find("```json")+7:][:_[_.find("```json")+7:].find("```")])

- tool: cloudinary_upload
  arguments:
    file: '_0.video_url'
    public_id: '_0.public_id'
    upload_params:
      transformation: '_.model_transformation'
      resource_type: "'video'"

- evaluate:
    transformed_video_url: '_.url'
""")

<span style="color:olive;">Notes:</span>
- The reason for using the quadruple curly braces `{{{{}}}}` for the jinja template is to avoid conflicts with the curly braces when using the `f` formatted strings in python. [More information here](https://stackoverflow.com/questions/64493332/jinja-templating-in-airflow-along-with-formatted-text)
- The `unwrap: True` in the prompt step is used to unwrap the output of the prompt step (to unwrap the `choices[0].message.content` from the output of the model).


## Creating/Updating a task

In [18]:
# creating the task object
task = client.tasks.create_or_update(
    task_id=TASK_UUID,
    agent_id=AGENT_UUID,
    **task_def
)

## Creating an Execution

An execution is a single run of a task. It is a way to run a task with a specific set of inputs.

In [43]:
# creating an execution object
transformation_prompt = """
1- I want to add an overlay an the following image to the video, and apply a layer apply flag also. Here's the image url:
https://res.cloudinary.com/demo/image/upload/logos/cloudinary_icon_white.png

2- I also want you to to blur the video, and add a fade in and fade out effect to the video with a duration of 3 seconds each.
"""

input_video_url = "http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerMeltdowns.mp4"

execution = client.executions.create(
    task_id=TASK_UUID,
    input={
        "video_url":  input_video_url,
        "public_id": "video_test2",
        "transformation_prompt": transformation_prompt,
    }
)
execution.id

'1021a8d6-5050-48c3-b23e-6d96578d1026'

## Checking execution details and output

There are multiple ways to get the execution details and the output:

1. **Get Execution Details**: This method retrieves the details of the execution, including the output of the last transition that took place.

2. **List Transitions**: This method lists all the task steps that have been executed up to this point in time, so the output of a successful execution will be the output of the last transition (first in the transition list as it is in reverse chronological order), which should have a type of `finish`.


<span style="color:olive;">Note: You need to wait for a few seconds for the execution to complete before you can get the final output, so feel free to run the following cells multiple times until you get the final output.</span>


In [37]:
# Get execution details
execution = client.executions.get(execution.id)
# Print the output
print(execution.output)

{'base64': None, 'meta_data': {'api_key': '518455844981529', 'asset_folder': '', 'asset_id': '069d70c84e88ee9293a1d753b3ca3898', 'audio': {'bit_rate': '191999', 'channel_layout': 'stereo', 'channels': 2, 'codec': 'aac', 'frequency': 44100}, 'bit_rate': 1197518, 'bytes': 2252313, 'created_at': '2024-11-20T08:55:44Z', 'display_name': 'video_test2', 'duration': 15.046531, 'etag': '9e87559cba36e6d07f7435c2a8081b3a', 'format': 'mp4', 'frame_rate': 24.0, 'height': 720, 'is_audio': False, 'nb_frames': 361, 'original_filename': 'ForBiggerMeltdowns', 'overwritten': True, 'pages': 0, 'placeholder': False, 'playback_url': 'https://res.cloudinary.com/dpnjjk8mb/video/upload/sp_auto/v1732200108/video_test2.m3u8', 'resource_type': 'video', 'rotation': 0, 'signature': 'e65a0d39a282fc8bd4dd3288d7d35dfffdadc2a8', 'tags': [], 'type': 'upload', 'url': 'http://res.cloudinary.com/dpnjjk8mb/video/upload/v1732200108/video_test2.mp4', 'version': 1732200108, 'version_id': 'a9fccdf7b310edb3c2ff5ffe9335f7f8', 'vi

In [45]:
# Lists all the task steps that have been executed up to this point in time
transitions = client.executions.transitions.list(execution_id=execution.id).items

# Transitions are retreived in reverse chronological order
for transition in reversed(transitions):
    print("Transition type: ", transition.type)
    print("Transition output: ", transition.output)
    print("-"*50)

if transitions[0].type == "finish":
    transformed_video_url = transitions[0].output['transformed_video_url']

Transition type:  init
Transition output:  {'public_id': 'video_test2', 'transformation_prompt': "\n1- I want to add an overlay an the following image to the video, and apply a layer apply flag also. Here's the image url:\nhttps://res.cloudinary.com/demo/image/upload/logos/cloudinary_icon_white.png\n\n2- I also want you to to blur the video, and add a fade in and fade out effect to the video with a duration of 3 seconds each.\n", 'video_url': 'http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerMeltdowns.mp4'}
--------------------------------------------------
Transition type:  step
Transition output:  {'base64': None, 'meta_data': {'api_key': '518455844981529', 'asset_folder': '', 'asset_id': '069d70c84e88ee9293a1d753b3ca3898', 'audio': {'bit_rate': '191999', 'channel_layout': 'stereo', 'channels': 2, 'codec': 'aac', 'frequency': 44100}, 'bit_rate': 1197518, 'bytes': 2252313, 'created_at': '2024-11-20T08:55:44Z', 'display_name': 'video_test2', 'duration': 15.0465

## Video Before Transformation

In [46]:
from IPython.display import Video

Video(input_video_url)

## Video After Transformation

In [47]:
from IPython.display import Video

Video(transformed_video_url)