# Customizing VSS Blueprint

In this excercise, we will create an agent to assist with bridge inspections. This agent will identify issues from drone footage detailing the condition of the bridge. Cosmetic issues should be differentiated from structual issues. Vandalism and landscaping issues should also be reported.

**Remember:** For a quick reminder, go back and check [Intro_To_VSS.ipynb](Intro_To_VSS.ipynb).



## Environment Setup

But first, the standard imports, settings, and helpers from last excercise.

In [1]:
# python imports
from IPython.display import Markdown, Video
from pathlib import Path
import requests
from typing import Literal, Any
from urllib.parse import urljoin

import wizards

In [2]:
# paths and locations
VSS_URL = "http://via-server:8100"
BRIDGE_VIDEO = Path("./bridge.mp4")

In [3]:
# helper functions for talking to the API
def check_response(response):
    """Return the proper response."""
    print(f"Response Code: {response.status_code}")
    response.raise_for_status()
    try:
        return response.json()
    except requests.exceptions.JSONDecodeError:
        return response.text

def vss_api_call(
        target: str,
        params: None | dict[str, str] = None,
        verb: Literal["get"] | Literal ["post"] = "get",
        json: Any = None,
        files: Any = None,
        data: Any = None,
    ) -> dict[str, Any] | str:
    """Query the VIA Agent API and handle the response."""
    # determine the full url
    url = urljoin(VSS_URL, target)

    # query the api
    if verb == "get":
        response = requests.get(url, params=params)
    elif verb == "post":
        response = requests.post(url, params=params, json=json, data=data, files=files)
    else:
        raise ValueError("Verb must be either get or post.")

    return check_response(response)

## Inspect drone footage

In [4]:
Video(BRIDGE_VIDEO, width=350)

## Query the available models

Create a variable called `models_response` that is the list of the available models. 

<details>
    <summary><b>💢 Stuck?</b></summary>

```python
models_response = vss_api_call("models")
```
    
</details>

In [5]:
# askme
models_response = vss_api_call("models")

Response Code: 200


## Upload the video file

In order for the blueprint to access the video file, we must first upload it. This is a simple post to the files API.

In [6]:
with BRIDGE_VIDEO.open("rb") as file:
    # setup query data
    files = {"file": (BRIDGE_VIDEO.name, file)}
    data = {"purpose":"vision", "media_type":"video"}

    # query the vss api
    upload_response = vss_api_call(
        "files",
        verb="post",
        files=files,
        data=data
    )

upload_response

Response Code: 200


{'id': '899f58fe-739f-43c0-b2b9-1d191f073bb7',
 'bytes': 112950948,
 'filename': 'bridge.mp4',
 'purpose': 'vision',
 'media_type': 'video'}

## Customize Prompts

We will also need tailored prompts. 
First, write a `prompt` that instructs the VLM what to look for.

Remember, these usually have three parts:
1. Persona
2. Details
3. Format

<details>
    <summary><b>💢 Stuck?</b></summary>

```python
prompt = (
    "You are a bridge inspector. "
    "Describe any concerns you have with this bridge in detail. "
    "Your concerns may be: "
    "cosmetic, structural, landscaping, or vandalism in nature."
    "Start each event description with a start and end time stamp of the event."
)
```
    
</details>

In [7]:
# askme
prompt = (
    "You are a bridge inspector. "
    "Describe any concerns you have with this bridge in detail. "
    "Your concerns may be: "
    "cosmetic, structural, landscaping, or vandalism in nature."
    "Start each event description with a start and end time stamp of the event."
)

Next we write the `caption_summarization_prompt`. This is the one that rarely changes, so we can start with the default.

In [8]:
caption_summarization_prompt = (
    "If any descriptions have the same meaning and are sequential "
    "then combine them under one sentence and merge the time stamps "
    "to a range. Format the timestamps as 'mm:ss'"
)

Finally, the `summary_aggregation_prompt` that is used to generate the final summary. Best practice is for this to reiterate what details need to be included in the summary and any formatting options.

<details>
    <summary><b>💢 Stuck?</b></summary>

```python
summary_aggregation_prompt = (
    "Based on the available information, generate a report "
    "that describes the condition of the bridge. The report "
    "should be organized into cosemetic, structual, vegetation "
    "overgrowth, and vandalism. The report must include timestamps. "
    "This should be a concise, yet descriptive summary of all "
    "the important events. The format should be intuitive and "
    "easy for a user to understand what happened. Format the "
    "output in Markdown so it can be displayed nicely."
)
```
    
</details>

In [9]:
# askme
summary_aggregation_prompt = (
    "Based on the available information, generate a report "
    "that describes the condition of the bridge. The report "
    "should be organized into cosemetic, structual, vegetation "
    "overgrowth, and vandalism. The report must include timestamps. "
    "This should be a concise, yet descriptive summary of all "
    "the important events. The format should be intuitive and "
    "easy for a user to understand what happened. Format the "
    "output in Markdown so it can be displayed nicely."
)

Everything is now in place to start some basic testing. Starting with basic parameters will be good enough to test the prompts.

In [10]:
body = {
    "id": upload_response.get("id"),
    "prompt": prompt,
    "caption_summarization_prompt": caption_summarization_prompt,
    "summary_aggregation_prompt": summary_aggregation_prompt,
    "model": models_response["data"][0]["id"],
    "max_tokens": 1024,
    "temperature": 0.4,
    "top_p": 1,
    "seed": 1,
    "num_frames_per_chunk": 10,
    "chunk_duration": 30,
    "chunk_overlap_duration": 0,
    "enable_chat": False
}

In [11]:
summarize_response = vss_api_call("summarize", verb="post", json=body)
print(summarize_response["choices"][0]["message"]["content"])

Response Code: 200
**Bridge Condition Report**

### Cosmetic

* 30.0-57.0: The bridge has a lot of rust on it.
* 60.0-87.0: Rusting, graffiti, and stains on the concrete are present.
* 120.0-147.0: Rust on the steel beam, graffiti, and stains on the concrete are present.
* 150.0-177.0: Rust on the metal beam, dirt and debris on the concrete surface, and graffiti on the right side are present.

### Structural

* 60.0-87.0: No visible signs of structural damage or significant wear.
* 120.0-147.0: No visible signs of structural damage or significant wear.
* 150.0-177.0: No visible signs of structural damage or significant wear.

### Vegetation Overgrowth

* 60.0-87.0: Vegetation growth is present.
* 120.0-147.0: Vegetation growth is present.
* 150.0-177.0: Vegetation growth on the left side is present.

### Vandalism

* 60.0-87.0: Graffiti is present.
* 120.0-147.0: Graffiti is present.
* 150.0-177.0: Graffiti on the right side is present.

### Disrepair

* 90.0-117.0: The bridge is in a 

How was that response? Does it to include the right kind of information? If so, then continue on to tuning aditional parameters. If not, revisit your prompts and try again.

## Chunking Parameters

The chunking strategies are very important to tune for each use case.

![Chunking Diagram](chunk_duration.png)

- `num_frames_per_chunk` controls how many still frames will be selected from each chunk for analysis. These frames are the input to the VLM for building an understanding of the video.

- `chunk_overlap_duration` can also be added to the summarization request to configure overlap between chunks. This increases accuracy at chunk boundaries.

- `chunk_duration` (also known as chunk size) is very important to tune based on the use case. The chunk size determines the temporal granularity at which the VLM will view the video.


Longer chunks result in fewer frames being send to the VLM for analysis. 
This causes less granularity in your summaries. However, longer chunks increase the odds that important events happen away from chunk boundaries. They also decrease compute time.

To understand how these parameters affect the final output, change the values in this widget. Find some that look good and run the next cell. How were the results impacted?

In [12]:
(num_frames_per_chunk, chunk_duration, chunk_overlap_duration) = \
    wizards.tune_chunks(body)

HBox(children=(VBox(children=(IntSlider(value=10, description='num_frames_per_chunk', layout=Layout(width='500…

In [13]:
# create a summary with selected values
summarize_response = vss_api_call("summarize", verb="post", json=body)
print(summarize_response["choices"][0]["message"]["content"])

Response Code: 200
**Bridge Condition Report**

### Cosmetic

* 30.0-57.0: The bridge has a lot of rust on it.
* 60.0-87.0: Rusting, paint chipping, and graffiti are present.
* 90.0-117.0: The bridge has cracked concrete, peeling paint, and graffiti.
* 120.0-147.0: Rust on the metal beam, graffiti, and concrete surface cracks and stains are visible.
* 150.00-177.00: Rust on the metal beam, dirt and debris on the concrete surface, and graffiti on the right side are present.

### Structural

* 60.0-87.0: No visible signs of structural damage or significant wear.
* 120.0-147.0: No visible signs of structural damage or significant wear.

### Vegetation Overgrowth

* 60.0-87.0: Vegetation growth is present.
* 90.0-117.0: Overgrown vegetation is visible.
* 120.0-147.0: Vegetation growth is present.
* 150.00-177.00: Vegetation growth is present on the left side.

### Vandalism

* 60.0-87.0: Graffiti is present.
* 90.0-117.0: Graffiti is visible.
* 120.0-147.0: Graffiti is present.
* 150.00-17

Is that better or worse? The decision is up to you. If you wan't some advice, or just want to compare notes, check out these **hints**.

<details>
    <summary><b>HINTS</b></summary>

    Using the recomended prompts and the starting settings, the fidelity of the results is already acceptable. Howver, the input video has hard breaks every 30 seconds. This causes very siloed responses. To accomodate this, while not increasing the fidelity more than necessary:
    
    - increase `chunk_duration_overlap` to 5 to overlap video boundaries
    - increase `chunk_duration` to 35 to maintain 60 total frames
</details>

## Inference Time Model Tuning

The summarize API also accepts parameters to control the LLM during summary generation. The important ones to note are:

- `max_tokens` controls the maximum length of summary generation. This is a hard cutoff and does not change model output. Typically, `1024` is accetable, but smaller values could result in lower costs.
- `temperature` and `top_p` parameters influence the probabilities when choosing the next output token. Generally, lower values tend towards more consistent, but less insightful, responses.
- `seed` helps control the repeatability of outputs by steering random number generation. Pinning this to a specific value helps us test the changes of other parameters.

In [14]:
body["max_tokens"] = 1024 
body["temperature"] = 0.9  # typical range: 0 -> 1
body["top_p"] = 0.9  # typical range: 0 -> 1
body["seed"] = 1  # pinned to a constant value

In [15]:
# create a summary with selected values
summarize_response = vss_api_call("summarize", verb="post", json=body)
print(summarize_response["choices"][0]["message"]["content"])

Response Code: 200
**Bridge Condition Report**

**Cosmetic Concerns**
--------------------

* 0:30-0:57: Rust on the bridge
* 1:00-1:27: Rusting, paint chipping, vegetation growth, graffiti, lack of proper lighting
* 2:00-2:27: Rust on the steel beam, graffiti, vegetation growth, stains on the concrete
* 2:30-2:57: Rust on the metal beam, graffiti, vegetation growth on the bridge, dirt and debris on the surface

**Structural Condition**
-----------------------

* 1:00-1:27: No visible signs of structural damage or significant wear
* 2:00-2:27: No visible signs of structural damage or significant wear
* 2:30-2:57: No visible signs of structural damage or significant wear

**Vegetation Overgrowth**
-------------------------

* 1:00-1:27: Vegetation growth
* 1:30-1:57: Overgrown vegetation
* 2:00-2:27: Vegetation growth
* 2:30-2:57: Vegetation growth on the bridge

**Vandalism**
-------------

* 1:00-1:27: Graffiti
* 1:30-1:57: Graffiti
* 2:00-2:27: Graffiti
* 2:30-2:57: Graffiti

**Other

When the model is consistently returning desirable results, the blueprint has been fine tuned!

# 🥳 Complete

The VSS Blueprint has been fully customized. Feel free to save and close this file and return to the workshop instructions.