# Customizing VSS Blueprint

In this excercise, we will create an agent to assist with bridge inspections. This agent will identify issues from drone footage detailing the condition of the bridge. Cosmetic issues should be differentiated from structual issues. Vandalism and landscaping issues should also be reported.

**Remember:** For a quick reminder, go back and check [Intro_To_VSS.ipynb](Intro_To_VSS.ipynb).



## Environment Setup

But first, the standard imports, settings, and helpers from last excercise. To save space, we will import these functions from [helpers.py](helpers.py).

In [1]:
# python imports
from IPython.display import Markdown, Video
from pathlib import Path
import requests
from typing import Literal, Any
from urllib.parse import urljoin

import wizards

In [2]:
# paths and locations
VSS_URL = "http://via-server:8100"
BRIDGE_VIDEO = Path("./assets/bridge.mp4")

In [17]:
# helper functions for talking to the API
from helpers import check_response, vss_api_call, Chat

## Inspect drone footage

In [4]:
Video(BRIDGE_VIDEO, width=350)

## Query the available models

Create a variable called `models_response` that is the list of the available models. 

<details>
    <summary><b>💢 Stuck?</b></summary>

```python
models_response = vss_api_call("models")
```
    
</details>

In [5]:
models_response = ...

Response Code: 200


## Upload the video file

In order for the blueprint to access the video file, we must first upload it. This is a simple post to the files API.

In [6]:
with BRIDGE_VIDEO.open("rb") as file:
    # setup query data
    files = {"file": (BRIDGE_VIDEO.name, file)}
    data = {"purpose":"vision", "media_type":"video"}

    # query the vss api
    upload_response = vss_api_call(
        VSS_URL,
        "files",
        verb="post",
        files=files,
        data=data
    )

upload_response

Response Code: 200


{'id': '6cbd65cd-0f3e-4a55-b544-965bdcd9b2ff',
 'bytes': 112950948,
 'filename': 'bridge.mp4',
 'purpose': 'vision',
 'media_type': 'video'}

## Customize Prompts

We will also need tailored prompts. 
First, write a `prompt` that instructs the VLM what to look for.

Remember, these usually have three parts:
1. Persona
2. Details
3. Format

<details>
    <summary><b>💢 Stuck?</b></summary>

```python
prompt = (
    "You are a bridge inspector. "
    "Describe any concerns you have with this bridge in detail. "
    "Your concerns may be: "
    "cosmetic, structural, landscaping, or vandalism in nature."
    "Start each event description with a start and end time stamp of the event."
)
```
    
</details>

In [7]:
prompt = ...

Next we write the `caption_summarization_prompt`. This is the one that rarely changes, so we can start with the default.

In [8]:
caption_summarization_prompt = (
    "If any descriptions have the same meaning and are sequential "
    "then combine them under one sentence and merge the time stamps "
    "to a range. Format the timestamps as 'mm:ss'"
)

Finally, the `summary_aggregation_prompt` that is used to generate the final summary. Best practice is for this to reiterate what details need to be included in the summary and any formatting options.

<details>
    <summary><b>💢 Stuck?</b></summary>

```python
summary_aggregation_prompt = (
    "Based on the available information, generate a report "
    "that describes the condition of the bridge. The report "
    "should be organized into cosemetic, structual, vegetation "
    "overgrowth, and vandalism. The report must include timestamps. "
    "This should be a concise, yet descriptive summary of all "
    "the important events. The format should be intuitive and "
    "easy for a user to understand what happened. Format the "
    "output in Markdown so it can be displayed nicely."
)
```
    
</details>

In [None]:
# askme
summary_aggregation_prompt = ...

Everything is now in place to start some basic testing. Starting with basic parameters will be good enough to test the prompts.

In [10]:
body = {
    "id": upload_response.get("id"),
    "prompt": prompt,
    "caption_summarization_prompt": caption_summarization_prompt,
    "summary_aggregation_prompt": summary_aggregation_prompt,
    "model": models_response["data"][0]["id"],
    "max_tokens": 1024,
    "temperature": 0.4,
    "top_p": 1,
    "seed": 1,
    "num_frames_per_chunk": 10,
    "chunk_duration": 30,
    "chunk_overlap_duration": 0,
    "enable_chat": False
}

In [11]:
summarize_response = vss_api_call(VSS_URL, "summarize", verb="post", json=body)
print(summarize_response["choices"][0]["message"]["content"])

Response Code: 200
**Bridge Condition Report**

**Cosmetic Concerns**
--------------------

* **Rust and Chipping Paint**: 0:30-0:57, 1:00-1:27, 2:00-2:27
	+ The bridge's metal components show signs of rust and chipping paint.
* **Graffiti**: 1:00-1:27, 1:30-1:57, 2:00-2:27, 150:00-177:00
	+ Graffiti is present on the bridge's walls and surfaces.
* **Vegetation Growth**: 1:00-1:27, 1:30-1:57, 2:00-2:27, 150:00-177:00
	+ Vegetation is overgrown on the bridge, particularly on the left side.
* **Stains on Concrete**: 2:00-2:27
	+ Stains are visible on the concrete surface of the bridge.
* **Dirt and Debris**: 150:00-177:00
	+ Dirt and debris are present on the concrete surface of the bridge.

**Structural Condition**
------------------------

* **No Visible Signs of Damage**: 1:00-1:27, 2:00-2:27, 150:00-177:00
	+ The bridge appears to be in good condition overall, with no visible signs of structural damage or significant wear.
* **Concrete Crumbling**: 1:30-1:57
	+ The concrete is crumbl

How was that response? Does it to include the right kind of information? If so, then continue on to tuning aditional parameters. If not, revisit your prompts and try again.

## Chunking Parameters

The chunking strategies are very important to tune for each use case.

![Chunking Diagram](./assets/chunk_duration.png)

- `num_frames_per_chunk` controls how many still frames will be selected from each chunk for analysis. These frames are the input to the VLM for building an understanding of the video.

- `chunk_overlap_duration` can also be added to the summarization request to configure overlap between chunks. This increases accuracy at chunk boundaries.

- `chunk_duration` (also known as chunk size) is very important to tune based on the use case. The chunk size determines the temporal granularity at which the VLM will view the video.


Longer chunks result in fewer frames being send to the VLM for analysis. 
This causes less granularity in your summaries. However, longer chunks increase the odds that important events happen away from chunk boundaries. They also decrease compute time.

To understand how these parameters affect the final output, change the values in this widget. Find some that look good and run the next cell. How were the results impacted?

In [12]:
(num_frames_per_chunk, chunk_duration, chunk_overlap_duration) = \
    wizards.tune_chunks(body)

HBox(children=(VBox(children=(IntSlider(value=10, description='num_frames_per_chunk', layout=Layout(width='500…

In [13]:
# create a summary with selected values
summarize_response = vss_api_call(VSS_URL, "summarize", verb="post", json=body)
print(summarize_response["choices"][0]["message"]["content"])

Response Code: 200
**Bridge Condition Report**

### Cosmetic

* **Rust and Paint Chipping**: Notable rust on metal components, particularly on cross beams (1:00-1:27, 1:30-2:27)
* **Graffiti**: Visible graffiti on the bridge, detracting from its appearance (1:00-1:27, 1:30-2:27)
* **Dirt and Debris**: Presence of dirt and debris on the surface (150:00-177:00)

### Structural

* **Concrete Damage**: Crumbling, cracked concrete (1:30-2:27)
* **Missing Support Beams**: Some support beams are missing (1:30-2:27)
* **Rusted Metal**: Rusted metal components, potentially weakening the structure (1:00-1:27, 1:30-2:27)

### Vegetation Overgrowth

* **Vegetation Growth**: Vegetation growing on the bridge, potentially causing damage (1:00-1:27, 1:30-2:27)
* **Overgrown Vegetation**: Surrounding vegetation is overgrown (1:30-2:27)

### Vandalism

* **Graffiti**: Visible graffiti on the bridge, detracting from its appearance (1:00-1:27, 1:30-2:27)

**Timestamps**

* 0:00-0:27: Bridge not visible
* 

Is that better or worse? The decision is up to you. If you wan't some advice, or just want to compare notes, check out these **hints**.

<details>
    <summary><b>HINTS</b></summary>

    Using the recomended prompts and the starting settings, the fidelity of the results is already acceptable. Howver, the input video has hard breaks every 30 seconds. This causes very siloed responses. To accomodate this, while not increasing the fidelity more than necessary:
    
    - increase `chunk_duration_overlap` to 5 to overlap video boundaries
    - increase `chunk_duration` to 35 to maintain 60 total frames
</details>

## Inference Time Model Tuning

The summarize API also accepts parameters to control the LLM during summary generation. The important ones to note are:

- `max_tokens` controls the maximum length of summary generation. This is a hard cutoff and does not change model output. Typically, `1024` is accetable, but smaller values could result in lower costs.
- `temperature` and `top_p` parameters influence the probabilities when choosing the next output token. Generally, lower values tend towards more consistent, but less insightful, responses.
- `seed` helps control the repeatability of outputs by steering random number generation. Pinning this to a specific value helps us test the changes of other parameters.

In [14]:
body["max_tokens"] = 1024 
body["temperature"] = 0.9  # typical range: 0 -> 1
body["top_p"] = 0.9  # typical range: 0 -> 1
body["seed"] = 1  # pinned to a constant value

In [15]:
# create a summary with selected values
summarize_response = vss_api_call(VSS_URL, "summarize", verb="post", json=body)
print(summarize_response["choices"][0]["message"]["content"])

Response Code: 200
**Bridge Condition Report**

**Cosmetic Concerns**
--------------------

* **Rust**: Visible rust on the bridge's metal beam (30.0-57.0, 120.0-147.0)
* **Paint Chipping**: Peeling paint on the bridge's surface (90.0-117.0)
* **Graffiti**: Visible graffiti on the bridge's surface (60.0-87.0, 90.0-117.0, 150.00-177.00)
* **Vegetation Growth**: Overgrown vegetation on the bridge's surface (60.0-87.0, 90.0-117.0, 150.00-177.00)
* **Dirt and Debris**: Dirt and debris on the concrete surface (150.00-177.00)
* **Missing Guardrail**: A missing guardrail on the bridge (120.0-147.0)

**Structural Concerns**
----------------------

* **Cracked Concrete**: Cracked concrete on the bridge's surface (90.0-117.0)

**Vegetation Overgrowth**
------------------------

* **Vegetation Growth**: Overgrown vegetation on the bridge's surface (60.0-87.0, 90.0-117.0, 150.00-177.00)

**Vandalism**
-------------

* **Graffiti**: Visible graffiti on the bridge's surface (60.0-87.0, 90.0-117.0, 1

## Enable Chat

When the model is consistently returning desirable results, the blueprint has been fine tuned. You can now start building your custom application.

Before we do that, thought, we need to do one last summarization to enable chat and build the RAG databases.

In [61]:
body["enable_chat"] = True
summarize_response = vss_api_call(VSS_URL, "summarize", verb="post", json=body)

Response Code: 200


# 🥳 Complete

The VSS Blueprint has been fully customized. Feel free to save and close this file and return to the workshop instructions.


---

# 🤓 Extra Credit

It's not necessary... yet... but if you wanted to chat with this video, here is a playground. This *might* be useful soon.

In [63]:
chat_client = Chat(
    VSS_URL, 
    video_id=upload_response.get("id"),
    model_id=models_response["data"][0]["id"]
)

In [66]:
chat_client.query("Are you ready?")

Response Code: 200


'Hello there! How can I assist you today?'