## Video Testing

In [None]:
# =========================== Add your Token here ===========================

%env API_TOKEN=

# ===========================================================================

In [4]:
import sys
import yt_dlp
import torch
import clip
import os
from openai import OpenAI

import base64
import requests
from IPython.display import Image
from PIL import Image
from pprint import pprint
import pandas as pd
import numpy as np


device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)



Connect to OpenAI

In [5]:

client = OpenAI(
    # This is the default and can be omitted
    api_key=os.environ.get("API_TOKEN"),
)

In [6]:
#test to make sure you're connected to the AI
chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Say this is a test",
        }
    ],
    model="gpt-3.5-turbo",
)
print(chat_completion.choices[0].message.content)

This is a test.


# Video Testing

### Create Batches and Pull Features from video clips as preprocessing


In [7]:
#read-in the images

#Make sure you sort the images correctly when reading them in
def sort_directory_files(directory):
    # Ensure that the provided path ends with a directory separator if needed
    if not directory.endswith(os.sep):
        directory += os.sep
    
    files = sorted(filter(os.path.isfile, (os.path.join(directory, name) for name in os.listdir(directory))))  # Get only file paths and sort them

    return files

# read in the images and make sure they're all there
clip_directory = './dashcam/rgb-images/GRMN0105/'
# clip_directory = './cartoon_test/'
image_files = sort_directory_files(clip_directory)
print("Image Count:", len(image_files))
# for file in image_files:
#     print("Found image:", file)
print(image_files[0:10])

Image Count: 1800
['./dashcam/rgb-images/GRMN0105/00001.jpg', './dashcam/rgb-images/GRMN0105/00002.jpg', './dashcam/rgb-images/GRMN0105/00003.jpg', './dashcam/rgb-images/GRMN0105/00004.jpg', './dashcam/rgb-images/GRMN0105/00005.jpg', './dashcam/rgb-images/GRMN0105/00006.jpg', './dashcam/rgb-images/GRMN0105/00007.jpg', './dashcam/rgb-images/GRMN0105/00008.jpg', './dashcam/rgb-images/GRMN0105/00009.jpg', './dashcam/rgb-images/GRMN0105/00010.jpg']


#### Batches

In [9]:
#set your batch size
batch_size = 30
total_images = len(image_files)
indices = np.arange(total_images)

# create a list of lists of the images and their indicues at batch size length for the images in image_files
def create_batches(indices, batch_size):
    for i in range(0,len(indices), batch_size):
        yield indices[i:i + batch_size]
batches = list(create_batches(indices, batch_size))


# create a list of lists of the batch size length for the images in image_files
# batches = [image_files[i:i + batch_size] for i in range(0, len(image_files), batch_size)]


print("number of batches:", len(batches))
print(batches[0])



number of batches: 60
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29]


#### Implement the model

Some information about OpenAI's gpt-4o model:\
*"The model is best at answering general questions about what is present in the images. While it does understand the relationship between objects in images, it is not yet optimized to answer detailed questions about the location of certain objects in an image. For example, you can ask it what color a car is or what some ideas for dinner might be based on what is in you fridge, but if you show it an image of a room and ask it where the chair is, it may not answer the question correctly."* - [openai](https://platform.openai.com/docs/guides/vision)


OpenAI essentially lets us know that object detection is still in the early stages for LMMs. This coincides with our hypothesis. Here, the intuition is to now test how well it actually is at object detection in videos. The 1 minute video has about 1800 image clips (frames) to be processed. There are various ways to deal with the frames but here we want to try to run all 1800 on the model because of the nature of our data. 

Dashcam data, although noisey, has differences and changes in every frame because the video is constantly moving (unless stopped at traffic light). So testing all we can is important to try to get the lmm to detect the smallest details possible.

##### First LMM

Run the video frame batches through the LMM to get a list of summaries.

----

Processing the 1800 frames brought about 

In [11]:
## Set the API key and model name
MODEL="gpt-4o"
client = OpenAI(api_key=os.environ.get("LAS_API_TOKEN", "<your OpenAI API key if not set as an env var>"))

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")
    
def gpt_frame_summaries(video_frames, image_files):
    ret  = []
    for frame in video_frames:
        ret.append(
            {
                "type": "image_url", "image_url": {
                    "url": f"data:image/png;base64,{encode_image(image_files[frame])}"
                }
            })
    return ret


### there are too many video frames here. You need to find a way to get them all working

# here we turned the model into a function so we can run chunks through it later
def run_lmm(video_files, prompt1):

    answers = []
    #turn this into a dataframe with second/summary

    response = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": prompt1,},
            {"role": "user", "content": gpt_frame_summaries(video_files, image_files)},
            # {"role": "system", "content": transcript},
            # {"role": "system", "content": prompt2},
        ],
        temperature=0.0,
    )
    # print(response.choices[0].message.content)
    answers.append(response.choices[0].message.content)
    return answers



## Get Video Summaries from batches

def vid_batch_summary(images, prompt1):

    #create a list of the summaries to run later
    summaries = []
    #get the indices of the files within each batch
    #run then through the lmm and get a list of their summaries
    for i, f in enumerate(images):
        indicies = []
        print(i)
        for inx, file in enumerate(f):
            print("index",i, inx)
            indicies.append(inx)
        print("********************************************* Batch:", i, "****÷********************************************")
        print(indicies)
        answers = run_lmm([i], prompt1)
        summaries.append(answers)
        print(summaries)
    return summaries






prompt_obfuscation ="Each image represents a new image frame in a video. These image frames represent one second's time in the video. From these image frames: identify the number of unique cars shown and describe them. Also provide the second in the video to mark the time. Then see if any of the unique cars shown are temporarily hidden by other cars or objects within the video."
prompt_uknown = "Each image represents a new image frame in a video. These image frames represent one second's time in the video. From these image frames:identify the number of unique cars shown and describe them. Particularly note if there any cars of vehicle type objects that appear."
summary_list = []
for batch in batches[0:8]:
    print("List with image batches:", batch)
    answers = run_lmm(batch, prompt_uknown)
    print(answers)
    summary_list.append(answers)
    


List with image batches: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29]
['In the provided sequence of images, there are several unique cars and vehicle-type objects visible. Here is a summary:\n\n1. **Yellow Taxi**: A yellow car, likely a taxi, is consistently visible in the center lane ahead of the camera.\n2. **White Van**: A white van is parked on the right side of the road.\n3. **Yellow Car**: Another yellow car is parked on the right side of the road.\n4. **Dark-colored Car**: A dark-colored car is parked on the right side of the road.\n5. **White Car**: A white car is parked on the left side of the road.\n6. **Tanker Truck**: A large tanker truck is visible on the left side of the road.\n7. **Tuk-tuk**: A three-wheeled vehicle (tuk-tuk) is visible on the left side of the road.\n\nThese vehicles appear consistently across the sequence of images, indicating they are stationary or moving slowly relative to the camera.']
List with image b

In [12]:
#save the summaries to csv just in case
summary_df = pd.DataFrame(summary_list)
# summary_df.to_csv('lmm_unconventionalV_summaries.csv')

##### Second LMM
Now we can run the summaries through the lmm again so that we can see if it'll get a good description of everything sequentially


In [13]:
#convert the lists to string so we can put it into the second llm

llm_content = ""
for summary in summary_list:
    llm_content = " ".join([llm_content, str(summary)])

print(llm_content.lstrip())

['In the provided sequence of images, there are several unique cars and vehicle-type objects visible. Here is a summary:\n\n1. **Yellow Taxi**: A yellow car, likely a taxi, is consistently visible in the center lane ahead of the camera.\n2. **White Van**: A white van is parked on the right side of the road.\n3. **Yellow Car**: Another yellow car is parked on the right side of the road.\n4. **Dark-colored Car**: A dark-colored car is parked on the right side of the road.\n5. **White Car**: A white car is parked on the left side of the road.\n6. **Tanker Truck**: A large tanker truck is visible on the left side of the road.\n7. **Tuk-tuk**: A three-wheeled vehicle (tuk-tuk) is visible on the left side of the road.\n\nThese vehicles appear consistently across the sequence of images, indicating they are stationary or moving slowly relative to the camera.'] ['In the provided sequence of images, there are several unique vehicles visible:\n\n1. A yellow car in the center lane, which appears c

In [14]:

#implement model on summaries to get descriptions
#also get summary of description


response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "Please summarize in detail the description of these summaries about dashcam footage and unconventional or unknown vehicles."},
        # {
            # "type": "csv",
            # "csv": {summaries}, }
        {"role": "user", "content": llm_content},
        # {"role": "system", "content": transcript},
        # {"role": "system", "content": "Please provide a summary of the video."},
    ],
    temperature=0.0,
    )
print(response.choices[0].message.content)



The summaries describe a sequence of images showing various unique vehicles and vehicle-type objects. Here is a detailed synthesis of the descriptions:

1. **Yellow Car/Sedan/Taxi**: A yellow vehicle, often identified as a sedan or taxi, is consistently visible in the center lane ahead of the camera in all frames. This vehicle is a prominent feature across the sequences.

2. **White Van**: A white van is frequently mentioned as being parked on the right side of the road. In some descriptions, a second white van is also noted further down the road.

3. **White Pickup Truck**: This vehicle is commonly seen parked on the right side of the road, near the white van. In some summaries, it is also mentioned moving in the opposite direction on the left side of the road.

4. **Dark-colored Car**: A dark-colored car is often noted as being parked on the right side of the road. In some instances, it is described as moving in the opposite direction on the left side of the road.

5. **Yellow Car (P

Two problem overview ...

### Obfuscation
The summaries for obfuscation show some understanding that cars have been hidden or covered by other objects in the video, but upon reading the outputs it doesn't seem to catch the relevant details to include structured data.

The first prompt to process the first 3 seconds of the video was:
 *Each image represents a new image frame in a video. These image frames represent one second's time in the video. From these image frames: identify the number of unique cars shown and describe them. Also provide the second in the video to mark the time. Then see if any of the unique cars shown are temporarily hidden by other cars or objects within the video.* 

The response output was:

'In the provided video frames, there are several unique cars visible. Here is a breakdown of the unique cars and their visibility:
1. **Yellow Taxi (in front of the camera)**\n   - Visible in all frames (0s to 1s).\n\n
2. **White Van (parked on the right side)**\n   - Visible in all frames (0s to 1s).\n\n
3. **Yellow Car (parked on the right side)**\n   - Visible in all frames (0s to 1s).\n\n
4. **Dark-colored Car (parked on the right side)**\n   - Visible in all frames (0s to 1s).\n\n
5. **White Car (parked on the left side)**\n   - Visible in all frames (0s to 1s).\n\n
6. **Dark-colored Car (driving on the left side)**\n   - Visible in all frames (0s to 1s).\n\n
7. **Tanker Truck (on the left side)**\n   - Visible in all frames (0s to 1s).\n\n
8. **Three-wheeler (on the left side)**\n   - Visible in frames 0s to 0.5s, then hidden by the tanker truck from 0.5s to 1s.\n\n

In summary, there are eight unique vehicles visible in the video frames, with the three-wheeler being temporarily hidden by the tanker truck from 0.5s to 1s.'

'In the provided sequence of images, there are several unique cars visible. Here is a breakdown of the unique cars and their visibility:\n\n
1. **Yellow Car (Taxi) in the center lane:**\n   - Visible in all frames.\n   - Not hidden by any other cars or objects.\n\n
2. **White Car on the left side of the road:**\n   - Visible in all frames.\n   - Not hidden by any other cars or objects.\n\n
3. **Gray Car on the right side of the road:**\n   - Visible in all frames.\n   - Not hidden by any other cars or objects.\n\n
4. **White Van on the right side of the road:**\n   - Visible in all frames.\n   - Not hidden by any other cars or objects.\n\n
5. **White Truck on the right side of the road:**\n   - Visible in all frames.\n   - Not hidden by any other cars or objects.\n\n
6. **Yellow Car on the left side of the road (near the white car):**\n   - Visible in all frames.\n   - Not hidden by any other cars or objects.\n\n
7. **Black Car in the left lane:**\n   - Visible in all frames.\n   - Not hidden by any other cars or objects.\n\n
8. **White Car in the left lane (behind the black car):**\n   - Visible in all frames.\n   - Not hidden by any other cars or objects.\n\n
All the unique cars mentioned above are consistently visible throughout the sequence of images and are not temporarily hidden by other cars or objects'

'In the provided video frames, there are several unique cars visible. Here is the breakdown:\n\n
1. **Yellow Sedan (in front of the camera)**\n   - Visible in all frames.\n   - Time: 11:14:13 to 11:14:14\n\n
2. **White Van (parked on the right side)**\n   - Visible in all frames.\n   - Time: 11:14:13 to 11:14:14\n\n
3. **Yellow Sedan (on the left side of the road)**\n   - Visible in all frames.\n   - Time: 11:14:13 to 11:14:14\n\n
4. **White Van (parked on the left side)**\n   - Visible in all frames.\n   - Time: 11:14:13 to 11:14:14\n\n
5. **White Truck (parked on the right side)**\n   - Visible in all frames.\n   - Time: 11:14:13 to 11:14:14\n\n
6. **Yellow Sedan (moving towards the camera on the left side)**\n   - Visible in all frames.\n   - Time: 11:14:13 to 11:14:14\n\n
7. **White Sedan (moving towards the camera on the left side)**\n   - Visible in all frames.\n   - Time: 11:14:13 to 11:14:14\n\nNone of the unique cars are temporarily hidden by other cars or objects within the video frames provided.'

--


A second pass of running the text summaries through an llm for the first 3 seconds of the video show that gpt-4o could not fully catch these details:
The summaries describe dashcam footage capturing a street scene with various vehicles identified and their visibility analyzed. Here is a detailed synthesis of the descriptions. 

The prompt was *"Please summarize in detail the description of these summaries about dashcam footage and vehicle obfuscation."*:

1. **Common Elements Across Summaries:**
   - **Timestamp and Coordinates:** The footage is consistently timestamped at 03/01/2023 11:14:12, with the vehicle traveling at 37 MPH.
   - **Visibility of Vehicles:** All summaries identify multiple vehicles, noting their positions and visibility in the frames.

2. **Unique Cars Identified:**
   - **Yellow Car (Taxi) in Front:** All summaries mention a yellow car, likely a taxi, consistently visible in the center lane ahead of the camera vehicle.
   - **White Van on the Right Side:** A white van parked on the right side of the road is visible in all frames.
   - **Additional Vehicles on the Right Side:** Various summaries mention additional vehicles on the right side, including a white minibus, a white truck, and another yellow car.
   - **Vehicles on the Left Side:** A white car and a tuk-tuk (auto rickshaw) are noted on the left side, with the tuk-tuk initially hidden but becoming visible later. Other summaries mention a black car and another yellow car on the left side, with temporary obstructions.

3. **Temporary Obstructions:**
   - **Tuk-tuk (Auto Rickshaw):** Initially hidden behind a white car on the left side, becoming visible as the camera vehicle moves forward.
   - **Other Vehicles:** Some vehicles on the left side are temporarily hidden by a white van or the camera vehicle's frame and sun glare.

4. **Summary of Unique Cars:**
   - **Total Unique Cars Identified:** The number of unique cars identified varies slightly across summaries, ranging from 6 to 10. This discrepancy is due to different interpretations of visibility and temporary obstructions.

In conclusion, the dashcam footage captures a street scene with multiple vehicles identified, including a consistently visible yellow taxi in the center lane and various parked vehicles on both sides of the road. Temporary obstructions are noted, particularly for vehicles on the left side, due to other vehicles or environmental factors like sun glare.


### Unknown/Unconventional Summaries

The results for identifying unknown or less conventional vehicles like tuk-tuks or multi-sized trucks are shown below. We explored the first 8 seconds of the video to get a wider range of potentially unconventional types. 

The prompt used was *"Each image represents a new image frame in a video. These image frames represent one second's time in the video. From these image frames:identify the number of unique cars shown and describe them. Particularly note if there any cars of vehicle type objects that appear."* 

The response output gave us:


['In the provided sequence of images, there are several unique cars and vehicle-type objects visible. Here is a summary:\n\n
1. **Yellow Taxi**: A yellow car, likely a taxi, is consistently visible in the center lane ahead of the camera.\n
2. **White Van**: A white van is parked on the right side of the road.\n
3. **Yellow Car**: Another yellow car is parked on the right side of the road.\n
4. **Dark-colored Car**: A dark-colored car is parked on the right side of the road.\n
5. **White Car**: A white car is parked on the left side of the road.\n
6. **Tanker Truck**: A large tanker truck is visible on the left side of the road.\n
7. **Tuk-tuk**: A three-wheeled vehicle (tuk-tuk) is visible on the left side of the road.\n\nThese vehicles appear consistently across the sequence of images, indicating they are stationary or moving slowly relative to the camera.']

['In the provided sequence of images, there are several unique vehicles visible:\n\n
1. A yellow car in the center lane, which appears consistently in all frames.\n
2. A white van on the left side of the road, visible in multiple frames.\n
3. A white pickup truck on the right side of the road, visible in multiple frames.\n
4. A dark-colored car on the right side of the road, visible in multiple frames.\n
5. A yellow car on the left side of the road, visible in multiple frames.\n
6. A white car on the left side of the road, visible in multiple frames.\n
7. A dark-colored vehicle in the center lane, visible in multiple frames.\n\n

These vehicles appear consistently across the sequence of images, indicating they are present throughout the duration of the video segment.']


['In the provided sequence of images, there are several unique vehicles visible. Here is a summary of the vehicles observed:\n\n
1. **Yellow Sedan**: This car is consistently seen in the middle lane, moving forward.\n
2. **White Van**: Parked on the right side of the road.\n
3. **White Pickup Truck**: Parked on the right side of the road, near the white van.\n
4. **Yellow Sedan (Second)**: Parked on the right side of the road, further down from the white van and pickup truck.\n
5. **White Minibus**: Moving in the left lane, visible in the background.\n
6. **Yellow Sedan (Third)**: Moving in the left lane, visible in the background.\n
7. **White Van (Second)**: Parked on the right side of the road, further down from the first white van and pickup truck.\n\n
In total, there are seven unique vehicles observed in the images.']


['In the provided image frames, there are several unique vehicles visible. Here is a summary of the vehicles observed:\n\n
1. **Yellow Sedan**: This car is directly in front of the camera in all frames.\n
2. **White Van**: This vehicle is visible on the left side of the road in the first few frames.\n
3. **White Pickup Truck**: This vehicle is parked on the right side of the road in all frames.\n
4. **Yellow Sedan (Parked)**: This vehicle is parked on the right side of the road in all frames.\n
5. **White Van (Parked)**: This vehicle is parked on the right side of the road in all frames.\n
6. **Another Yellow Sedan (Parked)**: This vehicle is parked on the right side of the road in all frames.\n\n
In total, there are six unique vehicles visible in the frames.']


['In the provided image frames, there are several unique cars and vehicle-type objects visible. Here is a summary of the vehicles observed:\n\n
1. **Yellow Car**: A yellow car is prominently visible in the center of the road in all frames.\n
2. **White Van**: A white van is parked on the right side of the road near the buildings.\n
3. **Black Car**: A black car is parked on the right side of the road near the buildings.\n
4. **Three-Wheeler**: A three-wheeler vehicle is visible on the left side of the road, moving forward.\n
5. **Other Cars**: There are several other cars visible in the distance on both sides of the road.\n\n
In total, there are at least five unique vehicle types observed in the frames.']


['In the provided sequence of images, there are several unique cars visible. Here is a description of the cars and vehicle-type objects observed:\n\n
1. **Yellow Car**: This car is consistently visible in the center lane, moving forward.\n
2. **White Van**: Parked on the right side of the road, near the sidewalk.\n
3. **Dark Red Car**: Parked on the right side of the road, near the sidewalk.\n
4. **Another Yellow Car**: Parked on the right side of the road, near the sidewalk.\n
5. **White Car**: Parked on the left side of the road, near the median.\n
6. **Dark Car**: Moving in the opposite direction on the left side of the road.\n\n
These vehicles are visible throughout the sequence of images, with some moving and others parked.']


['In the sequence of images provided, there are several unique cars and vehicle-type objects visible:\n\n
1. **Yellow Sedan**: This car is consistently visible in the lane ahead of the camera vehicle.\n
2. **White Van**: Parked on the right side of the road.\n
3. **Yellow Taxi**: Parked on the right side of the road.\n
. **Silver Sedan**: Moving in the opposite direction on the left side of the road.\n
5. **Motorcycle**: Visible in the later frames, moving in the same direction as the camera vehicle.\n\n
These vehicles appear consistently across the frames, indicating their presence in the scene.']

['In the sequence of images provided, there are several unique cars visible. Here is a summary of the vehicles observed:\n\n
1. **Yellow Sedan**: This car is prominently visible in the center of the road, moving forward.\n
2. **White Pickup Truck**: This vehicle is seen on the left side of the road, moving in the opposite direction.\n
3. **Multiple Yellow Sedans**: Parked along the right side of the road, there are several yellow sedans visible.\n
4. **Silver Van**: Parked on the right side of the road, near the buildings.\n\n
These vehicles appear consistently across the frames, indicating their presence in the scene.']


---

A second pass of running the text summaries through an llm for the first 7 seconds of the video show that gpt-4o could not fully catch all of the details, but it does identify unconventional vehicles.


The response output was:

The summaries describe a sequence of images showing various unique vehicles and vehicle-type objects. Here is a detailed synthesis of the descriptions:

1. **Yellow Car/Sedan/Taxi**: A yellow vehicle, often identified as a sedan or taxi, is consistently visible in the center lane ahead of the camera in all frames. This vehicle is a prominent feature across the sequences.

2. **White Van**: A white van is frequently mentioned as being parked on the right side of the road. In some descriptions, a second white van is also noted further down the road.

3. **White Pickup Truck**: This vehicle is commonly seen parked on the right side of the road, near the white van. In some summaries, it is also mentioned moving in the opposite direction on the left side of the road.

4. **Dark-colored Car**: A dark-colored car is often noted as being parked on the right side of the road. In some instances, it is described as moving in the opposite direction on the left side of the road.

5. **Yellow Car (Parked)**: Another yellow car, sometimes referred to as a sedan, is parked on the right side of the road. Multiple yellow sedans are mentioned in some summaries.

6. **White Car**: A white car is mentioned as being parked on the left side of the road, near the median.

7. **Tanker Truck**: A large tanker truck is visible on the left side of the road in one summary.

8. **Tuk-tuk/Three-Wheeler**: A three-wheeled vehicle, often referred to as a tuk-tuk, is visible on the left side of the road, moving forward.

9. **White Minibus**: Mentioned in one summary as moving in the left lane, visible in the background.

10. **Silver Sedan**: A silver sedan is noted as moving in the opposite direction on the left side of the road in some summaries.

11. **Motorcycle**: A motorcycle is visible in the later frames, moving in the same direction as the camera vehicle.

12. **Dark Red Car**: A dark red car is mentioned as being parked on the right side of the road near the sidewalk in one summary.

13. **Black Car**: A black car is noted as being parked on the right side of the road near the buildings in one summary.

14. **Other Cars**: Several other cars are visible in the distance on both sides of the road in some summaries.

Overall, the vehicles appear consistently across the sequence of images, indicating their presence throughout the duration of the video segment. Some vehicles are stationary, while others are moving, providing a dynamic scene.
