# Video Question And Answering using Gemini Pro Vision Model

Note: Only video frames is supported at the moment based on which questions can be asked and model will answer based on the frames of the video. Audio is not supported.

For now, videos are required to uploaded to Google Cloud Storage and will work on publicly available link.

In [1]:
%cd ..

/Users/isham993/Desktop/Programming-Tutorials/2023-Data-Science/google-gemini-project/getting-started-with-gemini-models-demo


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


### Importing Necessary Libraries

In [2]:
from utils import *

### Authenticate Google Service Account Credentials

In [3]:
authenticate_google_service_account_credentials()

Successfully authenticated with Google service account.


### Model Instantiation

In [4]:
multimodal_model = GenerativeModel(GEMINI_PRO_VISION)

Example of video url in Google Storage:

- `file_path = "github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4"`
- `video_url = f"https://storage.googleapis.com/{file_path}"`

The model only accepts uri for videos at this point. 
- `video_uri = f"gs://{file_path}"`

So, `https://storage.googleapis.com/` will be replaced with `gs://` when passing to `Part.from_uri` class. 

### Getting JSON Response

Lets ask question to the video showcasing Pixel8 commercial.

<video width="500" height="500" controls>
  <source src="https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4" type="video/mp4">
</video>

In [5]:
prompt = """
Answer the following questions using the video only:
What is the profession of the main person?
What are the main features of the phone highlighted?
Which city was this recorded in?
Provide the answer JSON.
"""
video = Part.from_uri(
    uri="gs://github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4",
    mime_type="video/mp4",
)
contents = [prompt, video]

responses = multimodal_model.generate_content(contents, stream=True)


print("\n-------Response--------")
for response in responses:
    print(response.text, end="")


-------Response--------
 ```json
{
  "profession": "photographer",
  "features": "Night Sight, Video Boost",
  "city": "Tokyo"
}
```

### Extracting tags of objects throughout the video

Gemini pro vision model also is able to extract tags from video. Lets see how it does for the following video. 

<video width="500" height="500" controls>
  <source src="https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/photography.mp4" type="video/mp4">
</video>

In [6]:
prompt = """
Answer the following questions using the video only:
- What is in the video?
- What is the action in the video?
- Provide 10 best tags for this video?
"""
video = Part.from_uri(
    uri="gs://github-repo/img/gemini/multimodality_usecases_overview/photography.mp4",
    mime_type="video/mp4",
)
contents = [prompt, video]

responses = multimodal_model.generate_content(contents, stream=True)

print("\n-------Response--------")
for response in responses:
    print(response.text, end="")


-------Response--------
 - The video shows a man in a hat taking pictures of some artifacts on a table.
- The man is taking pictures of the artifacts.
- #photography, #art, #travel, #vacation, #beach, #sun, #sand, #water, #nature, #explore

### Retrieving extra information beyond the video

This is a video of O-Train Confederation Line. Lets see how to extract information beyond the video that is input to the model. 

<video width="500" height="500" controls>
  <source src="https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/ottawatrain3.mp4" type="video/mp4">
</video>

In [7]:
prompt = """
Which line is this?
where does it go?
What are the stations/stops?
"""
video = Part.from_uri(
    uri="gs://github-repo/img/gemini/multimodality_usecases_overview/ottawatrain3.mp4",
    mime_type="video/mp4",
)
contents = [prompt, video]

responses = multimodal_model.generate_content(contents, stream=True)

print("\n-------Response--------")
for response in responses:
    print(response.text, end="")


-------Response--------
 This is the O-Train Confederation Line. It is a light rail line that runs from Tunney's Pasture to Blair. The stations on the line are:

* Tunney's Pasture
* Bayview
* Pimisi
* Dominion
* Carleton
* Confederation
* City Hall
* Parliament
* Rideau
* uOttawa
* Lees
* Hurdman
* Cyrville
* St. Laurent
* Blair