<a href="https://colab.research.google.com/github/prisar/ai_notebooks/blob/main/nb_097.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Summarize the video located at "gs://mrc-quant-ml-video-analysis/videoplayback.mp4" using a temporal-aware summarization method involving video chunking, Whisper transcription, vector encoding, and a memory-augmented LLM, and evaluate the faithfulness, temporal coherence, and semantic coverage of the resulting summary.

## Setup environment

### Subtask:
Install necessary libraries (Whisper, LlamaIndex, etc.) and set up authentication for accessing the video from GCS.


**Reasoning**:
Install the necessary libraries using pip.



In [1]:
!pip install whisper llama-index google-cloud-storage moviepy

Collecting whisper
  Downloading whisper-1.1.10.tar.gz (42 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/42.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.8/42.8 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting llama-index
  Downloading llama_index-0.13.0-py3-none-any.whl.metadata (12 kB)
Collecting llama-index-cli<0.6,>=0.5.0 (from llama-index)
  Downloading llama_index_cli-0.5.0-py3-none-any.whl.metadata (1.4 kB)
Collecting llama-index-core<0.14,>=0.13.0 (from llama-index)
  Downloading llama_index_core-0.13.0-py3-none-any.whl.metadata (2.5 kB)
Collecting llama-index-embeddings-openai<0.6,>=0.5.0 (from llama-index)
  Downloading llama_index_embeddings_openai-0.5.0-py3-none-any.whl.metadata (400 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.4.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-

**Reasoning**:
Authenticate to Google Cloud Storage. Since this is running in a Google Cloud environment, the recommended way is to use the default credentials provided by the environment.



In [1]:
from google.cloud import storage

# Initialize a client using the default credentials
storage_client = storage.Client()

print("Google Cloud Storage authentication successful.")

Google Cloud Storage authentication successful.


## Download video

### Subtask:
Download the video from the specified GCS path to the Colab environment.


**Reasoning**:
Define the GCS bucket and blob names, the destination file name, and then download the blob to the specified destination file using the storage client.



In [2]:
# Define the GCS bucket name and source blob name
bucket_name = "mrc-quant-ml-video-analysis"
source_blob_name = "videoplayback.mp4"

# Define the destination file name in the Colab environment
destination_file_name = "downloaded_video.mp4"

# Get the bucket object
bucket = storage_client.get_bucket(bucket_name)

# Get the blob object (the video file) from the bucket
blob = bucket.blob(source_blob_name)

# Download the blob to the specified destination file
blob.download_to_filename(destination_file_name)

# Print a confirmation message
print(f"Video downloaded successfully to {destination_file_name}")

RefreshError: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Engine metadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7849e41d7590>)

## Download video retry

### Subtask:
Retry downloading the video from the specified GCS path to the Colab environment, addressing potential authentication issues.


**Reasoning**:
Attempt to download the video again, ensuring the storage client is initialized and handling potential authentication issues by letting the default credentials try again. If it fails, provide details about the error.



In [3]:
# Define the GCS bucket name and source blob name
bucket_name = "mrc-quant-ml-video-analysis"
source_blob_name = "videoplayback.mp4"

# Define the destination file name in the Colab environment
destination_file_name = "downloaded_video.mp4"

try:
    # Get the bucket object
    bucket = storage_client.get_bucket(bucket_name)

    # Get the blob object (the video file) from the bucket
    blob = bucket.blob(source_blob_name)

    # Download the blob to the specified destination file
    blob.download_to_filename(destination_file_name)

    # Print a confirmation message
    print(f"Video downloaded successfully to {destination_file_name}")

except Exception as e:
    print(f"Error downloading video: {e}")
    print("Please ensure your Colab environment is properly authenticated to access the GCS bucket.")


Error downloading video: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Engine metadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7849e4333a90>)
Please ensure your Colab environment is properly authenticated to access the GCS bucket.


## Summary:

### Data Analysis Key Findings

*   All required libraries for video processing and summarization (`whisper`, `llama-index`, `google-cloud-storage`, `moviepy`) were successfully installed.
*   An initial attempt to download the video from Google Cloud Storage (GCS) failed due to an authentication error (`RefreshError`), indicating a problem with credentials in the Colab environment.
*   A retry of the video download also failed, showing a persistent authentication issue (Status: 404 when trying to retrieve metadata), suggesting that the Colab environment's default credentials are not sufficient or properly configured to access the specified GCS bucket.

### Insights or Next Steps

*   The task of summarizing the video cannot proceed without successfully downloading the video.
*   The next step is to troubleshoot and resolve the Google Cloud Storage authentication issue in the Colab environment, potentially by explicitly authenticating with service account credentials that have access to the GCS bucket.
