<a href="https://colab.research.google.com/github/pattichis/AI4All-Med/blob/main/GEMINI_Example_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Video Transcription and Summarization Example

The example provides a template for generating transcriptions from videos and summarizing them.

It uses a speech-to-text model (Whisper) and the GEMINI large language model.

You can modify the prompts to provide better prompts.

To run the example, you will need to setup an account in Google Cloud. In December, 2025, Google starts you with $300 credit that you can use to run the example.



# Installation instructions

It is suggested that you click on Gemini to help you with the following steps to make sure they are executed correctly.

1. After opening the file in Google Colab, use <tt>"Save a copy in drive"</tt> to save it in your local drive.
2. The example used in class was from [Doctor / Patient C-Section Discussion (w/ subtitles) | My Birth Matters](https://www.youtube.com/watch?v=6KPx1SlCWAo)
3. To download YouTube videos in Mac OS, you need to install <t>yt-dlp</t>. In Mac OS, using brew, open a command prompt and type:
```
   brew install yt-dlp
```
In Windows, download yt-dlp.exe from
   [GitHub](https://github.com/yt-dlp/yt-dlp/releases/tag/2025.12.08)
Once you install <t>yt-dlp</t>, to download the example presented in class, at the command prompt, execute:
```
yt-dlp --cookies-from-browser chrome "https://www.youtube.com/watch?v=6KPx1SlCWAo"
```
Then rename the file to <tt>video_example.mp4</tt>.

4. After you run Google Colab, upload the file <tt>video_example.mp4</tt>.

5. You will need to setup an account on [Google Cloud](https://cloud.google.com/).

6. Setup an API key using [Google's AI studio](https://aistudio.google.com/api-keys). Name this key: 'GEMINI_API_KEY'.

7. Click on the key on the left of Google Colab and add 'GEMINI_API_KEY' as a key. Then from [Google's AI studio](https://aistudio.google.com/api-keys), click on <tt>copy</tt> to copy the value to the added API Key in Google Colab.


## Whisper
Whisper is a Transformer sequence-to-sequence model trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection.

This model was develped by OpenAI. [[GitHub](https://github.com/openai/whisper)]

In this notebook, we are going to use faster-whisper which is a C-implemented version of OpenAI's Whisper. [[GitHub](https://github.com/SYSTRAN/faster-whisper)]

In [39]:
#@title 1. Install Dependencies
!pip install -qU pytube yt-dlp faster-whisper

In [41]:
from google.colab import files

print("BEFORE UPLOADING, RENAME your video file to BEFORE UPLOADING: video_example.mp4")
uploaded = files.upload()

for filename in uploaded.keys():
    print(f"Uploaded file: {filename}")
    # Assuming the user uploads only one video file for transcription
    # If multiple files are uploaded, you'll need to specify which one to use
    # For now, we'll take the first one.
    uploaded_video_filename = filename
    break

print(f"Please note the uploaded filename: {uploaded_video_filename}")


Saving video_example.mp4 to video_example (3).mp4
Uploaded file: video_example (3).mp4
Please note the uploaded filename: video_example (3).mp4


In [42]:
#@title 3. Set-up Whisper
import torch
from faster_whisper import WhisperModel

model_size = "large-v3"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# Run on GPU with FP16
whisper = WhisperModel(model_size, device=device, compute_type="float16")

In [43]:
#@title 4. Transcribe the audio/video
import os
from tqdm import tqdm

video_path = "video_example.mp4"
main_out_folder = './'

segments, info = whisper.transcribe(video_path)

output_filename = f"{main_out_folder}/transcription.txt"
with open(output_filename, "w") as f:
    for segment in tqdm(segments):
        f.write("[%.2fs -> %.2fs] %s\n" % (segment.start, segment.end, segment.text))

print(f"Transcription saved to {output_filename}")

45it [00:38,  1.18it/s]

Transcription saved to .//transcription.txt





## GEMINI
GEMINI is a large language model developed by Google DeepMind.

To access it, create an account and an API key in Google's AI Studio [[link](https://aistudio.google.com/app/apikey)].

In [57]:
#@title 1. Setup
import google.generativeai as genai
from google.colab import userdata

# Ensure your API key is set in your environment variables
try:
  GOOGLE_API_KEY=userdata.get('GEMINI_API_KEY') # Use the standard GEMINI_API_KEY name
  genai.configure(api_key=GOOGLE_API_KEY)
except Exception as e:
  print(f"There was an unknown error. Ensure you have a secret 'GEMINI_API_KEY' stored in Colab and it's a valid key from https://aistudio.google.com/app/apikey")
  raise e

model = genai.GenerativeModel('gemini-2.5-flash')


In [58]:
#@title 2. Upload the file to Google's servers
# This works for TXT, PDF, CSV, and even Audio/Video files
print("Uploading file. Rename your file to  ...")
transcript_file = genai.upload_file(
    path=output_filename,
    display_name="Video Transcript"
    )

Uploading file...


In [59]:
#@title 3. Wait for file processing (Good practice, though usually instant for text)
import time

while transcript_file.state.name == "PROCESSING":
    print("Processing file...")
    time.sleep(2)
    transcript_file = genai.get_file(transcript_file.name)

if transcript_file.state.name == "FAILED":
    raise ValueError(transcript_file.state.name)

print(f"File uploaded successfully: {transcript_file.uri}")

File uploaded successfully: https://generativelanguage.googleapis.com/v1beta/files/tcwtx89ngefq


In [60]:
#@title 4. Define the Prompt
# Note: We do NOT paste the text here. We just give instructions.

prompt = """
Analyze the provided file and generate TWO distinct outputs based on the instructions below.

### OUTPUT 1: The "Poor" Summary
Summarize the converstation between the doctor and patient about C-section
**Label:** [POOR RESULT]

### OUTPUT 2: The "Expert" Analysis
**Instruction:** Act as a Board-Certified OB/GYN. Analyze the transcript for clinical decision points regarding C-sections.
**Requirements:**
1. Extract key specific medical arguments made.
2. Provide a balanced "Pros vs. Cons" analysis for each argument based on current medical guidelines.
3. Conclude with a clinical assessment.
**Format:** Strictly use this schema:
# CLINICAL MOMENTS
================================================
Argument Raised: "text from transcript"
  Medical Evaluation: "your expert analysis"
  Pros: "list of benefits"
  Cons: "list of risks"
================================================
Expert Summary: "Your detailed clinical synthesis"

**Label:** [EXPERT RESULT]
"""
print(prompt)


Analyze the provided file and generate TWO distinct outputs based on the instructions below.

### OUTPUT 1: The "Poor" Summary
Summarize the converstation between the doctor and patient about C-section
**Label:** [POOR RESULT]

### OUTPUT 2: The "Expert" Analysis
**Instruction:** Act as a Board-Certified OB/GYN. Analyze the transcript for clinical decision points regarding C-sections.
**Requirements:**
1. Extract key specific medical arguments made.
2. Provide a balanced "Pros vs. Cons" analysis for each argument based on current medical guidelines.
3. Conclude with a clinical assessment.
**Format:** Strictly use this schema:
# CLINICAL MOMENTS
Argument Raised: "text from transcript"
  Medical Evaluation: "your expert analysis"
  Pros: "list of benefits"
  Cons: "list of risks"
Expert Summary: "Your detailed clinical synthesis"

**Label:** [EXPERT RESULT]



In [61]:
# --- 4. Generate Content ---
# Key Change: We pass a LIST containing both the prompt text AND the file object
response = model.generate_content([prompt, transcript_file])

# --- 5. Display Results ---
print("--- Result ---")
print(response.text)

--- Result ---
[POOR RESULT]
The conversation is between a doctor and a patient, Mary, about C-sections. Mary is confused about whether C-sections are sometimes done unnecessarily. The doctor explains that while C-sections are vital in some cases to save lives, they have become too common and many could be avoided. Reasons given for this include conflicting information from electronic fetal monitoring leading to false alarms, the perception that C-sections are "no big deal" because they are common, and that their risks like infection and blood loss are often overlooked. The doctor also mentions that a first C-section greatly increases the chance of future C-sections, with risks increasing each time. To avoid an unnecessary C-section, the doctor advises maintaining a healthy weight and activity, taking birth classes, checking hospital/doctor C-section rates, having a skilled support person like a doula, and communicating birth preferences to the medical team.

---

[EXPERT RESULT]
# CLI

In [62]:
#@title 7. Cleanup (Optional but recommended)
# Files are stored for 48 hours by default, but you can delete them manually
genai.delete_file(transcript_file.name)
print("\n(Temporary file deleted from cloud storage)")


(Temporary file deleted from cloud storage)


In [63]:
#@title 8. Post-process and Save answer. (Optional but recommended)
import re

full_response = response.text

# Extract Poor Result
poor_result_match = re.search(r'\[POOR RESULT\]\n(.*?)(?=\n\[EXPERT RESULT\]|$)', full_response, re.DOTALL)
poor_result = poor_result_match.group(1).strip() if poor_result_match else "Poor Result Not Found"

# Extract Expert Result (including the JSON block)
expert_result_match = re.search(r'\[EXPERT RESULT\]\n(.*)', full_response, re.DOTALL)
expert_result = expert_result_match.group(1).strip() if expert_result_match else "Expert Result Not Found"

# Save Poor Result to a file
with open("poor_result.txt", "w") as f:
    f.write(poor_result)
print("Poor Result saved to poor_result.txt")

# Save Expert Result to a file
with open("expert_result.txt", "w") as f:
    f.write(expert_result)
print("Expert Result saved to expert_result.txt")

Poor Result saved to poor_result.txt
Expert Result saved to expert_result.txt
