# 📸 Youtube Transcript Summarization

This notebook demonstrates how to extract metadata and transcripts from a YouTube video, and then summarize the content using a language model. The example video used in this notebook is [https://www.youtube.com/watch?v=UY76TqYyuVw](https://www.youtube.com/watch?v=UY76TqYyuVw).


## Step 1: Get Video Metadata


In [1]:
import yt_dlp

def extract_youtube_metadata(video_url):
    """
    Extract metadata from a YouTube video using yt_dlp.

    Args:
        video_url (str): The URL of the YouTube video.

    Returns:
        dict: A dictionary containing metadata such as title, description, views, like count, upload date, duration, uploader, and thumbnail URL.
    """
    # YouTubeDL options
    ydl_opts = {
        'extractor_args': {'youtube': {'player_client': ['ios']}}
    }

    try:
        # Extract video metadata
        with yt_dlp.YoutubeDL(ydl_opts) as ydl:
            info = ydl.extract_info(video_url, download=False)

        # Organize metadata
        metadata = {
            "title": info.get("title"),
            "description": info.get("description"),
            "views": info.get("view_count"),
            "like_count": info.get("like_count"),
            "upload_date": info.get("upload_date"),
            "duration": info.get("duration"),  # Duration in seconds
            "uploader": info.get("uploader"),
            "thumbnail": sorted(
                info.get("thumbnails", []),
                key=lambda x: x.get('preference', 0),
                reverse=True
            )[1 if len(info.get("thumbnails", [])) > 1 else 0].get("url") if info.get("thumbnails") else None
        }

        return metadata

    except Exception as e:
        print(f"An error occurred: {e}")
        return None

# Example usage
video_url = "https://www.youtube.com/watch?v=UY76TqYyuVw"
metadata = extract_youtube_metadata(video_url)

if metadata:
    for key, value in metadata.items():
        print(f"{key}: {value}")


[youtube] Extracting URL: https://www.youtube.com/watch?v=UY76TqYyuVw
[youtube] UY76TqYyuVw: Downloading webpage
[youtube] UY76TqYyuVw: Downloading ios player API JSON
[youtube] UY76TqYyuVw: Downloading m3u8 information
title: History of Spices
description: The history of spices presented by Chef Johnna Gale at the Arizona Vegetarian Festival in January 2016

Chef Johnna tells you where individual spices originated and how they traveled around the world to end up on your plate. In this talk she discusses cinnamon, nutmeg, clove, and saffron and tells you which spice was once used as currency, why saffron is so expensive, and how there are actually two different types of cinnamon, as well as lots of other interesting facts.

Visit these links to learn more about the individual spices:
http://www.kitchenshaman.com/following-the-spice-cinnamon/
http://www.kitchenshaman.com/following-the-spice-nutmeg-and-mace/
http://www.kitchenshaman.com/following-the-spice-clove/
http://www.kitchenshaman

## Step 2: Get Video Transcript


In [2]:
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api.formatters import TextFormatter, JSONFormatter
import re

def fetch_all_transcripts(video_url, formatter_type="text"):
    """
    Fetch all transcripts (manual and auto-generated) for a YouTube video.

    Parameters:
    - video_url (str): The URL of the YouTube video.
    - formatter_type (str): The type of formatter ("text" or "json").

    Returns:
    - dict: A dictionary of all transcripts with language codes as keys.
            Auto-generated transcripts are suffixed with '_auto'.
    """
    try:
        # Extract video ID using a regex to support both standard and shortened YouTube URLs
        match = re.search(r"(?:v=|\/)([0-9A-Za-z_-]{11})", video_url)
        if not match:
            raise ValueError("Invalid YouTube URL. Could not extract video ID.")
        video_id = match.group(1)

        # Get all available transcripts
        transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)

        # Choose formatter
        formatter = JSONFormatter() if formatter_type == "json" else TextFormatter()

        transcripts = {}
        for transcript in transcript_list:
            try:
                lang_code = transcript.language_code
                suffix = "_auto" if transcript.is_generated else ""
                fetched_transcript = transcript.fetch()
                formatted_transcript = formatter.format_transcript(fetched_transcript)
                transcripts[f"{lang_code}{suffix}"] = formatted_transcript
            except Exception as e:
                print(f"Error fetching transcript for {lang_code}{suffix}: {e}")

        return transcripts

    except YouTubeTranscriptApi.CouldNotRetrieveTranscript as e:
        print(f"Could not retrieve transcripts for the video: {e}")
    except YouTubeTranscriptApi.TranscriptsDisabled as e:
        print(f"Transcripts are disabled for this video: {e}")
    except ValueError as e:
        print(e)
    except Exception as e:
        print(f"An unexpected error occurred while fetching transcripts: {e}")
    
    return None


def sort_transcripts(transcripts, preferred_languages=["en"]):
    """
    Sort transcripts based on language preferences.

    Parameters:
    - transcripts (dict): A dictionary of all transcripts with language codes as keys.
    - preferred_languages (list): List of preferred language codes in priority order.

    Returns:
    - dict: A sorted dictionary of transcripts.
    """
    sorted_transcripts = {}

    # Preferred manual and auto transcripts
    for lang in preferred_languages:
        if lang in transcripts:
            sorted_transcripts[lang] = transcripts.pop(lang)
        if f"{lang}_auto" in transcripts:
            sorted_transcripts[f"{lang}_auto"] = transcripts.pop(f"{lang}_auto")

    # Remaining manual and auto transcripts
    for lang_code, transcript in transcripts.items():
        sorted_transcripts[lang_code] = transcript

    return sorted_transcripts


# Main execution
video_url = "https://www.youtube.com/watch?v=UY76TqYyuVw"  # Replace with your video link
preferred_languages = ["en"]  # Set your preferred languages in priority order
formatter_type = "text"  # Choose "text" or "json"

# Step 1: Fetch all transcripts
all_transcripts = fetch_all_transcripts(video_url, formatter_type)

if all_transcripts:
    # Step 2: Sort transcripts by preference
    all_transcripts = sort_transcripts(all_transcripts, preferred_languages)
else:
    print("Transcripts not available.")

transcript_lang = list(all_transcripts.keys())[0]
transcript = all_transcripts[list(all_transcripts.keys())[0]]
print(all_transcripts[list(all_transcripts.keys())[0]])


our first speaker um is uh Chef Joe jna
Gail who is uh the also known as the kit
kitchen Shaman founder and executive
chef and CEO of uh kitchen Shaman uh
LLC uh Janna has spent over 20 years
working in professional kitchens in
Arizona New Mexico and even Reon in
Kansas she has u a website and brings uh
so you can look her up after this uh
talk today if you're interested she
brings a lifelong passion for food and
the joys of cooking to her career as a
cooking instructor coach and food writer
John it's great to have you
here
[Applause]
hi everyone thanks for coming out in the
cold wow it's really cold for
Arizona I welcome to the Arizona
vegetarian food festival and um I'm
kicking it off with a talk on the
history and uses of
spices and I think I I started writing
about the history of food because I had
a chef who kept coming up to me going do
you know what this is um yeah do you
know where it's
no okay cool let's let's find out about
this so it all started with the
potato seriously whe

# Step 3: Create Summarization Prompt


In [3]:
summarization_prompt = f"""
<metadata>
**Video Title**: {metadata['title']}
**Description**: 
{metadata['description']}
**Language**: {transcript_lang}
</metadata>

---

## Transcript

{ transcript }

---

## Task Instructions

### 1. Extract Key Information
- Identify the main ideas, critical points, and essential details from the transcript.

### 2. Incorporate Context from Description
- Use the description to provide additional background or context when applicable.

### 3. Use Flexible Markdown Structure
- Format the information using Markdown elements (e.g., headings, bullet points, blockquotes) for clarity and effective presentation.

### 4. Highlight Notable Insights
- Use Markdown syntax to emphasize impactful statements, themes, or quotes.

---

### Additional Notes:
- **Relevance of Description**: Integrate relevant details from the description into the extracted content, such as background, event context, or purpose.
- **Logical Organization**: Group related points or themes for better readability.
- **Markdown Best Practices**: Use elements such as headings, subheadings, bullet points, and blockquotes to enhance structure and clarity.

---

### Example Markdown Structure (adapt as needed):

```markdown
# {{ title }}

## Context
- Description provides context about [specific background or purpose].
- Additional details from description: [Key points or supplementary information].

---

## Key Topics
1. **[Main Topic 1]**
   - Key insight or summary point 1.
   - Additional details or sub-points.
2. **[Main Topic 2]**
   - Key insight or summary point 1.
   - Additional details or sub-points.
...continue as needed for more topics.

---

## Examples or Case Studies
- **Example 1**: Key information or story from the video.
- **Example 2**: Additional example with insights.

---

## Key Takeaways and Insights
### [Insightful Topic or Section Heading]
- Observation or takeaway 1.
- Observation or takeaway 2.
...add more as needed.

---

## Notable Quotes
> "Include key quote or impactful statement from the video."
> "Another important quote emphasizing a key point."

---

## Conclusion
- Summary of the video’s purpose or main message.
- Final actionable insights or reflections.

"""

# Output markdown content

print(summarization_prompt)



<metadata>
**Video Title**: History of Spices
**Description**: 
The history of spices presented by Chef Johnna Gale at the Arizona Vegetarian Festival in January 2016

Chef Johnna tells you where individual spices originated and how they traveled around the world to end up on your plate. In this talk she discusses cinnamon, nutmeg, clove, and saffron and tells you which spice was once used as currency, why saffron is so expensive, and how there are actually two different types of cinnamon, as well as lots of other interesting facts.

Visit these links to learn more about the individual spices:
http://www.kitchenshaman.com/following-the-spice-cinnamon/
http://www.kitchenshaman.com/following-the-spice-nutmeg-and-mace/
http://www.kitchenshaman.com/following-the-spice-clove/
http://www.kitchenshaman.com/following-the-spice-peppercorns/
http://www.kitchenshaman.com/spices-saffron/

Or check out the recipe book mentioned in the video, Delectable Vegan Soups at http://amzn.to/1QomIeV

Subscr

## Step 4: Call LLM API (Gemini, here) with Summarizatino Prompt


In [4]:
import google.generativeai as genai

genai.configure(api_key="your-api-key") # Replace with your API key
model = genai.GenerativeModel("gemini-1.5-flash")
response = model.generate_content(summarization_prompt)
print(response.text)

# A Culinary Journey Through Spice History: Key Insights from Chef Johnna Gale's Talk

## Context

This transcript captures a presentation by Chef Johnna Gale, founder of Kitchen Shaman, at the Arizona Vegetarian Festival in January 2016.  Her talk, "History of Spices," explores the origins and global journeys of various spices, focusing on cinnamon, nutmeg, clove, saffron, and peppercorns.  The description highlights her discussion of the historical significance of spices, including their use as currency and medicinal properties.  Chef Gale also references her website, Kitchen Shaman, and her cookbook, *Delectable Vegan Soups*, offering additional resources for viewers.

---

## Key Topics

1. **The Spice Islands and the Discovery Years:** Chef Gale begins by setting the historical stage around 500 years ago, focusing on the Spice Islands (modern-day Indonesia) and the European powers' quest for sea routes to these islands during the Age of Exploration.  She mentions Columbus, Vasco d