# YouTube Video Metadata Extractor from Markdown

Workflow:
- Install dependencies
- Load API key from `.env`
- Extract YouTube video IDs from Markdown
- Fetch metadata via YouTube Data API
- Cache results
- Prepare data for Svelte rendering

## 🛠️ Install Required Packages

In [None]:
# Run this once to install dependencies
!pip install isodate python-dotenv

## 🔐 Load Environment Variables (e.g., API Key)

In [3]:
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

YOUTUBE_API_KEY = os.getenv("SECRET__GOOGLE__YOUTUBE__API_KEY")

if not YOUTUBE_API_KEY:
    raise EnvironmentError("YOUTUBE_API_KEY is missing in .env file")

print("YouTube API key loaded successfully.")

YouTube API key loaded successfully.


## Step 1: Extract YouTube URLs from Markdown

In [5]:
import re
import requests
import json
from datetime import datetime, timedelta
from typing import List, Dict, Optional
import isodate

def extract_youtube_urls_from_markdown(markdown: str) -> List[str]:
    """
    Extract all YouTube video IDs from Markdown text.
    Supports: youtu.be/... and youtube.com/watch?v=...
    """
    youtube_regex = r"(?:https?://)?(?:www\.)?(?:youtube\.com/watch\?v=|youtu\.be/)([\w-]{11})"
    matches = re.findall(youtube_regex, markdown)
    return list(set(matches))  # Remove duplicates

# Example Markdown
markdown_content = '''
Summarize & analyze this video:
https://www.youtube.com/watch?v=9S_ETzbAMfg
'''

video_ids = extract_youtube_urls_from_markdown(markdown_content)
print("Extracted Video IDs:", video_ids)

Extracted Video IDs: ['9S_ETzbAMfg']


## Step 2: Parse ISO 8601 Duration (e.g., PT4M30S → 4:30)

In [10]:
def parse_iso_duration(iso_duration: str) -> str:
    """
    Convert ISO 8601 duration (e.g., PT5M20S) to MM:SS format.
    """
    try:
        duration = isodate.parse_duration(iso_duration)
        total_seconds = int(duration.total_seconds())
        minutes = total_seconds // 60
        seconds = total_seconds % 60
        return f"{minutes}:{seconds:02d}"
    except Exception as e:
        return "?:??"

# Test
print(parse_iso_duration("PT3M45S"))

3:45


## Step 3: Fetch Metadata Using YouTube API (with Cache)

In [15]:
# Global cache
cache = {}
CACHE_TTL_HOURS = 24  # Cache for 24 hours

def fetch_youtube_metadata(video_id: str) -> Optional[Dict]:
    now = datetime.utcnow()

    # Check cache
    if video_id in cache:
        cached = cache[video_id]
        if now < cached['expires']:
            print(f"[Cache] Found metadata for {video_id}")
            return cached['data']

    print(f"[API] Fetching metadata for {video_id}")
    
    url = "https://www.googleapis.com/youtube/v3/videos"
    params = {
        'key': YOUTUBE_API_KEY,
        'id': video_id,
        'part': 'snippet,contentDetails'
    }

    response = requests.get(url, params=params)
    
    if response.status_code != 200:
        print(f"Error {response.status_code}: {response.text}")
        return None

    data = response.json()
    
    if not data.get('items'):
        print(f"No video found for ID: {video_id}")
        return None

    item = data['items'][0]
    snippet = item['snippet']
    details = item['contentDetails']

    metadata = {
        'video_id': video_id,
        'title': snippet['title'],
        'thumbnail_url': snippet['thumbnails']['high']['url'],
        'duration': parse_iso_duration(details['duration']),
        'published_at': snippet['publishedAt']
    }

    # Store in cache
    expires = now + timedelta(hours=CACHE_TTL_HOURS)
    cache[video_id] = {
        'data': metadata,
        'expires': expires
    }

    return metadata

# Test
metadata = fetch_youtube_metadata("dQw4w9WgXcQ")
if metadata:
    print(json.dumps(metadata, indent=2))

[API] Fetching metadata for dQw4w9WgXcQ
{
  "video_id": "dQw4w9WgXcQ",
  "title": "Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)",
  "thumbnail_url": "https://i.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg",
  "duration": "3:34",
  "published_at": "2009-10-25T06:57:33Z"
}


  now = datetime.utcnow()


## Step 4: Process All Videos in Markdown

In [12]:
def process_markdown_videos(markdown: str) -> List[Dict]:
    video_ids = extract_youtube_urls_from_markdown(markdown)
    results = []
    for vid in video_ids:
        meta = fetch_youtube_metadata(vid)
        if meta:
            results.append(meta)
    return results

video_cards = process_markdown_videos(markdown_content)

print("Video Cards for Svelte Rendering:")
for card in video_cards:
    print(f"- {card['title']} [{card['duration']}] → {card['thumbnail_url']}")

[API] Fetching metadata for 9S_ETzbAMfg
Video Cards for Svelte Rendering:
- China Reveals New Moon Lander [9:42] → https://i.ytimg.com/vi/9S_ETzbAMfg/hqdefault.jpg


  now = datetime.utcnow()


## Step 5: Output for Svelte Component

In [13]:
# Final data structure to pass to Svelte
svelte_data = {
    "youtubeCards": video_cards
}

print(json.dumps(svelte_data, indent=2))

{
  "youtubeCards": [
    {
      "video_id": "9S_ETzbAMfg",
      "title": "China Reveals New Moon Lander",
      "thumbnail_url": "https://i.ytimg.com/vi/9S_ETzbAMfg/hqdefault.jpg",
      "duration": "9:42",
      "published_at": "2025-08-13T19:00:08Z"
    }
  ]
}


## 🧩 Next Steps

- Move logic to SvelteKit server route (`+server.ts`)
- Use `$env/static/private` to access `YOUTUBE_API_KEY` on server
- In Svelte: `{#each youtubeCards as card}` to render previews
- Add error fallbacks and loading states
- Consider batch fetching for performance