# YouTube Comments Analysis with LangSmith Insights

This example demonstrates how to use LangSmith Insights to analyze YouTube comments. We'll:
1. Fetch recent comments from YouTube videos matching a search query
2. Store the comments in a CSV file for reference
3. Use LangSmith Insights to automatically cluster comments by theme and sentiment

## Prerequisites

You will need:
1. A YouTube Data API key

<details>
<summary><b>How to get your YouTube Data API Key</b> (click to expand)</summary>

1. Go to [Google Cloud Console](https://console.cloud.google.com)
2. Create a new project or select an existing one from the dropdown
3. Enable the YouTube Data API v3:
   - Navigate to "APIs & Services" → "Library"
   - Search for "YouTube Data API v3" and click "Enable"
4. Create credentials:
   - Go to "APIs & Services" → "Credentials"
   - Click "Create Credentials" → "API Key"
   - Copy your new API key
5. (Optional but recommended) Restrict your key:
   - Click on your new key → "API restrictions"
   - Select "Restrict key" and choose "YouTube Data API v3"
   - Save changes

</details>

2. A LangSmith API key

<details>
<summary><b>How to get your LangSmith API key</b> (click to expand)</summary>

1. Go to [LangSmith Settings](https://smith.langchain.com/settings)
2. Click "Create API Key"
3. Copy your new API key

</details>


## Setup

Before running the notebook, set your API keys as environment variables in your terminal:
```bash
export YOUTUBE_API_KEY=your-youtube-api-key-here
export LANGSMITH_API_KEY=your-langsmith-api-key
```

In [None]:
import csv
import os
from datetime import datetime, timedelta, timezone
from pathlib import Path

import requests
from langsmith import Client

In [None]:
DATA_DIR = Path("data")
DATA_DIR.mkdir(parents=True, exist_ok=True)
CSV_PATH = DATA_DIR / "youtube_comments.csv"

# TODO: Configure your YouTube search parameters below
YOUTUBE_QUERY = "LangChain"  # Your search term
YOUTUBE_LOOKBACK_DAYS = 21  
YOUTUBE_MAX_VIDEOS = 50  
YOUTUBE_MAX_COMMENTS = 60
YOUTUBE_SEARCH_ORDER = "date"  # Options: date, relevance, viewCount, rating

print(
    f"Configuration: query='{YOUTUBE_QUERY}', "
    f"lookback={YOUTUBE_LOOKBACK_DAYS} days, "
    f"max_videos={YOUTUBE_MAX_VIDEOS}, "
    f"max_comments={YOUTUBE_MAX_COMMENTS}"
)

## Fetch YouTube Comments

First, we'll search for recent videos matching our query, then fetch comments from those videos.

In [None]:
YOUTUBE_API_KEY = os.getenv("YOUTUBE_API_KEY")
if not YOUTUBE_API_KEY:
    raise RuntimeError(
        "YOUTUBE_API_KEY environment variable is required. "
        "Get an API key at https://console.cloud.google.com/apis/credentials"
    )

YOUTUBE_SEARCH_URL = "https://www.googleapis.com/youtube/v3/search"
YOUTUBE_COMMENTS_URL = "https://www.googleapis.com/youtube/v3/commentThreads"

# Search for videos
published_after = (
    datetime.now(timezone.utc) - timedelta(days=YOUTUBE_LOOKBACK_DAYS)
).isoformat()

video_params = {
    "q": YOUTUBE_QUERY,
    "part": "snippet",
    "type": "video",
    "order": YOUTUBE_SEARCH_ORDER,
    "publishedAfter": published_after,
    "maxResults": min(50, YOUTUBE_MAX_VIDEOS),
    "key": YOUTUBE_API_KEY,
}

video_resp = requests.get(YOUTUBE_SEARCH_URL, params=video_params, timeout=30)
video_resp.raise_for_status()
videos = [
    item["id"]["videoId"]
    for item in video_resp.json().get("items", [])
][:YOUTUBE_MAX_VIDEOS]

print(f"Found {len(videos)} videos")

In [None]:
# Fetch comments from videos
comment_rows = []

for video_id in videos:
    remaining = YOUTUBE_MAX_COMMENTS - len(comment_rows)
    if remaining <= 0:
        break

    comment_params = {
        "part": "snippet",
        "videoId": video_id,
        "order": "relevance",
        "textFormat": "plainText",
        "maxResults": min(100, remaining),
        "key": YOUTUBE_API_KEY,
    }

    try:
        comment_resp = requests.get(
            YOUTUBE_COMMENTS_URL, params=comment_params, timeout=30
        )
        comment_resp.raise_for_status()
        items = comment_resp.json().get("items", [])

        for item in items[:remaining]:
            snippet = item["snippet"]["topLevelComment"]["snippet"]
            comment_text = snippet.get("textDisplay", "").strip()
            if comment_text:
                comment_rows.append({"text": comment_text})
    except requests.HTTPError as e:
        print(f"Skipping video {video_id}: {e}")
        continue

print(f"Collected {len(comment_rows)} comments from {len(videos)} videos")

## Save Comments to CSV

Save the comments locally for reference.

In [None]:
if comment_rows:
    with CSV_PATH.open("w", newline="", encoding="utf-8") as fh:
        writer = csv.DictWriter(fh, fieldnames=["text"])
        writer.writeheader()
        writer.writerows(comment_rows)
    print(f"Saved {len(comment_rows)} comments to {CSV_PATH}")
else:
    print("No comments collected")

## Analyze with LangSmith Insights

Now we'll send the comments to LangSmith Insights for automatic clustering and analysis. The Insights API will:
- Identify common themes across comments
- Group similar comments together
- Generate summaries for each cluster
- Provide a visual interface to explore the results

In [None]:
# Format comments for the Insights API
# chat_histories expects a list of text data (comments, conversations, support tickets, user feedback, etc.)
chat_histories = [row["text"] for row in comment_rows]

print(f"Prepared {len(chat_histories)} comments for analysis")
if chat_histories:
    print(f"\nExample comment:\n{chat_histories[0][:500]}...")

In [None]:
client = Client()

report = client.generate_insights(
    chat_histories=chat_histories,
    instructions=(
        "These are comments from YouTube videos about LangChain. "
        "I want to understand user sentiment, common questions, "
        "feature requests, and pain points related to LangChain."
    ),
    name=f"youtube-{YOUTUBE_QUERY.lower()}-{datetime.now(timezone.utc):%Y%m%d-%H%M}",
)