## Part 1: Data Collection
### Purpose
This notebook collects **YouTube Trending Videos in Kenya** using the **YouTube Data API v3**.  
It retrieves video details such as title, description, views, likes, comments, publish date, channel info, and more.

### Steps in This Notebook
1. **Load API Key** from `.env` file for security.
2. **Fetch trending videos** using the `fetch_trending_videos()` function.
3. **Paginate** through results to collect more than 50 videos.
4. **Save raw data** to `data/raw/trending_videos_ke.csv`.

### Functions Overview
- **`fetch_trending_videos(api_key, region_code, max_results)`**  
  Fetches trending videos and their details from the YouTube API.

- **`save_videos_to_csv(videos, filename)`**  
  Saves the collected data to a CSV file.

- **`run_data_collection()`**  
  Orchestrates the data collection process.

### Output
- **CSV file:** `data/raw/trending_videos_ke.csv`  
  Contains the raw video data, which will be used in the next stage: **Data Cleaning**.

In [1]:
# Imports
import os
import pandas as pd
from googleapiclient.discovery import build
from dotenv import load_dotenv

In [2]:
# Load API Key
load_dotenv()
API_KEY = os.getenv("YOUTUBE_API_KEY")

In [3]:
def get_category_mapping(api_key, region_code='KE'):
    youtube = build('youtube', 'v3', developerKey=api_key)
    request = youtube.videoCategories().list(
        part='snippet',
        regionCode=region_code
    )
    response = request.execute()

    category_mapping = {}
    for item in response['items']:
        category_mapping[item['id']] = item['snippet']['title']
    
    return category_mapping


In [8]:
def fetch_trending_videos(api_key, region_code='KE', max_results=500):
    category_mapping = get_category_mapping(api_key, region_code)
    youtube = build('youtube', 'v3', developerKey=api_key)
    videos = []

    request = youtube.videos().list(
        part='snippet,contentDetails,statistics',
        chart='mostPopular',
        regionCode=region_code,
        maxResults=50
    )

    while request and len(videos) < max_results:
        response = request.execute()
        
        for item in response['items']:
            category_id = item['snippet']['categoryId']
            category_name = category_mapping.get(category_id, "Unknown")
            
            video_details = {
                'video_id': item['id'],
                'title': item['snippet']['title'],
                'description': item['snippet']['description'],
                'published_at': item['snippet']['publishedAt'],
                'channel_id': item['snippet']['channelId'],
                'channel_title': item['snippet']['channelTitle'],
                'category_id': category_id,
                'category_name': category_name,
                'tags': item['snippet'].get('tags', []),
                'duration': item['contentDetails']['duration'],
                'definition': item['contentDetails']['definition'],
                'caption': item['contentDetails'].get('caption', 'false'),
                'view_count': item['statistics'].get('viewCount', 0),
                'like_count': item['statistics'].get('likeCount', 0),
                'favorite_count': item['statistics'].get('favoriteCount', 0),
                'comment_count': item['statistics'].get('commentCount', 0)
            }
            videos.append(video_details)
        
        request = youtube.videos().list_next(request, response)

    return videos[:max_results]

In [9]:
# Save results to CSV
def save_videos_to_csv(videos, filename):
    df = pd.DataFrame(videos)
    os.makedirs(os.path.dirname(filename), exist_ok=True)
    df.to_csv(filename, index=False)
    print(f"Youtube Data saved to {filename}")


In [11]:
# Main function
def run_data_collection():
    trending_videos = fetch_trending_videos(API_KEY)
    save_videos_to_csv(trending_videos, "data/raw/trending_videos_ke.csv")

In [12]:
# Run the main function
run_data_collection()

Youtube Data saved to data/raw/trending_videos_ke.csv
