# YouTube Comment Scraping for Smartphone Comparison

## Introduction
As part of this project on customer sentiment analysis of modern smartphones—**iPhone 15, Samsung S24, and Google Pixel 7**—we begin by collecting user-generated feedback from YouTube. YouTube is a widely used platform where users actively share their opinions in the form of video comments. These comments serve as valuable data points for understanding public sentiment toward each device.

To automate this process, a Python-based function was developed using the youtube-comment-downloader and langdetect libraries. The function takes a YouTube video URL and extracts up to 1000 English-language comments, filtering out non-English ones for better consistency in analysis. The collected comments are saved into CSV files for further sentiment analysis and comparison across the three smartphone models.

This method ensures a structured and language-specific dataset, which forms the foundation for reliable opinion mining and visual sentiment insights.



In [1]:
from youtube_comment_downloader import YoutubeCommentDownloader  # Import the comment downloader
from langdetect import detect  # For detecting the language of the comment
import pandas as pd  # For working with DataFrames and saving to CSV

def scrape_english_youtube_comments(video_url, output_filename, max_comments=1000):
    """
    Scrape English comments from a YouTube video and save them to a CSV file.
    
    Parameters:
    - video_url: str : URL of the YouTube video
    - output_filename: str : Filename to save the comments CSV
    - max_comments: int : Max number of English comments to save (default is 1000)
    """
    downloader = YoutubeCommentDownloader()  # Create an instance of the comment downloader
    comments = downloader.get_comments_from_url(video_url)  # Get all comments from the given video URL

    filtered_comments = []  # List to store only English comments

    for comment in comments:
        text = comment.get('text', '')  # Get the text of the comment (default to empty string if not found)
        try:
            lang = detect(text)  # Detect the language of the comment
            if lang == 'en':  # If language is English
                filtered_comments.append(text)  # Add to filtered list
        except:
            continue  # Skip comments that can't be processed (e.g., very short or symbols only)

        if len(filtered_comments) >= max_comments:  # Stop if we've reached the desired count
            break

    # Save filtered comments to a CSV file
    df = pd.DataFrame(filtered_comments, columns=["Comment"])  # Create DataFrame with a single column
    df.to_csv(output_filename, index=False)  # Save DataFrame to CSV without index
    print(f"✅ Saved {len(filtered_comments)} English comments to '{output_filename}'.")  # Print completion message

In [3]:
# iPhone 15 video comments
scrape_english_youtube_comments(
    'https://www.youtube.com/watch?v=0X0Jm8QValY&t=6s',  # iPhone 15 video URL
    'youtube_english_comments_iphone15.csv'  # Output filename
)

✅ Saved 1000 English comments to 'youtube_english_comments_iphone15.csv'.


In [5]:
# Samsung S24 video comments
scrape_english_youtube_comments(
    'https://www.youtube.com/watch?v=uqHM3utsvfE',  # S24 video URL
    'youtube_english_comments_s24.csv'  # Output filename
)

✅ Saved 429 English comments to 'youtube_english_comments_s24.csv'.


In [7]:
# Google Pixel 7 video comments
scrape_english_youtube_comments(
    'https://www.youtube.com/watch?v=TtoYxgRcWR8&t=1s',  # Replace with actual Pixel 7 video URL
    'youtube_english_comments_pixel7.csv'  # Output filename
)

✅ Saved 376 English comments to 'youtube_english_comments_pixel7.csv'.
