This repository contains the dataset and Python scripts used to collect, clean, analyze, and visualize YouTube comments from the video:
"2025’s Most Important Career Podcast – Make Money Using AI | The Ranveer Show"
Video Link: https://youtu.be/YG1sW00jwLY
The dataset was extracted using the YouTube Data API and then cleaned using Python. After preprocessing, the final cleaned dataset file is:
youtube_comments.csvyoutube_comments_cleaned.csv(after text cleaning)
This project uses five Python files, each handling a different stage of the NLP workflow.
- Uses YouTube Data API (
googleapiclient) - Fetches all top-level comments from the video
- Stores them into
youtube_comments.csv
- Loads the CSV file
- Renames columns properly for consistent processing
- Converts text to lowercase
- Removes punctuation
- Tokenizes words
- Removes English stopwords
- Creates a new column
cleanedin the DataFrame
Contains:
- Word Frequency Calculation
- Bar Plot of Top 20 Words
- Sentiment Analysis using TextBlob
- Sentiment Distribution Graph
Runs the entire workflow:
- Load dataset
- Clean text
- Show word frequencies
- Plot frequency graph
- Perform sentiment analysis
- Plot sentiment graph