Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

This repository contains all the code I wrote to support a case study published on my blog:

The study aims to evaluate bias in the media using sentiment analysis of video titles published by some prominent American TV channels on their Youtube accounts.


A bit of setting up is required before you can run this code.

Google API key

First, you need to get an API key from Google by following the steps described here:

This key will be used for two services:

  • Google Cloud Natural Language API
  • YouTube Data API v3

Once you've acquired a key, save it into a file named google-api-key.txt at the root of this repository.

Python environment

The following Python packages need to be installed in your Python environment:


Acquiring data

Four types of datasets must be generated: channels, topics, videos and sentiment scores.


Create a channels.csv file using the structure detailed in this example:

channels = pandas.DataFrame.from_records([
    {'title': 'Fox News', 'slug': 'fox-news', 'youtube_id': 'UCXIJgqnII2ZOINSWNOGFThA', 'playlist_id': 'UUXIJgqnII2ZOINSWNOGFThA', 'url': '', 'color': '#5975a4'},
    {'title': 'CNN', 'slug': 'cnn', 'youtube_id': 'UCupvZG-5ko_eiXAupbDfxWw', 'playlist_id': 'UUupvZG-5ko_eiXAupbDfxWw', 'url': '', 'color': '#b55d60'},
    {'title': 'MSNBC', 'slug': 'msnbc', 'youtube_id': 'UCaXkIU1QidjPwiAYu6GcHjg', 'playlist_id': 'UUaXkIU1QidjPwiAYu6GcHjg', 'url': '', 'color': '#5f9e6e'},
    {'title': 'CBS News', 'slug': 'cbs-news', 'youtube_id': 'UC8p1vwvWtl6T73JiExfWs1g', 'playlist_id': 'UU8p1vwvWtl6T73JiExfWs1g', 'url': '', 'color': '#666666'},

channels.to_csv('channels.csv', index=False, encoding='utf-8')

The youtube_id is the channel's unique Youtube ID. Finding out a channel's ID is a little tricky:

  • Go to the channel's page (e.g.
  • View the HTML source of the page.
  • Look for "data-channel-external-id" in the HTML source. The value associated with it is the channel's Youtube ID.

The playlist_id corresponds to a channel's default playlist where all its videos are published. To retrieve a channel's playlist_id:


Create a topics.csv file using the structure detailed in this example:

topics = pandas.DataFrame.from_records([
    {'title': 'Obama', 'slug': 'obama', 'variant1': 'Obama', 'variant2': 'Obamas'},
    {'title': 'Clinton', 'slug': 'clinton','variant1': 'Clinton', 'variant2': 'Clintons'},
    {'title': 'Trump', 'slug': 'trump','variant1': 'Trump', 'variant2': 'Trumps'},
    {'title': 'Democrats', 'slug': 'democrats', 'variant1': 'Democrat', 'variant2': 'Democrats'},
    {'title': 'Republicans', 'slug': 'republicans', 'variant1': 'Republican', 'variant2': 'Republicans'},
    {'title': 'Liberals', 'slug': 'liberals', 'variant1': 'Liberal', 'variant2': 'Liberals'},
    {'title': 'Conservatives', 'slug': 'conservatives', 'variant1': 'Conservative', 'variant2': 'Conservatives'},

topics.to_csv('topics.csv', index=False, encoding='utf-8')

The variants are the different terms that will be searched for in the video titles in order to match videos with your topics of choice.


Run the following snippets of code in order to download all the video metadata from Youtube for your channels of choice:

First, this will download all video information and create a separate CSV file for each channel (e.g. videos-cnn.csv):

from code.youtube_api import download_channels_videos


Second, this will merge all the CSV files generated above into a single videos-MERGED.csv file.

from code.youtube_api import merge_channel_videos

Lastly, this will create extra columns for each topic:

from code.utils import create_topic_columns

videos = pd.read_csv('videos-MERGED.csv')
create_topic_columns(videos, topics)
videos.to_csv('videos.csv', index=False, encoding='utf-8')

You now have a videos.csv file containing all the video metadata for all channels.

Sentiment scores

The last step is to download sentiment scores from the Google Natural Language API. Note that this API is not free. Make sure to first refer to the API's pricing page for adequate budgeting.

Run the following:

from code.language_api import download_sentiments


You now have a sentiments.csv file containing the sentiment scores for all relevant videos.

Exploring and analysing the data

Check out my blog post for some inspiration on how to explore and analyze the data:


Identifying bias in the media with sentiment analysis: a case study.




No releases published


No packages published