# Emotional Consistency among Political Ideologies: An Approach to Address Polarization on Youtube

Group 5:
- Chance Landis (ChancL), Hanna Lee (Lee10), Jason Sun (YongXs), Andy Wong (WongA22)

## Credit Listing
- Hanna: Democratic-leaning Channels Data Collection
- Chance: Republican-leaning Channels Data Collection
- Andy: Exploratory Data Analysis
- Jason: Text Documentation

## Problem Statement
We want to figure out whether the current political polarization is associated with the emotional values expressed by each party. Due to the current politically charged environment of our country, the semblance of sympathizing toward a value that is not related to your political party causes backlash. This fear of backlash can create a “false” polarized environment, which is an extension of the fear itself. The question that arises is whether these boundaries are reinforced by the people themselves and/or  external factors, like social media.

## Research Question
1. Do political parties exhibit similar emotional responses to differing ideologies?

## Data Collection
To investigate this topic, we will analyze content from the top five YouTube channels associated with Democratic and Republican viewpoints, based on subscriber counts. The channels selected for the study are:

- **Democratic-leaning Channels**: Vice, Vox, MSNBC, The Daily Show, The Young Turks
- **Republican-leaning Channels**: Fox News, Ben Shapiro, Steven Crowder, The Daily Mail, The Daily Wire

We have identified eight key ideologies for this analysis to understand if there are emotional differences in how political parties discuss these topics. For each ideology, a set of keywords has been established to facilitate data scraping:
a
- **ISIS**: Terrorism, Extremism, Radical
- **Guns**: Shootings, School shooting, Firearms, Gun control, NRA, Second Amendment
- **Immigration**: Border control, Mexico, Visa /Citizenship, Asylum, Deportation, Refugee
- **Economy**: Budget deficit, Unemployment, Inflation, Interest rate, Federal Reserve, Market, Employment
- **Health care**: Medicaid, Covid, Obamacare, Public health, Insurance
- **Socio-economic**: Rich / poor, Income inequality, Poverty, Wealth distribution
- **Abortion**: Pregnancy, Unwanted Pregnancy, Roe, Wade, Abortion, Pro-life, Rape, Incest, Life of mother, Religion
- **Climate change**: Global Warming, Carbo, Alternative Energy, Climate, Methane, Emissions, Gas, Greenhouse

### Sources of Information
- **AllSides**: A media bias tool that provides a rating based on "multi-partisan Editorial Reviews by trained experts and Blind Bias Surveys™ in which participants rate content without knowing the source." We used this tool to determine how we should classify the most popular (based on subscriber count) YouTube channels we found. (Source: https://www.allsides.com/media-bias/media-bias-rating-methods)
- **HypeAuitor**: A company that uses a data-driven approach to influencer marketing. In the process, they collated lists of YouTube based on category, subscriber count, and country. This allowed us to find YouTube channels that focused on news and politics with the most subscribers. (Source: https://hypeauditor.com/about/company/, https://hypeauditor.com/top-youtube-news-politics-united-states/)
- **Pew Research Center**: A nonpartisan, nonprofit organization that conducts research on public opinion, demographic trends, and social issues. It provides data-driven insights into various aspects of social science issues, explicitly stating they do not take a stance on political issues. For our research, we relied on their studies on political ideologies and alignment with political parties as a reference. (Source: https://www.pewresearch.org/about/, https://www.pewresearch.org/politics/2016/06/22/5-views-of-parties-positions-on-issues-ideologies/)
- **YouTube**: As a group, we've chosen to expand our collection of YouTube videos by selecting additional keywords associated with the ideology we're studying. Our focus will be on gathering comments from these videos to conduct our research.
    - We used a combination of Andy and Hanna's code to get the comments from YouTube channels.

### Top 5 Democratic YouTube Channels
Vice, Vox, MSNBC, The Daily Show, The Young Turks

#### Setup

In [None]:
!pip install --upgrade google-api-python-client --quiet
!pip install nltk --quiet
!pip install nrclex --quiet

In [1]:
# imports
import json
import pandas as pd

import nltk

import googleapiclient
import googleapiclient.discovery
import googleapiclient.errors
from googleapiclient.errors import HttpError

import re
import datetime
from datetime import datetime
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
from nltk.tokenize import casual

from nrclex import NRCLex

#### Define API / Lists / Dictionaries

In [123]:
# API call
# vice: AIzaSyA2rNi_MI-3LQkBzzQ6Tn4EF0lgXWoilfc, AIzaSyAfHslkgOEDPAnr5_jB1b2wxZKthApBXNw, AIzaSyCxD0YFA3bAPMoK2ovmCFnKhW7yqrNshEQ
# vox: AIzaSyAoeLCEEfqmnpRHR4xRMKt1YdbeUUw75ao, AIzaSyDnBTOIfxF4o-aFiBKJNtkZJBKajBzYDlI
# msnbc: AIzaSyAbCFiuHc9untZ17tyV5A6rlpKNib4qviE
# daily show: AIzaSyD8adQZlhLNVQrQXpU5-u3s1Y-9TZs20ik
# young turk: AIzaSyB8yyrUrfQGLrlQRmF555oc1emrIDXF7yU

# Others: API_KEY = "AIzaSyCjWja_yyRROSw5tcP_KxYjasJgHLX3oKE"
# API_KEY = "AIzaSyCjWja_yyRROSw5tcP_KxYjasJgHLX3oKE"
API_KEY = "AIzaSyAfHslkgOEDPAnr5_jB1b2wxZKthApBXNw"

youtube = googleapiclient.discovery.build("youtube", "v3", developerKey=API_KEY)

In [3]:
# Define channels
channels = ["Vice", "Vox", "msnbc", "thedailyshow", "TheYoungTurks"]

In [63]:
# Establish keyword dictionary
keyword_lists = {
    "isis": ["ISIS", "Radicalist", "Islamic State", "Jihadist", "Syria conflict", "Iraq insurgency", "Al-Qaeda", "Radical Islam", "Suicide bombings", "Mosul"],
    "guns": ["Gun", "Shooting", "School shooting", "Firearm", "Gun control", "NRA", "Second Amendment"],
    "immigration": ["Immigration", "Border control", "Mexico", "Visa", "Citizenship", "Asylum", "Deportation", "Refugee"],
    "economy": ["Economy", "Budget deficit", "Unemployment rate", "Inflation", "Interest rate", "Federal reserve", "Recession", "GDP", "Consumer Price Index", "Trade Balance", "Stock Exchange", "Central bank", "Consumer spending", "NASDAQ", "Dow Jones", "S&P", "currency exchange", "Financial crisis", "Investment strategies", "Credit rating", "Commodities", "Real estate market", "Banking sector"],
    "healthcare": ["Health care", "Medicaid", "Covid", "Obamacare", "Public health", "Insurance", "Universal healthcare", "Private healthcare", "Medicare", "Patient rights", "Vaccination", "Pandemics"], 
    "socioeco": ["Socio-economic", "Rich", "Poor", "Income inequality", "Poverty", "Wealth distribution", "Minimum Wage", "Financial Insecurity", "Welfare", "Homelessness", "Financial Literacy"],
    "abortion": ["Abortion", "Pregnancy", "Unwanted Pregnancy", "Roe", "Wade", "Pro-life", "Planned Parenthood", "Fetal rights", "Life of mother", "Reproductive", "Women's health", "Gestational", "Late-term abortion", "Post-abortion syndrome", "Safe haven laws", "Mifepristone", "Misoprostol", "Dobbs", "Pro-choice", "Anti-abortion"],
    "climate": ["Climate change", "Global Warming", "Carbon", "Alternative Energy", "Climate", "Methane", "Emissions", "Gas", "Greenhouse", "Renewable energy", "Fossil fuels", "Deforestation", "Carbon footprint"]
}

In [81]:
# Establish keyword dictionary
keyword_lists2 = {
    "guns": ["Gun", "Shooting", "School shooting", "Firearm", "Gun control", "NRA", "Second Amendment"],
    "immigration": ["Immigration", "Border control", "Mexico", "Visa", "Citizenship", "Asylum", "Deportation", "Refugee"],
    "healthcare": ["Health care", "Medicaid", "Covid", "Obamacare", "Public health", "Insurance", "Universal healthcare", "Private healthcare", "Medicare", "Patient rights", "Vaccination", "Pandemics"], 
    "climate": ["Climate change", "Global Warming", "Carbon", "Alternative Energy", "Climate", "Methane", "Emissions", "Gas", "Greenhouse", "Renewable energy", "Fossil fuels", "Deforestation", "Carbon footprint"]
}

In [87]:
# Define keywords
keyword_isis = {
    "isis": ["ISIS", "Radicalist", "Islamic State", "Jihadist", "Syria conflict", "Iraq insurgency", "Al-Qaeda", "Radical Islam", "Suicide bombings", "Mosul"]
}

In [None]:
keyword_guns = {
    "guns": ["Gun", "Shooting", "School shooting", "Firearm", "Gun control", "NRA", "Second Amendment"]
}

In [None]:
keyword_immigration = {
    "immigration": ["Immigration", "Border control", "Mexico", "Visa", "Citizenship", "Asylum", "Deportation", "Refugee"]
}

In [88]:
keyword_economy = {
    "economy": ["Economy", "Budget deficit", "Unemployment rate", "Inflation", "Interest rate", "Federal reserve", "Recession", "GDP", "Consumer Price Index", "Trade Balance", "Stock Exchange", "Central bank", "Consumer spending", "NASDAQ", "Dow Jones", "S&P", "currency exchange", "Financial crisis", "Investment strategies", "Credit rating", "Commodities", "Real estate market", "Banking sector"]
}

In [None]:
keyword_healthcare = {
    "healthcare": ["Health care", "Medicaid", "Covid", "Obamacare", "Public health", "Insurance", "Universal healthcare", "Private healthcare", "Medicare", "Patient rights", "Vaccination", "Pandemics"] 
}

In [89]:
keyword_socioeco = {
    "socioeco": ["Socio-economic", "Rich", "Poor", "Income inequality", "Poverty", "Wealth distribution", "Minimum Wage", "Financial Insecurity", "Welfare", "Homelessness", "Financial Literacy"],
}

In [None]:
keyword_abortion = {
    "abortion": ["Abortion", "Pregnancy", "Unwanted Pregnancy", "Roe", "Wade", "Pro-life", "Planned Parenthood", "Fetal rights", "Life of mother", "Reproductive", "Women's health", "Gestational", "Late-term abortion", "Post-abortion syndrome", "Safe haven laws", "Mifepristone", "Misoprostol", "Dobbs", "Pro-choice", "Anti-abortion"]
}

In [94]:
keyword_climate = {
    "climate": ["Climate change", "Global Warming", "Carbon", "Alternative Energy", "Climate", "Methane", "Emissions", "Gas", "Greenhouse", "Renewable energy", "Fossil fuels", "Deforestation", "Carbon footprint"]
}

#### Functions

In [5]:
# Function for getting channel id based on name
def get_channel_id(channel):  
    channel_id = youtube.search().list(
        part="snippet",
        type="channel",
        q=channel
    )

    res_channel = channel_id.execute()
    chan_id = res_channel["items"][0]["id"]["channelId"]

    return chan_id

In [6]:
# Function for retrieving the upload playlist id using channel id
def get_upload_id(channel):
    request = youtube.channels().list(
        part="contentDetails",
        id=channel
    )

    res = request.execute()
    uploads_playlist_id = res["items"][0]["contentDetails"]["relatedPlaylists"]["uploads"]

    return uploads_playlist_id

#### Get Channel IDs

In [10]:
up_id = []

for channel in channels:
    print(channel)
    chan_id = get_channel_id(channel)
    upload_id = get_upload_id(chan_id)
    up_id.append(upload_id)

Vice
Vox
msnbc
thedailyshow
TheYoungTurks


In [11]:
up_id

['UUn8zNIfYAQNdrFRrr8oibKw',
 'UULXo7UDZvByw2ixzpQCufnA',
 'UUaXkIU1QidjPwiAYu6GcHjg',
 'UUwWhs_6x42TyRM4Wstoq8HA',
 'UU1yBKRuGpC1tSM73A0ZjYjQ']

In [12]:
# Initialize PorterStemmer
ps = PorterStemmer()

# Function to check if a video title contains any of the keywords
def contains_keyword(title, keywords):
    title_lower = title.lower()
    words = word_tokenize(title_lower)
    
    # Stem each word in the title + keyword
    stemmed_words = [ps.stem(word) for word in words]
    for keyword in keywords:
        keyword_stemmed = ps.stem(keyword.lower())
        if keyword_stemmed in stemmed_words:
            return keyword
    return None

In [31]:
# function to fetch videos from a playlist and get title with keywords
def keyword_videos(playlist_id, keyword_list, channel_name, limit):
    videos_info = []
    next_page_token = None

    while True:
        # Make the next API request using the nextPageToken
        request = youtube.playlistItems().list(
            part="snippet",
            playlistId=playlist_id,
            pageToken=next_page_token
        ) 
        res = request.execute()

        # Process the response and save video info
        for v in res["items"]:
            video_title = v["snippet"]["title"]
            detected_word = contains_keyword(video_title, keyword_list)
            if detected_word:
                # Separate Resource Call to retrieve video views
                views = youtube.videos().list(id=v['snippet']['resourceId']['videoId'], part="snippet,contentDetails,statistics")
                view_temp = views.execute()
                video_views = view_temp['items'][0]['statistics'].get('viewCount', 'Not Available')

                # Append video information with views to videos_info list
                videos_info.append({
                    "channel": channel_name,
                    "id": v["snippet"]["resourceId"]["videoId"],
                    "title": video_title,
                    "keyword": detected_word,
                    "published_at": v["snippet"]["publishedAt"],
                    "VideoViews": video_views
                })
        # Update the nextPageToken for the next iteration
        next_page_token = res.get('nextPageToken')

        if not next_page_token or (len(videos_info) > limit):
            break
    return videos_info

In [131]:
# Function for getting top 30 relevant comments for a list of videos
def get_vid_comments(vid_id, limit):
    vids_final = []

    # Iterate through each video in the video list
    try:
        # Retrieve comments for the video
        request = youtube.commentThreads().list(
            videoId=vid_id,
            part='id,snippet,replies',
            textFormat='plainText',
            order='relevance',
            maxResults=50)
        res = request.execute()

        # Iterate through each comment
        for v in res["items"]:
            # Extract comment information and add to the final list
            comment_info = {
                'VideoId': vid_id,
                'CommentId': v['id'],
                'CommentTitle': v['snippet']['topLevelComment']['snippet']['textOriginal'],
                'CommentCreationTime': v['snippet']['topLevelComment']['snippet']['publishedAt'],
                'CommentLikes': v['snippet']['topLevelComment']['snippet']['likeCount']
            }
            vids_final.append(comment_info)

            # Check if the number of saved comments exceeds the limit
            if len(vids_final) >= limit:
                return vids_final

        nextPageToken = res.get('nextPageToken')

        # Retrieve additional pages of comments if available
        while nextPageToken:
            try:
                request = youtube.commentThreads().list(
                    videoId=vid_id,
                    part='id,snippet,replies',
                    textFormat='plainText',
                    order='relevance',
                    maxResults=50,
                    pageToken=nextPageToken)
                res = request.execute()

                nextPageToken = res.get('nextPageToken')

                # Iterate through additional comments and add to the final list
                for v in res["items"]:
                    comment_info = {
                        'VideoId': vid_id,
                        'CommentId': v['id'],
                        'CommentTitle': v['snippet']['topLevelComment']['snippet']['textOriginal'],
                        'CommentCreationTime': v['snippet']['topLevelComment']['snippet']['publishedAt'],
                        'CommentLikes': v['snippet']['topLevelComment']['snippet']['likeCount']
                    }
                    vids_final.append(comment_info)

                    # Check if the number of saved comments exceeds the limit
                    if len(vids_final) >= limit:
                        return vids_final
            except KeyError:
                break

    # Error handling for videos with disabled comments
    except HttpError as e:
        if e.resp.status == 403:
            print(f"Comments are disabled for the video with videoId: {vid_id}")

        else:
            print("An HTTP error occurred:", e)

    return vids_final

### Vice

In [65]:
# Run function to get information about relevant videos
vice_vid_info = []

for category, keywords in keyword_lists.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUn8zNIfYAQNdrFRrr8oibKw", keywords, "Vice", 10)
    print(f"Found {len(vid_info)} videos for category {category}")
    vice_vid_info.append(vid_info)

Fetching videos for category: isis
Found 11 videos for category isis
Fetching videos for category: guns
Found 11 videos for category guns
Fetching videos for category: immigration
Found 11 videos for category immigration
Fetching videos for category: economy
Found 4 videos for category economy
Fetching videos for category: healthcare
Found 11 videos for category healthcare
Fetching videos for category: socioeco
Found 11 videos for category socioeco
Fetching videos for category: abortion
Found 7 videos for category abortion
Fetching videos for category: climate
Found 12 videos for category climate


In [137]:
vice_vid_info

[[{'channel': 'Vice',
   'id': 'SwoRx3tstxY',
   'title': 'We Uncovered an ISIS Mass Grave | Super Users',
   'keyword': 'ISIS',
   'published_at': '2022-04-11T15:00:12Z',
   'VideoViews': '349357'},
  {'channel': 'Vice',
   'id': 'LttCr8rudxQ',
   'title': 'How ISIS Makes Millions From Stolen Antiques | The Business of Crime',
   'keyword': 'ISIS',
   'published_at': '2022-02-24T16:00:17Z',
   'VideoViews': '404048'},
  {'channel': 'Vice',
   'id': 'm54L1jusS5w',
   'title': 'Safer at War than at Home | Diary of a Combat Medic Fighting ISIS (Part 3/3)',
   'keyword': 'ISIS',
   'published_at': '2019-04-05T16:00:09Z',
   'VideoViews': '330452'},
  {'channel': 'Vice',
   'id': '_xO5dsGEL_E',
   'title': 'Interrogating Enemy Fighters | Diary of a Combat Medic Fighting ISIS (Part 2/3)',
   'keyword': 'ISIS',
   'published_at': '2019-04-04T16:00:03Z',
   'VideoViews': '240228'},
  {'channel': 'Vice',
   'id': 'PdJJxwP8NaU',
   'title': 'What to Pack for War in Syria | Diary of a Combat Med

In [99]:
# Get the video IDs of the relevant videos
vice_vid_ids = set()

# Extracting all ids
for sublist in vice_vid_info:
    for video in sublist:
        vice_vid_ids.add(video['id'])
        
len(vice_vid_ids)

78

In [135]:
vice_vid_ids

{'-1A9v5bQDqk',
 '07lsXkWmpz8',
 '0rlaeRHgUCk',
 '18_KBggvIZM',
 '2RZoGOc3VCs',
 '388wlVWxGz0',
 '3JFihjEWU3c',
 '4PfZlxhvdkM',
 '4qePOEBm9Aw',
 '5I2HA1MIeeo',
 '5cm9ELLR8ds',
 '61trVLZv1-w',
 '6S9oUu0R3sY',
 '7fJpRa7o_fQ',
 '9Okkpmbdn_o',
 'ARQJlzZ2qu8',
 'D669qcb7GGI',
 'Dkk1rMARYOY',
 'EEIvWNhuL8U',
 'EaPnv82Hs3g',
 'FDYqe5I35KA',
 'FHlI6Vjc0tI',
 'GMdymyLNC1s',
 'IT8XsE0If0g',
 'Imj5EGZzrwg',
 'Jj-3kBi49eg',
 'Jp0nqJ1yrrg',
 'KL8CIZej19o',
 'Kae-nng77yE',
 'LttCr8rudxQ',
 'M4BOdBJFkgA',
 'N9SR09eC_z0',
 'PdJJxwP8NaU',
 'PfbNY2G64G8',
 'QX3M8Ka9vUA',
 'RpJDwQSvXNs',
 'SwoRx3tstxY',
 'TgVwCJ2J6pQ',
 'UnETVMI4tY8',
 'V3KGKQd_4tk',
 'X3ySrcI2mEA',
 'XnpbVRg1-qc',
 'Zh2e8nY8VJ0',
 '_278oKkyf48',
 '_kqiGAswtKE',
 '_xO5dsGEL_E',
 'cdZy4balvB8',
 'd9K96fZGY64',
 'dckjk1V-KRM',
 'ee5JCDzPB0I',
 'efSRjjzde54',
 'egEbJ2gukxU',
 'fD93qXcMwGA',
 'fU3C8o8I6GQ',
 'gXBR5ZrKdws',
 'gXTeg5LAvN8',
 'gdhdAktIHtg',
 'gs26R56d3ww',
 'gt7HlHDmc_4',
 'hxdwbZ3Oeyc',
 'hyk5YXnag9E',
 'jUFOECOZ1fg',
 'juAxCz

In [156]:
# Get the top 30 relevant comments of each video
vice_comments = []

for ids in vice_vid_ids:
    vice_comm = get_vid_comments(ids, 30)
    vice_comments.append(vice_comm)

Comments are disabled for the video with videoId: EEIvWNhuL8U


In [134]:
vice_comments

[[{'VideoId': 'qX_aaRepdIM',
   'CommentId': 'UgyT8lzHq4KG18pbEAx4AaABAg',
   'CommentTitle': 'A lot of businesses are struggling. The ones that aren’t struggling as much are banks and grocery stores.',
   'CommentCreationTime': '2020-09-23T19:56:07Z',
   'CommentLikes': 75},
  {'VideoId': 'qX_aaRepdIM',
   'CommentId': 'UgwWciuwhjWJtn5p2B54AaABAg',
   'CommentTitle': 'Damn the World really sucks now.\nLooking forward to better days for everyone.',
   'CommentCreationTime': '2020-09-23T20:42:39Z',
   'CommentLikes': 89},
  {'VideoId': 'qX_aaRepdIM',
   'CommentId': 'UgxAbD9h5oBPSxaRgvN4AaABAg',
   'CommentTitle': 'Sadly I think at least 50% of clubs, bars and restaurants worldwide will eventually go under. This with have a ripple effect in the supply chains and vendors who keep these businesses stocked with products as well.',
   'CommentCreationTime': '2020-09-24T08:16:30Z',
   'CommentLikes': 45},
  {'VideoId': 'qX_aaRepdIM',
   'CommentId': 'Ugx5ivt655zG1cctmVJ4AaABAg',
   'CommentT

In [157]:
# Count comments per videoId to ensure there is enough comments

from collections import defaultdict

comment_count = defaultdict(int)

for video_comments in vice_comments:
    for comment in video_comments:
        video_id = comment['VideoId']
        comment_count[video_id] += 1

print("Number of comments per videoId:")
for video_id, count in comment_count.items():
    print(f"Video ID: {video_id}, Number of Comments: {count}")

Number of comments per videoId:
Video ID: qX_aaRepdIM, Number of Comments: 30
Video ID: Jj-3kBi49eg, Number of Comments: 30
Video ID: nogb7DboO64, Number of Comments: 23
Video ID: D669qcb7GGI, Number of Comments: 30
Video ID: 5cm9ELLR8ds, Number of Comments: 30
Video ID: 61trVLZv1-w, Number of Comments: 30
Video ID: fU3C8o8I6GQ, Number of Comments: 30
Video ID: V3KGKQd_4tk, Number of Comments: 30
Video ID: _kqiGAswtKE, Number of Comments: 30
Video ID: LttCr8rudxQ, Number of Comments: 30
Video ID: xTnJgpF5LYg, Number of Comments: 20
Video ID: Imj5EGZzrwg, Number of Comments: 30
Video ID: lFIro2Dnfj8, Number of Comments: 30
Video ID: QX3M8Ka9vUA, Number of Comments: 30
Video ID: jUFOECOZ1fg, Number of Comments: 2
Video ID: 5I2HA1MIeeo, Number of Comments: 6
Video ID: KL8CIZej19o, Number of Comments: 30
Video ID: hxdwbZ3Oeyc, Number of Comments: 30
Video ID: FDYqe5I35KA, Number of Comments: 30
Video ID: 07lsXkWmpz8, Number of Comments: 30
Video ID: dckjk1V-KRM, Number of Comments: 30
Vide

In [158]:
# Create a dict to map video ids to their corresponding details
vid_details = {vid['id']: vid for sublist in vice_vid_info for vid in sublist}

# Comebine vid detail with comments
vice_result = []

for sublist in vice_comments:
    for item in sublist:
        vid_id = item['VideoId']
        if vid_id in vid_details:
            details = vid_details[vid_id].copy()
            details.update(item)
            vice_result.append(details)

In [159]:
len(vice_result)

1938

In [143]:
vice_result

[{'channel': 'Vice',
  'id': 'qX_aaRepdIM',
  'title': 'Running a Nightclub During a Pandemic',
  'keyword': 'Pandemics',
  'published_at': '2020-09-23T19:00:18Z',
  'VideoViews': '137070',
  'VideoId': 'qX_aaRepdIM',
  'CommentId': 'UgyT8lzHq4KG18pbEAx4AaABAg',
  'CommentTitle': 'A lot of businesses are struggling. The ones that aren’t struggling as much are banks and grocery stores.',
  'CommentCreationTime': '2020-09-23T19:56:07Z',
  'CommentLikes': 75},
 {'channel': 'Vice',
  'id': 'qX_aaRepdIM',
  'title': 'Running a Nightclub During a Pandemic',
  'keyword': 'Pandemics',
  'published_at': '2020-09-23T19:00:18Z',
  'VideoViews': '137070',
  'VideoId': 'qX_aaRepdIM',
  'CommentId': 'UgwWciuwhjWJtn5p2B54AaABAg',
  'CommentTitle': 'Damn the World really sucks now.\nLooking forward to better days for everyone.',
  'CommentCreationTime': '2020-09-23T20:42:39Z',
  'CommentLikes': 89},
 {'channel': 'Vice',
  'id': 'qX_aaRepdIM',
  'title': 'Running a Nightclub During a Pandemic',
  'keyw

In [160]:
vice_comments_df = pd.DataFrame(vice_result)

vice_comments_df.head()
vice_comments_df.shape

(1938, 11)

### Vox

In [70]:
vox_vid_info = []

for category, keywords in keyword_lists.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UULXo7UDZvByw2ixzpQCufnA", keywords, "Vox", 10)
    print(f"Found {len(vid_info)} videos for category {category}")
    vox_vid_info.append(vid_info)

Fetching videos for category: isis
Found 9 videos for category isis
Fetching videos for category: guns
Found 11 videos for category guns
Fetching videos for category: immigration
Found 11 videos for category immigration
Fetching videos for category: economy
Found 4 videos for category economy
Fetching videos for category: healthcare
Found 11 videos for category healthcare
Fetching videos for category: socioeco
Found 9 videos for category socioeco
Fetching videos for category: abortion
Found 10 videos for category abortion
Fetching videos for category: climate
Found 11 videos for category climate


In [46]:
vox_vid_info[3]

[{'channel': 'Vox',
  'id': '9kaSKoBb7ew',
  'title': 'High female employment rates in Europe mean more babies #shorts',
  'keyword': 'Employment',
  'published_at': '2023-08-02T16:31:01Z',
  'VideoViews': '243267'},
 {'channel': 'Vox',
  'id': 'ualUPur6iks',
  'title': "Why it's so hard to get unemployment benefits",
  'keyword': 'Unemployed',
  'published_at': '2020-06-10T12:00:33Z',
  'VideoViews': '550201'},
 {'channel': 'Vox',
  'id': 'P81i66_tLlU',
  'title': 'How marketers target your nose',
  'keyword': 'Market',
  'published_at': '2018-09-26T12:00:03Z',
  'VideoViews': '493366'},
 {'channel': 'Vox',
  'id': 'GWH5vyi3lTk',
  'title': 'How the economy shapes our love lives',
  'keyword': 'Economy',
  'published_at': '2018-02-22T13:00:01Z',
  'VideoViews': '836379'},
 {'channel': 'Vox',
  'id': 'Cjzvvgmg1NU',
  'title': 'Why the market for skin whitening is growing',
  'keyword': 'Market',
  'published_at': '2018-01-04T13:00:02Z',
  'VideoViews': '1709622'},
 {'channel': 'Vox',
 

In [147]:
vox_vid_ids = set()

# Extracting all ids
for sublist in vox_vid_info:
    for video in sublist:
        vox_vid_ids.add(video['id'])
        
len(vox_vid_ids)

74

In [148]:
vox_comments = []

for ids in vox_vid_ids:
    vox_comm = get_vid_comments(ids, 30)
    vox_comments.append(vox_comm)

In [155]:
vox_comment_count = defaultdict(int)

for video_comments in vox_comments:
    for comment in video_comments:
        video_id = comment['VideoId']
        vox_comment_count[video_id] += 1

for video_id, count in vox_comment_count.items():
    print(f"Video ID: {video_id}, Number of Comments: {count}")

Video ID: 4Ltr7x8nO2M, Number of Comments: 30
Video ID: _e0VofLJTIk, Number of Comments: 30
Video ID: ssSIUVPjDns, Number of Comments: 30
Video ID: Z9gQLELtbhg, Number of Comments: 30
Video ID: HJ034SvB16E, Number of Comments: 30
Video ID: tC0IMn8lsdc, Number of Comments: 30
Video ID: yeaQUhAOdtk, Number of Comments: 30
Video ID: LJjo1kJW6To, Number of Comments: 30
Video ID: qcJeOphUtek, Number of Comments: 30
Video ID: Al0rBxHuVk4, Number of Comments: 30
Video ID: nUnJQWO4YJY, Number of Comments: 30
Video ID: BB3qNWRaxGE, Number of Comments: 30
Video ID: qdjArlHB8k8, Number of Comments: 30
Video ID: H7tUEWNL7lg, Number of Comments: 30
Video ID: sv0dQfRRrEQ, Number of Comments: 30
Video ID: yzDjjUAt3zc, Number of Comments: 30
Video ID: 3bIvqS7gnQo, Number of Comments: 30
Video ID: pTwPHuE_HrU, Number of Comments: 30
Video ID: -S_f-huz-EU, Number of Comments: 30
Video ID: t6V9i8fFADI, Number of Comments: 30
Video ID: iKHl__BEsD0, Number of Comments: 30
Video ID: k1vE_LVBx4s, Number of C

In [165]:
# Create a dict to map video ids to their corresponding details
vox_vid_details = {vid['id']: vid for sublist in vox_vid_info for vid in sublist}

# Comebine vid detail with comments
vox_result = []

for sublist in vox_comments:
    for item in sublist:
        vid_id = item['VideoId']
        if vid_id in vox_vid_details:
            details = vox_vid_details[vid_id].copy()
            details.update(item)
            vox_result.append(details)

In [166]:
len(vox_result)

2180

In [167]:
vox_comments_df = pd.DataFrame(vox_result)

vox_comments_df.head()
vox_comments_df.shape

(2180, 11)

### MSNBC

In [82]:
msnbc_vid_info = []

In [86]:
for category, keywords in keyword_lists2.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUaXkIU1QidjPwiAYu6GcHjg", keywords, "MSNBC", 10)
    print(f"Found {len(vid_info)} videos for category {category}")
    msnbc_vid_info.append(vid_info)

Fetching videos for category: guns
Found 11 videos for category guns
Fetching videos for category: immigration
Found 11 videos for category immigration
Fetching videos for category: healthcare
Found 11 videos for category healthcare
Fetching videos for category: climate
Found 11 videos for category climate


In [90]:
for category, keywords in keyword_isis.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUaXkIU1QidjPwiAYu6GcHjg", keywords, "MSNBC", 24)
    print(f"Found {len(vid_info)} videos for category {category}")
    msnbc_vid_info.append(vid_info)

Fetching videos for category: isis
Found 6 videos for category isis


In [91]:
for category, keywords in keyword_economy.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUaXkIU1QidjPwiAYu6GcHjg", keywords, "MSNBC", 26)
    print(f"Found {len(vid_info)} videos for category {category}")
    msnbc_vid_info.append(vid_info)

Fetching videos for category: economy
Found 27 videos for category economy


In [92]:
for category, keywords in keyword_socioeco.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUaXkIU1QidjPwiAYu6GcHjg", keywords, "MSNBC", 13)
    print(f"Found {len(vid_info)} videos for category {category}")
    msnbc_vid_info.append(vid_info)

Fetching videos for category: socioeco
Found 14 videos for category socioeco


In [95]:
for category, keywords in keyword_abortion.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUaXkIU1QidjPwiAYu6GcHjg", keywords, "MSNBC", 16)
    print(f"Found {len(vid_info)} videos for category {category}")
    msnbc_vid_info.append(vid_info)

Fetching videos for category: climate
Found 17 videos for category climate


In [168]:
msnbc_vid_ids = set()

# Extracting all ids
for sublist in msnbc_vid_info:
    for video in sublist:
        msnbc_vid_ids.add(video['id'])
        
len(msnbc_vid_ids)

96

In [169]:
msnbc_comments = []

for ids in msnbc_vid_ids:
    msnbc_comm = get_vid_comments(ids, 30)
    msnbc_comments.append(msnbc_comm)

In [170]:
msnbc_comment_count = defaultdict(int)

for video_comments in msnbc_comments:
    for comment in video_comments:
        video_id = comment['VideoId']
        msnbc_comment_count[video_id] += 1

print("Number of comments per videoId:")

for video_id, count in msnbc_comment_count.items():
    print(f"Video ID: {video_id}, Number of Comments: {count}")

Number of comments per videoId:
Video ID: SVxPp2POkCY, Number of Comments: 30
Video ID: L3ogH6j7CaI, Number of Comments: 30
Video ID: 1ltzfeFeiP4, Number of Comments: 30
Video ID: mUutMIoBuco, Number of Comments: 30
Video ID: FPK35AVH8BQ, Number of Comments: 30
Video ID: jesO90gunFs, Number of Comments: 30
Video ID: yTFGRCL_ZTc, Number of Comments: 30
Video ID: colxgYIy4mo, Number of Comments: 30
Video ID: BEanahjUQhI, Number of Comments: 30
Video ID: lgRP_w4CgBw, Number of Comments: 30
Video ID: SU9avQSjJrM, Number of Comments: 30
Video ID: 3qiJp65MDBI, Number of Comments: 30
Video ID: LZRgMwS7B8o, Number of Comments: 30
Video ID: DnpgzhZRwPE, Number of Comments: 30
Video ID: 2TeUrMjuEa0, Number of Comments: 30
Video ID: hpIhbJTW42M, Number of Comments: 30
Video ID: ehyFN4JCOTY, Number of Comments: 30
Video ID: RZUBLTph5uw, Number of Comments: 30
Video ID: HqMl3HfR8nk, Number of Comments: 30
Video ID: u0wKRykb5HY, Number of Comments: 30
Video ID: YBdZ3QtLrHI, Number of Comments: 30
Vi

In [171]:
# Create a dict to map video ids to their corresponding details
msnbc_vid_details = {vid['id']: vid for sublist in msnbc_vid_info for vid in sublist}

# Comebine vid detail with comments
msnbc_result = []

for sublist in msnbc_comments:
    for item in sublist:
        vid_id = item['VideoId']
        if vid_id in msnbc_vid_details:
            details = msnbc_vid_details[vid_id].copy()
            details.update(item)
            msnbc_result.append(details)

In [172]:
len(msnbc_result)

2839

In [173]:
msnbc_comments_df = pd.DataFrame(msnbc_result)

msnbc_comments_df.head()
msnbc_comments_df.shape

(2839, 11)

### The Daily Show

In [72]:
daily_vid_info = []

for category, keywords in keyword_lists.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UUwWhs_6x42TyRM4Wstoq8HA", keywords, "Daily Show", 10)
    print(f"Found {len(vid_info)} videos for category {category}")
    daily_vid_info.append(vid_info)

Fetching videos for category: isis
Found 6 videos for category isis
Fetching videos for category: guns
Found 11 videos for category guns
Fetching videos for category: immigration
Found 11 videos for category immigration
Fetching videos for category: economy
Found 10 videos for category economy
Fetching videos for category: healthcare
Found 11 videos for category healthcare
Fetching videos for category: socioeco
Found 11 videos for category socioeco
Fetching videos for category: abortion
Found 11 videos for category abortion
Fetching videos for category: climate
Found 11 videos for category climate


In [174]:
daily_vid_ids = set()

# Extracting all ids
for sublist in daily_vid_info:
    for video in sublist:
        daily_vid_ids.add(video['id'])
        
len(daily_vid_ids)

79

In [175]:
daily_comments = []

for ids in daily_vid_ids:
    daily_comm = get_vid_comments(ids, 30)
    daily_comments.append(daily_comm)

In [176]:
daily_comment_count = defaultdict(int)

for video_comments in daily_comments:
    for comment in video_comments:
        video_id = comment['VideoId']
        daily_comment_count[video_id] += 1

print("Number of comments per videoId:")

for video_id, count in daily_comment_count.items():
    print(f"Video ID: {video_id}, Number of Comments: {count}")

Number of comments per videoId:
Video ID: 8uZhdZlY6qY, Number of Comments: 30
Video ID: TtziF8sgZ0I, Number of Comments: 30
Video ID: cV0vH_IIQo4, Number of Comments: 30
Video ID: 2Qv0I76zZcE, Number of Comments: 30
Video ID: 6J55FgWN85o, Number of Comments: 30
Video ID: rVQUxB7M0nA, Number of Comments: 30
Video ID: UlQC7SfwlwA, Number of Comments: 28
Video ID: 0e1hpD7ForE, Number of Comments: 30
Video ID: coL1uedveZE, Number of Comments: 30
Video ID: 4YMPEK1pwtQ, Number of Comments: 30
Video ID: 9QZy8dV7q1s, Number of Comments: 30
Video ID: 3JUk5QXpW4Y, Number of Comments: 30
Video ID: B5t12TDa618, Number of Comments: 30
Video ID: nnPuCJqRn4U, Number of Comments: 30
Video ID: v2bIyik6JUI, Number of Comments: 30
Video ID: hqgFwO2D0NU, Number of Comments: 30
Video ID: ndl97Kt9ERQ, Number of Comments: 30
Video ID: UZw5EbpZOoY, Number of Comments: 30
Video ID: sPtvQupLUyU, Number of Comments: 30
Video ID: hESoqv2AwWA, Number of Comments: 30
Video ID: F_O1_SXIdlA, Number of Comments: 30
Vi

In [177]:
# Create a dict to map video ids to their corresponding details
daily_vid_details = {vid['id']: vid for sublist in daily_vid_info for vid in sublist}

# Comebine vid detail with comments
daily_result = []

for sublist in daily_comments:
    for item in sublist:
        vid_id = item['VideoId']
        if vid_id in daily_vid_details:
            details = daily_vid_details[vid_id].copy()
            details.update(item)
            daily_result.append(details)

In [178]:
len(daily_result)

2364

In [179]:
daily_comments_df = pd.DataFrame(daily_result)

daily_comments_df.head()
daily_comments_df.shape

(2364, 11)

### The Young Turks

In [74]:
yturk_vid_info = []

for category, keywords in keyword_lists.items():
    print(f"Fetching videos for category: {category}")
    vid_info = keyword_videos("UU1yBKRuGpC1tSM73A0ZjYjQ", keywords, "Young Turks", 10)
    print(f"Found {len(vid_info)} videos for category {category}")
    yturk_vid_info.append(vid_info)

Fetching videos for category: isis
Found 5 videos for category isis
Fetching videos for category: guns
Found 11 videos for category guns
Fetching videos for category: immigration
Found 11 videos for category immigration
Fetching videos for category: economy
Found 11 videos for category economy
Fetching videos for category: healthcare
Found 11 videos for category healthcare
Fetching videos for category: socioeco
Found 11 videos for category socioeco
Fetching videos for category: abortion
Found 11 videos for category abortion
Fetching videos for category: climate
Found 11 videos for category climate


In [180]:
yturk_vid_ids = set()

# Extracting all ids
for sublist in yturk_vid_info:
    for video in sublist:
        yturk_vid_ids.add(video['id'])
        
len(yturk_vid_ids)

81

In [182]:
yturk_comments = []

for ids in yturk_vid_ids:
    yturk_comm = get_vid_comments(ids, 30)
    yturk_comments.append(yturk_comm)

In [183]:
yturk_comment_count = defaultdict(int)

for video_comments in yturk_comments:
    for comment in video_comments:
        video_id = comment['VideoId']
        yturk_comment_count[video_id] += 1

print("Number of comments per videoId:")

for video_id, count in yturk_comment_count.items():
    print(f"Video ID: {video_id}, Number of Comments: {count}")

Number of comments per videoId:
Video ID: MEc2gfWcQEc, Number of Comments: 30
Video ID: ymF88jc2oPo, Number of Comments: 30
Video ID: _7nnyMHUUys, Number of Comments: 30
Video ID: H_gbq0pdYp0, Number of Comments: 30
Video ID: 4vw4sCou7wU, Number of Comments: 30
Video ID: fjBtngQR4WY, Number of Comments: 30
Video ID: ZRGEDoixz_8, Number of Comments: 30
Video ID: q4bBZEEL71s, Number of Comments: 30
Video ID: 1RWcMnxUZVc, Number of Comments: 30
Video ID: tZu91Y_l8dc, Number of Comments: 30
Video ID: SkFJTsEsMVY, Number of Comments: 30
Video ID: 2Eb2kPjP60c, Number of Comments: 1
Video ID: JiPtY6stLB4, Number of Comments: 30
Video ID: yWMmnfjTT8s, Number of Comments: 30
Video ID: Q8lGHAPfSis, Number of Comments: 30
Video ID: _dMtdutY2f4, Number of Comments: 30
Video ID: I5YvqhWYSuM, Number of Comments: 30
Video ID: vfSZ5krN2Hc, Number of Comments: 30
Video ID: szPhH-7TOvY, Number of Comments: 30
Video ID: 5SlR-UHguKY, Number of Comments: 30
Video ID: tTJMvR-NcO8, Number of Comments: 30
Vid

In [184]:
# Create a dict to map video ids to their corresponding details
yturk_vid_details = {vid['id']: vid for sublist in yturk_vid_info for vid in sublist}

# Comebine vid detail with comments
yturk_result = []

for sublist in yturk_comments:
    for item in sublist:
        vid_id = item['VideoId']
        if vid_id in yturk_vid_details:
            details = yturk_vid_details[vid_id].copy()
            details.update(item)
            yturk_result.append(details)

In [185]:
len(yturk_result)

2341

In [186]:
yturk_comments_df = pd.DataFrame(yturk_result)

yturk_comments_df.head()
yturk_comments_df.shape

(2341, 11)

### Combine ALL DF and Save as CSV

In [187]:
democrat_comment_df = pd.concat([vice_comments_df, vox_comments_df, msnbc_comments_df, daily_comments_df, yturk_comments_df], ignore_index=True)

In [188]:
democrat_comment_df.shape

(11662, 11)

In [189]:
# Save df to a CSV file
democrat_comment_df.to_csv("democrat_comments.csv", index=False)

## Another Method to Grab Channel CSV (outdated)

#### Vice

In [None]:
vice_comments = get_video_comments('Vice', 'UUn8zNIfYAQNdrFRrr8oibKw', keyword_lists, limit=30)

In [None]:
len(vice_comments)

In [None]:
# Check output, commented out for viewing purposes
# vice_comments[:10]

In [None]:
# Change to DF
vice_comments_df = pd.DataFrame(vice_comments)

In [None]:
# Check output, commented out for viewing purposes
# vice_comments_df.head()

In [None]:
# Save df to a CSV file
vice_comments_df.to_csv("vice_comments.csv", index=False)

### Vox

In [None]:
vox_comments = get_video_comments('Vox', 'UULXo7UDZvByw2ixzpQCufnA', keyword_lists, limit=30)

In [None]:
len(vox_comments)

In [None]:
# Check output, commented out for viewing purposes
# vox_comments[:10]

In [None]:
# Change to DF
vox_comments_df = pd.DataFrame(vox_comments)

# Check output, commented out for viewing purposes
# vox_comments_df.head()

In [None]:
# Save df to a CSV file
vox_comments_df.to_csv("vox_comments.csv", index=False)

#### MSNBC

In [None]:
msnbc_comments = get_video_comments('MSNBC', 'UUaXkIU1QidjPwiAYu6GcHjg', keyword_lists, limit=20)

In [None]:
len(msnbc_comments)

In [None]:
# Check output, commented out for viewing purposes
msnbc_comments[:10]

In [None]:
# Change to DF
msnbc_comments_df = pd.DataFrame(msnbc_comments)

# Check output, commented out for viewing purposes
# msnbc_comments_df.head()

In [None]:
# Save df to a CSV file
msnbc_comments_df.to_csv("msnbc_comments.csv", index=False)

#### The Daily Show

In [None]:
dailyshow_comments = get_video_comments('The Daily Show', 'UUwWhs_6x42TyRM4Wstoq8HA', keyword_lists, limit=30)

In [None]:
len(dailyshow_comments)

In [None]:
# Check output, commented out for viewing purposes
dailyshow_comments[:10]

In [None]:
# Change to DF
dailyshow_comments_df = pd.DataFrame(dailyshow_comments)

# Check output, commented out for viewing purposes
# dailyshow_comments_df.head()

In [None]:
# Save df to a CSV file
dailyshow_comments_df.to_csv("dailyshow_comments.csv", index=False)

#### Young Turks

In [None]:
yturk_comments = get_video_comments('The Young Turks', 'UU1yBKRuGpC1tSM73A0ZjYjQ', keyword_lists, limit=30)

In [None]:
len(yturk_comments)

In [None]:
# Check output, commented out for viewing purposes
yturk_comments[:10]

In [None]:
# Change to DF
yturk_comments_df = pd.DataFrame(yturk_comments)

# Check output, commented out for viewing purposes
# yturk_comments_df.head()

In [None]:
# Save df to a CSV file
yturk_comments_df.to_csv("yturk_comments.csv", index=False)

#### Combine all DFs

In [None]:
yturk_comments_df = pd.read_csv('yturk_comments.csv')

In [None]:
combine_democ_comments = pd.concat([vice_comments_df, vox_comments_df, yturk_comments_df])

combine_democ_comments.reset_index(drop=True, inplace=True)

In [None]:
combine_democ_comments.shape

In [None]:
# Save df to a CSV file
combine_democ_comments.to_csv("combine_democ_comments.csv", index=False)

### Top 5 Republican YouTube Channels
Fox News, Ben Shapiro, Steven Crowder, The Daily Mail, The Daily Wire

In [None]:
# function to fetch videos from a playlist and get title with keywordsand 
def keyword_videos_right(playlist_id, channel_name, dict_list):
    videos_info = []
    next_page_token = None

    while True:
        # Make the next API request using the nextPageToken
        request = youtube.playlistItems().list(
            part="snippet",
            playlistId=playlist_id,
            pageToken=next_page_token
        ) 
        res = request.execute()

        # Process the response and save video info
        for v in res["items"]:
            video_title = v["snippet"]["title"]
            for keyword_name, keywords in keyword_lists.items():
            
                detected_word = contains_keyword(video_title, keywords)
                if detected_word:
                    # Separate Resource Call to retrieve video views
                    views = youtube.videos().list(id=v['snippet']['resourceId']['videoId'], part="snippet,contentDetails,statistics")
                    view_temp = views.execute()
                    video_views = view_temp['items'][0]['statistics']['viewCount']
    
                    # Append video information with views to videos_info list
                    dict_list.append({
                        "id": v["snippet"]["resourceId"]["videoId"],
                        "channel_name" : v['snippet']['channelTitle'],
                        "title": video_title,
                        "keyword": detected_word,
                        "published_at": v["snippet"]["publishedAt"],
                        "VideoViews": video_views
                    })
        # Update the nextPageToken for the next iteration
        next_page_token = res.get('nextPageToken')

        if not next_page_token or (len(videos_info) > 60):
            break
    return videos_info

In [None]:
# Define channels
channels_right = ["BenShapiro", "StevenCrowder", "FoxNews", "DailyWirePlus", "dailymail"]

In [None]:
# Gets list of Right channels playlist id for uploads
right_up_id = []
for channel in channels_right:
    chan_id = get_channel_id(channel)
    upload_id = get_upload_id(chan_id)
    right_up_id.append(upload_id)

In [None]:
right_up_id

In [None]:
for keyword_name, keywords in keyword_lists.items():
    print(keywords)

## Code for collecting titles

In [None]:
# Collects video titles for each of the given channels that contain keywords given
right_video_titles = []
for channel, upload_id in zip(channels_right, right_up_id):
    print(channel)
    videos_info = keyword_videos_right(upload_id, channel, right_video_titles)
    #right_video_titles.append(videos_info)

In [None]:
right_df = pd.DataFrame(right_video_titles)

In [None]:
right_df.shape

In [None]:
right_df.head(200)

In [None]:
# Save DF as CSV
right_df.to_csv('Project_yt_titles.csv')

In [None]:
right_video_titles[1]

## Code to get Comments

In [None]:
# Function for getting comments for a given of videos
def get_vid_comments_right(vid, limit):
    vids_final = []
    
    # Iterate through each video in the video list
    request = youtube.commentThreads().list(
        videoId=vid['id'],
        part='id,snippet,replies',
        textFormat='plainText',
        order='relevance',
        maxResults=100
    )
    res = request.execute()

    # Iterate through each comment
    try:
        while res["nextPageToken"] != None:
            for v in res["items"]:
                # Create a copy of dictionary of current video that is being iterated. This is because each comment is also contained with the video data
                vid_temp = vid.copy()
                vid_temp.update({'CommentId':v['id']})
                vid_temp.update({'CommentTitle':v['snippet']['topLevelComment']['snippet']['textOriginal']})
                vid_temp.update({'CommentCreationTime':v['snippet']['topLevelComment']['snippet']['publishedAt']})
                vid_temp.update({'CommentLikes':v['snippet']['topLevelComment']['snippet']['likeCount']})
                vids_final.append(vid_temp)
            
            request = youtube.commentThreads().list(
                videoId=vid['id'],
                part='id,snippet,replies',
                textFormat='plainText',
                order='relevance',
                maxResults=100,
                pageToken = res["nextPageToken"]
            )
            res = request.execute()
    except KeyError:
        for v in res["items"]:
                # Create a copy of dictionary of current video that is being iterated. This is because each comment is also contained with the video data
                vid_temp = vid.copy()
                vid_temp.update({'CommentId':v['id']})
                vid_temp.update({'CommentTitle':v['snippet']['topLevelComment']['snippet']['textOriginal']})
                vid_temp.update({'CommentCreationTime':v['snippet']['topLevelComment']['snippet']['publishedAt']})
                vid_temp.update({'CommentLikes':v['snippet']['topLevelComment']['snippet']['likeCount']})
                vids_final.append(vid_temp)
        # If the number of saved videos is larger than self-defined limit, break while loop and return the list of videos
        if len(vids_final) >= limit:
            return(vids_final)
            
            
    return vids_final

In [None]:
# Runs code to get comments for each title in right_video_titles
right_comments_dict_list = []
for title in right_video_titles:
    result = get_vid_comments_right(title, 100)
    right_comments_dict_list.append(result)

In [None]:
len(right_comments_dict_list)

In [None]:
# Converts list of list of dictionaries to a flat list
flat_list_of_dicts = [item for sublist in right_comments_dict_list for item in sublist]

In [None]:
# Converts list of dictionaries to dataframe
right_comments_df = pd.DataFrame(flat_list_of_dicts)

In [None]:
right_comments_df.tail(2)

In [None]:
right_comments_df.shape

In [None]:
# Save DF as CSV
right_comments_df.to_csv('Project_yt_comments.csv')

## Andy's Section

In [None]:
# Function for retrieving the upload playlist id of a channel
def get_upload_id(channel):
    request = youtube.channels().list(part='contentDetails', forUsername=channel)
    res = request.execute()
    return res["items"][0]["contentDetails"]["relatedPlaylists"]["uploads"]

# Function for retrieving all vids within the upload playlist of a channel, stopping once a limit INT has been reached
def get_vids(channel, limit, keywords, ideology):
    
    # Output list
    vid_lst=[]

    request = youtube.playlistItems().list(part='snippet',playlistId=get_upload_id(channel),maxResults=50)
        
    res = request.execute()
    nextPageToken = res['nextPageToken']

    # Iterate through each video in the playlist
    for v in res["items"]:

        # Normalization of video title to check for keywords
        title = v['snippet']['title']
        title = title.lower()
        title = re.sub(r'[^\w\s]','', title)

        # Check for key words. If key word detected, then counter +1. If counter > 0, then the post will be flagged and added.
        counter = 0
        for word in title.split():
            counter = 0
            if word in keywords:
                counter += 1
        if counter == 0:
            continue

        # Create temp dictionary per video, and add video-specific information to dictionary
        vid_dict = {}
        vid_dict['ChannelName'] = v['snippet']['channelTitle']
        vid_dict['VideoId'] = v['snippet']['resourceId']['videoId']
        vid_dict['VideoTitle'] = v['snippet']['title']
        vid_dict['Ideology'] = ideology

        # Separate Resource Call to retrieve video views
        views = youtube.videos().list(id=v['snippet']['resourceId']['videoId'], part="snippet,contentDetails,statistics")
        view_temp = views.execute()
        vid_dict['VideoViews'] = view_temp['items'][0]['statistics']['viewCount']

        # Append dictionary to greater list
        vid_lst.append(vid_dict)

    # Iterate until no more next page
    while nextPageToken:
        try:
            request = youtube.playlistItems().list(part='snippet', playlistId=get_upload_id(channel), maxResults=50, pageToken = res['nextPageToken'])                
            res = request.execute()

            # Redefine next page token to check @ next iteration
            nextPageToken = res['nextPageToken']

            # Iterate through each video
            for v in res["items"]:

                # Normalization of video title to check for keywords
                title = v['snippet']['title']
                title = title.lower()
                title = re.sub(r'[^\w\s]','', title)

                # Check for key words. If key word detected, then counter +1. If counter > 0, then the post will be flagged and added.
                counter = 0
                for word in title.split():
                    if word in keywords:
                        counter += 1
                if counter == 0:
                    continue

                # Create temp dictionary per video, and add video-specific information to dictionary
                vid_dict = {}
                vid_dict['ChannelName'] = v['snippet']['channelTitle']
                vid_dict['VideoId'] = v['snippet']['resourceId']['videoId']
                vid_dict['VideoTitle'] = v['snippet']['title']
                                
                # Separate Resource Call to retrieve video views
                views = youtube.videos().list(id=v['snippet']['resourceId']['videoId'], part="snippet,contentDetails,statistics")
                view_temp = views.execute()
                vid_dict['VideoViews'] = view_temp['items'][0]['statistics']['viewCount']
                
                vid_lst.append(vid_dict)

            # If the number of saved videos is larger than self-defined limit, break while loop and return the list of videos
            if len(vid_lst) >= limit:
                return(vid_lst)

        # Error case handling
        except KeyError:
            break

# Function for getting top 30 relevant comments for a list of videos
def get_vid_comments(vid_lst, limit):
    vids_final = []

    # Iterate through each video in the video list
    for vid in vid_lst:
        
        request = youtube.commentThreads().list(videoId=vid['VideoId'],part='id,snippet,replies',textFormat='plainText',order='relevance',maxResults=50)
        res = request.execute()

        # Iterate through each comment
        for v in res["items"]:
            
            # Create a copy of dictionary of current video that is being iterated. This is because each comment is also contained with the video data
            vid_temp = copy.copy(vid)
            vid_temp.update({'CommentId':v['id']})
            vid_temp.update({'CommentTitle':v['snippet']['topLevelComment']['snippet']['textOriginal']})
            vid_temp.update({'CommentCreationTime':v['snippet']['topLevelComment']['snippet']['publishedAt']})
            vid_temp.update({'CommentLikes':v['snippet']['topLevelComment']['snippet']['likeCount']})
            vids_final.append(vid_temp)

        while nextPageToken:
            try:
                request = youtube.commentThreads().list(videoId=vid['VideoId'],part='id,snippet,replies',textFormat='plainText',order='relevance',maxResults=50)
                res = request.execute()
        
                nextPageToken = res['nextPageToken']
                
                for v in res["items"]:
                    # Create a copy of dictionary of current video that is being iterated. This is because each comment is also contained with the video data
                    vid_temp = copy.copy(vid)
                    vid_temp.update({'CommentId':v['id']})
                    vid_temp.update({'CommentTitle':v['snippet']['topLevelComment']['snippet']['textOriginal']})
                    vid_temp.update({'CommentCreationTime':v['snippet']['topLevelComment']['snippet']['publishedAt']})
                    vid_temp.update({'CommentLikes':v['snippet']['topLevelComment']['snippet']['likeCount']})
                    vids_final.append(vid_temp)
                    
                # If the number of saved videos is larger than self-defined limit, break while loop and return the list of videos
                if len(vids_final) >= limit:
                    return(vids_final)
            except KeyError:
                break
            
    return vids_final

# from Lab9
def textcleaner(row):
    row = str(row)
    row = row.lower()
    # remove punctuation
    row = re.sub(r'[^\w\s]', '', row)
    #remove urls
    row  = re.sub(r'http\S+', '', row)
    #remove mentions
    row = re.sub(r"(?<![@\w])@(\w{1,25})", '', row)
    #remove hashtags
    row = re.sub(r"(?<![#\w])#(\w{1,25})", '',row)
    #remove other special characters
    row = re.sub('[^A-Za-z .-]+', '', row)
        #remove digits
    row = re.sub('\d+', '', row)
    row = row.strip(" ")
    row = re.sub('\s+', ' ', row)
    return row
    
stopeng = set(stopwords.words('english'))
def remove_stop(text):
    try:
        words = text.split(' ')
        valid = [x for x in words if x not in stopeng]
        return(' '.join(valid))
    except AttributeError:
        return('')

def df_clean_process(df):

    # Change datetime to date
    df['VideoPublishedDate'] = df['VideoPublishedDate'].apply(lambda x: datetime.strptime(x[0:10], '%Y-%m-%d').date())
    df['CommentCreationTime'] = df['CommentCreationTime'].apply(lambda x: datetime.strptime(x[0:10], '%Y-%m-%d').date())

    # Check NaN, if < 10% of total dataset, drop NaN
    if df.isnull().values.any():
        if len(df[df.isna().any(axis=1)]) < len(df) * 0.1:
            df = df.dropna()

    # Split into separate df for computational load reduction
    title_df = df[['ChannelName', 'VideoTitle', 'VideoPublishedDate', 'VideoViews', 'Ideology']].drop_duplicates()
    comment_df = df[['ChannelName', 'VideoViews', 'CommentTitle', 'CommentCreationTime', 'CommentLikes', 'Ideology']]

    # tokenize
    title_df['TweetToken'] = title_df['VideoTitle'].apply(lambda x: casual.TweetTokenizer().tokenize(x))
    comment_df['TweetToken'] = comment_df['CommentTitle'].apply(lambda x: casual.TweetTokenizer().tokenize(x))

    # clean
    title_df['Cleaned'] = title_df['TweetToken'].apply(lambda x: remove_stop(textcleaner(x)))
    comment_df['Cleaned'] = comment_df['TweetToken'].apply(lambda x: remove_stop(textcleaner(x)))

    return (title_df, comment_df)

    # Sentiment analysis

In [None]:
# define channels
channels_left = ['VICE', 'Vox', 'MSNBC', 'The Daily Show', 'TheYoungTurks']
channels_right = ['Fox News', 'Ben Shapiro', 'StevenCrowder', 'Daily Mail', 'DailyWire+']

# define key ideologies/associated keywords to look for in title
isis_keywords = ['terrorism', 'terrorist', 'extremism', 'radicalist', 'radicalism']
guns_keywords = ['shooting', 'shootings', 'school shooting', 'school shootings', 'firearms', 'firearm', 'gun', 'gun control', 'guns', 'nra', 'second amendment']
immigration_keywords = ['border control', 'mexico', 'visa', 'citizenship', 'asylum', 'deportation', 'refugee']
economy_keywords = ['budget', 'budget deficit', 'unemployed', 'inflation', 'interest rate',' federal reserve', 'market', 'employment']
health_care_keywords = ['medicaid', 'covid', 'obamacare', 'public health', 'insurance']
socioeconomic_keywords = ['rich', 'poor', 'income inequality', 'poverty',' wealth distribution']
abortion_keywords = ['pregnancy', 'unwanted pregnancy', 'roe', 'wade', 'abortion', 'pro-life', 'rape', 'incest', 'life of mother', 'religion']
climate_change_keywords = ['global warming', 'carbon', 'alternative energy', 'climate', 'methane', 'emissions','gas','greenhouse']

# Define for iteration
keywords = [isis_keywords, guns_keywords, immigration_keywords, economy_keywords, health_care_keywords, socioeconomic_keywords, abortion_keywords, climate_change_keywords]

# Pre-define empty df
left_df = pd.DataFrame(columns=['ChannelName', 'VideoId', 'VideoTitle', 'Ideology', 'VideoPublishedDate', 'VideoViews', 'CommentId', 'CommentTitle', 'CommentCreationTime', 'CommentLikes'])

# Loop through all left channels
for channel in channels_left:

    # Loop through all keywords/ideologies
    for keyword, ideology in zip(keywords, ['ISIS', 'GUNS', 'IMMIGRATION', 'ECONOMY', 'HEALTH CARE', 'SOCIOECONOMIC', 'ABORTION', 'CLIMATE CHANGE']):

        # Return temp df for one ideology for one channel
        temp_df = pd.DataFrame(get_vid_comments(get_vids(channel, 50, keyword, ideology)[0:50], 150))

        # Append temp df to master df
        left_df = pd.concat([left_df,temp_df])

# Pre-define empty df
right_df = pd.DataFrame(columns=['ChannelName', 'VideoId', 'VideoTitle', 'Ideology', 'VideoPublishedDate', 'VideoViews', 'CommentId', 'CommentTitle', 'CommentCreationTime', 'CommentLikes'])
for channel in channels_right:

    # Loop through all keywords/ideologies
    for keyword, ideology in zip(keywords, ['ISIS', 'GUNS', 'IMMIGRATION', 'ECONOMY', 'HEALTH CARE', 'SOCIOECONOMIC', 'ABORTION', 'CLIMATE CHANGE']):

        # Return temp df for one ideology for one channel
        temp_df = pd.DataFrame(get_vid_comments(get_vids(channel, 50, keyword, ideology)[0:50], 150))

        # Append temp df to master df
        right_df = pd.concat([right_df,temp_df])

(left_title_df, left_comment_df) = df_clean_process(left_df)
(right_title_df, right_comment_df) = df_clean_process(right_df)
# Loop through all right channels
for channel in channels_right:

    # Loop through all keywords/ideologies
    for keyword, ideology in zip(keywords, ['ISIS', 'GUNS', 'IMMIGRATION', 'ECONOMY', 'HEALTH CARE', 'SOCIOECONOMIC', 'ABORTION', 'CLIMATE CHANGE']):

        # Return temp df for one ideology for one channel
        temp_df = pd.DataFrame(get_vid_comments(get_vids(channel, 50, keyword, ideology)[0:50], 150))

        # Append temp df to master df
        right_df = pd.concat([right_df,temp_df])

(left_title_df, left_comment_df) = df_clean_process(left_df)
(right_title_df, right_comment_df) = df_clean_process(right_df)