# Emotional Consistency among Political Ideologies: An Approach to Address Polarization on Youtube

Group 5:
- Chance Landis (ChancL), Hanna Lee (Lee10), Jason Sun (YongXs), Andy Wong (WongA22)

## Data Collection

### Sources of Information
- **AllSides**: A media bias tool that provides a rating based on "multi-partisan Editorial Reviews by trained experts and Blind Bias Surveys™ in which participants rate content without knowing the source." We used this tool to determine how we should classify the most popular (based on subscriber count) YouTube channels we found. (Source: https://www.allsides.com/media-bias/media-bias-rating-methods)
- **HypeAuitor**: A company that uses a data-driven approach to influencer marketing. In the process, they collated lists of YouTube based on category, subscriber count, and country. This allowed us to find YouTube channels that focused on news and politics with the most subscribers. (Source: https://hypeauditor.com/about/company/, https://hypeauditor.com/top-youtube-news-politics-united-states/)
- **Pew Research Center**: A nonpartisan, nonprofit organization that conducts research on public opinion, demographic trends, and social issues. It provides data-driven insights into various aspects of social science issues, explicitly stating they do not take a stance on political issues. For our research, we relied on their studies on political ideologies and alignment with political parties as a reference. (Source: https://www.pewresearch.org/about/, https://www.pewresearch.org/politics/2016/06/22/5-views-of-parties-positions-on-issues-ideologies/)
- **YouTube**: As a group, we've chosen to expand our collection of YouTube videos by selecting additional keywords associated with the ideology we're studying. Our focus will be on gathering comments from these videos to conduct our research.
    - We used a combination of Andy and Hanna's code to get the comments from YouTube channels.

### Top 5 Democratic YouTube Channels
Vice, Vox, MSNBC, The Daily Show, The Young Turks

In [1]:
pip install --upgrade pip

Note: you may need to restart the kernel to use updated packages.


In [2]:
!pip install --upgrade google-api-python-client --quiet

In [3]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /Users/hlee/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [4]:
# imports
import json

import googleapiclient
import googleapiclient.discovery
import googleapiclient.errors

from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize

In [5]:
# API call
API_KEY = "AIzaSyBtUCrXlQxJ0AJXJ4J2wZWM5InXDfom00A"

youtube = googleapiclient.discovery.build("youtube", "v3", developerKey=API_KEY)

In [6]:
# Define channels
channels = ["Vice", "Vox", "msnbc", "thedailyshow", "TheYoungTurks"]

In [None]:
# Define keywords
isis_keywords = ["ISIS", "Terrorism", "Extremism", "Radicalist"]

guns_keywords = ["Gun", "Shooting", "School shooting", "Firearm", "Gun control", "NRA", "Second Amendment"]

immigration_keywords = ["Immigration", "Border control", "Mexico", "Visa", "Citizenship", "Asylum", "Deportation", "Refugee"]

economy_keywords = ["Economy", "Budget deficit", "Unemployed", "Inflation", "Interest rate", "Federal reserve", "Market", "Employment"]

healthcare_keywords = ["Health care", "Medicaid", "Covid", "Obamacare", "Public health", "Insurance",]

socioeco_keywords = ["Socio-economic", "Rich", "Poor", "Income inequality", "Poverty", "Wealth distribution",]

abortion_keywords = ["Abortion", "Pregnancy", "Unwanted Pregnancy", "Roe", "Wade", "Pro-life", "Rape", "Incest", "Life of mother", "Religion"]

climate_keywords = ["Climate change", "Global Warming", "Carbon", "Alternative Energy", "Climate", "Methane", "Emissions", "Gas", "Greenhouse"]

In [8]:
keyword_lists = {
    "isis": ["ISIS", "Terrorism", "Extremism", "Radicalist"],
    "guns": ["Gun", "Shooting", "School shooting", "Firearm", "Gun control", "NRA", "Second Amendment"],
    "immigration": ["Immigration", "Border control", "Mexico", "Visa", "Citizenship", "Asylum", "Deportation", "Refugee"],
    "economy": ["Economy", "Budget deficit", "Unemployed", "Inflation", "Interest rate", "Federal reserve", "Market", "Employment"],
    "healthcare": ["Health care", "Medicaid", "Covid", "Obamacare", "Public health", "Insurance"],
    "socioeco": ["Socio-economic", "Rich", "Poor", "Income inequality", "Poverty", "Wealth distribution"],
    "abortion": ["Abortion", "Pregnancy", "Unwanted Pregnancy", "Roe", "Wade", "Pro-life", "Rape", "Incest", "Life of mother", "Religion"],
    "climate": ["Climate change", "Global Warming", "Carbon", "Alternative Energy", "Climate", "Methane", "Emissions", "Gas", "Greenhouse"]
}

In [9]:
# Function for getting channel id based on name
def get_channel_id(channel):  
    channel_id = youtube.search().list(
        part="snippet",
        type="channel",
        q=channel
    )

    res_channel = channel_id.execute()
    chan_id = res_channel["items"][0]["id"]["channelId"]

    return chan_id

In [10]:
# Function for retrieving the upload playlist id using channel id
def get_upload_id(channel):
    request = youtube.channels().list(
        part="contentDetails",
        id=channel
    )

    res = request.execute()
    uploads_playlist_id = res["items"][0]["contentDetails"]["relatedPlaylists"]["uploads"]

    return uploads_playlist_id

In [11]:
up_id = []

for channel in channels:
    chan_id = get_channel_id(channel)
    upload_id = get_upload_id(chan_id)
    up_id.append(upload_id)

In [12]:
up_id

['UUn8zNIfYAQNdrFRrr8oibKw',
 'UULXo7UDZvByw2ixzpQCufnA',
 'UUaXkIU1QidjPwiAYu6GcHjg',
 'UUwWhs_6x42TyRM4Wstoq8HA',
 'UU1yBKRuGpC1tSM73A0ZjYjQ']

In [13]:
# Initialize PorterStemmer
ps = PorterStemmer()

# Function to check if a video title contains any of the keywords
def contains_keyword(title, keywords):
    title_lower = title.lower()
    words = word_tokenize(title_lower)
    
    # Stem each word in the title + keyword
    stemmed_words = [ps.stem(word) for word in words]
    for keyword in keywords:
        keyword_stemmed = ps.stem(keyword.lower())
        if keyword_stemmed in stemmed_words:
            return keyword
    return None

In [16]:
# function to fetch videos from a playlist and get title with keywordsand 
def keyword_videos(playlist_id, keywords, channel_name):
    videos_info = []
    next_page_token = None

    while True:
        # Make the next API request using the nextPageToken
        request = youtube.playlistItems().list(
            part="snippet",
            playlistId=playlist_id,
            pageToken=next_page_token
        ) 
        res = request.execute()

        # Process the response and save video info
        for v in res["items"]:
            video_title = v["snippet"]["title"]
            detected_word = contains_keyword(video_title, keywords)
            if detected_word:
                # Separate Resource Call to retrieve video views
                views = youtube.videos().list(id=v['snippet']['resourceId']['videoId'], part="snippet,contentDetails,statistics")
                view_temp = views.execute()
                video_views = view_temp['items'][0]['statistics']['viewCount']

                # Append video information with views to videos_info list
                videos_info.append({
                    "id": v["snippet"]["resourceId"]["videoId"],
                    "title": video_title,
                    "keyword": detected_word,
                    "published_at": v["snippet"]["publishedAt"],
                    "VideoViews": video_views
                })
        # Update the nextPageToken for the next iteration
        next_page_token = res.get('nextPageToken')

        if not next_page_token or (len(videos_info) > 60):
            break
    return videos_info

In [17]:
for channel, upload_id in zip(channels, up_id):
    for keyword_name, keywords in keyword_lists.items():
        videos_info = keyword_videos(upload_id, keywords, channel)

Vice
[{'id': 'hoFN8_6I0s0', 'title': 'This Illegal Climb Got Me Arrested #shorts #freeclimber #extreme #theshard', 'keyword': 'Extremism', 'published_at': '2023-08-10T15:00:36Z', 'VideoViews': '98244'}, {'id': 'wmR2h8jAklg', 'title': 'A New Brand of Hindu Extremism is Going Global | Decade of Hate', 'keyword': 'Extremism', 'published_at': '2023-06-24T15:00:43Z', 'VideoViews': '892599'}, {'id': 'vDcZQKXg3Xk', 'title': 'The (Extreme) San Francisco Hill Bombing Skating Challenge | KING OF THE ROAD (S1 E6)', 'keyword': 'Extremism', 'published_at': '2023-06-17T14:00:41Z', 'VideoViews': '782694'}, {'id': 'SwoRx3tstxY', 'title': 'We Uncovered an ISIS Mass Grave | Super Users', 'keyword': 'ISIS', 'published_at': '2022-04-11T15:00:12Z', 'VideoViews': '349296'}, {'id': 'LttCr8rudxQ', 'title': 'How ISIS Makes Millions From Stolen Antiques | The Business of Crime', 'keyword': 'ISIS', 'published_at': '2022-02-24T16:00:17Z', 'VideoViews': '403961'}, {'id': 'iZloCDhexeM', 'title': 'The Deadly Tigers 

KeyboardInterrupt: 

In [18]:
def get_video_comments(channels, up_id, keyword_lists, limit=30):
    # Function to fetch videos from a playlist and get title with keywordsand 
    def keyword_videos(playlist_id, keywords, channel_name):
        videos_info = []
        next_page_token = None

        while True:
            # Make the next API request using the nextPageToken
            request = youtube.playlistItems().list(
                part="snippet",
                playlistId=playlist_id,
                pageToken=next_page_token
            ) 
            res = request.execute()

            # Process the response and save video info
            for v in res["items"]:
                video_title = v["snippet"]["title"]
                detected_word = contains_keyword(video_title, keywords)
                if detected_word:
                    videos_info.append(
                    {
                        "channel": channel_name,
                        "video_id": v["snippet"]["resourceId"]["videoId"],
                        "title": video_title,
                        "keyword": detected_word,
                        "published_at": v["snippet"]["publishedAt"]
                    }
                    )

            # Update the nextPageToken for the next iteration
            next_page_token = res.get('nextPageToken')

            if not next_page_token or (len(videos_info) > 60):
                break
        return videos_info

    # Function for getting top 30 relevant comments for a list of videos
    def get_vid_comments(vid_lst, limit):
        vids_final = []

        # Iterate through each video in the video list
        for vid in vid_lst:
            
            request = youtube.commentThreads().list(videoId=vid['video_id'],part='id,snippet,replies',textFormat='plainText',order='relevance',maxResults=50)
            res = request.execute()

            # Iterate through each comment
            for v in res["items"]:
                
                # Create a copy of dictionary of current video that is being iterated. This is because each comment is also contained with the video data
                vid_temp = vid.copy()
                vid_temp.update({'CommentId':v['id']})
                vid_temp.update({'CommentTitle':v['snippet']['topLevelComment']['snippet']['textOriginal']})
                vid_temp.update({'CommentCreationTime':v['snippet']['topLevelComment']['snippet']['publishedAt']})
                vid_temp.update({'CommentLikes':v['snippet']['topLevelComment']['snippet']['likeCount']})
                vids_final.append(vid_temp)

            nextPageToken = res.get('nextPageToken')
            while nextPageToken:
                try:
                    request = youtube.commentThreads().list(videoId=vid['video_id'],part='id,snippet,replies',textFormat='plainText',order='relevance',maxResults=50, pageToken=nextPageToken)
                    res = request.execute()
            
                    nextPageToken = res.get('nextPageToken')
                    
                    for v in res["items"]:
                        # Create a copy of dictionary of current video that is being iterated. This is because each comment is also contained with the video data
                        vid_temp = vid.copy()
                        vid_temp.update({'CommentId':v['id']})
                        vid_temp.update({'CommentTitle':v['snippet']['topLevelComment']['snippet']['textOriginal']})
                        vid_temp.update({'CommentCreationTime':v['snippet']['topLevelComment']['snippet']['publishedAt']})
                        vid_temp.update({'CommentLikes':v['snippet']['topLevelComment']['snippet']['likeCount']})
                        vids_final.append(vid_temp)
                        
                    # If the number of saved videos is larger than self-defined limit, break while loop and return the list of videos
                    if len(vids_final) >= limit:
                        return vids_final
                except KeyError:
                    break
                
        return vids_final
    
    all_comments = []
    for channel, upload_id in zip(channels, up_id):
        for keyword_name, keywords in keyword_lists.items():
            videos_info = keyword_videos(upload_id, keywords, channel)
            video_comments = get_vid_comments(videos_info, limit)
            all_comments.extend(video_comments)
    
    return all_comments

In [None]:
get_video_comments(channels, up_id, keyword_lists, limit=30)