# Reddit Scraping

The purpose of this notebook is to scrape data from Reddit related to the Ukraine-Russia war to use for our project

## Setup

Begin by curating the results from searching "russia ukraine war" on Reddit (https://www.reddit.com/search/?q=russia%20ukraine%20war) to ensure that a diverse set of posts related to the conflict (positive, neutral, and negative sentiments) with sufficient number of comments are selected. Through this curation, the 22 posts linked below were identified for scraping.

- https://www.reddit.com/r/MapPorn/comments/11rxpnv/where_every_country_stands_on_the_russiaukraine/
- https://www.reddit.com/r/TooAfraidToAsk/comments/10ltzhs/currently_confused_on_the_whole_russiaukraine_war/
- https://www.reddit.com/r/ThatsInsane/comments/116zlt6/mine_from_russiaukraine_war_blows_up_on_batumi/
- https://www.reddit.com/r/JordanPeterson/comments/115wetc/the_west_is_escalating_the_russiaukraine_war_make/
- https://www.reddit.com/r/Anarcho_Capitalism/comments/11zk8a4/the_libertarian_position_on_the_russiaukraine_war/
- https://www.reddit.com/r/conspiracy/comments/11dqgru/its_hard_to_know_who_to_root_for_in_the_russia/
- https://www.reddit.com/r/inthenews/comments/11r9jf1/desantis_calls_russiaukraine_war_a_territorial/
- https://www.reddit.com/r/worldnews/comments/1012wys/india_again_expresses_grave_concern_over/
- https://www.reddit.com/r/worldnews/comments/115pq8k/netherlands_orders_russian_diplomats_to_leave/
- https://www.reddit.com/r/france/comments/11atjnp/what_does_average_french_think_of_russia_ukraine/
- https://www.reddit.com/r/anime_titties/comments/1042gn1/russiaukraine_war_live_putin_calls_for_36hour/
- https://www.reddit.com/r/Africa/comments/11knizh/is_africa_still_neutral_a_year_into_the_ukraine/
- https://www.reddit.com/r/worldnews/comments/zq7dln/russiaukraine_war_vladimir_putin_targets_traitors/
- https://www.reddit.com/r/AskARussian/comments/u6e7sv/war_in_ukraine_the_megathread_part_3/
- https://www.reddit.com/r/news/comments/t003vl/russia_declares_war_on_ukraine_reports_of/
- https://www.reddit.com/r/UkraineAnxiety/comments/u3ny3p/ukrainerelated_anxiety_megathread_reassurance/
- https://www.reddit.com/r/UkrainianConflict/comments/11zdigz/desantis_brands_putin_a_war_criminal_who_should/
- https://www.reddit.com/r/interestingasfuck/comments/t4cdik/in_1996_ukraine_handed_over_nuclear_weapons_to/
- https://www.reddit.com/r/nextfuckinglevel/comments/t5lp3g/antiwar_protest_in_st_petersburg_russia_march_2/
- https://www.reddit.com/r/worldnews/comments/117526s/zelensky_if_china_allies_itself_with_russia_there/
- https://www.reddit.com/r/worldnews/comments/11ad8p2/india_abstains_as_un_calls_for_russia_to_leave/
- https://www.reddit.com/r/worldnews/comments/1183r5w/putin_falsely_claims_it_was_west_that_started_the/

In [1]:
urls = ['https://www.reddit.com/r/MapPorn/comments/11rxpnv/where_every_country_stands_on_the_russiaukraine/',
'https://www.reddit.com/r/TooAfraidToAsk/comments/10ltzhs/currently_confused_on_the_whole_russiaukraine_war/',
'https://www.reddit.com/r/ThatsInsane/comments/116zlt6/mine_from_russiaukraine_war_blows_up_on_batumi/',
'https://www.reddit.com/r/JordanPeterson/comments/115wetc/the_west_is_escalating_the_russiaukraine_war_make/',
'https://www.reddit.com/r/Anarcho_Capitalism/comments/11zk8a4/the_libertarian_position_on_the_russiaukraine_war/',
'https://www.reddit.com/r/conspiracy/comments/11dqgru/its_hard_to_know_who_to_root_for_in_the_russia/',
'https://www.reddit.com/r/inthenews/comments/11r9jf1/desantis_calls_russiaukraine_war_a_territorial/',
'https://www.reddit.com/r/worldnews/comments/1012wys/india_again_expresses_grave_concern_over/',
'https://www.reddit.com/r/worldnews/comments/115pq8k/netherlands_orders_russian_diplomats_to_leave/',
'https://www.reddit.com/r/france/comments/11atjnp/what_does_average_french_think_of_russia_ukraine/',
'https://www.reddit.com/r/anime_titties/comments/1042gn1/russiaukraine_war_live_putin_calls_for_36hour/',
'https://www.reddit.com/r/Africa/comments/11knizh/is_africa_still_neutral_a_year_into_the_ukraine/',
'https://www.reddit.com/r/worldnews/comments/zq7dln/russiaukraine_war_vladimir_putin_targets_traitors/',
'https://www.reddit.com/r/AskARussian/comments/u6e7sv/war_in_ukraine_the_megathread_part_3/',
'https://www.reddit.com/r/news/comments/t003vl/russia_declares_war_on_ukraine_reports_of/',
'https://www.reddit.com/r/UkraineAnxiety/comments/u3ny3p/ukrainerelated_anxiety_megathread_reassurance/',
'https://www.reddit.com/r/UkrainianConflict/comments/11zdigz/desantis_brands_putin_a_war_criminal_who_should/',
'https://www.reddit.com/r/interestingasfuck/comments/t4cdik/in_1996_ukraine_handed_over_nuclear_weapons_to/',
'https://www.reddit.com/r/nextfuckinglevel/comments/t5lp3g/antiwar_protest_in_st_petersburg_russia_march_2/',
'https://www.reddit.com/r/worldnews/comments/117526s/zelensky_if_china_allies_itself_with_russia_there/',
'https://www.reddit.com/r/worldnews/comments/11ad8p2/india_abstains_as_un_calls_for_russia_to_leave/',
'https://www.reddit.com/r/worldnews/comments/1183r5w/putin_falsely_claims_it_was_west_that_started_the/']

## Scraping

Use [PRAW](https://praw.readthedocs.io/en/stable/) to scrape the comments from each Reddit post in `urls`

In [2]:
!pip install praw



In [3]:
import praw
import copy
# Refer to https://towardsdatascience.com/scraping-reddit-data-1c0af3040768 for setup steps
reddit = praw.Reddit(client_id='XJRwklwrK6R4VOOn5ZxJHg', client_secret='uIqVxOFNJyGIDIdloNN-zWxbSV6kTQ', user_agent='G14Bot/0.0.1')

results = []

for url in urls:
    data = {}
    submission = reddit.submission(url=url)
    
    # Iterate over comments of post
    submission.comments.replace_more(limit=0)
    for comment in submission.comments.list():
        #Begin by adding the post specific fields to keep with all comments of the post
        data['Post - Author'] = submission.author
        data['Post - Date'] = submission.created_utc
        data['Post - Is Distinguished'] = submission.distinguished
        data['Post - Is Edited'] = submission.edited
        data['Post - Is Original Content'] = submission.edited
        data['Post - Is Locked'] = submission.locked
        data['Post - Name'] = submission.name
        data['Post - num_comments'] = submission.num_comments
        data['Post - over_18'] = submission.over_18
        data['Post - Permalink'] = "http://www.reddit.com" + submission.permalink
        data['Post - Score'] = submission.score
        data['Post - Is Spoiler'] = submission.spoiler
        data['Post - Is Stickied'] = submission.stickied
        data['Post - Subreddit'] = submission.subreddit
        data['Post - Title'] = submission.title
        data['Post - Upvote Ratio'] = submission.upvote_ratio
        
        # Add the comment specific fields
        data["ID"] = comment
        data["Author"] = comment.author
        data["Date"] = comment.created_utc
        data["Parent ID Prefix"] = str(comment.parent_id).split("_")[0] # Tier of comment
        data["Parent ID"] = str(comment.parent_id).split("_")[1]
        data["Is Distinguished"] = comment.distinguished
        data["Is Edited"] = comment.edited
        data["Is Stickied"] = comment.stickied
        data["Permalink"] = "http://www.reddit.com" + comment.permalink
        data["Score"] = comment.score
        data["Body"] = comment.body
        
        # Append data to results 
        results.append(copy.deepcopy(data))
    
    print("Added", url)  

Added https://www.reddit.com/r/MapPorn/comments/11rxpnv/where_every_country_stands_on_the_russiaukraine/
Added https://www.reddit.com/r/TooAfraidToAsk/comments/10ltzhs/currently_confused_on_the_whole_russiaukraine_war/
Added https://www.reddit.com/r/ThatsInsane/comments/116zlt6/mine_from_russiaukraine_war_blows_up_on_batumi/
Added https://www.reddit.com/r/JordanPeterson/comments/115wetc/the_west_is_escalating_the_russiaukraine_war_make/
Added https://www.reddit.com/r/Anarcho_Capitalism/comments/11zk8a4/the_libertarian_position_on_the_russiaukraine_war/
Added https://www.reddit.com/r/conspiracy/comments/11dqgru/its_hard_to_know_who_to_root_for_in_the_russia/
Added https://www.reddit.com/r/inthenews/comments/11r9jf1/desantis_calls_russiaukraine_war_a_territorial/
Added https://www.reddit.com/r/worldnews/comments/1012wys/india_again_expresses_grave_concern_over/
Added https://www.reddit.com/r/worldnews/comments/115pq8k/netherlands_orders_russian_diplomats_to_leave/
Added https://www.reddi

In [4]:
import pandas as pd

df = pd.DataFrame.from_records(results)

# Convert dates from UNIX time
df['Post - Date'] = pd.to_datetime(df['Post - Date'], unit='s')
df['Date'] = pd.to_datetime(df['Date'], unit='s')

In [5]:
df.head()

Unnamed: 0,Post - Author,Post - Date,Post - Is Distinguished,Post - Is Edited,Post - Is Original Content,Post - Is Locked,Post - Name,Post - num_comments,Post - over_18,Post - Permalink,...,Author,Date,Parent ID Prefix,Parent ID,Is Distinguished,Is Edited,Is Stickied,Permalink,Score,Body
0,flyingcatwithhorns,2023-03-15 14:16:41,,False,False,False,t3_11rxpnv,392,False,http://www.reddit.com/r/MapPorn/comments/11rxp...,...,iwsfutcmd,2023-03-15 19:15:22,t3,11rxpnv,,False,False,http://www.reddit.com/r/MapPorn/comments/11rxp...,75,Myanmar?\n\nwell that's a surprise
1,flyingcatwithhorns,2023-03-15 14:16:41,,False,False,False,t3_11rxpnv,392,False,http://www.reddit.com/r/MapPorn/comments/11rxp...,...,snowday784,2023-03-15 17:25:13,t3,11rxpnv,,False,False,http://www.reddit.com/r/MapPorn/comments/11rxp...,316,Bolivia what is you doing bby
2,flyingcatwithhorns,2023-03-15 14:16:41,,False,False,False,t3_11rxpnv,392,False,http://www.reddit.com/r/MapPorn/comments/11rxp...,...,micahsaurus,2023-03-15 15:32:45,t3,11rxpnv,,False,False,http://www.reddit.com/r/MapPorn/comments/11rxp...,202,"Kind of misleading.\n\nIt should read, “Which ..."
3,flyingcatwithhorns,2023-03-15 14:16:41,,False,False,False,t3_11rxpnv,392,False,http://www.reddit.com/r/MapPorn/comments/11rxp...,...,grisioco,2023-03-15 14:28:30,t3,11rxpnv,,False,False,http://www.reddit.com/r/MapPorn/comments/11rxp...,371,you know youre in the right when the only coun...
4,flyingcatwithhorns,2023-03-15 14:16:41,,False,False,False,t3_11rxpnv,392,False,http://www.reddit.com/r/MapPorn/comments/11rxp...,...,Foreign_Phone59,2023-03-15 22:18:48,t3,11rxpnv,,False,False,http://www.reddit.com/r/MapPorn/comments/11rxp...,50,Not Afganistan and Myanmar being beacons of re...


In [6]:
print("The shape of the entire dataframe is", df.shape)

The shape of the entire dataframe is (7852, 27)


In [7]:
# Export results to csv
df.to_csv('russia_ukraine_reddit_comments.csv', index=False)