## Intro
In this notebook I'm going to pull in Reddit data from two subreddits: Astrophysics and Quantum Computing. I will then convert each into .csv files for later use

In [1]:
#importing standard libraries
import pandas as pd
import numpy as np

#API
import requests

#Automating
import time
import datetime
import warnings
import sys

In [2]:
#set base URL
url = 'https://api.pushshift.io/reddit/search/submission/?subreddit='

#set parameters in a params dictionary
params = {
    'subreddit': 'wow',
    'size':50,
    'lang':True,
    'before': 1617223748    #epoch timestamp
}

res = requests.get(url,params)
res

<Response [200]>

Seeing <Response [200]> is a great sight!

In [3]:
#it can also be viewed as .text or .json()
#try as .json
res.json()

{'data': [{'all_awardings': [],
   'allow_live_comments': False,
   'author': 'Xemro',
   'author_flair_css_class': None,
   'author_flair_richtext': [],
   'author_flair_text': None,
   'author_flair_type': 'text',
   'author_fullname': 't2_44tc57ja',
   'author_patreon_flair': False,
   'author_premium': False,
   'awarders': [],
   'can_mod_post': False,
   'contest_mode': False,
   'created_utc': 1617223717,
   'domain': 'self.wow',
   'full_link': 'https://www.reddit.com/r/wow/comments/mhfd13/mount_farming/',
   'gildings': {},
   'id': 'mhfd13',
   'is_crosspostable': True,
   'is_meta': False,
   'is_original_content': False,
   'is_reddit_media_domain': False,
   'is_robot_indexable': True,
   'is_self': True,
   'is_video': False,
   'link_flair_background_color': '',
   'link_flair_css_class': 'question',
   'link_flair_richtext': [{'e': 'text', 't': 'Question'}],
   'link_flair_template_id': 'a8efbf86-494b-11ea-9ee5-0ea9890373cb',
   'link_flair_text': 'Question',
   'link_f

In [4]:
#convert to a DataFrame
initial_df = pd.DataFrame(res.json()['data'])
initial_df.head(3).T

Unnamed: 0,0,1,2
all_awardings,[],[],[]
allow_live_comments,False,False,False
author,Xemro,topazviper,makintrash
author_flair_css_class,,,
author_flair_richtext,[],[],[]
...,...,...,...
author_flair_text_color,,,
removed_by_category,,,
author_flair_background_color,,,
crosspost_parent,,,


In [5]:
#more EDA
initial_df.columns

Index(['all_awardings', 'allow_live_comments', 'author',
       'author_flair_css_class', 'author_flair_richtext', 'author_flair_text',
       'author_flair_type', 'author_fullname', 'author_patreon_flair',
       'author_premium', 'awarders', 'can_mod_post', 'contest_mode',
       'created_utc', 'domain', 'full_link', 'gildings', 'id',
       'is_crosspostable', 'is_meta', 'is_original_content',
       'is_reddit_media_domain', 'is_robot_indexable', 'is_self', 'is_video',
       'link_flair_background_color', 'link_flair_css_class',
       'link_flair_richtext', 'link_flair_template_id', 'link_flair_text',
       'link_flair_text_color', 'link_flair_type', 'locked', 'media_only',
       'no_follow', 'num_comments', 'num_crossposts', 'over_18',
       'parent_whitelist_status', 'permalink', 'pinned', 'pwls',
       'retrieved_on', 'score', 'selftext', 'send_replies', 'spoiler',
       'stickied', 'subreddit', 'subreddit_id', 'subreddit_subscribers',
       'subreddit_type', 'thumbnail'

In [6]:
initial_df = initial_df.loc[:, ['title',
                        'created_utc',
                       'selftext',
                       'subreddit',
                       'author',
                       'media_only',
                       'permalink',
                       'num_comments']]

initial_df.head()

Unnamed: 0,title,created_utc,selftext,subreddit,author,media_only,permalink,num_comments
0,Mount farming,1617223717,"So i’ve been wondering, does going for a old c...",wow,Xemro,False,/r/wow/comments/mhfd13/mount_farming/,0
1,Trading an item that dropped for you in the ra...,1617223699,I sometimes run raids in pugs to build out my ...,wow,topazviper,False,/r/wow/comments/mhfcte/trading_an_item_that_dr...,0
2,I witnessed the secret message from Blizz whil...,1617223683,&amp;#x200B;\n\nhttps://preview.redd.it/ptre2o...,wow,makintrash,False,/r/wow/comments/mhfclr/i_witnessed_the_secret_...,0
3,LFG in American Seever,1617223184,"Hi, I’m from Costa Rica 🇨🇷 . With good English...",wow,chrisque2,False,/r/wow/comments/mhf6f5/lfg_in_american_seever/,0
4,"Ohhhhh , that's what I was doing....",1617222904,,wow,Boars89,False,/r/wow/comments/mhf31e/ohhhhh_thats_what_i_was...,0


So now that I've proven to myself that I can pull *some* data off of Reddit, I'm going to specifically target my chosen subreddits and pull quite a bit more.

# Building API function

This section is highly highly highly influenced by a video recorded under a different DSI cohort, led by **Sara Soueidan**:
https://generalassembly.zoom.us/rec/play/IEeFJ50KMX_1d4d6ACRj9caeqz_W3V9C3RP4XIOzn8ynAE83APpwbxF3ylJnSJXMFSiNmPo1oHw35Kpl.D1XBopdtlQNWilJ9?continueMode=true&_x_zm_rtaid=2ShimnfWRSqreUf7iKyVRg.1616787450820.4c6921ec8a66ba664a818cf81df2e461&_x_zm_rhtaid=107

In [8]:
# this is a function to grab POSTS from Reddit (that is, not comments)
def get_posts(subreddit, n_iter, epoch_right_now):    #subreddit name and number of times function should run
    
    #store base url variable
    base_url = 'https://api.pushshift.io/reddit/search/submission/?subreddit='
    
    df_list = []                                  #instantiate empty list
    
    current_time = epoch_right_now                # save current epoch, used to iterate in reverse through time
        
    for post in range(n_iter):                    # set up the for loop  
        res = requests.get(                       #instantiate get request
            base_url,                              #requests.get() takes base_url and params
            params = {                             #parameters for get request
                'subreddit' : subreddit,           #specify subreddit
                'size' : 100,                      #specify number of posts to pull
                'lang' : True,                     # ?? dunno but it works
                'before' : current_time            # pull everything from current time backwards
            }
        )
        
        df = pd.DataFrame(res.json()['data'])      # take data from most recent request and store as DataFrame
        df = df.loc[:, ['title',                   # pull specific columns from DataFrame for analysis
                        'created_utc',
                        'selftext',
                        'subreddit',
                        'author',
                        'permalink',
                        'num_comments']]
        
        df_list.append(df)                         # append to empty DataFrame list
        
        time.sleep(.5)                             # add wait time
        
        current_time = df['created_utc'].min()     # set current time counter back to last epoch in recently grabbed df
        
    return pd.concat(df_list,axis=0)

### Note on pulling multiple requests
I've broken down each pull into several smaller grabs -- i.e. instead of running the function 100 times, I'm going to run it 20 times and update the 'created_utc' field with the last entry of the pull, then run 20 times again, then 20 times again, etc.  

I'm not entirely sure of the reasons, but trying to pull everything at once keeps bouncing errors

#### Start with the "Astronomy" subreddit

In [9]:
astro_posts1 = get_posts('astrophysics',20,1617223748)

astro_posts1

Unnamed: 0,title,created_utc,selftext,subreddit,author,permalink,num_comments
0,Msc in astrophy after bsc in biophysics?,1617218341,[removed],astrophysics,Previous-Ad-8137,/r/astrophysics/comments/mhdi69/msc_in_astroph...,0
1,Msc. Astrophy after Bsc. Biophysics?,1617217140,[removed],astrophysics,Previous-Ad-8137,/r/astrophysics/comments/mhd2ax/msc_astrophy_a...,0
2,"Not an astrophysicist. But, say I have a pole/...",1617200843,I hope this makes sense and I’m not sounding s...,astrophysics,BlueJ5,/r/astrophysics/comments/mh7a20/not_an_astroph...,19
3,4 Tiny Missions Answering the Biggest Question...,1617199287,,astrophysics,NiklasFiedler,/r/astrophysics/comments/mh6r0q/4_tiny_mission...,3
4,Light bending around a black hole,1617133336,I've been trying to figure out how to use pyth...,astrophysics,Saashiv01,/r/astrophysics/comments/mgo75n/light_bending_...,13
...,...,...,...,...,...,...,...
95,Why does the matter circling around the black ...,1586869993,[removed],astrophysics,charlotte_fns,/r/astrophysics/comments/g15ai7/why_does_the_m...,0
96,Can a moon have visible rings like Saturn? If ...,1586863457,,astrophysics,albin123z123,/r/astrophysics/comments/g13scy/can_a_moon_hav...,6
97,what would a star being consummed from a black...,1586856286,,astrophysics,Uniquelypotatos,/r/astrophysics/comments/g12eks/what_would_a_s...,7
98,About to finish my degree.,1586851031,I am about to enter my last year in school and...,astrophysics,Turkish_Delight98,/r/astrophysics/comments/g11fk1/about_to_finis...,9


In [10]:
astro_posts2 = get_posts('astrophysics',20,1586802508)

astro_posts2

Unnamed: 0,title,created_utc,selftext,subreddit,author,permalink,num_comments
0,PhD Application Help,1586802404,[removed],astrophysics,Asshole_Landlord92,/r/astrophysics/comments/g0omu1/phd_applicatio...,0
1,Cosmic anisotropy has been detected in the X-r...,1586796261,Hello everyone! This is my first serious post ...,astrophysics,AstronomerInDisguise,/r/astrophysics/comments/g0mls5/cosmic_anisotr...,9
2,Penn State vs CU Boulder vs Md,1586788828,Our son has narrowed choice to these 3 for ast...,astrophysics,Curious_pa_mom,/r/astrophysics/comments/g0k760/penn_state_vs_...,13
3,"If you're bored in lockdown, you can help thes...",1586785302,,astrophysics,erwin500,/r/astrophysics/comments/g0j5p0/if_youre_bored...,0
4,An interesting competition: the IAAC!,1586734394,An interesting competition for all of you stuc...,astrophysics,TrackCalc,/r/astrophysics/comments/g07ad9/an_interesting...,0
...,...,...,...,...,...,...,...
95,"If there are multiple universes, must they be ...",1520542860,I find it impossible to accept that our singul...,astrophysics,GeorgeIX,/r/astrophysics/comments/830ymm/if_there_are_m...,17
96,What programming language,1520523146,I've seen that programming is suggested or eve...,astrophysics,QueenOfShadows1991,/r/astrophysics/comments/82ybkv/what_programmi...,5
97,Magnetic Pole Reverse,1520290437,"I know this is more geophysics, but scientists...",astrophysics,analytical-atheist,/r/astrophysics/comments/82a737/magnetic_pole_...,3
98,Rosette Nebula - is a large spherical H II reg...,1520252074,,astrophysics,eleonora1319,/r/astrophysics/comments/825iia/rosette_nebula...,0


In [11]:
astro_posts3 = get_posts('astrophysics',20,1520226107)

astro_posts3

Unnamed: 0,title,created_utc,selftext,subreddit,author,permalink,num_comments
0,Alien civilization,1520166436,How likely is that there exist an alien civili...,astrophysics,Pranavwalker,/r/astrophysics/comments/81xe9o/alien_civiliza...,6
1,Antimatter?,1520150460,"Since antimatter feels gravity in reverse, doe...",astrophysics,MeanFunkymonky,/r/astrophysics/comments/81wgcg/antimatter/,7
2,How accurate are Monty Python's Galaxy Song ly...,1520063964,Simply that,astrophysics,Iphinassa,/r/astrophysics/comments/81oepi/how_accurate_a...,1
3,Habitable Body Around a Neutron Star Companion,1520027667,The question I’m asking here doesn’t relate to...,astrophysics,astrographer,/r/astrophysics/comments/81iso4/habitable_body...,2
4,Question about how we can see quasars and radi...,1519917198,So if we can see a quasar 12-13 billion light ...,astrophysics,dfasano,/r/astrophysics/comments/815xlq/question_about...,4
...,...,...,...,...,...,...,...
95,"Saw Lawrence Krauss talk last night, have some...",1330098698,It was a pretty popular talk at the University...,astrophysics,woodycanuck,/r/astrophysics/comments/q45lp/saw_lawrence_kr...,20
96,Where can I a list of dim satellite flyovers?,1330060876,I know that although this is a subreddit for a...,astrophysics,JunCTionS,/r/astrophysics/comments/q3ooc/where_can_i_a_l...,3
97,Do mathematical models for a white hole model ...,1330011349,"Okay, I understand this may be a frequently as...",astrophysics,[deleted],/r/astrophysics/comments/q2kus/do_mathematical...,5
98,Need some help for Astrophysics/Astronomy Home...,1329944617,"I have very little knowledge on Physics, Astro...",astrophysics,[deleted],/r/astrophysics/comments/q1fsp/need_some_help_...,6


In [19]:
#Combine the 3 DataFrames above into one megaframe
astro_posts_all = pd.concat([astro_posts1,astro_posts2,astro_posts3])

In [51]:
#save my combined DataFrame as a .csv file
astro_posts_all.to_csv('./datasets/astro_posts.csv',index=False)

#### Repeat for the "QuantumComputing" subreddit

In [21]:
quantum_posts1 = get_posts('QuantumComputing',20,1617223748)

quantum_posts1

Unnamed: 0,title,created_utc,selftext,subreddit,author,permalink,num_comments
0,Sandia National Laboratories : Rare open-acces...,1617215626,,QuantumComputing,Chipdoc,/r/QuantumComputing/comments/mhciwe/sandia_nat...,0
1,Some facts about Quantum physics,1617201726,,QuantumComputing,factSciGuy,/r/QuantumComputing/comments/mh7kxm/some_facts...,0
2,Reasoning under uncertainty with a near-term q...,1617140595,,QuantumComputing,zctppe5,/r/QuantumComputing/comments/mgqsd3/reasoning_...,1
3,Quantum Computing Resources,1617129736,Hey everyone — just learned about the space la...,QuantumComputing,grams4days,/r/QuantumComputing/comments/mgmw55/quantum_co...,3
4,What's the complexity of 3-sat algorithm on qu...,1617124577,I am new in this field and I would like to kno...,QuantumComputing,asm-us,/r/QuantumComputing/comments/mgkzxx/whats_the_...,3
...,...,...,...,...,...,...,...
95,"For Qauntum Computing startup Quandela, being ...",1575641254,,QuantumComputing,TheQuantumDaily,/r/QuantumComputing/comments/e6z78v/for_qauntu...,0
96,Pi and Quantum Algorithm,1575611097,Pi and Quantum Algorithm\n\n#pi #quantum #quan...,QuantumComputing,aiforworld2,/r/QuantumComputing/comments/e6ugdg/pi_and_qua...,1
97,Headed to Practical Quantum Computing Conferen...,1575586545,"Hey friends,\n\nAnyone else headed to Q2B's [P...",QuantumComputing,nabil-,/r/QuantumComputing/comments/e6p93j/headed_to_...,23
98,In two years — between 2017 and 2018 — private...,1575570727,,QuantumComputing,TheQuantumDaily,/r/QuantumComputing/comments/e6lcin/in_two_yea...,0


In [22]:
quantum_posts2 = get_posts('QuantumComputing',20,1575563242)

quantum_posts2

Unnamed: 0,title,created_utc,selftext,subreddit,author,permalink,num_comments
0,quantum computing is intelligence and physics/...,1575557002,,QuantumComputing,disciplined_trade,/r/QuantumComputing/comments/e6i3nt/quantum_co...,0
1,The Quantum.Tech conference and exhibition is ...,1575553652,,QuantumComputing,TheQuantumDaily,/r/QuantumComputing/comments/e6hef7/the_quantu...,0
2,Is Amazon Braket a Nothing Burger?,1575519711,In the words of a certain very frequent poster...,QuantumComputing,rrtucci,/r/QuantumComputing/comments/e6buoe/is_amazon_...,5
3,Video from Caltech (Quantum Supremacy in my ti...,1575514245,,QuantumComputing,1ethanhansen,/r/QuantumComputing/comments/e6ap7u/video_from...,0
4,IBM committed to moving quantum computers out ...,1575492836,,QuantumComputing,TheQuantumDaily,/r/QuantumComputing/comments/e65pnq/ibm_commit...,0
...,...,...,...,...,...,...,...
95,Lighting the Way to Miniature devices,1473875395,,QuantumComputing,talius,/r/QuantumComputing/comments/52rqm7/lighting_t...,0
96,Quantum information encoded in spinning black ...,1473874086,,QuantumComputing,ovidiu69,/r/QuantumComputing/comments/52rmjx/quantum_in...,3
97,CQuIC is hiring 4-5 theory postdocs!,1473740860,,QuantumComputing,i2000s,/r/QuantumComputing/comments/52ioz8/cquic_is_h...,0
98,Frame of Essence - You don't know how Quantum ...,1473710590,,QuantumComputing,Strilanc,/r/QuantumComputing/comments/52gcow/frame_of_e...,6


In [25]:
#lowering the funciton repeats from 20 to 5 because of errors
quantum_posts3 = get_posts('QuantumComputing',5,1473614581)

quantum_posts3

Unnamed: 0,title,created_utc,selftext,subreddit,author,permalink,num_comments
0,Intuition for solving problems with quantum co...,1473564465,"Hello. My background is in programming, not ph...",QuantumComputing,cymbalblade,/r/QuantumComputing/comments/526r6o/intuition_...,5
1,University of Waterloo gets $76 million for qu...,1473208165,,QuantumComputing,whitewhim,/r/QuantumComputing/comments/51ige0/university...,1
2,Is the No-communication Theorem flawed?,1473125374,The premise of it (as to my understanding) is ...,QuantumComputing,Ecchii,/r/QuantumComputing/comments/51cscb/is_the_noc...,6
3,D-Wave Founder Eric Ladizinsky - The Coming Qu...,1473114103,,QuantumComputing,5points,/r/QuantumComputing/comments/51by8z/dwave_foun...,0
4,Google’s plan for quantum computer supremacy,1472838077,,QuantumComputing,Strilanc,/r/QuantumComputing/comments/50ugm6/googles_pl...,1
...,...,...,...,...,...,...,...
95,Questions for a beginner in quantum computing,1388432361,"Hi,\n\nI will be taking a couple of courses on...",QuantumComputing,jb_1988,/r/QuantumComputing/comments/1u1b39/questions_...,13
96,How does one observe the ourput of a QC if obs...,1388291097,,QuantumComputing,gravitypushes,/r/QuantumComputing/comments/1tx6un/how_does_o...,13
97,"Quick stupid question, struggling with the con...",1386737748,So if a qubit is just a super position that is...,QuantumComputing,Quantuum,/r/QuantumComputing/comments/1sm1nt/quick_stup...,6
98,Quantum Computing Game,1386721124,"Play this game. (link in bottom), and help a t...",QuantumComputing,akiel123,/r/QuantumComputing/comments/1slc4q/quantum_co...,0


At this point the QuantumComputing subreddit keeps bouncing back errors if I try pulling any more.  It could be there just *aren't any more posts* which makes sense, given that QC is quite a newer field than Astronomy and one with a higher barrier of entry

In [27]:
#Combine the 3 DataFrames above into one megaframe
quantum_posts_all = pd.concat([quantum_posts1,quantum_posts2,quantum_posts3])
quantum_posts_all

Unnamed: 0,title,created_utc,selftext,subreddit,author,permalink,num_comments
0,Sandia National Laboratories : Rare open-acces...,1617215626,,QuantumComputing,Chipdoc,/r/QuantumComputing/comments/mhciwe/sandia_nat...,0
1,Some facts about Quantum physics,1617201726,,QuantumComputing,factSciGuy,/r/QuantumComputing/comments/mh7kxm/some_facts...,0
2,Reasoning under uncertainty with a near-term q...,1617140595,,QuantumComputing,zctppe5,/r/QuantumComputing/comments/mgqsd3/reasoning_...,1
3,Quantum Computing Resources,1617129736,Hey everyone — just learned about the space la...,QuantumComputing,grams4days,/r/QuantumComputing/comments/mgmw55/quantum_co...,3
4,What's the complexity of 3-sat algorithm on qu...,1617124577,I am new in this field and I would like to kno...,QuantumComputing,asm-us,/r/QuantumComputing/comments/mgkzxx/whats_the_...,3
...,...,...,...,...,...,...,...
95,Questions for a beginner in quantum computing,1388432361,"Hi,\n\nI will be taking a couple of courses on...",QuantumComputing,jb_1988,/r/QuantumComputing/comments/1u1b39/questions_...,13
96,How does one observe the ourput of a QC if obs...,1388291097,,QuantumComputing,gravitypushes,/r/QuantumComputing/comments/1tx6un/how_does_o...,13
97,"Quick stupid question, struggling with the con...",1386737748,So if a qubit is just a super position that is...,QuantumComputing,Quantuum,/r/QuantumComputing/comments/1sm1nt/quick_stup...,6
98,Quantum Computing Game,1386721124,"Play this game. (link in bottom), and help a t...",QuantumComputing,akiel123,/r/QuantumComputing/comments/1slc4q/quantum_co...,0


In [50]:
#save my combined DataFrame as a .csv file
quantum_posts_all.to_csv('./datasets/quantum_posts.csv',index=False)

## Grabbing comments
I'm going to run all the exact same steps above but this time pulling *comments* instead of *posts*. This requires a subtle change in the url, changing the "submission" text to "comment"

I also need to remove ['title','selftext','num_comments'] from the columns that I want to pull, and instead introduce ['body'] for the text body of each comment

### Setting up the new function get_comments()

In [32]:
# this is a function to grab COMMENTS from Reddit (that is, not posts)
def get_comments(subreddit, n_iter, epoch_right_now):    #subreddit name and number of times function should run
    
    #store base url variable
    base_url = 'https://api.pushshift.io/reddit/search/comment/?subreddit='
    
    df_list = []                                  #instantiate empty list
    
    current_time = epoch_right_now                # save current epoch, used to iterate in reverse through time
        
    for post in range(n_iter):                    # set up the for loop  
        res = requests.get(                       #instantiate get request
            base_url,                              #requests.get() takes base_url and params
            params = {                             #parameters for get request
                'subreddit' : subreddit,           #specify subreddit
                'size' : 100,                      #specify number of posts to pull
                'lang' : True,                     # ?? dunno but it works
                'before' : current_time            # pull everything from current time backwards
            }
        )
        
        df = pd.DataFrame(res.json()['data'])      # take data from most recent request and store as DataFrame
        df = df.loc[:, ['body',                   # pull specific columns from DataFrame for analysis
                        'created_utc',
                        'subreddit',
                        'author',
                        'permalink']]
        
        df_list.append(df)                         # append to empty DataFrame list
        
        time.sleep(.5)                             # add wait time
        
        current_time = df['created_utc'].min()     # set current time counter back to last epoch in recently grabbed df
        
    return pd.concat(df_list,axis=0)

In [33]:
astro_comments1 = get_comments('astrophysics',20,1617223748)

astro_comments1

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,Your mum says your pole is more like 2 inches ...,1617222727,astrophysics,moon-worshiper,/r/astrophysics/comments/mh7a20/not_an_astroph...
1,"&gt; 760 mph\n\nJust to clarify, thats the spe...",1617220516,astrophysics,Lewri,/r/astrophysics/comments/mh7a20/not_an_astroph...
2,Threw this git repo together [https://github.c...,1617219871,astrophysics,physmathastro,/r/astrophysics/comments/mgo75n/light_bending_...
3,oh okay thanks for clearing that.,1617219522,astrophysics,AryanPandey,/r/astrophysics/comments/mh7a20/not_an_astroph...
4,"again, the movement needs to propagate through...",1617219442,astrophysics,Lewri,/r/astrophysics/comments/mh7a20/not_an_astroph...
...,...,...,...,...,...
95,"I've cited this book in other comments here, b...",1608302901,astrophysics,Cricket_Proud,/r/astrophysics/comments/kfm92e/sources_for_le...
96,&gt;This means ~7 years of college on average....,1608299661,astrophysics,astro-temp,/r/astrophysics/comments/kfbc61/thinking_about...
97,Accept that we are a tiny part of the universe...,1608294599,astrophysics,NedHasWares,/r/astrophysics/comments/kfel0r/smallness/gg8t...
98,I didn't take any chem or bio after high schoo...,1608290144,astrophysics,whiteraven4,/r/astrophysics/comments/keexn1/should_i_take_...


In [34]:
astro_comments2 = get_comments('astrophysics',20,1608290040)

astro_comments2

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,"I would say go for it! In Europe (or at least,...",1608287954,astrophysics,sight19,/r/astrophysics/comments/kfbc61/thinking_about...
1,"Everything is relative, you may be tiny relati...",1608287615,astrophysics,Donauhist,/r/astrophysics/comments/kfel0r/smallness/gg8m...
2,"think about this, it is very likely there are ...",1608285234,astrophysics,Kyce_es,/r/astrophysics/comments/kfel0r/smallness/gg8k...
3,"trust me, you can have it worse, i am desensit...",1608285114,astrophysics,Kyce_es,/r/astrophysics/comments/kfel0r/smallness/gg8k...
4,"Im in the same situation your in, ive been tol...",1608270661,astrophysics,Bankai_77,/r/astrophysics/comments/kfbc61/thinking_about...
...,...,...,...,...,...
95,[deleted],1600497912,astrophysics,[deleted],/r/astrophysics/comments/ivghxo/i_am_not_a_sci...
96,[removed],1600495559,astrophysics,[deleted],/r/astrophysics/comments/iveumg/a_dream_i_want...
97,Veritasium said something about the universe e...,1600495356,astrophysics,SnakeGnim123,/r/astrophysics/comments/ivghxo/i_am_not_a_sci...
98,"I remember when I was a teen, me and a buddy w...",1600491085,astrophysics,traveladdikt,/r/astrophysics/comments/ivghxo/i_am_not_a_sci...


In [35]:
astro_comments3 = get_comments('astrophysics',20,1600486554)

astro_comments3

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,Yes.,1600485144,astrophysics,ilovelamp3303,/r/astrophysics/comments/ivghxo/i_am_not_a_sci...
1,"Given enough time and separation then yes, it'...",1600483563,astrophysics,FireNIceFly,/r/astrophysics/comments/iv3496/colonisation_o...
2,[removed],1600483114,astrophysics,[deleted],/r/astrophysics/comments/ivghxo/i_am_not_a_sci...
3,[removed],1600481863,astrophysics,[deleted],/r/astrophysics/comments/iveumg/a_dream_i_want...
4,Here is one citizen science project using NASA...,1600481497,astrophysics,ChrisARippel,/r/astrophysics/comments/is9y8s/project_ideas/...
...,...,...,...,...,...
95,Well gravity is like a property of the singula...,1593812325,astrophysics,plantgamer63,/r/astrophysics/comments/hki270/do_black_holes...
96,Consider this: we are 3 dimensional beings in ...,1593810711,astrophysics,alexambruby,/r/astrophysics/comments/hkmzm0/is_there_a_pos...
97,If we are incapable of detecting it then we wo...,1593810519,astrophysics,Thomas-Burrell,/r/astrophysics/comments/hkmzm0/is_there_a_pos...
98,STOP BRO look either you a boy or a pot head l...,1593809861,astrophysics,mojindu464,/r/astrophysics/comments/hkmzm0/is_there_a_pos...


In [36]:
astro_comments4 = get_comments('astrophysics',20,1593809648)

astro_comments4

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,[removed],1593809634,astrophysics,[deleted],/r/astrophysics/comments/hkmzm0/is_there_a_pos...
1,"One of the reasons is that astronomy, and to s...",1593807355,astrophysics,sight19,/r/astrophysics/comments/hkim8c/can_someone_ju...
2,It's possible all this was taken into account ...,1593807088,astrophysics,mojindu464,/r/astrophysics/comments/hk817y/what_happens_t...
3,"Ah, I misunderstood/misread...",1593805959,astrophysics,StarPerfect,/r/astrophysics/comments/hkmzm0/is_there_a_pos...
4,"To be clear, I'm not implying that Schrödinger...",1593805876,astrophysics,bear_of_the_woods,/r/astrophysics/comments/hkmzm0/is_there_a_pos...
...,...,...,...,...,...
95,Have you studied maths? A surprising number of...,1587644025,astrophysics,_cosmicomics_,/r/astrophysics/comments/g6kibj/im_a_17_year_o...
96,That's what I was afraid of. I really enjoy ph...,1587641066,astrophysics,YourMumsAVirgin69,/r/astrophysics/comments/g6kibj/im_a_17_year_o...
97,There may possibly be some universities which ...,1587640389,astrophysics,Lewri,/r/astrophysics/comments/g6kibj/im_a_17_year_o...
98,Maybe a diploma course in physics could help,1587640255,astrophysics,duraninx,/r/astrophysics/comments/g6kibj/im_a_17_year_o...


In [37]:
astro_comments5 = get_comments('astrophysics',20,1587639663)

astro_comments5

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,"Steven Hawking book are very interesting, I' m...",1587635178,astrophysics,trisomia_21,/r/astrophysics/comments/g68ifz/17_soon_18_yea...
1,I can't recommend the Feynman lectures enough ...,1587630138,astrophysics,datLasse,/r/astrophysics/comments/g68tuw/does_anyone_ha...
2,I would say the Feynman lectures are great way...,1587627978,astrophysics,datLasse,/r/astrophysics/comments/g68ifz/17_soon_18_yea...
3,I have to be honest and say I'm not quite sure...,1587623755,astrophysics,alcmay76,/r/astrophysics/comments/g6b384/what_are_some_...
4,"I am starting on a bachelor degree in 1,5 year...",1587620429,astrophysics,I-Regret-This-Name,/r/astrophysics/comments/g68ifz/17_soon_18_yea...
...,...,...,...,...,...
95,You can make similar conclusions about redshif...,1579272207,astrophysics,Tremongulous_Derf,/r/astrophysics/comments/epyias/how_do_we_diff...
96,Trust me -- you can usually busk it. Depends o...,1579261416,astrophysics,curiousscribbler,/r/astrophysics/comments/epttky/questions_abou...
97,Light has a frequency that can be measured. Wh...,1579256693,astrophysics,MrMakeItAllUp,/r/astrophysics/comments/epyias/how_do_we_diff...
98,The interplay between story details and the ph...,1579244509,astrophysics,ketarax,/r/astrophysics/comments/epttky/questions_abou...


In [38]:
# concatenate into one big DataFrame
astro_comments_all = pd.concat([astro_comments1,astro_comments2,astro_comments3,astro_comments4,astro_comments5])
astro_comments_all

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,Your mum says your pole is more like 2 inches ...,1617222727,astrophysics,moon-worshiper,/r/astrophysics/comments/mh7a20/not_an_astroph...
1,"&gt; 760 mph\n\nJust to clarify, thats the spe...",1617220516,astrophysics,Lewri,/r/astrophysics/comments/mh7a20/not_an_astroph...
2,Threw this git repo together [https://github.c...,1617219871,astrophysics,physmathastro,/r/astrophysics/comments/mgo75n/light_bending_...
3,oh okay thanks for clearing that.,1617219522,astrophysics,AryanPandey,/r/astrophysics/comments/mh7a20/not_an_astroph...
4,"again, the movement needs to propagate through...",1617219442,astrophysics,Lewri,/r/astrophysics/comments/mh7a20/not_an_astroph...
...,...,...,...,...,...
95,You can make similar conclusions about redshif...,1579272207,astrophysics,Tremongulous_Derf,/r/astrophysics/comments/epyias/how_do_we_diff...
96,Trust me -- you can usually busk it. Depends o...,1579261416,astrophysics,curiousscribbler,/r/astrophysics/comments/epttky/questions_abou...
97,Light has a frequency that can be measured. Wh...,1579256693,astrophysics,MrMakeItAllUp,/r/astrophysics/comments/epyias/how_do_we_diff...
98,The interplay between story details and the ph...,1579244509,astrophysics,ketarax,/r/astrophysics/comments/epttky/questions_abou...


In [49]:
astro_comments_all.to_csv('./datasets/astro_comments.csv',index=False)

### Rinse and repeat for QuantumComputing comments

In [40]:
quantum_comments1 = get_comments('QuantumComputing',20,1617223748)

quantum_comments1

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,Related question: there are cloud-based quantu...,1617218427,QuantumComputing,1E4rth,/r/QuantumComputing/comments/mez40l/is_there_a...
1,"3-SAT is NP-complete, and we don't expect to b...",1617210972,QuantumComputing,crispyleprechaun,/r/QuantumComputing/comments/mgkzxx/whats_the_...
2,It is currently conjectured that quantum compu...,1617210541,QuantumComputing,cirosantilli,/r/QuantumComputing/comments/mgkzxx/whats_the_...
3,[removed],1617176687,QuantumComputing,[deleted],/r/QuantumComputing/comments/mgkzxx/whats_the_...
4,"serious question, since i know nothing about t...",1617175565,QuantumComputing,maexx80,/r/QuantumComputing/comments/mez40l/is_there_a...
...,...,...,...,...,...
95,Note that what China has is not a universal qu...,1607188722,QuantumComputing,quantum_steve,/r/QuantumComputing/comments/k7a2h7/quantum_en...
96,You could always use Grover's algorithm to sol...,1607187780,QuantumComputing,Jonathcraft,/r/QuantumComputing/comments/k77ujg/suggestion...
97,"Sure, simulations seems promising but to what ...",1607187388,QuantumComputing,k3npac2,/r/QuantumComputing/comments/k6vdvo/will_quant...
98,"Ok, thank you",1607185154,QuantumComputing,a_khalid1999,/r/QuantumComputing/comments/k77ujg/suggestion...


In [41]:
quantum_comments2 = get_comments('QuantumComputing',20,1607184312)

quantum_comments2

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,[removed],1607178006,QuantumComputing,[deleted],/r/QuantumComputing/comments/k77ujg/suggestion...
1,"Un-related but I want to know what are the ""pr...",1607177646,QuantumComputing,a_khalid1999,/r/QuantumComputing/comments/k77ujg/suggestion...
2,"DWave sells, I guess. But that is for Cloud ba...",1607177093,QuantumComputing,theimposter2000,/r/QuantumComputing/comments/k3a98m/trying_to_...
3,My best guess is early 2030s. \n\n\nCovid-19...,1607176978,QuantumComputing,theimposter2000,/r/QuantumComputing/comments/k53vhp/quantum_co...
4,[Quantum.country](http://quantum.country)\n\nI...,1607172188,QuantumComputing,roo_sado,/r/QuantumComputing/comments/k6qn2j/for_someon...
...,...,...,...,...,...
95,"I'm a bit late here, but it's possible to crea...",1599083908,QuantumComputing,frraaank,/r/QuantumComputing/comments/ieuk7o/q_when_is_...
96,"Agreed, Loceff's book is a really good jumping...",1599076719,QuantumComputing,Abstract-Abacus,/r/QuantumComputing/comments/ikh511/i_have_jus...
97,"""*While the quantum computers presently sittin...",1599076091,QuantumComputing,Abstract-Abacus,/r/QuantumComputing/comments/ikkwha/quantum_co...
98,[deleted],1599073671,QuantumComputing,[deleted],/r/QuantumComputing/comments/ikkwha/quantum_co...


In [42]:
quantum_comments3 = get_comments('QuantumComputing',20,1599073006)

quantum_comments3

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,Totally agree. Jay Gambetta is already sharpen...,1599066500,QuantumComputing,rrtucci,/r/QuantumComputing/comments/il6tqx/xanadu_lau...
1,I thought quantum dot based qubits would play ...,1599065841,QuantumComputing,bigbossperson,/r/QuantumComputing/comments/il0hl4/quantum_si...
2,"Like all other quantum cloud services, This wi...",1599064936,QuantumComputing,thermolizard,/r/QuantumComputing/comments/il6tqx/xanadu_lau...
3,QC as a business is all a hoax. These people ...,1599064842,QuantumComputing,thermolizard,/r/QuantumComputing/comments/ik6b0r/robert_smi...
4,Classical methods are far superior to anything...,1599064664,QuantumComputing,thermolizard,/r/QuantumComputing/comments/ikizhc/practical_...
...,...,...,...,...,...
95,Two that I know of are [meQuanics](http://www....,1590688378,QuantumComputing,prolynx,/r/QuantumComputing/comments/gsagwt/any_good_p...
96,"Again, there is no known complexity scaling fo...",1590685220,QuantumComputing,powerofshower,/r/QuantumComputing/comments/goi45f/what_the_h...
97,"Ohh sounds good, I will try it, thanks :D",1590657421,QuantumComputing,Yoyotown2000,/r/QuantumComputing/comments/gprprx/quantum_co...
98,I haven't been through the brilliant course bu...,1590656901,QuantumComputing,RedditHG,/r/QuantumComputing/comments/gprprx/quantum_co...


In [43]:
quantum_comments4 = get_comments('QuantumComputing',20,1590656715)

quantum_comments4

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,There is the official Quantum Katas from Micro...,1590654418,QuantumComputing,RedditHG,/r/QuantumComputing/comments/gprprx/quantum_co...
1,I said that phase estimation based quantum alg...,1590651430,QuantumComputing,YuvalRishu,/r/QuantumComputing/comments/goi45f/what_the_h...
2,"This is not a rule, but I don’t know how one c...",1590640117,QuantumComputing,rodrigonader,/r/QuantumComputing/comments/grd54f/understand...
3,The computational complexity of physical probl...,1590636148,QuantumComputing,powerofshower,/r/QuantumComputing/comments/goi45f/what_the_h...
4,I never had a clear idea of the goal posts. If...,1590632968,QuantumComputing,YuvalRishu,/r/QuantumComputing/comments/goi45f/what_the_h...
...,...,...,...,...,...
95,&gt;pytket is a python module for interfacing ...,1581829814,QuantumComputing,Melodious_Thunk,/r/QuantumComputing/comments/f28dp4/ibm_invest...
96,I think Microsoft should use a photo of this m...,1581823644,QuantumComputing,rrtucci,/r/QuantumComputing/comments/f4do7y/the_hitchh...
97,No with quantum data.,1581822259,QuantumComputing,-TheBoyWhoLived,/r/QuantumComputing/comments/f4dtu3/research_t...
98,You should write a blog post about this\nhttps...,1581815391,QuantumComputing,rrtucci,/r/QuantumComputing/comments/f4do7y/the_hitchh...


In [44]:
quantum_comments5 = get_comments('QuantumComputing',20,1581813697)

quantum_comments5

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,Q# is like a Rube-Goldberg machine. Very crypt...,1581804751,QuantumComputing,rrtucci,/r/QuantumComputing/comments/f4do7y/the_hitchh...
1,&gt;ML models used in quantum computing\n\nYou...,1581798318,QuantumComputing,vtomole,/r/QuantumComputing/comments/f4dtu3/research_t...
2,I was thinking like how can we apply ML on qua...,1581794171,QuantumComputing,-TheBoyWhoLived,/r/QuantumComputing/comments/f4dtu3/research_t...
3,"I'm a bot, *bleep*, *bloop*. Someone has linke...",1581793927,QuantumComputing,TotesMessenger,/r/QuantumComputing/comments/f4dtu3/research_t...
4,"Machine learning models with quantum data, or ...",1581793842,QuantumComputing,CarbonIsYummy,/r/QuantumComputing/comments/f4dtu3/research_t...
...,...,...,...,...,...
95,[https://www.lanl.gov/projects/national-securi...,1573330198,QuantumComputing,WorriedPurpose,/r/QuantumComputing/comments/dtt4g0/how_to_get...
96,"Yeah, just thought that post was particularly ...",1573323705,QuantumComputing,SaltKick2,/r/QuantumComputing/comments/dterxx/i_found_th...
97,‘Internships’ are rarely advertised. Find prof...,1573317570,QuantumComputing,youngeverest,/r/QuantumComputing/comments/dtt4g0/how_to_get...
98,I bet the quantum internet people at TUDelf an...,1573316528,QuantumComputing,rrtucci,/r/QuantumComputing/comments/dtt4g0/how_to_get...


In [46]:
#combine into a mega-DF
quantum_comments_all = pd.concat([quantum_comments1,quantum_comments2,quantum_comments3,quantum_comments4,quantum_comments5])
quantum_comments_all

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,Related question: there are cloud-based quantu...,1617218427,QuantumComputing,1E4rth,/r/QuantumComputing/comments/mez40l/is_there_a...
1,"3-SAT is NP-complete, and we don't expect to b...",1617210972,QuantumComputing,crispyleprechaun,/r/QuantumComputing/comments/mgkzxx/whats_the_...
2,It is currently conjectured that quantum compu...,1617210541,QuantumComputing,cirosantilli,/r/QuantumComputing/comments/mgkzxx/whats_the_...
3,[removed],1617176687,QuantumComputing,[deleted],/r/QuantumComputing/comments/mgkzxx/whats_the_...
4,"serious question, since i know nothing about t...",1617175565,QuantumComputing,maexx80,/r/QuantumComputing/comments/mez40l/is_there_a...
...,...,...,...,...,...
95,[https://www.lanl.gov/projects/national-securi...,1573330198,QuantumComputing,WorriedPurpose,/r/QuantumComputing/comments/dtt4g0/how_to_get...
96,"Yeah, just thought that post was particularly ...",1573323705,QuantumComputing,SaltKick2,/r/QuantumComputing/comments/dterxx/i_found_th...
97,‘Internships’ are rarely advertised. Find prof...,1573317570,QuantumComputing,youngeverest,/r/QuantumComputing/comments/dtt4g0/how_to_get...
98,I bet the quantum internet people at TUDelf an...,1573316528,QuantumComputing,rrtucci,/r/QuantumComputing/comments/dtt4g0/how_to_get...


In [48]:
quantum_comments_all.to_csv('./datasets/quantum_comments.csv',index=False)