## Intro
In this notebook I'm going to pull in Reddit data from two subreddits: r/sysadmin and r/programming. I will then convert each into .csv files for later use

In [1]:
#importing standard libraries
import pandas as pd
import numpy as np

#API
import requests

#Automating
import time
import datetime
import warnings
import sys

In [2]:
#set base URL
url = 'https://api.pushshift.io/reddit/search/submission/?subreddit='

#set parameters in a params dictionary
params = {
    'subreddit': 'wow',
    'size':50,
    'lang':True,
    'before': 1649908442    #epoch timestamp
}

res = requests.get(url,params)
res

<Response [200]>

Seeing <Response [200]> is a great sight!

In [3]:
#it can also be viewed as .text or .json()
#try as .json
res.json()

{'data': [{'all_awardings': [],
   'allow_live_comments': False,
   'author': 'TheMidget177',
   'author_flair_css_class': None,
   'author_flair_richtext': [],
   'author_flair_text': None,
   'author_flair_type': 'text',
   'author_fullname': 't2_64d44k4t',
   'author_is_blocked': False,
   'author_patreon_flair': False,
   'author_premium': False,
   'awarders': [],
   'can_mod_post': False,
   'contest_mode': False,
   'created_utc': 1649907897,
   'domain': 'self.wow',
   'full_link': 'https://www.reddit.com/r/wow/comments/u38bo5/i_wanna_get_into_wow_but_im_not_sure_how/',
   'gildings': {},
   'id': 'u38bo5',
   'is_created_from_ads_ui': False,
   'is_crosspostable': True,
   'is_meta': False,
   'is_original_content': False,
   'is_reddit_media_domain': False,
   'is_robot_indexable': True,
   'is_self': True,
   'is_video': False,
   'link_flair_background_color': '',
   'link_flair_css_class': 'question',
   'link_flair_richtext': [{'e': 'text', 't': 'Question'}],
   'link_fla

In [4]:
#convert to a DataFrame
initial_df = pd.DataFrame(res.json()['data'])
initial_df.head(3).T

Unnamed: 0,0,1,2
all_awardings,[],[],[]
allow_live_comments,False,False,False
author,TheMidget177,kbheezy,Born_Illustrator3213
author_flair_css_class,,,
author_flair_richtext,[],[],[]
...,...,...,...
author_flair_background_color,,,
media,,,
media_embed,,,
secure_media,,,


In [5]:
#more EDA
initial_df.columns

Index(['all_awardings', 'allow_live_comments', 'author',
       'author_flair_css_class', 'author_flair_richtext', 'author_flair_text',
       'author_flair_type', 'author_fullname', 'author_is_blocked',
       'author_patreon_flair', 'author_premium', 'awarders', 'can_mod_post',
       'contest_mode', 'created_utc', 'domain', 'full_link', 'gildings', 'id',
       'is_created_from_ads_ui', 'is_crosspostable', 'is_meta',
       'is_original_content', 'is_reddit_media_domain', 'is_robot_indexable',
       'is_self', 'is_video', 'link_flair_background_color',
       'link_flair_css_class', 'link_flair_richtext', 'link_flair_template_id',
       'link_flair_text', 'link_flair_text_color', 'link_flair_type', 'locked',
       'media_only', 'no_follow', 'num_comments', 'num_crossposts', 'over_18',
       'parent_whitelist_status', 'permalink', 'pinned', 'pwls',
       'retrieved_on', 'score', 'selftext', 'send_replies', 'spoiler',
       'stickied', 'subreddit', 'subreddit_id', 'subreddit_sub

In [6]:
initial_df = initial_df.loc[:, ['title',
                        'created_utc',
                       'selftext',
                       'subreddit',
                       'author',
                       'media_only',
                       'permalink',
                       'num_comments']]

initial_df.head()

Unnamed: 0,title,created_utc,selftext,subreddit,author,media_only,permalink,num_comments
0,I wanna get into wow but im not sure how,1649907897,"Hey everyone, I played wow for the 1st time to...",wow,TheMidget177,False,/r/wow/comments/u38bo5/i_wanna_get_into_wow_bu...,0
1,Attempting to come back after a almost 11 year...,1649907818,"Hey beautiful community, I quit wow back in ca...",wow,kbheezy,False,/r/wow/comments/u38awb/attempting_to_come_back...,0
2,Russian sailors evacuate warship withinside th...,1649906197,,wow,Born_Illustrator3213,False,/r/wow/comments/u37toq/russian_sailors_evacuat...,0
3,Leak shows famous hacker group gives bonuses a...,1649906183,,wow,Born_Illustrator3213,False,/r/wow/comments/u37tjk/leak_shows_famous_hacke...,0
4,"whether you like or dislike Shadowlands, Y'all...",1649904578,,wow,Mr_Stach,False,/r/wow/comments/u37c6b/whether_you_like_or_dis...,0


So now that I've proven to myself that I can pull *some* data off of Reddit, I'm going to specifically target my chosen subreddits and pull quite a bit more.

# Building API function

This section is highly highly highly influenced by a video recorded under a different DSI cohort, led by **Sara Soueidan**:
https://generalassembly.zoom.us/rec/play/IEeFJ50KMX_1d4d6ACRj9caeqz_W3V9C3RP4XIOzn8ynAE83APpwbxF3ylJnSJXMFSiNmPo1oHw35Kpl.D1XBopdtlQNWilJ9?continueMode=true&_x_zm_rtaid=2ShimnfWRSqreUf7iKyVRg.1616787450820.4c6921ec8a66ba664a818cf81df2e461&_x_zm_rhtaid=107

In [7]:
# this is a function to grab POSTS from Reddit (that is, not comments)
def get_posts(subreddit, n_iter, epoch_right_now):    #subreddit name and number of times function should run
    
    #store base url variable
    base_url = 'https://api.pushshift.io/reddit/search/submission/?subreddit='
    
    df_list = []                                  #instantiate empty list
    
    current_time = epoch_right_now                # save current epoch, used to iterate in reverse through time
        
    for post in range(n_iter):                    # set up the for loop  
        res = requests.get(                       #instantiate get request
            base_url,                              #requests.get() takes base_url and params
            params = {                             #parameters for get request
                'subreddit' : subreddit,           #specify subreddit
                'size' : 100,                      #specify number of posts to pull
                'lang' : True,                     # ?? dunno but it works
                'before' : current_time            # pull everything from current time backwards
            }
        )
        
        df = pd.DataFrame(res.json()['data'])      # take data from most recent request and store as DataFrame
        df = df.loc[:, ['title',                   # pull specific columns from DataFrame for analysis
                        'created_utc',
                        'selftext',
                        'subreddit',
                        'author',
                        'permalink',
                        'num_comments']]
        
        df_list.append(df)                         # append to empty DataFrame list
        
        time.sleep(.5)                             # add wait time
        
        current_time = df['created_utc'].min()     # set current time counter back to last epoch in recently grabbed df
        
    return pd.concat(df_list,axis=0)

### Note on pulling multiple requests
I've broken down each pull into several smaller grabs -- i.e. instead of running the function 100 times, I'm going to run it 20 times and update the 'created_utc' field with the last entry of the pull, then run 20 times again, then 20 times again, etc.  

I'm not entirely sure of the reasons, but trying to pull everything at once keeps bouncing errors

#### Start with the "sysadmin" subreddit

In [8]:
sysadmin_posts1 = get_posts('sysadmin',20,1649908442)

sysadmin_posts1

Unnamed: 0,title,created_utc,selftext,subreddit,author,permalink,num_comments
0,UPDATE: How to make VMWare Remote Console Work...,1649907326,We got it working!\n\n[Here is what we were wo...,sysadmin,HanSolo71,/r/sysadmin/comments/u385rb/update_how_to_make...,0
1,Windows Server License Terms,1649907297,Hi Everyone \nMy question is really short. \...,sysadmin,Pyme93,/r/sysadmin/comments/u385fz/windows_server_lic...,0
2,Help with a Freeradius3,1649906743,"Hi all, I am looking for some help with a free...",sysadmin,Nvious81,/r/sysadmin/comments/u37zom/help_with_a_freera...,0
3,If our servers crash this Friday,1649906368,"Will they come back to live on Sunday, fixing ...",sysadmin,bjngjie,/r/sysadmin/comments/u37vkh/if_our_servers_cra...,0
4,Is it me or is it the job?,1649905702,So I’m coming up on my 7th year at working at ...,sysadmin,Snorlax_420,/r/sysadmin/comments/u37od5/is_it_me_or_is_it_...,0
...,...,...,...,...,...,...,...
95,Vendor login session recording.,1648489581,Hello folks!\n\nIn need of some help or sugges...,sysadmin,Austronaut1403,/r/sysadmin/comments/tqey3c/vendor_login_sessi...,0
96,Updating Edge Remotly,1648489152,In our envirnment our users use Google Chrome ...,sysadmin,gaz2600,/r/sysadmin/comments/tqesdo/updating_edge_remo...,0
97,Automated Application Updating in Windows via ...,1648488943,"Hi all,\n\n&amp;#x200B;\n\nHas anybody succesf...",sysadmin,cquick00,/r/sysadmin/comments/tqephm/automated_applicat...,0
98,PSA: Don’t put 1L Desktops in a warehouse,1648488130,I just opened up the desktops that the last gu...,sysadmin,DoorDelicious8395,/r/sysadmin/comments/tqee98/psa_dont_put_1l_de...,0


In [10]:
sysadmin_posts2 = get_posts('sysadmin',20,1648487935)

sysadmin_posts2

Unnamed: 0,title,created_utc,selftext,subreddit,author,permalink,num_comments
0,Default printer changes on RDS when disconnecting,1648487609,I have a weird issue that Ive spent weeks tryi...,sysadmin,Noztra_,/r/sysadmin/comments/tqe79f/default_printer_ch...,0
1,Demoting AD Domain Controller. Is there a way ...,1648487266,Would like to be sure nothing is still using t...,sysadmin,AnonymousRockwell,/r/sysadmin/comments/tqe2kq/demoting_ad_domain...,0
2,Need some networking advice,1648487056,"Bear with me on this people, because I know en...",sysadmin,rhutanium,/r/sysadmin/comments/tqdzkx/need_some_networki...,0
3,Need optinions - RDS,1648487051,"Hey fellows,\n\nI have a customer with softwar...",sysadmin,iholu,/r/sysadmin/comments/tqdzi4/need_optinions_rds/,0
4,Cloud Managed Printers?,1648486848,Is there any cheap home office/ small business...,sysadmin,arrecebx,/r/sysadmin/comments/tqdwpb/cloud_managed_prin...,0
...,...,...,...,...,...,...,...
95,server room diagram - best software to use?,1646978760,"Hello,\nI am wondering what you all have used,...",sysadmin,TangoYankeyIT,/r/sysadmin/comments/tbjg5g/server_room_diagra...,0
96,outlook web issue,1646977238,outlook web issue when user search for old ema...,sysadmin,souf_teck,/r/sysadmin/comments/tbj1kj/outlook_web_issue/,0
97,KVM at home - monitor takes forever to display!,1646975044,"I have a dual KVM to Display Port for a 32"" mo...",sysadmin,red_shrike,/r/sysadmin/comments/tbigi6/kvm_at_home_monito...,0
98,1st IT Job: Ready to quit,1646969944,"Hi, so I had started with this cloud company t...",sysadmin,iHayden,/r/sysadmin/comments/tbgyp2/1st_it_job_ready_t...,0


In [11]:
sysadmin_posts3 = get_posts('sysadmin',20,1646969894)

sysadmin_posts3

Unnamed: 0,title,created_utc,selftext,subreddit,author,permalink,num_comments
0,"US Government ""remote"" jobs",1646969217,Browsing remote jobs for US government positio...,sysadmin,ChanklaChucker,/r/sysadmin/comments/tbgqwa/us_government_remo...,0
1,PSADT - uninstall all adobe apps exclude the o...,1646969140,I have already include the deploy-Application....,sysadmin,Past_Special_7306,/r/sysadmin/comments/tbgq2f/psadt_uninstall_al...,0
2,What's your MFA and why'd you choose it over o...,1646967074,My org just setup MFA since we are migrating t...,sysadmin,Grateful4Today,/r/sysadmin/comments/tbg2mk/whats_your_mfa_and...,0
3,I think I found something worse then personal ...,1646966835,"A couple weeks ago, help desk installs a new d...",sysadmin,TestUser12358,/r/sysadmin/comments/tbfzze/i_think_i_found_so...,0
4,Tried: The Galaxy S22 Ultra's 45W charging is ...,1646965595,[removed],sysadmin,techylog,/r/sysadmin/comments/tbfm8e/tried_the_galaxy_s...,0
...,...,...,...,...,...,...,...
95,"Century link static, need some help",1645551474,"Been on hold for a while, figured I would get ...",sysadmin,nickcasa,/r/sysadmin/comments/sytc42/century_link_stati...,0
96,Issues installing Print Management Role,1645550911,I am trying to install print management role o...,sysadmin,ITDerm,/r/sysadmin/comments/syt460/issues_installing_...,0
97,How can I deploy Computer Policies to a comput...,1645550772,"Windows 10 Pro, OpenVPN connection.\n\nWe're s...",sysadmin,segagamer,/r/sysadmin/comments/syt27n/how_can_i_deploy_c...,0
98,Group just for Basic Admin Rights,1645550692,Is there any built in groups for Basic admin r...,sysadmin,Simpuhl,/r/sysadmin/comments/syt139/group_just_for_bas...,0


In [12]:
#Combine the 3 DataFrames above into one megaframe
sysadmin_posts_all = pd.concat([sysadmin_posts1,sysadmin_posts2,sysadmin_posts3])

In [13]:
#save my combined DataFrame as a .csv file
sysadmin_posts_all.to_csv('./datasets/sysadmin_posts.csv',index=False)

#### Repeat for the "programming" subreddit

In [14]:
programming_posts1 = get_posts('programming',20,1649908442)

programming_posts1

Unnamed: 0,title,created_utc,selftext,subreddit,author,permalink,num_comments
0,Algorithm Analysis,1649905566,,programming,xalg0rd,/r/programming/comments/u37mvo/algorithm_analy...,0
1,Deploy Seeker search augmented conversational ...,1649901284,,programming,louis030195,/r/programming/comments/u36b0m/deploy_seeker_s...,0
2,Algorithm Analysis – Data and File Structures,1649898352,,programming,xalg0rd,/r/programming/comments/u35dt7/algorithm_analy...,0
3,Is there an online API for Baking Textures?,1649896730,,programming,USMANHEART,/r/programming/comments/u34utm/is_there_an_onl...,0
4,Github billing bug displays multi-million doll...,1649894660,,programming,AnonymousSeeker5,/r/programming/comments/u3474m/github_billing_...,0
...,...,...,...,...,...,...,...
95,Advanced TypeScript: Type-Level Nested Object ...,1647941737,,programming,mauroerta,/r/programming/comments/tjz5d7/advanced_typesc...,0
96,Refinements in ruby,1647940797,,programming,lukrzrk,/r/programming/comments/tjyy3g/refinements_in_...,0
97,Top Programming Languages 2022 | Certification...,1647940784,,programming,No-Guess5763,/r/programming/comments/tjyy05/top_programming...,0
98,Learn How To Write A C++ App To Solve A 7 Diag...,1647939847,,programming,yimmasabi,/r/programming/comments/tjyqgf/learn_how_to_wr...,0


In [16]:
programming_posts2 = get_posts('programming',20,1647939223)

programming_posts2

Unnamed: 0,title,created_utc,selftext,subreddit,author,permalink,num_comments
0,Building your Engineering Management Craft in ...,1647938981,,programming,cheerfulboy,/r/programming/comments/tjyj8y/building_your_e...,0
1,What is React js ? Why Hire React js developers,1647937182,,programming,Sifars-web-dev,/r/programming/comments/tjy52b/what_is_react_j...,0
2,PCSX2 - New Website,1647936737,,programming,RedDevilus,/r/programming/comments/tjy1ob/pcsx2_new_website/,0
3,Aggression,1647936196,,programming,Prestigious-Prize848,/r/programming/comments/tjxxf8/aggression/,0
4,The Evolution of AWS from a Cloud-Native Devel...,1647934883,,programming,puuut,/r/programming/comments/tjxmnd/the_evolution_o...,0
...,...,...,...,...,...,...,...
95,Check for number of available paths in an Obst...,1645824471,,programming,credoxyz,/r/programming/comments/t1f0sn/check_for_numbe...,0
96,GPU Programming in Fortran : Building a conser...,1645822892,,programming,fluid_numerics,/r/programming/comments/t1efkv/gpu_programming...,0
97,Curse Word Translater,1645822796,,programming,Opposite_Signature67,/r/programming/comments/t1edtq/curse_word_tran...,0
98,NFT,1645822700,,programming,Pleasant_System8231,/r/programming/comments/t1ecm3/nft/,0


In [19]:
programming_posts3 = get_posts('programming',20,1645820958)

programming_posts3

Unnamed: 0,title,created_utc,selftext,subreddit,author,permalink,num_comments
0,Russia Sanctions May Spark Escalating Cyber Co...,1645818391,,programming,feross,/r/programming/comments/t1cnz0/russia_sanction...,0
1,"Python strings are immutable, but only sometimes",1645818239,,programming,pmz,/r/programming/comments/t1clxl/python_strings_...,0
2,Deploy Docker/Compose using Woodpecker CI,1645817611,,programming,Zaiden-Rhys1,/r/programming/comments/t1cdet/deploy_dockerco...,0
3,Dashboard design in wpf Tutorial,1645816571,,programming,IllustrationExpo,/r/programming/comments/t1bz91/dashboard_desig...,0
4,Genuine 300 Free Guest Posting Websites,1645815490,,programming,LatestFaq,/r/programming/comments/t1bjtv/genuine_300_fre...,0
...,...,...,...,...,...,...,...
95,Javascript Tetris,1643768052,,programming,thepan73,/r/programming/comments/sicn7y/javascript_tetris/,0
96,Learn How DoorDash built a No Code platform fo...,1643764603,,programming,kishore-guruswamy,/r/programming/comments/sibebu/learn_how_doord...,0
97,SPA WITHOUT FRAMEWORKS,1643761874,,programming,oppai_silverman,/r/programming/comments/siadjp/spa_without_fra...,0
98,A plain English description of monads without ...,1643760401,,programming,Chrisdone2,/r/programming/comments/si9tyy/a_plain_english...,0


In [20]:
#Combine the 3 DataFrames above into one megaframe
programming_posts_all = pd.concat([programming_posts1,programming_posts2,programming_posts3])
programming_posts_all

Unnamed: 0,title,created_utc,selftext,subreddit,author,permalink,num_comments
0,Algorithm Analysis,1649905566,,programming,xalg0rd,/r/programming/comments/u37mvo/algorithm_analy...,0
1,Deploy Seeker search augmented conversational ...,1649901284,,programming,louis030195,/r/programming/comments/u36b0m/deploy_seeker_s...,0
2,Algorithm Analysis – Data and File Structures,1649898352,,programming,xalg0rd,/r/programming/comments/u35dt7/algorithm_analy...,0
3,Is there an online API for Baking Textures?,1649896730,,programming,USMANHEART,/r/programming/comments/u34utm/is_there_an_onl...,0
4,Github billing bug displays multi-million doll...,1649894660,,programming,AnonymousSeeker5,/r/programming/comments/u3474m/github_billing_...,0
...,...,...,...,...,...,...,...
95,Javascript Tetris,1643768052,,programming,thepan73,/r/programming/comments/sicn7y/javascript_tetris/,0
96,Learn How DoorDash built a No Code platform fo...,1643764603,,programming,kishore-guruswamy,/r/programming/comments/sibebu/learn_how_doord...,0
97,SPA WITHOUT FRAMEWORKS,1643761874,,programming,oppai_silverman,/r/programming/comments/siadjp/spa_without_fra...,0
98,A plain English description of monads without ...,1643760401,,programming,Chrisdone2,/r/programming/comments/si9tyy/a_plain_english...,0


In [21]:
#save my combined DataFrame as a .csv file
programming_posts_all.to_csv('./datasets/programming_posts.csv',index=False)

## Grabbing comments
I'm going to run all the exact same steps above but this time pulling *comments* instead of *posts*. This requires a subtle change in the url, changing the "submission" text to "comment"

I also need to remove ['title','selftext','num_comments'] from the columns that I want to pull, and instead introduce ['body'] for the text body of each comment

### Setting up the new function get_comments()

In [22]:
# this is a function to grab COMMENTS from Reddit (that is, not posts)
def get_comments(subreddit, n_iter, epoch_right_now):    #subreddit name and number of times function should run
    
    #store base url variable
    base_url = 'https://api.pushshift.io/reddit/search/comment/?subreddit='
    
    df_list = []                                  #instantiate empty list
    
    current_time = epoch_right_now                # save current epoch, used to iterate in reverse through time
        
    for post in range(n_iter):                    # set up the for loop  
        res = requests.get(                       #instantiate get request
            base_url,                              #requests.get() takes base_url and params
            params = {                             #parameters for get request
                'subreddit' : subreddit,           #specify subreddit
                'size' : 100,                      #specify number of posts to pull
                'lang' : True,                     # ?? dunno but it works
                'before' : current_time            # pull everything from current time backwards
            }
        )
        
        df = pd.DataFrame(res.json()['data'])      # take data from most recent request and store as DataFrame
        df = df.loc[:, ['body',                   # pull specific columns from DataFrame for analysis
                        'created_utc',
                        'subreddit',
                        'author',
                        'permalink']]
        
        df_list.append(df)                         # append to empty DataFrame list
        
        time.sleep(.5)                             # add wait time
        
        current_time = df['created_utc'].min()     # set current time counter back to last epoch in recently grabbed df
        
    return pd.concat(df_list,axis=0)

In [23]:
sysadmin_comments1 = get_comments('sysadmin',20,1649908442)

sysadmin_comments1

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,On my 15th (and hopefully final) year in the s...,1649908394,sysadmin,hatchikyu,/r/sysadmin/comments/u1yxbt/job_descriptions_t...
1,This is why a lot of us dont mention when we s...,1649908319,sysadmin,Aggravating_Refuse89,/r/sysadmin/comments/u2pr07/ceo_has_recently_s...
2,Yeah I’ve been through that - delayed start wa...,1649908315,sysadmin,M_Keating,/r/sysadmin/comments/u1q5q5/patch_tuesday_mega...
3,From a guy that started in finance and moved t...,1649908220,sysadmin,civbat,/r/sysadmin/comments/u36i7z/im_never_seeing_th...
4,In my experience a TAM is an account manager w...,1649908182,sysadmin,mrhoopers,/r/sysadmin/comments/u3524r/whelp_i_did_it/i4n...
...,...,...,...,...,...
95,"Late to the party here, but the CIS benchmarks...",1649864445,sysadmin,DP3Kevin,/r/sysadmin/comments/c9tu2c/google_chrome_gpo_...
96,You tell them you've picked up the ticket from...,1649864440,sysadmin,wakamoleo,/r/sysadmin/comments/u2ry8l/manager_on_my_case...
97,I do pretty much everything you just described...,1649864436,sysadmin,Fallingdamage,/r/sysadmin/comments/u2q06c/how_do_you_stage_u...
98,"""You're right, you don't need us, I will just ...",1649864418,sysadmin,codifier,/r/sysadmin/comments/u2pr07/ceo_has_recently_s...


In [24]:
sysadmin_comments2 = get_comments('sysadmin',20,1649864406)

sysadmin_comments2

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,ah it referenced the douchebag part my bad. I'...,1649864398,sysadmin,Realistic-Specific27,/r/sysadmin/comments/u2pr07/ceo_has_recently_s...
1,Our IT desk clearly works on how fast they clo...,1649864397,sysadmin,MintReach,/r/sysadmin/comments/u2pr07/ceo_has_recently_s...
2,And Chrome configuration is now in the Adminis...,1649864397,sysadmin,LucidAce,/r/sysadmin/comments/u2mkcf/hybrid_azure_ad_or...
3,And I can 100% guarantee you that CEO doesnt g...,1649864389,sysadmin,crankylinuxuser,/r/sysadmin/comments/u2pr07/ceo_has_recently_s...
4,Very possible. Just strange I can find pretty ...,1649864387,sysadmin,CyberoEXE,/r/sysadmin/comments/u2rquz/what_is_this_file_...
...,...,...,...,...,...
95,"Anything with ""Rockstar"" in the description. Y...",1649812167,sysadmin,c4ctus,/r/sysadmin/comments/u1yxbt/job_descriptions_t...
96,"Yes, you need to license the users for MFA. \n...",1649812128,sysadmin,bakerds,/r/sysadmin/comments/u2d3u4/azure_mfa_with_glo...
97,Automox is about the easiest patch management ...,1649812121,sysadmin,SubbiesForLife,/r/sysadmin/comments/u2cbr8/automox_resources/...
98,this is the plan yes,1649812119,sysadmin,haventmetyou,/r/sysadmin/comments/u2d3u4/azure_mfa_with_glo...


In [25]:
sysadmin_comments3 = get_comments('sysadmin',20,1649812105)

sysadmin_comments3

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,Why? That won't fix anything...,1649812102,sysadmin,ccatlett1984,/r/sysadmin/comments/u2chfw/the_trust_relation...
1,I assume you're using Azure AD as a SAML provi...,1649812081,sysadmin,The_Tolkien_BlackGuy,/r/sysadmin/comments/u2d3u4/azure_mfa_with_glo...
2,What are some green flags?,1649812076,sysadmin,zuckerberghandjob,/r/sysadmin/comments/u1yxbt/job_descriptions_t...
3,Only need cached LOCAL admin creds.....\n\nA s...,1649812062,sysadmin,ccatlett1984,/r/sysadmin/comments/u2chfw/the_trust_relation...
4,Define - attain. It is menial to get a hashed ...,1649812009,sysadmin,DarkEmblem5736,/r/sysadmin/comments/u29ys9/does_anyone_actual...
...,...,...,...,...,...
95,that sounds like an HR issue to me,1649776211,sysadmin,CommadorVic20,/r/sysadmin/comments/u1yxbt/job_descriptions_t...
96,"My band's tip jar alternates between:\n\n""Tipp...",1649776209,sysadmin,Recalcitrant-wino,/r/sysadmin/comments/u1w4sl/looking_for_onelin...
97,How do you think this will impact part-time cr...,1649776194,sysadmin,ragglefrag,/r/sysadmin/comments/u1yf3b/windows_11_will_ne...
98,that sounds like an HR issue to me,1649776191,sysadmin,CommadorVic20,/r/sysadmin/comments/u1yxbt/job_descriptions_t...


In [26]:
sysadmin_comments4 = get_comments('sysadmin',20,1649776158)

sysadmin_comments4

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,Not quite sure if it would meet your needs pro...,1649776156,sysadmin,Dimensional_Dragon,/r/sysadmin/comments/u216vb/teams_rooms_video_...
1,Android Enterprise has been wonky for me since...,1649776155,sysadmin,lemetatron,/r/sysadmin/comments/u1xu1i/android_enterprise...
2,Unlimited vacation = no vacation because you'l...,1649776129,sysadmin,courtarro,/r/sysadmin/comments/u1yxbt/job_descriptions_t...
3,Many Kyocera printers have ceramic drums which...,1649776124,sysadmin,bigbearandy,/r/sysadmin/comments/u1yjpi/a_printer_that_jus...
4,"Got it, we have no need for sites. I'm glad it...",1649776094,sysadmin,TheMerovingian,/r/sysadmin/comments/u20ix1/customer_phones_di...
...,...,...,...,...,...
95,Call them out on Twitter. Easy to do and may g...,1649717110,sysadmin,bitslammer,/r/sysadmin/comments/u1j9t3/any_contacts_at_gr...
96,I'm on confluence cloud...it's a mess. There's...,1649717105,sysadmin,colddream40,/r/sysadmin/comments/u01uia/atlassian_is_still...
97,Californian here. What state are you in? So I ...,1649717091,sysadmin,Martian9576,/r/sysadmin/comments/u15l5m/fortune_says_remot...
98,I can use Where-Object and regex to filter dow...,1649717089,sysadmin,world_gone_nuts,/r/sysadmin/comments/u1jc10/removing_trailing_...


In [27]:
sysadmin_comments5 = get_comments('sysadmin',20,1649717057)

sysadmin_comments5

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,Probably a script that had something like:\n\n...,1649717049,sysadmin,imral,/r/sysadmin/comments/u14qqq/atlassian_just_gav...
1,That is the tricky part. I'd start by taking n...,1649717045,sysadmin,NotYourNanny,/r/sysadmin/comments/u1i7m4/personal_hygiene_w...
2,&gt;This article is sponsored by people with a...,1649716905,sysadmin,slopos,/r/sysadmin/comments/u15l5m/fortune_says_remot...
3,An acquaintance worked for Novell at that time...,1649716867,sysadmin,superlativedave,/r/sysadmin/comments/u15l5m/fortune_says_remot...
4,On prem Exchange or Exchange Online? No to the...,1649716839,sysadmin,ThePirate417,/r/sysadmin/comments/u1i3ap/looking_for_a_way_...
...,...,...,...,...,...
95,Early-days Atlassian had a strong appeal - the...,1649684953,sysadmin,ShillionaireMorty,/r/sysadmin/comments/u14qqq/atlassian_just_gav...
96,For your comment on the disconnected RDP sessi...,1649684950,sysadmin,Pirated_Freeware,/r/sysadmin/comments/u15k3i/security_cadence_k...
97,"Once your body adjusts to caffeine, coke, mdma...",1649684901,sysadmin,Isord,/r/sysadmin/comments/u14qqq/atlassian_just_gav...
98,I wonder if the stats for salary exempt and sa...,1649684900,sysadmin,Hangikjot,/r/sysadmin/comments/u0xdxw/entitled_users_on_...


In [28]:
# concatenate into one big DataFrame
sysadmin_comments_all = pd.concat([sysadmin_comments1,sysadmin_comments2,sysadmin_comments3,sysadmin_comments4,sysadmin_comments5])
sysadmin_comments_all

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,On my 15th (and hopefully final) year in the s...,1649908394,sysadmin,hatchikyu,/r/sysadmin/comments/u1yxbt/job_descriptions_t...
1,This is why a lot of us dont mention when we s...,1649908319,sysadmin,Aggravating_Refuse89,/r/sysadmin/comments/u2pr07/ceo_has_recently_s...
2,Yeah I’ve been through that - delayed start wa...,1649908315,sysadmin,M_Keating,/r/sysadmin/comments/u1q5q5/patch_tuesday_mega...
3,From a guy that started in finance and moved t...,1649908220,sysadmin,civbat,/r/sysadmin/comments/u36i7z/im_never_seeing_th...
4,In my experience a TAM is an account manager w...,1649908182,sysadmin,mrhoopers,/r/sysadmin/comments/u3524r/whelp_i_did_it/i4n...
...,...,...,...,...,...
95,Early-days Atlassian had a strong appeal - the...,1649684953,sysadmin,ShillionaireMorty,/r/sysadmin/comments/u14qqq/atlassian_just_gav...
96,For your comment on the disconnected RDP sessi...,1649684950,sysadmin,Pirated_Freeware,/r/sysadmin/comments/u15k3i/security_cadence_k...
97,"Once your body adjusts to caffeine, coke, mdma...",1649684901,sysadmin,Isord,/r/sysadmin/comments/u14qqq/atlassian_just_gav...
98,I wonder if the stats for salary exempt and sa...,1649684900,sysadmin,Hangikjot,/r/sysadmin/comments/u0xdxw/entitled_users_on_...


In [29]:
sysadmin_comments_all.to_csv('./datasets/sysadmin_comments.csv',index=False)

### Rinse and repeat for r/programming comments

In [30]:
programming_comments1 = get_comments('programming',20,1649908442)

programming_comments1

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,"From my experience, the more managers use the ...",1649908434,programming,d4rkwing,/r/programming/comments/u304hd/agile_and_the_l...
1,Big agree. I've been really struggling at my l...,1649908360,programming,Neurotrace,/r/programming/comments/u304hd/agile_and_the_l...
2,My Jenkins docker image running with two docke...,1649908048,programming,HorseRadish98,/r/programming/comments/u3474m/github_billing_...
3,"Ah that’s part of the issue, no matter what I ...",1649907978,programming,aloha2436,/r/programming/comments/u304hd/agile_and_the_l...
4,"Jenkins recommends two hosts, a jenkins master...",1649907973,programming,bastardoperator,/r/programming/comments/u3474m/github_billing_...
...,...,...,...,...,...
94,Life lesson: if faced with the choice of delet...,1649777974,programming,ambientocclusion,/r/programming/comments/u1qzku/at_last_atlassi...
95,Thanks. AMP is cancer.,1649777973,programming,AttackOfTheThumbs,/r/programming/comments/u1qzku/at_last_atlassi...
96,Can you at least export names into a text and ...,1649777962,programming,zxr7,/r/programming/comments/u1xb0b/do_you_use_back...
97,Exactly.,1649777949,programming,ScientificBeastMode,/r/programming/comments/u182io/there_should_be...


In [31]:
programming_comments2 = get_comments('programming',20,1649777934)

programming_comments2

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,"Yeah, any type though practically always with ...",1649777877,programming,Dealiner,/r/programming/comments/u1xb0b/do_you_use_back...
1,So tell me any other backend/language then tha...,1649777803,programming,KaiAusBerlin,/r/programming/comments/u1o5km/rise_in_npm_pro...
2,I don’t think anyone is arguing in favor of 50...,1649777760,programming,ScientificBeastMode,/r/programming/comments/u1kk70/in_defense_of_s...
3,deja-roo reads like an newly-intermediate deve...,1649777720,programming,topological_rabbit,/r/programming/comments/u1qzku/at_last_atlassi...
4,"It is not too bad now, but still has issues. I...",1649777712,programming,AttackOfTheThumbs,/r/programming/comments/u1qzku/at_last_atlassi...
...,...,...,...,...,...
95,"Whoa, they want a diploma?!",1649603413,programming,cleeder,/r/programming/comments/u01jf1/github_can_now_...
96,Alternatively: _Just don't do code reviews if ...,1649603402,programming,User-Not-Found-Here,/r/programming/comments/u0c48n/legacy_is_where...
97,I've built my career as a consultant on the op...,1649603341,programming,turudd,/r/programming/comments/u0c48n/legacy_is_where...
98,"Fuck yea brother, let's send it down to O'Blan...",1649603288,programming,darn42,/r/programming/comments/tzopru/solid_principle...


In [32]:
programming_comments3 = get_comments('programming',20,1649603281)

programming_comments3

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,In the real world 2 years is practically brand...,1649603275,programming,PancAshAsh,/r/programming/comments/u01jf1/github_can_now_...
1,What is missing is explaining WHY engineers ch...,1649603235,programming,Stoomba,/r/programming/comments/u0c48n/legacy_is_where...
2,&gt; You can't do the same for individual serv...,1649603192,programming,BenjiSponge,/r/programming/comments/u01jf1/github_can_now_...
3,Is there any description of the issue? There h...,1649603084,programming,josefx,/r/programming/comments/u0gyv1/chrome_c_lock_a...
4,And you're arguing for an unproven silver bull...,1649603005,programming,darn42,/r/programming/comments/tzopru/solid_principle...
...,...,...,...,...,...
94,Nooo GPL bad because corpo can't use it.,1649425271,programming,Beneficial_Topic_667,/r/programming/comments/tyvzy2/modified_agplv3...
95,...as opposed to have their hobby projects get...,1649425239,programming,Beneficial_Topic_667,/r/programming/comments/tyvzy2/modified_agplv3...
96,"Well, you only need to be as pessimistic about...",1649425176,programming,okgofigure85,/r/programming/comments/tyenxe/we_struggled_wi...
97,"This is how it works in the liberal arts, too....",1649425139,programming,rwhitisissle,/r/programming/comments/tyr21g/you_should_be_r...


In [33]:
programming_comments4 = get_comments('programming',20,1649425126)

programming_comments4

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,If you're complaining others use copyleft lice...,1649425122,programming,Beneficial_Topic_667,/r/programming/comments/tyvzy2/modified_agplv3...
1,The one thing these articles mis is why I shou...,1649425121,programming,The_Krambambulist,/r/programming/comments/tyr21g/you_should_be_r...
2,Does it not run on Linux?,1649425070,programming,Funny_Willingness433,/r/programming/comments/tyz4jh/zas_editor/i3wd...
3,They're not.\n\n(Unless you mean ulimit -- whi...,1649424959,programming,case-o-nuts,/r/programming/comments/txqvx1/jep_425_virtual...
4,It would let you get rid of the hacks involved...,1649424897,programming,okgofigure85,/r/programming/comments/tyenxe/we_struggled_wi...
...,...,...,...,...,...
95,C is better than both.,1649222748,programming,PuzzleheadedWeb9876,/r/programming/comments/twtcst/comparing_go_vs...
96,"Alright, I get _that_ reference.\n\nFacepalmed...",1649222132,programming,omegafivethreefive,/r/programming/comments/tx1jj9/github_can_now_...
97,"fun fact, keys are already base64-ed in framew...",1649222119,programming,boxonpox,/r/programming/comments/tx1jj9/github_can_now_...
98,Or nobody thought it was very funny.,1649222076,programming,Practical_Cartoonist,/r/programming/comments/tx1jj9/github_can_now_...


In [34]:
programming_comments5 = get_comments('programming',20,1649221674)

programming_comments5

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,I could do about two minutes of that.,1649221270,programming,Zaemz,/r/programming/comments/tx1jj9/github_can_now_...
1,You’re not building two ways to do anything un...,1649221096,programming,Shivalicious,/r/programming/comments/tw3spd/make_beautifull...
2,"&gt; Similarly, the HTML solution only works w...",1649220977,programming,Shivalicious,/r/programming/comments/tw3spd/make_beautifull...
3,"It's not, it's a reference to [this](https://a...",1649220835,programming,noratat,/r/programming/comments/tx1jj9/github_can_now_...
4,"it's a form of illegal hacking, like clicking ...",1649220265,programming,nickcash,/r/programming/comments/tx1jj9/github_can_now_...
...,...,...,...,...,...
95,Mainly how to evaluate programming books. I wo...,1649005450,programming,Ralumier,/r/programming/comments/tv6hh4/whats_the_gener...
96,"Joke's on you, I'm gonna start a band called ""...",1649005321,programming,ilovetacos,/r/programming/comments/tv9atw/horrible_edge_c...
97,The idea that a developer can ever fully under...,1649005239,programming,chillermane,/r/programming/comments/tvcn38/bad_developers_...
98,&gt; Do people not get the word sometimes? \n\...,1649005027,programming,alcohol_enthusiast_,/r/programming/comments/tuzr2r/wordle_is_nphar...


In [35]:
#combine into a mega-DF
programming_comments_all = pd.concat([programming_comments1,programming_comments2,programming_comments3,
                                      programming_comments4,programming_comments5])
programming_comments_all

Unnamed: 0,body,created_utc,subreddit,author,permalink
0,"From my experience, the more managers use the ...",1649908434,programming,d4rkwing,/r/programming/comments/u304hd/agile_and_the_l...
1,Big agree. I've been really struggling at my l...,1649908360,programming,Neurotrace,/r/programming/comments/u304hd/agile_and_the_l...
2,My Jenkins docker image running with two docke...,1649908048,programming,HorseRadish98,/r/programming/comments/u3474m/github_billing_...
3,"Ah that’s part of the issue, no matter what I ...",1649907978,programming,aloha2436,/r/programming/comments/u304hd/agile_and_the_l...
4,"Jenkins recommends two hosts, a jenkins master...",1649907973,programming,bastardoperator,/r/programming/comments/u3474m/github_billing_...
...,...,...,...,...,...
95,Mainly how to evaluate programming books. I wo...,1649005450,programming,Ralumier,/r/programming/comments/tv6hh4/whats_the_gener...
96,"Joke's on you, I'm gonna start a band called ""...",1649005321,programming,ilovetacos,/r/programming/comments/tv9atw/horrible_edge_c...
97,The idea that a developer can ever fully under...,1649005239,programming,chillermane,/r/programming/comments/tvcn38/bad_developers_...
98,&gt; Do people not get the word sometimes? \n\...,1649005027,programming,alcohol_enthusiast_,/r/programming/comments/tuzr2r/wordle_is_nphar...


In [36]:
programming_comments_all.to_csv('./datasets/programming_comments.csv',index=False)