## The global warming issue and Narratives around it<br>
### Part 1: Pulling data from web API using pushshift

In this notebook, I imported the posts from the following subreddits: <br><br>
    1- [Globalwarming](https://www.reddit.com/r/GlobalWarming/) >: With 6.2k subscribers (Created on May 28, 2008). <br><br>
    2- [Conspiracytheory](https://www.reddit.com/r/ConspiracyTheory/) : with 3.7k subscribers (Created on Dec 11, 2009). <br><br>
    
- **Problem statement:** The reddit users in **[Globalwarming]** subreddit were in general concerned, while the subreddit users in **[Conspiracytheory]** reddit were more interested in raising conspiracy theories. This contrasting viewpoint may be a good binary class target for our NLP analysis, to develop a trained model.

 - The imported posts were converted to DataFrame and later saved into "../datasets" folder for further processing

Importing the required libraries:

- Built a general function to read reddit APIs: "get_reddit_posts.py"<br>
    ("Pushshift API" serves the purpose in pulling the reddit posts)


In [1]:
#imports
import requests
import pandas as pd
import time

#Importing the built function, prior to that added the assets path to the system path
#Inspiration: https://stackoverflow.com/questions/4383571/importing-files-from-different-folder

import sys
# inserting the parent directory into current path
sys.path.insert(1, '../assets')

from get_reddit_posts import get_reddit_posts

import pickle

---

### Part 1.1. Reading in the Global warming API

Calling in the created function and reading global warming data:

In [2]:

#Defining API pull initial parameteres:

par = {"subreddit": "GlobalWarming", #The subreddit title
       "post_num": 4000, # Numer of posts to pull from
        "time_1": int(time.mktime(time.strptime('1 July, 2020', '%d %B, %Y'))), # The latest pull time
       "API_limit": 100, # API pull number limits for reddit per time
       "API_wait": 1 #API wait time berfore the next pull
      }



climate_change = get_reddit_posts(par["subreddit"], par["post_num"], par["time_1"], par[ "API_limit"], par["API_wait"])

100 posts downloaded, oldest post:2020-04-28 16:58:48 - status code: 200, now waiting 1 seconds before next pull. Patience...
100 posts downloaded, oldest post:2020-03-03 03:02:47 - status code: 200, now waiting 1 seconds before next pull. Patience...
100 posts downloaded, oldest post:2020-01-23 13:16:14 - status code: 200, now waiting 1 seconds before next pull. Patience...
100 posts downloaded, oldest post:2020-01-02 05:23:33 - status code: 200, now waiting 1 seconds before next pull. Patience...
100 posts downloaded, oldest post:2019-11-26 18:31:35 - status code: 200, now waiting 1 seconds before next pull. Patience...
100 posts downloaded, oldest post:2019-10-20 17:46:56 - status code: 200, now waiting 1 seconds before next pull. Patience...
100 posts downloaded, oldest post:2019-09-24 10:26:43 - status code: 200, now waiting 1 seconds before next pull. Patience...
100 posts downloaded, oldest post:2019-08-30 12:20:26 - status code: 200, now waiting 1 seconds before next pull. Pati

In [3]:
climate_change.head(5)

Unnamed: 0,all_awardings,allow_live_comments,author,author_flair_css_class,author_flair_richtext,author_flair_text,author_flair_type,author_fullname,author_patreon_flair,author_premium,...,author_id,brand_safe,approved_at_utc,banned_at_utc,suggested_sort,view_count,author_created_utc,distinguished,mod_reports,user_reports
0,[],False,Kafka15,,[],,text,t2_2hd4z3br,False,False,...,,,,,,,,,,
1,[],False,karan_negiiiii,,[],,text,t2_5e3k31xp,False,False,...,,,,,,,,,,
2,[],False,Hildavardr,,[],,text,t2_73p53o6w,False,False,...,,,,,,,,,,
3,[],False,pEppapiGistfuhrer,,[],,text,t2_41l09klf,False,False,...,,,,,,,,,,
4,[],False,BrexitBlaze,,[],,text,t2_2v56rgmf,False,True,...,,,,,,,,,,


In [4]:
climate_change.shape

(3934, 92)

Saving the dataframe into CSV

In [5]:
file_path = "../datasets/" + par["subreddit"] + "_raw" + ".csv"
climate_change.to_csv(file_path)

---

### Part 1.2. Reading in the ConspiracyTheory API

Calling in the created function and reading conspiracyTheory data:

In [6]:
#Defining API pull initial parameteres:

par = {"subreddit": "ConspiracyTheory", #The subreddit title
       "post_num": 900, # Numer of posts to pull from (limited reddit)
        "time_1": int(time.mktime(time.strptime('1 July, 2020', '%d %B, %Y'))), # The latest pull time
       "API_limit": 100, # API pull number limits for reddit per time
       "API_wait": 1 #API wait time berfore the next pull
      }



cons_theory = get_reddit_posts(par["subreddit"], par["post_num"], par["time_1"], par[ "API_limit"], par["API_wait"])

100 posts downloaded, oldest post:2019-11-12 15:09:47 - status code: 200, now waiting 1 seconds before next pull. Patience...
100 posts downloaded, oldest post:2019-09-02 20:00:45 - status code: 200, now waiting 1 seconds before next pull. Patience...
100 posts downloaded, oldest post:2019-07-14 17:57:30 - status code: 200, now waiting 1 seconds before next pull. Patience...
100 posts downloaded, oldest post:2019-05-28 07:49:16 - status code: 200, now waiting 1 seconds before next pull. Patience...
100 posts downloaded, oldest post:2019-03-16 17:29:32 - status code: 200, now waiting 1 seconds before next pull. Patience...
100 posts downloaded, oldest post:2018-08-10 14:35:41 - status code: 200, now waiting 1 seconds before next pull. Patience...
100 posts downloaded, oldest post:2017-11-10 06:12:43 - status code: 200, now waiting 1 seconds before next pull. Patience...
100 posts downloaded, oldest post:2015-08-25 11:30:25 - status code: 200, now waiting 1 seconds before next pull. Pati

In [7]:
cons_theory.head(5)

Unnamed: 0,all_awardings,allow_live_comments,author,author_flair_css_class,author_flair_richtext,author_flair_text,author_flair_type,author_fullname,author_patreon_flair,author_premium,...,banned_at_utc,parent_whitelist_status,suggested_sort,view_count,whitelist_status,author_created_utc,banned_by,mod_reports,user_reports,distinguished
0,[],False,LonelyHampster,,[],,text,t2_30uaspqv,False,False,...,,,,,,,,,,
1,[],False,Switchkillengaged,,[],,text,t2_wgxq1f6,False,False,...,,,,,,,,,,
2,[],False,makiababi,,[],,text,t2_1getuotr,False,False,...,,,,,,,,,,
3,[],False,Raven9nine9,,[],,text,t2_1g7arfuy,False,False,...,,,,,,,,,,
4,[],False,finnagains,,[],,text,t2_14267pk,False,False,...,,,,,,,,,,


In [8]:
cons_theory.shape

(894, 86)

Saving the dataframe into csv

In [9]:
file_path = "../datasets/" + par["subreddit"] + "_raw" + ".csv"
cons_theory.to_csv(file_path)

Now, the data is ready for the next steps