# PART 1- Reddit Data Collection
The notebook below collects data from 'r/india' via the reddit API and using the PRAW (Python Reddit API Wrapper) library in python. The data to be collected is self chosen as mentioned.

## a) Importing Required Libraries
The following libraries were used to scrape data from Reddit: -

In [3]:
import praw
import pandas as pd
import csv
import os

## b) Extracting Credentials 
1. The below code extracts the vital credentials from a file consisting of environment variables (.env).
2. The credentials are necessary to create a Reddit object.
3. It uses the "dotenv library" to access and extract the environment variables.
4. This step was carried out as an immediate precaution to prevent the misuse of credentials. As a result those remain hidden in the .env file which isnt visible to the viewer.

In [4]:
os.chdir('/Users/varishgrover/desktop')#changing the working directory to access the .env file
from dotenv import load_dotenv
load_dotenv() # Loading the .env

CLIENT_ID= os.getenv("CLIENT_ID")
CLIENT_SECRET= os.getenv("CLIENT_SECRET")
USER_AGENT= os.getenv("USER_AGENT")
USER_NAME= os.getenv("USER_NAME")
PASSWORD= os.getenv("PASSWORD")

## c) Creating a Reddit Object
Creating a Reddit object using the extracted credentials which will be later used to access the data from the reddit website.

In [5]:
redditObject= praw.Reddit(client_id= CLIENT_ID,client_secret=CLIENT_SECRET, user_agent=USER_AGENT,
user_name=USER_NAME,password=PASSWORD)
#PRAW library is used

## d) Creating a SubReddit Object
1. Creating a SubReddit object using the reddit object.
2. The used subreddit is 'r/india'.

In [6]:
subredditObject= redditObject.subreddit("india")  #scrapping data from subreddit --> r/india
newSubredditObject=subredditObject.new(limit=60)
#Scrapping 60 latest entries from the "new" section in subreddit r/india. This limit was chosen self chosen
#and can be modified easily

## e) Optional- Accessing available parameters for the Subreddit

In [7]:
"""In order to access and/or know parameters for scrapping from reddit the following code can be handy:-
It will give a list of accessible parameters"""

dir(next(newSubredditObject))

['STR_FIELD',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattr__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_chunk',
 '_comments_by_id',
 '_fetch',
 '_fetch_data',
 '_fetch_info',
 '_fetched',
 '_kind',
 '_reddit',
 '_reset_attributes',
 '_safely_add_arguments',
 '_url_parts',
 '_vote',
 'all_awardings',
 'allow_live_comments',
 'approved_at_utc',
 'approved_by',
 'archived',
 'author',
 'author_flair_background_color',
 'author_flair_css_class',
 'author_flair_richtext',
 'author_flair_template_id',
 'author_flair_text',
 'author_flair_text_color',
 'author_flair_type',
 'author_fullname',
 'author_patreon_flair',
 'author_premium',
 'awarders',
 'banned_at_utc',
 'banned_by',
 'can_gild',
 'can_

## f ) Reading from Reddit and writing to CSV

##### The data which is scrapped from Reddit is written to a .csv file "karish.csv".


#### The following parameters were chosen to be scrapped:-

1. 'Num Comments'- The number of comments on a particular entry(post)
2. 'Ups'- The number of "Up" votes on a particular entry(post)
3. 'Downs'- The number of "Down" votes on a particular entry(post)
4. 'Upvote Ratio'- The ratio of number of upvotes
5. 'Self'- The entry was self posted and not shared
6. 'Link Flair Text'- The flair to which the subreddit post belongs to 
7. 'Link Flair Text Color'- The flair color (Dark/ Light)
8. 'Thumbnail Height- Height of the thumbnail picture'
9. 'Author Flair Text Color'
10. 'TITLE'- The title of the specific entry
11. 'URL'- The URL of the specific entry
12. 'ID'- The ID of the specific entry
13. 'AUTHOR'- The author of the specific entry
14. 'Author Flair Text'

*** MORE DATA CAN EASILY BE SCRAPPED FROM REDDIT BY JUST CONSIDERING MORE PARAMETERS

In [8]:
os.chdir('/Users/varishgrover/Desktop')
with open('karish.csv','w',newline='') as f:
    myWriter=csv.writer(f)
    myWriter.writerow(['TITLE','URL','ID','AUTHOR','Num Comments','Ups','Downs','Link Flair Text','Link Flair Text Color', 'Link Flair Background Color','Author Flair Text','Author Flair Text Color','Thumbnail Height','Self','Upvote Ratio'])
    for i in newSubredditObject:
        myWriter.writerow([i.title,i.url,i.id,i.author,i.num_comments,i.ups,i.downs,i.link_flair_text,i.link_flair_text_color, i.link_flair_background_color,i.author_flair_text,i.author_flair_text_color,i.thumbnail_height,i.is_self,i.upvote_ratio])

## g ) Loading into the Pandas Data Frame
The created CSV file is now read and loaded into the Pandas Data Frame.

In [9]:
work=pd.read_csv('karish.csv')
work.head() # TOP 5 ENTRIES IN THE SCRAPPED DATA

Unnamed: 0,TITLE,URL,ID,AUTHOR,Num Comments,Ups,Downs,Link Flair Text,Link Flair Text Color,Link Flair Background Color,Author Flair Text,Author Flair Text Color,Thumbnail Height,Self,Upvote Ratio
0,Ayesha Takia and husband Farhan Azmi turn thei...,https://www.indiatoday.in/movies/celebrities/s...,g2ya0v,ReallyRedditLover,0,1,0,Coronavirus,dark,,,,78.0,False,1.0
1,"Coronaviru Update: Migrants In Mumbai, Stuck B...",https://www.ndtv.com/mumbai-news/coronaviru-up...,g2y9or,ReallyRedditLover,0,2,0,Coronavirus,dark,,,,78.0,False,1.0
2,COVID-19: Pune cops made lockdown violators si...,https://timesofindia.indiatimes.com/videos/cit...,g2y8xh,ReallyRedditLover,0,1,0,Coronavirus,dark,,,,78.0,False,1.0
3,"Jharkhand HC Grants Bail To Former MP, 5 Other...",https://www.livelaw.in/top-stories/jharkhand-h...,g2y8p9,StorySpiral,0,3,0,Non-Political,dark,#5093d6,,,73.0,False,1.0
4,"No social distancing, no masks! HD Kumaraswamy...",https://economictimes.indiatimes.com/magazines...,g2y7yh,Themistokles_7,0,4,0,Politics,dark,#ddbd37,,,75.0,False,1.0
