## Reddit Scrapper Overview 
A walkthrough of the different features of the reddit scrapper package

You can read more about the process [here](https://towardsdatascience.com/predicting-reddit-flairs-using-machine-learning-and-deploying-the-model-on-heroku-part-1-574b69098d9a).

In [None]:
# Install the praw library 
!pip install praw

In [None]:
# Import the library
import reddit_scraper as rs
import praw
import pandas as pd

### Authenticating client

In [None]:
# Credentials generated from the reddit developers applications page
# Hidden to protect my details. Add your own info.  
my_client_id = ''
my_client_secret = ''
user = ''

In [None]:
# Authenticate the Reddit instance
reddit = rs.reddit_auth(my_client_id, my_client_secret, user)

In [None]:
# NOTE:- This is currently not autheticating properly. 
# In case you get the output as none then authenticate conventionally
print(reddit)

In [None]:
# Conventional authentication
#reddit = praw.Reddit(client_id=my_client_id, client_secret=my_client_secret, user_agent=user)

Add your own details. I have hidden my data. 

### Scraping Data without the specified flairs

In [None]:
# These are the predefined features and will be set by default 
features = [
    'ID', 
    'is_Original', 
    'Flair',
    'num_comments', 
    'Title',
    'Subreddit', 
    'Body', 
    'URL', 
    'Upvotes',
    'created_on', 
    'Comments'
]

In [None]:
# Set the desired subreddit 
subreddit = "depression"

The parameters are:- 
1. The reddit instance 
2. sub_name: name of the subreddit
3. num_posts: num of posts you want to collect 
4. comments: Set to True to get all the comments. False to only get the top comment. Default- True (Preferable True)

In [None]:
# Collect data in a dataframe
data = rs.scrape_without_flairs(reddit, sub_name=subreddit, 
                                          features=features, 
                                          num_posts=100, comments=False)

In [None]:
data.head()

In [None]:
# Save Data 
data.to_csv('depression_reddit_data.csv')

## Scraping data using reddit flairs

### List the reddit flairs

In [None]:
# Get a list of the unique flairs associated with a subreddit.
flair_list = rs.get_unique_flairs(reddit, sub_name='India', num_posts=100)

In [None]:
print(flair_list)

### Get data based on flairs

In [None]:
# Scrape data with a list of flairs
data = rs.scrape_with_flairs(reddit, sub_name='India', flairs=flair_list, num_per_flair=5, features=features, comments=False)