# **1. Install PRAW**

In [None]:
#!pip install praw
import praw

# **2. Create a Reddit App**

To access the Reddit API, you'll need to create an application on Reddit and obtain your API credentials. Follow these steps:

- Go to the Reddit website (https://www.reddit.com/) and log in to your account. Feel free to create a throwaway account for this project!
- Navigate to the Reddit Apps page (https://www.reddit.com/prefs/apps).
- Click the "are you a developer? create an app..." button in the top left.
- Provide a name for your app (e.g., "PRAW"), select the app type ('script') , and optionally add a description. Use http://localhost:8080 as your redirect URI.
- After submitting the form, you will reach a page that looks like the following image. You'll see your application's details, including the client ID and client secret. Keep these credentials handy for the next step.


![Praw](https://www.honchosearch.com/hubfs/Imported_Blog_Media/Client-ID-Client-Secret.png)

# **3. Initialize PRAW**

In [None]:
reddit = praw.Reddit(
    client_id='YOUR_CLIENT_ID',
    client_secret='YOUR_CLIENT_SECRET',
    user_agent='YOUR_USER_AGENT',
    username='YOUR_REDDIT_USERNAME',
    password='YOUR_REDDIT_PASSWORD'
)

Replace 'YOUR_CLIENT_ID', 'YOUR_CLIENT_SECRET', 'YOUR_USER_AGENT', 'YOUR_REDDIT_USERNAME', and 'YOUR_REDDIT_PASSWORD' with your actual Reddit API credentials.

Your user agent is an identifier used by reddit to identify the source of requests. You can make it whatever you want, but you'll want to choose something descriptive and unique, and it's recommended that your username is included.

**I have removed my own credentials from this workbook. We can show you how to hide your credentials before submitting the project! The following code will need your own credentials in order to successfully work.**

# 4. Take a look at the documentation [here](https://praw.readthedocs.io/)!

In [None]:
# Below is JUST an example of how you can use PRAW

# Choose your subreddit
subreddit = reddit.subreddit('boardgames')

# Adjust the limit as needed -- Note that this will grab the 25 most recent posts
posts = subreddit.new(limit=25)

## NOTE
- Reddit API Limitations: The Reddit API imposes limitations on the number of posts you can retrieve in a single request. The maximum number of posts per request is typically 100. Therefore, if you set the limit parameter to a value greater than 100, PRAW will make multiple requests behind the scenes to fetch the desired number of posts.
- Rate Limiting: The Reddit API also enforces rate limits to prevent abuse and ensure fair usage. The specific rate limits can vary depending on factors such as your Reddit account's age and karma. As a standard user, you're typically allowed to make 60 requests per minute. If you exceed the rate limit, you may receive an error response until the rate limit resets.
- TIP: You can use the created_utc attribute of a post to keep track of the timestamp and ensure non-overlapping pulls. The created_utc attribute represents the post's creation time in UTC.

In [None]:
import pandas as pd

data = []
for post in posts:
    data.append([post.created_utc, post.title, post.selftext, post.subreddit])

# Turn into a dataframe
boardgames = pd.DataFrame(data, columns = ['created_utc', 'title', 'self_text', 'subreddit'])
boardgames.head()

Remember, you will need to pull *at least* 1000 posts from each subreddit, not just 25. Like I mentioned above, you can use the created_utc attribute of a post to keep track of the timestamp and ensure non-overlapping pulls. We will leave this work for you all to complete.

Once you have at least 1000 posts from each subreddit, you can do some EDA (perhaps maybe the most common words in each subreddit..?) Eventually, you will want to combine your two dataframes together to do modeling.

### Hopefully this is enough of a tutorial to help get you started! If you have any questions, let us know!

### Note: Rather than working in this template notebook, make a brand new "scraping" notebook (or script), with your own comments, so you can use this project in a portfolio!