# Introduction to Downloading Data from Reddit

Since Tweets download has become slightly difficult and not free. I thought we could atleast explore some social media platform to download user generated content. Let's explore Reddit posts and comments!

You can also explore YouTube comments or traditional web scarping!

# 1: Importing Necessary Libraries

In this cell, we'll import the necessary libraries to download the data only. "praw" is used to connect with Reddit API and help you download the data.

In [2]:
# Setting up Python: Install PRAW
# pip install praw

In [3]:
import praw
import pandas as pd

# Downloading data from Reddit

Reddit is a great source for user-generated content (UGC). You can use the PRAW library in python to access Reddit data.

### Step 1: Apply for a Developer Account
1. Go to [Reddit Apps](https://www.reddit.com/prefs/apps).
2. Click on "Create App" and fill out the form to get your credentials.

### Step 2: Create an App
When creating an app on Reddit, you need to fill out several fields. For many of these fields, you can use placeholder values if you are not planning to deploy a web app. Here’s how you can fill out the form:

1. **Name**: Choose a name for your app. This can be anything that helps you identify the app, like "MyRedditApp".

2. **App type**: Select "script" if you're creating a script to run locally on your machine.

3. **Description**: Provide a brief description of what your app does. This is optional but can be helpful for your own records.

4. **About URL**: This can be a placeholder URL like http://example.com. It’s not necessary for a script.

5. **Redirect URI**: For scripts, you can typically use http://localhost:8080. This is used during the OAuth authentication process.

6. **Developer's terms UR**L: This can be a placeholder URL like http://example.com. It’s optional.

7. **Save**: Click on the "Create app" button.

**Note**: Approval might take some time, depending on the details provided.

### Step 3: Locating Credentials 

1. **Client ID**: When you go to the Reddit apps page (Reddit Apps), find your app. The Client ID is a string of characters displayed just below the name of your app. If it’s not obvious, it’s the 14-character string (typically) just below "personal use script".

2. **Client Secret**: This is labeled as "secret" on the app details page. It is a longer string and is shown only once, so copy it and keep it secure.

3. **User Agent**: This is a string you create yourself. It should be unique and descriptive. For example, MyRedditApp by /u/YourRedditUsername.

### Important Considerations:
- Keep your keys and tokens private and secure.
- Respect Reddit's Developer Agreement and use the API in accordance with the rules and guidelines.
- Monitor your usage to stay within the API rate limits.

With your Reddit App set up, you are now ready to explore and analyze Reddit data using various tools and libraries. Happy coding!

# 2: Set Up Reddit API Credentials:

In [None]:
# Replace these with your own credentials
client_id = 'YOUR_CLIENT_ID'  # This is your developer key
client_secret = 'YOUR_CLIENT_SECRET'  # This is your secret; while sharing codes, please hide this information
user_agent = 'MyRedditApp by /u/YourRedditUsername'  # Create a unique user agent

# Authenticate and access Reddit API
reddit = praw.Reddit(
    client_id=client_id,
    client_secret=client_secret,
    user_agent=user_agent
)

# 3: Fetch Data from Reddit

In [7]:
# Fetch posts from a subreddit
subreddit = reddit.subreddit('climatechange')  # Replace with your target subreddit; I chose climate change, you could literall chose anything from gaming to latest gadget out in the market

posts = subreddit.top(limit=1000)  # Adjust the limit as needed

data = []
for post in posts:
    post.comments.replace_more(limit=0)  # Fetch all comments
    comments = []
    for comment in post.comments.list():
        comments.append(comment.body)
    data.append([post.title, post.selftext, post.score, post.id, post.subreddit, post.url, post.num_comments, post.created_utc, post.author, comments])

# Save to DataFrame
df = pd.DataFrame(data, columns=['title', 'body', 'score', 'id', 'subreddit', 'url', 'num_comments', 'created', 'author', 'comments'])

# Display the DataFrame
df.head()

Unnamed: 0,title,body,score,id,subreddit,url,num_comments,created,author,comments
0,Billions of crabs went missing around Alaska. ...,,4428,17byrco,climatechange,https://www.yahoo.com/news/billions-crabs-went...,738,1697764000.0,,[So basically the deep ocean water got warm en...
1,$266 Trillion in Climate Spending Is a No-Brai...,,4144,17v43h4,climatechange,https://www.bloomberg.com/opinion/articles/202...,677,1699974000.0,Splenda,[Back to Math \n\nThe earth has 7 billion huma...
2,Gray whales have been mysteriously washing up ...,,2443,179qtvw,climatechange,https://www.businessinsider.nl/gray-whales-hav...,371,1697520000.0,shallah,"[Starvation due to climate change, There is le..."
3,Corn Could Grow in the Canadian Arctic Within ...,,2371,17caxnf,climatechange,https://cleanenergyrevolution.co/2023/10/20/co...,448,1697807000.0,Fickle-Flamingo1922,[“We’ll be growing oranges in Alaska”\n\n“God ...
4,Scientists say climate extremes of 2023 point ...,,2274,17jzvwf,climatechange,https://www.newsweek.com/scientists-say-climat...,537,1698690000.0,Splenda,"[""Hey guys, just in case you missed the last 1..."


In [8]:
df.shape

(999, 10)

In [None]:
df.to_csv('reddit_posts.csv')