# Create API scraper  - For reddit
This script scrapes defined subreddits of your choice from reddit. It only scrapes the title and body of posts. 
It doesn't scrape any usernames, nor any comments. You can change what type of posts you want to use, like hot, new, top.
I also made it skip the first three top posts, since they are often moderator stickied posts. I don't know how to ignore those, so skipping them was easier. It means some of these might be scraped if there's more than three. You can set the number of posts, and change where to save the file and format. This saves it as a csv file in my directory. 

### Step 1. Find reddit's API tool
Define your Reddit API credentials - You create an "app" from this developer link https://old.reddit.com/prefs/apps
### Step 1: Name your app 
Under Redirect URI, put in: http://localhost (required, even if not used in this script)

optionals: add description
### Step 2: Retrieve Credentials
After creating the app, you’ll see your new app listed. Take note of the following:

   * Client ID: Located just below your app name (a string of characters).
   * Client Secret: Found next to the "secret" label.
   * User Agent: A string describing your application. Use something like "MyRedditScraper/1.0 by [your_reddit_username]".
    

In [None]:
import praw
import pandas as pd
import os.path

In [10]:
# Put in creditionals from the app you created through reddit API
CLIENT_ID = 'your client id here'
CLIENT_SECRET = 'your client secret here'
USER_AGENT = '"MyRedditScraper"'

# Initialize the Reddit API client
reddit = praw.Reddit(
    client_id=CLIENT_ID,
    client_secret=CLIENT_SECRET,
    user_agent=USER_AGENT
)

def scrape_subreddit(subreddit_name, limit=100):
    """
    Scrapes text bodies of posts from a subreddit.

    Args:
        subreddit_name (str): Name of the subreddit to scrape.
        limit (int): Maximum number of posts to scrape.

    Returns:
        pd.DataFrame: A DataFrame containing indexed posts with text bodies.
    """
    subreddit = reddit.subreddit(subreddit_name)
    posts = []

    for idx, submission in enumerate(subreddit.hot(limit=limit)):
        if idx < 3:  # Skip the first 3 posts
            continue
        post_data = {
            'Index': idx + 1,
            'Title': submission.title,
            'Body': submission.selftext
        }
        posts.append(post_data)

    # Convert to DataFrame
    return pd.DataFrame(posts)

# Define Subreddit to scrape from
if __name__ == "__main__":
    subreddit_name = "schizophrenia"  # Replace with your desired subreddit
    post_limit = 50  # Number of posts to scrape

    data = scrape_subreddit(subreddit_name, limit=post_limit)

    # Specify the folder to save the file
    output_folder = "Mental_data"
    os.makedirs(output_folder, exist_ok=True)
   
    # Save the data to a CSV file
    output_file = os.path.join(output_folder, f"{subreddit_name}_posts.csv")
    data.to_csv(output_file, index=False)
    print(f"Data saved to {output_file}")


Data saved to Mental_data/schizophrenia_posts.csv
