**Sampling caveat**  
Posts were fetched with `subreddit.top(limit=1000)`, i.e. in **score (rank) order**.  
Therefore each subreddit’s 1 000 rows represent its highest-scoring content, not its most recent timeline.  
Large date gaps (e.g. r/worldnews has no posts after 2023-03) mean newer posts have not yet reached the score level required to enter the all-time top list.  
Time-series insights should be limited to the visible date ranges or the data should be recollected using `subreddit.new()` if chronological coverage is required.

### Setup & Imports

In [1]:
import praw
import pandas as pd
import os
from dotenv import load_dotenv

load_dotenv()

reddit = praw.Reddit(
    client_id=os.getenv('REDDIT_CLIENT_ID'),
    client_secret=os.getenv('REDDIT_CLIENT_SECRET'),
    user_agent=os.getenv('REDDIT_USER_AGENT')
)

print("Authenticated as:", reddit.user.me() or "anonymous")


Authenticated as: anonymous


### Data Collection

In [2]:
# Define a function to extract posts from a subreddit
def get_top_posts(subreddit_name, limit=100):
    subreddit = reddit.subreddit(subreddit_name)
    posts_data = []

    for post in subreddit.top(limit=limit):  
        posts_data.append({
            "title": post.title,
            "score": post.score,
            "url": post.url,
            "created_utc": post.created_utc,
            "num_comments": post.num_comments,
            "subreddit": subreddit_name
        })

    return pd.DataFrame(posts_data)

# Collect data from 6 subreddits
news_df = get_top_posts("news", limit=500)
worldnews_df = get_top_posts("worldnews", limit=500)
politics_df = get_top_posts("politics", limit=500)
technology_df = get_top_posts("technology", limit=500) 
worldpolitics_df = get_top_posts("worldpolitics", limit=500)
TrueReddit_df = get_top_posts("TrueReddit", limit=500)

# Save to CSV files
news_df.to_csv("news_data.csv", index=False)
worldnews_df.to_csv("worldnews_data.csv", index=False)
politics_df.to_csv("politics_data.csv", index=False)
technology_df.to_csv("technology_data.csv", index=False)
worldpolitics_df.to_csv("worldpolitics_data.csv", index=False)
TrueReddit_df.to_csv("TrueReddit_data.csv", index=False)

print("✅ Data saved to CSV files successfully!")


✅ Data saved to CSV files successfully!
