# 📊 Reddit Post Analysis Dashboard (Notebook Version)

This notebook explores Reddit post data to understand:
- Sentiment trends (positive/negative/neutral/etc.)
- Misinformation patterns
- Post-comment theme alignment
- Engagement insights (upvotes/downvotes)


In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import ast

# Load data
df = pd.read_csv("Expanded_Fake_Reddit_Posts.csv")

# Convert stringified lists into real lists
for col in ['post_details'] + [f'comment_{i}_details' for i in range(1, 6)]:
    df[col] = df[col].apply(ast.literal_eval)

# Extract structured fields
df['post_theme'] = df['post_details'].apply(lambda x: x[0])
df['post_misinfo'] = df['post_details'].apply(lambda x: x[1])
df['post_topic'] = df['post_details'].apply(lambda x: x[2])
df['post_date'] = pd.to_datetime(df['post_date'])
df['post_month'] = df['post_date'].dt.to_period('M')
df['upvote_ratio'] = df['post_upvote'] / (df['post_upvote'] + df['post_downvote'])

# Flatten comment data
comment_themes, comment_misinfos = [], []
for i in range(1, 6):
    comment_themes += df[f'comment_{i}_details'].apply(lambda x: x[0]).tolist()
    comment_misinfos += df[f'comment_{i}_details'].apply(lambda x: x[1]).tolist()

comment_theme_df = pd.DataFrame({'theme': comment_themes})
comment_misinfo_df = pd.DataFrame({'misinfo': comment_misinfos})

# Aggregations
misinfo_df = pd.DataFrame({
    'Posts': df['post_misinfo'].value_counts(),
    'Comments': comment_misinfo_df['misinfo'].value_counts()
}).fillna(0)

monthly_trends = df.groupby(['post_month', 'post_theme']).size().unstack(fill_value=0)
monthly_trends.index = monthly_trends.index.to_timestamp()

topic_theme = df.groupby(['post_topic', 'post_theme']).size().unstack(fill_value=0)


### 🎭 Post Theme Distribution

This donut chart shows how frequently different post themes appear — such as positive, neutral, funny, ignorant, etc.
It gives a quick overview of the tone of the posts shared on Reddit.


In [None]:
# Donut chart for post themes
theme_counts = df['post_theme'].value_counts()

plt.figure(figsize=(6, 6))
plt.pie(theme_counts, labels=theme_counts.index, autopct='%1.1f%%',
        startangle=140, wedgeprops=dict(width=0.4), colors=sns.color_palette("Set2"))
plt.title("Post Theme Distribution")
plt.axis('equal')
plt.show()


# Donut chart for post themes
theme_counts = df['post_theme'].value_counts()

plt.figure(figsize=(6, 6))
plt.pie(theme_counts, labels=theme_counts.index, autopct='%1.1f%%',
        startangle=140, wedgeprops=dict(width=0.4), colors=sns.color_palette("Set2"))
plt.title("Post Theme Distribution")
plt.axis('equal')
plt.show()


In [None]:
plt.figure(figsize=(8, 5))
sns.countplot(y='theme', data=comment_theme_df,
              order=comment_theme_df['theme'].value_counts().index,
              palette='Set3')
plt.title("Comment Theme Distribution")
plt.xlabel("Count")
plt.ylabel("Theme")
plt.show()


### ⚠️ Misinformation in Posts vs Comments

This grouped bar chart compares the frequency of misinformation across posts and comments.
Categories include: none, mild, and blatant misinformation.


In [None]:
misinfo_df.plot(kind='bar', figsize=(8, 5), color=sns.color_palette("Set1"))
plt.title("Misinformation in Posts vs Comments")
plt.xlabel("Misinformation Level")
plt.ylabel("Count")
plt.show()


### 📚 Topics Most Discussed in Posts

This horizontal bar chart reveals the most frequent discussion topics, such as:
- AI in education
- Study habits
- Procrastination


In [None]:
plt.figure(figsize=(8, 6))
df['post_topic'].value_counts().sort_values().plot(kind='barh', color='skyblue')
plt.title("Most Discussed Topics in Posts")
plt.xlabel("Post Count")
plt.ylabel("Topic")
plt.show()


### 👍 Upvote Ratio by Post Theme

This box plot shows whether different types of post themes (positive, funny, etc.) receive higher approval (upvote ratio).


In [None]:
plt.figure(figsize=(8, 5))
sns.boxplot(x='post_theme', y='upvote_ratio', data=df, palette="Set2")
plt.title("Upvote Ratio by Post Theme")
plt.xlabel("Theme")
plt.ylabel("Upvote Ratio")
plt.show()


### 🔁 Upvotes vs Downvotes

This scatter plot (log-log scale) helps spot controversial or viral posts based on their upvote/downvote counts.


In [None]:
plt.figure(figsize=(8, 5))
sns.scatterplot(x='post_upvote', y='post_downvote', data=df, alpha=0.6)
plt.xscale("log")
plt.yscale("log")
plt.title("Upvotes vs Downvotes (Log Scale)")
plt.xlabel("Upvotes")
plt.ylabel("Downvotes")
plt.show()


### 📅 Monthly Theme Trends

This line plot tracks how post themes have evolved month by month.
Useful for spotting seasonal or trending sentiment changes.


In [None]:
plt.figure(figsize=(10, 5))
monthly_trends.plot(linewidth=2)
plt.title("Monthly Theme Trends")
plt.xlabel("Month")
plt.ylabel("Post Count")
plt.grid(True)
plt.show()


### 🔥 Topic vs Theme Heatmap

This heatmap shows how different themes appear under each discussion topic.
Helps detect which topics draw controversy, support, humor, etc.


In [None]:
plt.figure(figsize=(10, 6))
sns.heatmap(topic_theme, annot=True, fmt='d', cmap="YlGnBu")
plt.title("Topic vs Theme Heatmap")
plt.xlabel("Theme")
plt.ylabel("Post Topic")
plt.show()


### 📊 Comment Misinformation Distribution

This chart shows how much misinformation exists in the comment sections.
It reflects whether misinformation is more common in replies or not.


In [None]:
plt.figure(figsize=(7, 4))
sns.countplot(x='misinfo', data=comment_misinfo_df, palette="muted",
              order=comment_misinfo_df['misinfo'].value_counts().index)
plt.title("Comment Misinformation Distribution")
plt.xlabel("Misinformation Level")
plt.ylabel("Comment Count")
plt.show()


### 🔥 Most Controversial Posts

Controversy Score = Upvotes + Downvotes  
This chart ranks posts by their combined vote count to spotlight polarizing content.


In [None]:
df['controversy_score'] = df['post_upvote'] + df['post_downvote']
top_controversial = df.sort_values(by='controversy_score', ascending=False).head(10)

plt.figure(figsize=(10, 6))
sns.barplot(y=top_controversial['post_title'], x=top_controversial['controversy_score'], palette="rocket")
plt.xlabel("Controversy Score (Upvotes + Downvotes)")
plt.ylabel("Post Title")
plt.title("Top 10 Most Controversial Posts")
plt.show()


### 🎯 Theme Alignment Between Posts & Comments

This compares whether comment tone aligns with the original post’s theme.  
Helps identify agreement vs sarcasm, trolling, or sentiment drift.


In [None]:
# Match count across 5 comments per post
alignment = []
for i in range(1, 6):
    match = df['post_theme'] == df[f'comment_{i}_details'].apply(lambda x: x[0])
    alignment.extend(match)

alignment_df = pd.Series(alignment).value_counts(normalize=True) * 100

plt.figure(figsize=(6, 4))
alignment_df.plot(kind='bar', color=['green', 'red'])
plt.xticks([0, 1], ['Matched', 'Did Not Match'], rotation=0)
plt.ylabel("Percentage of Comments")
plt.title("Theme Match Between Post & Comments")
plt.show()


### 📈 Monthly Trend of Misinformation in Posts

This line chart shows how misinformation levels (e.g. none, mild, blatant) have changed over time.


In [None]:
monthly_misinfo = df.groupby(['post_month', 'post_misinfo']).size().unstack().fillna(0)
monthly_misinfo.index = monthly_misinfo.index.to_timestamp()

plt.figure(figsize=(10, 5))
monthly_misinfo.plot()
plt.title("Monthly Trend of Misinformation in Posts")
plt.xlabel("Month")
plt.ylabel("Number of Posts")
plt.grid(True)
plt.show()
