Skip to content

manishreddy123/Reddit-Persona-Generator-using-OpenRouter-API

Repository files navigation

🤖 Reddit Persona Generator using OpenRouter API 🚀

Welcome to the Reddit Persona Generator – a powerful tool that transforms Reddit user activity into deep, emotionally enriched, AI-generated user personas.

Personas are saved as text files in the output/ directory.

This project combines:

  • 📄 Reddit data scraping
  • 🧠 Sentiment & emotion analysis
  • 🪄 Advanced prompt engineering
  • 🔮 LLM-powered personality inference using OpenRouter

Project Directory Structure

reddit_persona_generator/
├── config.py
├── reddit_scraper.py
├── openrouter_credit_report.py
├── llm_inference.py
├── main.py
├── preprocessing.py
├── prompt_builder.py
├── persona_writer.py
├── requirements.txt
├── test_openrouter_key.py
├── README.md
├── data/
│   ├── [username]_raw.json
│   └── ...
├── enriched_data/
│   ├── [username]_enriched.json
│   └── ...
└── output/
    ├── [username]_persona.txt
    └── ...
  • data/ — Raw scraped Reddit data JSON files
  • enriched_data/ — Enriched JSON files with sentiment, emotion, NER, archetype, toxicity
  • output/ — Final generated persona text files
  • Root folder contains main Python scripts and configuration files

📌 Table of Contents


🎯 Project Overview

The Reddit Persona Generator takes a Reddit username and analyzes their public posts and comments to infer a psychographic and behavioral profile. This profile includes:

  • 🔍 Name, age, location (inferred)
  • 🧬 Personality traits (MBTI-style)
  • 🧠 Motivations and emotional triggers
  • 🧭 Decision-making patterns
  • 💳 Spending habits and brand loyalties

The analysis pipeline leverages HuggingFace Transformers, VADER sentiment analysis, and OpenRouter API to create LLM-driven personas in plain English.


⚙️ Setup Instructions

🔁 1. Clone the Repository

git clone https://github.com/yourusername/reddit_persona_generator.git
cd reddit_persona_generator

🧪 2. (Optional) Set Up Virtual Environment

python -m venv venv
# Mac/Linux
source venv/bin/activate
# Windows
venv\Scripts\activate

📦 3. Install Dependencies

pip install -r requirements.txt

🔐 4. Set up Reddit & OpenRouter Credentials

Create an OpenRouter API Key

  • Go to OpenRouter
  • Sign up and generate your API key
  • While selecting the model to use it is preferred to use "google/gemini-2.5-pro" if you have credits in openrouter account other wise use "nvidia/llama-3.1-nemotron-ultra-253b-v1:free"

Create a Reddit App

  • Visit Reddit Apps
  • Create a new "script" app and fill in:
  • Save and note the Client ID and Client Secret

Create a .env file

OPENROUTER_API_KEY=your_openrouter_api_key
LLM_MODEL_NAME=google/gemini-2.5-pro
REDDIT_CLIENT_ID=your_client_id
REDDIT_CLIENT_SECRET=your_client_secret
REDDIT_USER_AGENT=your_unique_user_agent

🗂️ Detailed Code Modules

📥 reddit_scraper.py — Collects Reddit Activity Uses PRAW to fetch Reddit posts & comments

Saves raw data to JSON for downstream processing

🧪 preprocessing.py — Adds Sentiment, Emotion, NER, Archetype & Toxicity Applies VADER for sentiment scoring

Uses HuggingFace model j-hartmann/emotion-english-distilroberta-base for emotion labeling

Uses spaCy for Named Entity Recognition (NER)

Maps subreddits to behavioral archetypes

Uses Detoxify for toxicity and civility scoring

Stores enriched data in JSON

📊 openrouter_credit_report.py — Check OpenRouter API Key & Credits Sends a GET request to OpenRouter API to retrieve API key details and credit status.

Usage:

export OPENROUTER_API_KEY=your_api_key
python openrouter_credit_report.py
🧪 test_openrouter_key.py — Test OpenRouter API Key with Sample Request Sends a sample chat completion request to OpenRouter API to verify API key and model.

Usage:

export OPENROUTER_API_KEY=your_api_key
export LLM_MODEL_NAME=google/gemini-2.5-pro
python test_openrouter_key.py
✍️ prompt_builder.py — Crafts AI Prompts Extracts and formats behavioral data

Builds input prompts that are LLM-friendly

🤖 llm_inference.py — Connects to OpenRouter Sends prompts to OpenRouter models like GPT-4

Handles retries, chunking, rate limits, and timeouts

📄 persona_writer.py — Saves Persona Writes final persona to output/_persona.txt
🎬 main.py — Runs Everything Orchestrates full pipeline from scraping to persona output

🧠 NLP Techniques & Their Benefits

Technique Tool Used Benefits
Sentiment Analysis VADER Lightweight, real-time, social-media optimized
Emotion Detection distilroberta-base fine-tuned on emotion data Identifies nuanced emotions: joy, anger, fear, etc.
Named Entity Recognition (NER) spaCy Extracts entities like people, organizations, locations for richer context
Subreddit Archetype Mapping Custom dictionary mapping Maps subreddit interests to behavioral archetypes
Toxicity & Civility Scoring Detoxify Detects toxic or uncivil language for content moderation insights
Prompt Chunking Custom batching Handles large input while respecting token limits
LLM Inference Gemini 2.5/ Nvidia nemotron via OpenRouter Enables deep personality generation
Prompt Engineering Context-rich template generation Guides LLMs to output structured and relevant persona
Modular Pipeline Python modules per task Easy to maintain, debug, and extend

🔄 Workflow Diagram

graph TD
    A[Input Reddit Username] --> B[Scrape Posts & Comments]
    B --> C[Save Raw JSON]
    C --> D[Preprocess: Sentiment, Emotion, NER, Archetype, Toxicity]
    D --> E[Build Prompt from Enriched Data]
    E --> F[Call OpenRouter LLM]
    F --> G[Generate Persona]
    G --> H[Write Persona to File]
    H --> I[Done ✅]
Loading

🚀 Running the Generator

Use this command to generate a persona:

python main.py https://www.reddit.com/user/your_target_user/

📂 Output Files

The project organizes files into the following directories:

  • data/ — Contains raw scraped Reddit data JSON files.
  • enriched_data/ — Contains enriched JSON files with added sentiment, emotion, NER, archetype, and toxicity features.
  • output/ — Contains the final generated persona text files.

Example output files for a user:

  • data/your_target_user_raw.json — Scraped raw data
  • enriched_data/your_target_user_enriched.json — Enriched data with sentiment, emotion, NER, archetype, toxicity
  • output/your_target_user_persona.txt — Final persona ✨

💬 Common Errors & Fixes

Error Code Error Message Cause Fix
429 RateLimitError Too many requests Add delays or use retry
401 AuthenticationError Bad/missing API key Recheck .env file
408 TimeoutError Server/network slow Use retry logic (already built-in)
404 ModelNotFoundError Wrong model name Verify OpenRouter model ID
N/A Token Overflow Text too long Chunking is already handled automatically

🤝 Contributions & Feedback

I welcome contributions:

  • 🌟 Fork the repo
  • 🛠 Fix bugs or add new models
  • ✨ Extend the prompt builder
  • 📣 Suggest new prompt templates or psychographic layers

Open an issue or PR anytime!


📌 Future Features (Roadmap)

  • 🧠 Add OCEAN / Big Five personality scores
  • 📊 Subreddit-based sentiment heatmaps
  • ✍️ Stylometry and writing fingerprint detection
  • 📄 Export to PDF / Notion integration
  • 🕵️‍♂️ Optional sarcasm detection (via better model)

📣 Final Thoughts

This project blends behavioral science, NLP, and LLMs to create a mirror of how users present themselves on Reddit. Use it for research, digital marketing, audience profiling, or just for fun!

Turn Reddit data into rich, human-readable insights — instantly. 🚀

About

a powerful tool that transforms Reddit user activity into deep, emotionally enriched, AI-generated user personas

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages