Welcome to the Reddit Persona Generator – a powerful tool that transforms Reddit user activity into deep, emotionally enriched, AI-generated user personas.
Personas are saved as text files in the output/
directory.
This project combines:
- 📄 Reddit data scraping
- 🧠 Sentiment & emotion analysis
- 🪄 Advanced prompt engineering
- 🔮 LLM-powered personality inference using OpenRouter
reddit_persona_generator/
├── config.py
├── reddit_scraper.py
├── openrouter_credit_report.py
├── llm_inference.py
├── main.py
├── preprocessing.py
├── prompt_builder.py
├── persona_writer.py
├── requirements.txt
├── test_openrouter_key.py
├── README.md
├── data/
│ ├── [username]_raw.json
│ └── ...
├── enriched_data/
│ ├── [username]_enriched.json
│ └── ...
└── output/
├── [username]_persona.txt
└── ...
data/
— Raw scraped Reddit data JSON filesenriched_data/
— Enriched JSON files with sentiment, emotion, NER, archetype, toxicityoutput/
— Final generated persona text files- Root folder contains main Python scripts and configuration files
- 🎯 Project Overview
- ⚙️ Setup Instructions
- 🗂️ Detailed Code Modules
- 🧠 NLP Techniques & Their Benefits
- 🔄 Workflow Diagram
- 🚀 Running the Generator
- 💬 Common Errors & Fixes
- 🤝 Contributions & Feedback
- 📌 Future Features (Roadmap)
- 📣 Final Thoughts
The Reddit Persona Generator takes a Reddit username and analyzes their public posts and comments to infer a psychographic and behavioral profile. This profile includes:
- 🔍 Name, age, location (inferred)
- 🧬 Personality traits (MBTI-style)
- 🧠 Motivations and emotional triggers
- 🧭 Decision-making patterns
- 💳 Spending habits and brand loyalties
The analysis pipeline leverages HuggingFace Transformers, VADER sentiment analysis, and OpenRouter API to create LLM-driven personas in plain English.
git clone https://github.com/yourusername/reddit_persona_generator.git
cd reddit_persona_generator
python -m venv venv
# Mac/Linux
source venv/bin/activate
# Windows
venv\Scripts\activate
pip install -r requirements.txt
- Go to OpenRouter
- Sign up and generate your API key
- While selecting the model to use it is preferred to use "google/gemini-2.5-pro" if you have credits in openrouter account other wise use "nvidia/llama-3.1-nemotron-ultra-253b-v1:free"
- Visit Reddit Apps
- Create a new "script" app and fill in:
- Name: anything
- Redirect URI: http://localhost
- Save and note the Client ID and Client Secret
OPENROUTER_API_KEY=your_openrouter_api_key
LLM_MODEL_NAME=google/gemini-2.5-pro
REDDIT_CLIENT_ID=your_client_id
REDDIT_CLIENT_SECRET=your_client_secret
REDDIT_USER_AGENT=your_unique_user_agent
📥 reddit_scraper.py — Collects Reddit Activity
Uses PRAW to fetch Reddit posts & commentsSaves raw data to JSON for downstream processing
🧪 preprocessing.py — Adds Sentiment, Emotion, NER, Archetype & Toxicity
Applies VADER for sentiment scoringUses HuggingFace model j-hartmann/emotion-english-distilroberta-base for emotion labeling
Uses spaCy for Named Entity Recognition (NER)
Maps subreddits to behavioral archetypes
Uses Detoxify for toxicity and civility scoring
Stores enriched data in JSON
📊 openrouter_credit_report.py — Check OpenRouter API Key & Credits
Sends a GET request to OpenRouter API to retrieve API key details and credit status.Usage:
export OPENROUTER_API_KEY=your_api_key
python openrouter_credit_report.py
🧪 test_openrouter_key.py — Test OpenRouter API Key with Sample Request
Sends a sample chat completion request to OpenRouter API to verify API key and model.Usage:
export OPENROUTER_API_KEY=your_api_key
export LLM_MODEL_NAME=google/gemini-2.5-pro
python test_openrouter_key.py
✍️ prompt_builder.py — Crafts AI Prompts
Extracts and formats behavioral dataBuilds input prompts that are LLM-friendly
🤖 llm_inference.py — Connects to OpenRouter
Sends prompts to OpenRouter models like GPT-4Handles retries, chunking, rate limits, and timeouts
📄 persona_writer.py — Saves Persona
Writes final persona to output/_persona.txt🎬 main.py — Runs Everything
Orchestrates full pipeline from scraping to persona outputTechnique | Tool Used | Benefits |
---|---|---|
Sentiment Analysis | VADER | Lightweight, real-time, social-media optimized |
Emotion Detection | distilroberta-base fine-tuned on emotion data | Identifies nuanced emotions: joy, anger, fear, etc. |
Named Entity Recognition (NER) | spaCy | Extracts entities like people, organizations, locations for richer context |
Subreddit Archetype Mapping | Custom dictionary mapping | Maps subreddit interests to behavioral archetypes |
Toxicity & Civility Scoring | Detoxify | Detects toxic or uncivil language for content moderation insights |
Prompt Chunking | Custom batching | Handles large input while respecting token limits |
LLM Inference | Gemini 2.5/ Nvidia nemotron via OpenRouter | Enables deep personality generation |
Prompt Engineering | Context-rich template generation | Guides LLMs to output structured and relevant persona |
Modular Pipeline | Python modules per task | Easy to maintain, debug, and extend |
graph TD
A[Input Reddit Username] --> B[Scrape Posts & Comments]
B --> C[Save Raw JSON]
C --> D[Preprocess: Sentiment, Emotion, NER, Archetype, Toxicity]
D --> E[Build Prompt from Enriched Data]
E --> F[Call OpenRouter LLM]
F --> G[Generate Persona]
G --> H[Write Persona to File]
H --> I[Done ✅]
Use this command to generate a persona:
python main.py https://www.reddit.com/user/your_target_user/
The project organizes files into the following directories:
data/
— Contains raw scraped Reddit data JSON files.enriched_data/
— Contains enriched JSON files with added sentiment, emotion, NER, archetype, and toxicity features.output/
— Contains the final generated persona text files.
Example output files for a user:
data/your_target_user_raw.json
— Scraped raw dataenriched_data/your_target_user_enriched.json
— Enriched data with sentiment, emotion, NER, archetype, toxicityoutput/your_target_user_persona.txt
— Final persona ✨
Error Code | Error Message | Cause | Fix |
---|---|---|---|
429 | RateLimitError | Too many requests | Add delays or use retry |
401 | AuthenticationError | Bad/missing API key | Recheck .env file |
408 | TimeoutError | Server/network slow | Use retry logic (already built-in) |
404 | ModelNotFoundError | Wrong model name | Verify OpenRouter model ID |
N/A | Token Overflow | Text too long | Chunking is already handled automatically |
I welcome contributions:
- 🌟 Fork the repo
- 🛠 Fix bugs or add new models
- ✨ Extend the prompt builder
- 📣 Suggest new prompt templates or psychographic layers
Open an issue or PR anytime!
- 🧠 Add OCEAN / Big Five personality scores
- 📊 Subreddit-based sentiment heatmaps
- ✍️ Stylometry and writing fingerprint detection
- 📄 Export to PDF / Notion integration
- 🕵️♂️ Optional sarcasm detection (via better model)
This project blends behavioral science, NLP, and LLMs to create a mirror of how users present themselves on Reddit. Use it for research, digital marketing, audience profiling, or just for fun!
Turn Reddit data into rich, human-readable insights — instantly. 🚀