Automated newsletter generator from X (Twitter) hashtags. Fetches tweets, processes with AI, and sends email newsletters.
- Automated Scraping: Extract tweets from X.com based on configurable hashtag groups
- AI Processing: Filter and summarize tweets using LLM via OpenRouter
- Email Delivery: Send newsletters via Resend API
- Scheduling: Configure automatic execution times
- Python 3.11+
- Playwright (browser automation)
- API keys (see Configuration)
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/Mac
.venv\Scripts\activate.ps1 # Windows
# Install dependencies
pip install -e ".[dev]"
# Install Playwright browsers
playwright install chromiumCreate a .env file with the following variables:
# OpenRouter (AI processing)
OPENROUTER_API_KEY=your_openrouter_api_key
# Resend (Email delivery)
RESEND_API_KEY=your_resend_api_key
EMAIL_FROM=your_email@example.com
EMAIL_TO=recipient@example.comTo scrape X.com, you need to export your browser cookies:
- Log in to X.com in your browser (Chrome/Firefox)
- Install a cookies extension (e.g., "Get cookies.txt" or "Cookie-Editor")
- Export cookies for
x.comin Netscape format - Save as
cookies.jsonin the project root
Example cookies.json format:
[
{
"domain": ".x.com",
"name": "auth_token",
"value": "your_token_here",
"path": "/",
"secure": true,
"sameSite": "Lax"
}
]Note: Cookies expire periodically. Re-export if you encounter login walls.
Edit config/hashtags.yaml to configure hashtag groups and scraping parameters:
groups:
- name: "IA & Data"
hashtags:
- "#AI"
- "#MachineLearning"
scraper:
min_tweets: 50
min_interactions: 10
wait_between_requests_ms: 7000
scheduler:
hours:
- 8
- 13
- 17
timezone: "America/Lima"python main.py --run-nowpython main.py --schedule| Option | Description |
|---|---|
--run-now |
Execute pipeline immediately |
--schedule |
Start scheduler with configured times |
--config |
Path to configuration file (default: config/hashtags.yaml) |
--headless |
Run browser in headless mode (default: True) |
--no-headless |
Run browser in visible mode for debugging |
XScrapper/
├── main.py # Entry point
├── config/
│ └── hashtags.yaml # Hashtag groups configuration
├── src/
│ ├── scraper.py # X.com scraping module
│ ├── ai_processor.py # AI processing module
│ ├── email_sender.py # Email delivery module
│ └── scheduler.py # Scheduling module
├── tests/ # Test suite
└── output/ # Raw tweet exports
pytestpytest --cov=src --cov-report=term-missingmypy src/ruff check src/ tests/MIT