TechPulse

An automated AI-powered newsletter pipeline that scrapes the latest tech and AI trends, curates the content using LLM agents, and delivers a formatted HTML digest to your inbox on a schedule.

How It Works

Scrapers  →  datasets/  →  Editor Agent  →  Writer Agent  →  Reviewer Agent  →  Email
(data)        (CSVs)        (selects)         (formats)         (validates)       (sent)

Scrapers run in parallel and populate datasets/ with fresh CSVs.
Editor Agent (Gemini) reads each CSV and selects the most relevant items.
Writer Agent (Gemini) formats the selected items into styled HTML sections.
Reviewer Agent (Gemini) validates the assembled email and fixes issues.
The final HTML is saved to process/emails/ and sent via Gmail SMTP.

Data Sources

Section	Source
Tech News	Perplexity API (sonar-pro)
GitHub Trending	GitHub API
HuggingFace Models / Datasets / Spaces / Papers	HuggingFace API
Research Papers	arXiv API + Semantic Scholar
Google Research Blog	google.github.io/googleai RSS
AWS Blog	AWS Blog API
OpenAI & Claude Cookbook	GitHub (openai/openai-cookbook, anthropics-cookbook)
Startup Funding	Google News RSS
LinkedIn Updates	Apify (Y Combinator, a16z, Sequoia)
Private Equity	Apify (a16z, Sequoia LinkedIn posts)

Prerequisites

Python 3.10+
A Gmail account with an App Password enabled
API keys (see Configuration below)

Installation

git clone https://github.com/your-username/techpulse.git
cd techpulse

python3 -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt

Configuration

1. Environment Variables

Copy the example below into a .env file in the project root:

# AI Agents (required)
GEMINI_API_KEY=your_gemini_api_key

# Tech News scraper (required)
PERPLEXITY_API_KEY=your_perplexity_api_key

# LinkedIn scraper via Apify (required for LinkedIn sections)
APIFY_API_KEY=your_apify_api_key

# Email delivery (required)
GMAIL_USER=your_email@gmail.com
GMAIL_PASSWORD=your_gmail_app_password

# GitHub API — increases rate limits (optional but recommended)
GITHUB_TOKEN=your_github_token

2. Recipients

Edit config.json to set the email recipient list:

{
  "recipients": ["you@example.com", "colleague@example.com"]
}

3. Editor Limits (`config/editor.yaml`)

Control how many items appear per section:

default_limit: 3
categories:
  "Tech News": 10
  "GitHub Trending": 3
  "HuggingFace Models": 3

4. AI Models (`config/models.yaml`)

Override which Gemini model each agent uses:

default_model: "gemini-flash-latest"
agents:
  EditorAgent: "gemini-flash-latest"
  WriterAgent: "gemini-flash-latest"
  ReviewerAgent: "gemini-flash-latest"

5. LinkedIn Targets (`config/influencers.yaml`)

Add or remove LinkedIn company slugs to scrape:

companies:
  - y-combinator
  - a16z
  - sequoia-capital

Usage

Run the full pipeline once:

python main.py

Or use the shell script (recommended for cron):

chmod +x run_pipeline.sh
./run_pipeline.sh

Schedule with cron (e.g. every Monday at 08:00):

0 8 * * 1 /path/to/techpulse/run_pipeline.sh >> /path/to/techpulse/cron_log.log 2>&1

Output:

Raw data → datasets/*.csv
Generated email → process/emails/YYYY-MM-DD.html

Project Structure

techpulse/
├── main.py                     # Entry point — orchestrates scrapers + pipeline
├── run_pipeline.sh             # Shell wrapper for cron scheduling
├── requirements.txt
├── config/
│   ├── editor.yaml             # Per-section item limits
│   ├── models.yaml             # Gemini model overrides per agent
│   └── influencers.yaml        # LinkedIn targets
├── scrapers/                   # One file per data source
│   ├── github_scraper.py
│   ├── huggingface_scraper.py
│   ├── news_scraper.py         # Perplexity API
│   ├── papers_scraper.py       # arXiv + Semantic Scholar
│   ├── startup_scraper.py
│   ├── linkedin_scraper.py     # Apify
│   ├── aws_blog_scraper.py
│   ├── openai_cookbook_scraper.py
│   └── config.py               # Scraper limits
├── process/
│   ├── flow_manager.py         # AI editorial pipeline
│   ├── email_template.html     # HTML email layout
│   ├── send_email.py           # Gmail SMTP sender
│   └── agents/
│       ├── editor.py           # Selects top items per section
│       ├── writer.py           # Writes HTML for each section
│       └── reviewer.py         # Validates & fixes the final email
└── datasets/                   # Auto-generated CSVs (git-ignored)

Contributing

Contributions are welcome. To add a new data source:

Create a scraper in scrapers/your_source_scraper.py that saves a CSV to datasets/.
Register it in main.py under the scrapers list.
Add a category entry in process/flow_manager.py (cats_config) with the filename and an HTML format hint.
Add the corresponding {{ placeholder }} in process/email_template.html.

For bug reports and feature requests, please open a GitHub Issue.

License

This project is licensed under the MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TechPulse

How It Works

Data Sources

Prerequisites

Installation

Configuration

1. Environment Variables

2. Recipients

3. Editor Limits (`config/editor.yaml`)

4. AI Models (`config/models.yaml`)

5. LinkedIn Targets (`config/influencers.yaml`)

Usage

Project Structure

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
process		process
scrapers		scrapers
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
run_pipeline.sh		run_pipeline.sh

Folders and files

Latest commit

History

Repository files navigation

TechPulse

How It Works

Data Sources

Prerequisites

Installation

Configuration

1. Environment Variables

2. Recipients

3. Editor Limits (config/editor.yaml)

4. AI Models (config/models.yaml)

5. LinkedIn Targets (config/influencers.yaml)

Usage

Project Structure

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3. Editor Limits (`config/editor.yaml`)

4. AI Models (`config/models.yaml`)

5. LinkedIn Targets (`config/influencers.yaml`)

Packages