Skip to content

loriscience/techPulse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TechPulse

An automated AI-powered newsletter pipeline that scrapes the latest tech and AI trends, curates the content using LLM agents, and delivers a formatted HTML digest to your inbox on a schedule.


How It Works

Scrapers  →  datasets/  →  Editor Agent  →  Writer Agent  →  Reviewer Agent  →  Email
(data)        (CSVs)        (selects)         (formats)         (validates)       (sent)
  1. Scrapers run in parallel and populate datasets/ with fresh CSVs.
  2. Editor Agent (Gemini) reads each CSV and selects the most relevant items.
  3. Writer Agent (Gemini) formats the selected items into styled HTML sections.
  4. Reviewer Agent (Gemini) validates the assembled email and fixes issues.
  5. The final HTML is saved to process/emails/ and sent via Gmail SMTP.

Data Sources

Section Source
Tech News Perplexity API (sonar-pro)
GitHub Trending GitHub API
HuggingFace Models / Datasets / Spaces / Papers HuggingFace API
Research Papers arXiv API + Semantic Scholar
Google Research Blog google.github.io/googleai RSS
AWS Blog AWS Blog API
OpenAI & Claude Cookbook GitHub (openai/openai-cookbook, anthropics-cookbook)
Startup Funding Google News RSS
LinkedIn Updates Apify (Y Combinator, a16z, Sequoia)
Private Equity Apify (a16z, Sequoia LinkedIn posts)

Prerequisites

  • Python 3.10+
  • A Gmail account with an App Password enabled
  • API keys (see Configuration below)

Installation

git clone https://github.com/your-username/techpulse.git
cd techpulse

python3 -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt

Configuration

1. Environment Variables

Copy the example below into a .env file in the project root:

# AI Agents (required)
GEMINI_API_KEY=your_gemini_api_key

# Tech News scraper (required)
PERPLEXITY_API_KEY=your_perplexity_api_key

# LinkedIn scraper via Apify (required for LinkedIn sections)
APIFY_API_KEY=your_apify_api_key

# Email delivery (required)
GMAIL_USER=your_email@gmail.com
GMAIL_PASSWORD=your_gmail_app_password

# GitHub API — increases rate limits (optional but recommended)
GITHUB_TOKEN=your_github_token

2. Recipients

Edit config.json to set the email recipient list:

{
  "recipients": ["you@example.com", "colleague@example.com"]
}

3. Editor Limits (config/editor.yaml)

Control how many items appear per section:

default_limit: 3
categories:
  "Tech News": 10
  "GitHub Trending": 3
  "HuggingFace Models": 3

4. AI Models (config/models.yaml)

Override which Gemini model each agent uses:

default_model: "gemini-flash-latest"
agents:
  EditorAgent: "gemini-flash-latest"
  WriterAgent: "gemini-flash-latest"
  ReviewerAgent: "gemini-flash-latest"

5. LinkedIn Targets (config/influencers.yaml)

Add or remove LinkedIn company slugs to scrape:

companies:
  - y-combinator
  - a16z
  - sequoia-capital

Usage

Run the full pipeline once:

python main.py

Or use the shell script (recommended for cron):

chmod +x run_pipeline.sh
./run_pipeline.sh

Schedule with cron (e.g. every Monday at 08:00):

0 8 * * 1 /path/to/techpulse/run_pipeline.sh >> /path/to/techpulse/cron_log.log 2>&1

Output:

  • Raw data → datasets/*.csv
  • Generated email → process/emails/YYYY-MM-DD.html

Project Structure

techpulse/
├── main.py                     # Entry point — orchestrates scrapers + pipeline
├── run_pipeline.sh             # Shell wrapper for cron scheduling
├── requirements.txt
├── config/
│   ├── editor.yaml             # Per-section item limits
│   ├── models.yaml             # Gemini model overrides per agent
│   └── influencers.yaml        # LinkedIn targets
├── scrapers/                   # One file per data source
│   ├── github_scraper.py
│   ├── huggingface_scraper.py
│   ├── news_scraper.py         # Perplexity API
│   ├── papers_scraper.py       # arXiv + Semantic Scholar
│   ├── startup_scraper.py
│   ├── linkedin_scraper.py     # Apify
│   ├── aws_blog_scraper.py
│   ├── openai_cookbook_scraper.py
│   └── config.py               # Scraper limits
├── process/
│   ├── flow_manager.py         # AI editorial pipeline
│   ├── email_template.html     # HTML email layout
│   ├── send_email.py           # Gmail SMTP sender
│   └── agents/
│       ├── editor.py           # Selects top items per section
│       ├── writer.py           # Writes HTML for each section
│       └── reviewer.py         # Validates & fixes the final email
└── datasets/                   # Auto-generated CSVs (git-ignored)

Contributing

Contributions are welcome. To add a new data source:

  1. Create a scraper in scrapers/your_source_scraper.py that saves a CSV to datasets/.
  2. Register it in main.py under the scrapers list.
  3. Add a category entry in process/flow_manager.py (cats_config) with the filename and an HTML format hint.
  4. Add the corresponding {{ placeholder }} in process/email_template.html.

For bug reports and feature requests, please open a GitHub Issue.


License

This project is licensed under the MIT License. See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors