SCOUT

SCOUT is a tool I built to automate part of my literature review workflow as an academic economist. It retrieves recent papers from arXiv and NBER, ranks them against my research profile using semantic embeddings, summarizes the top hits with an LLM, and packages everything into an HTML digest.

This is a personal tool, not a production system. It works well for my workflow but hasn't been hardened for general use.

What it does

Retrieve — Pulls recent papers from arXiv (API) and NBER (web scraping) based on configurable keywords and lookback windows.
Rank — Computes semantic similarity between retrieved papers and a research profile (stated interests + embeddings of your own uploaded papers). A configurable paper_weight blends the two signals.
Summarize — Sends the top-ranked papers to an LLM (OpenAI, Claude, or Gemini) for structured summaries tailored to the user's interests.
Digest — Generates a styled HTML digest with relevance scores, explanations, and links. Optionally sends it by email.
Feedback loop — Users rate recommendations; a Thompson Sampling bandit adjusts the paper_weight parameter over time to improve relevance.

Architecture

┌─────────────┐     ┌──────────────┐     ┌───────────────┐
│  Retrieval   │────▸│   Ranking    │────▸│ Summarization │
│ (arXiv/NBER) │     │ (embeddings) │     │  (LLM call)   │
└─────────────┘     └──────┬───────┘     └───────┬───────┘
                           │                     │
                    ┌──────▼─────────────────────▼──────┐
                    │         HTML Digest Builder        │
                    └──────────────┬────────────────────┘
                                   │
                    ┌──────────────▼────────────────────┐
                    │   Feedback → Parameter Optimizer   │
                    │     (Thompson Sampling bandit)     │
                    └───────────────────────────────────┘

Modules:

paper_retrieval.py — arXiv API + NBER scraper
relevance_ranking.py — OpenAI embeddings, cosine similarity, weighted scoring
paper_summarization.py — Multi-provider LLM summarization
llm_providers.py — Unified interface for OpenAI / Claude / Gemini
paper_processor.py — PDF text extraction, embedding generation for uploaded papers
parameter_optimizer.py — Thompson Sampling (Beta-bandit) for tuning ranking weights from user feedback
feedback_store.py — JSON-based feedback persistence
pdf_downloader.py — Downloads PDFs for top-ranked papers

Quickstart

# Clone and install
git clone https://github.com/matiasbayas/SCOUT.git
cd SCOUT
pip install -r requirements.txt

# Set up API keys (at minimum, OpenAI for embeddings)
export OPENAI_API_KEY="your-key"
# Optional: export ANTHROPIC_API_KEY="your-key" or GEMINI_API_KEY="your-key"

# Create a config file and edit research interests
python -m scout_agent.scout_agent --create-config
# Edit config.json: set your research_interests and preferred summarization provider

# Run
python -m scout_agent.scout_agent

The output is an HTML digest in the digests/ directory.

Configuration

Copy config.json.example to config.json and edit. Key settings:

Section	Key	What it controls
top-level	`research_interests`	Topics for ranking
`retriever`	`source`, `lookback_days`, `max_results`	Where and how far back to search
`ranker`	`paper_weight`, `similarity_threshold`	Balance between interests and uploaded papers
`summarizer`	`provider`, `temperature`	Which LLM to use
`papers`	`enabled`, `use_for_ranking`	Whether uploaded papers influence ranking

API keys can be set via environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY), in config.json, or via CLI flags.

Uploading your own papers

SCOUT can embed your papers and use them to personalize ranking:

# Single paper
python -m scout_agent.scout_agent --upload-paper paper.pdf --paper-title "My Paper"

# Directory of papers
python -m scout_agent.scout_agent --upload-dir ./my-papers/

# Upload and immediately run
python -m scout_agent.scout_agent --upload-paper paper.pdf --run-after-upload

Feedback and parameter tuning

After reviewing a digest, rate papers to improve future recommendations:

python -m scout_agent.scout_agent --feedback http://arxiv.org/abs/2401.00001v1 highly_relevant

Ratings: highly_relevant, somewhat_relevant, not_relevant.

To run a tuning session (presents papers and collects feedback interactively):

python -m scout_agent.scout_agent --tune --tune-iters 5

Under the hood, this uses Thompson Sampling on a Beta posterior to adjust the paper_weight parameter — a simple Bayesian bandit, not a reward model or policy optimization. It's a lightweight way to personalize the interest-vs-paper balance over time.

Limitations and known issues

Ranking depends on OpenAI embeddings — the ranking module requires an OpenAI API key even when using Claude or Gemini for summarization.
NBER retrieval uses Selenium — requires Chrome/ChromeDriver. ArXiv-only mode works without it.
No CI or comprehensive test suite — tests cover core logic (feedback store, API key precedence, provider construction) but not end-to-end workflows.
Single-parameter tuning — the feedback loop only adjusts one weight. A richer approach would tune retrieval keywords, similarity thresholds, or summarization prompts.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
scout_agent		scout_agent
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.json.example		config.json.example
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SCOUT

What it does

Architecture

Quickstart

Configuration

Uploading your own papers

Feedback and parameter tuning

Limitations and known issues

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SCOUT

What it does

Architecture

Quickstart

Configuration

Uploading your own papers

Feedback and parameter tuning

Limitations and known issues

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages