SCOUT is a tool I built to automate part of my literature review workflow as an academic economist. It retrieves recent papers from arXiv and NBER, ranks them against my research profile using semantic embeddings, summarizes the top hits with an LLM, and packages everything into an HTML digest.
This is a personal tool, not a production system. It works well for my workflow but hasn't been hardened for general use.
- Retrieve — Pulls recent papers from arXiv (API) and NBER (web scraping) based on configurable keywords and lookback windows.
- Rank — Computes semantic similarity between retrieved papers and a research profile (stated interests + embeddings of your own uploaded papers). A configurable
paper_weightblends the two signals. - Summarize — Sends the top-ranked papers to an LLM (OpenAI, Claude, or Gemini) for structured summaries tailored to the user's interests.
- Digest — Generates a styled HTML digest with relevance scores, explanations, and links. Optionally sends it by email.
- Feedback loop — Users rate recommendations; a Thompson Sampling bandit adjusts the
paper_weightparameter over time to improve relevance.
┌─────────────┐ ┌──────────────┐ ┌───────────────┐
│ Retrieval │────▸│ Ranking │────▸│ Summarization │
│ (arXiv/NBER) │ │ (embeddings) │ │ (LLM call) │
└─────────────┘ └──────┬───────┘ └───────┬───────┘
│ │
┌──────▼─────────────────────▼──────┐
│ HTML Digest Builder │
└──────────────┬────────────────────┘
│
┌──────────────▼────────────────────┐
│ Feedback → Parameter Optimizer │
│ (Thompson Sampling bandit) │
└───────────────────────────────────┘
Modules:
paper_retrieval.py— arXiv API + NBER scraperrelevance_ranking.py— OpenAI embeddings, cosine similarity, weighted scoringpaper_summarization.py— Multi-provider LLM summarizationllm_providers.py— Unified interface for OpenAI / Claude / Geminipaper_processor.py— PDF text extraction, embedding generation for uploaded papersparameter_optimizer.py— Thompson Sampling (Beta-bandit) for tuning ranking weights from user feedbackfeedback_store.py— JSON-based feedback persistencepdf_downloader.py— Downloads PDFs for top-ranked papers
# Clone and install
git clone https://github.com/matiasbayas/SCOUT.git
cd SCOUT
pip install -r requirements.txt
# Set up API keys (at minimum, OpenAI for embeddings)
export OPENAI_API_KEY="your-key"
# Optional: export ANTHROPIC_API_KEY="your-key" or GEMINI_API_KEY="your-key"
# Create a config file and edit research interests
python -m scout_agent.scout_agent --create-config
# Edit config.json: set your research_interests and preferred summarization provider
# Run
python -m scout_agent.scout_agentThe output is an HTML digest in the digests/ directory.
Copy config.json.example to config.json and edit. Key settings:
| Section | Key | What it controls |
|---|---|---|
| top-level | research_interests |
Topics for ranking |
retriever |
source, lookback_days, max_results |
Where and how far back to search |
ranker |
paper_weight, similarity_threshold |
Balance between interests and uploaded papers |
summarizer |
provider, temperature |
Which LLM to use |
papers |
enabled, use_for_ranking |
Whether uploaded papers influence ranking |
API keys can be set via environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY), in config.json, or via CLI flags.
SCOUT can embed your papers and use them to personalize ranking:
# Single paper
python -m scout_agent.scout_agent --upload-paper paper.pdf --paper-title "My Paper"
# Directory of papers
python -m scout_agent.scout_agent --upload-dir ./my-papers/
# Upload and immediately run
python -m scout_agent.scout_agent --upload-paper paper.pdf --run-after-uploadAfter reviewing a digest, rate papers to improve future recommendations:
python -m scout_agent.scout_agent --feedback http://arxiv.org/abs/2401.00001v1 highly_relevantRatings: highly_relevant, somewhat_relevant, not_relevant.
To run a tuning session (presents papers and collects feedback interactively):
python -m scout_agent.scout_agent --tune --tune-iters 5Under the hood, this uses Thompson Sampling on a Beta posterior to adjust the paper_weight parameter — a simple Bayesian bandit, not a reward model or policy optimization. It's a lightweight way to personalize the interest-vs-paper balance over time.
- Ranking depends on OpenAI embeddings — the ranking module requires an OpenAI API key even when using Claude or Gemini for summarization.
- NBER retrieval uses Selenium — requires Chrome/ChromeDriver. ArXiv-only mode works without it.
- No CI or comprehensive test suite — tests cover core logic (feedback store, API key precedence, provider construction) but not end-to-end workflows.
- Single-parameter tuning — the feedback loop only adjusts one weight. A richer approach would tune retrieval keywords, similarity thresholds, or summarization prompts.
MIT