Skip to content

pyros-projects/papertrail

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PaperTrail - Research Paper Catalog

πŸ€– AI-GENERATED PROJECT
This project was created through a single-shot AI agent benchmark run using GitHub Copilot with Claude Sonnet 4.5 on the prompt defined in prompt.md
Original benchmark: agent-comparison/copilot_sonnet-45

The initial result was near production-ready, so this repository serves as a standalone project for final refinements and ongoing development.
In .report/ you may find the original chat history to this run and so on.

A research paper catalog with continuous autonomous ingestion, GraphRAG-based search, and multiple specialized views with the stand out feature of finding papers providing pro- and contra-arguments to a given theory input. Built with FastAPI backend and React frontend.

Features

  • Paper Ingestion: Manual arXiv link input and continuous autonomous monitoring
  • GraphRAG: Knowledge graph showing paper relationships (authors, citations, topics)
  • Semantic Search: Vector similarity-based paper discovery
  • Theory Mode: RAG-based argument discovery for hypothesis validation
  • Four Specialized Views:
    • Dashboard: Statistics, charts, continuous import management
    • Paper List: Filterable/sortable catalog with status management
    • Paper Detail: Comprehensive metadata, AI summaries, relationships, notes
    • Theory Mode: Pro/contra argument analysis for hypothesis validation
  • Real-time Updates: WebSocket-based progress tracking
  • Resilient Architecture: Graceful degradation when LLM/embeddings unavailable

Quick Start

1. One-Command Startup (Recommended)

The easiest way to get started:

bash run.sh

This automated script will:

  • Install all frontend dependencies (npm install)
  • Build the production frontend (npm run build)
  • Install all backend dependencies (uv sync)
  • Start the backend server on port 8000
  • Serve the built frontend automatically

After running, access the application at http://localhost:8000

2. Manual Setup

If you prefer to set up components separately:

Install Dependencies:

# Backend dependencies
uv sync

# Frontend dependencies
cd frontend
npm install
cd ..

Configure Environment:

# Copy template and edit with your credentials
cp .envtemplate .env

Required environment variables:

  • DEFAULT_MODEL: LLM model (e.g., azure/gpt-4.1 or openai/gpt-4)
  • DEFAULT_EMBEDDING_MODEL: Embedding model (e.g., azure/text-embedding-3-small)
  • Plus corresponding API keys (see .envtemplate)

Run the Application:

Option A: Run both backend and frontend together

# Terminal 1: Start backend
uv run uvicorn researcher.main:app --reload --host 0.0.0.0 --port 8000

# Terminal 2: Start frontend
cd frontend && npm run dev

Option B: Build and run production

# Build frontend
cd frontend && npm run build && cd ..

# Run backend (serves frontend automatically)
uv run uvicorn researcher.main:app --host 0.0.0.0 --port 8000

Access the Application:

Usage Guide

Adding Papers

  1. Navigate to Papers view
  2. Enter arXiv URL or ID (e.g., 2103.12345 or https://arxiv.org/abs/2103.12345)
  3. Click "Add Paper" - watch real-time progress
  4. Paper is automatically:
    • Downloaded and text extracted
    • Embedded for semantic search
    • Analyzed by LLM (if available)
    • Added to knowledge graph

Continuous Import

  1. Go to Dashboard
  2. Create a new import task:
    • Name it (e.g., "CS.AI Papers")
    • Optional: Filter by arXiv category (e.g., cs.AI)
    • Click "Create Task"
  3. Task runs every 5 minutes, importing new matching papers
  4. Monitor progress and imported count in dashboard

Semantic Search

  1. Use the search bar in Papers view
  2. Enter natural language query (e.g., "transformers for image processing")
  3. Results ranked by semantic similarity

Theory Mode

  1. Navigate to Theory Mode
  2. Enter your hypothesis (e.g., "Attention mechanisms improve model interpretability")
  3. Click "Analyze Hypothesis"
  4. View pro/contra arguments with:
    • Relevance scores
    • AI-generated summaries
    • Key quotes from papers
  5. Export analysis as text file

Paper Details

Click any paper to view:

  • Full metadata and abstract
  • AI-generated summary and key contributions
  • Similar papers (by embeddings)
  • Related papers (by graph relationships)
  • Personal notes
  • Export options (BibTeX, PDF link)

Fallback Strategy

PaperTrail is designed to work even when services are unavailable:

Embedding Fallback

  • Primary: litellm with configured embedding model
  • Fallback: sentence-transformers (automatic, always works)
  • Papers can always be ingested and searched

LLM Graceful Degradation

  • When Available: Full AI summaries, analysis, theory mode
  • When Unavailable:
    • Papers stored with <summary> placeholders
    • Added to backfill queue
    • Automatically processed when LLM becomes available
    • Theory mode disabled with clear message

Background Backfill Worker

  • Monitors backfill queue continuously
  • Processes papers with placeholders when LLM becomes available
  • No manual intervention needed

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            React Frontend (Vite)            β”‚
β”‚  Dashboard β”‚ Papers β”‚ Detail β”‚ Theory       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚ REST API + WebSocket
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚          FastAPI Backend (Python)           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Services Layer                        β”‚ β”‚
β”‚  β”‚  β€’ Ingestion  β€’ Search  β€’ Graph        β”‚ β”‚
β”‚  β”‚  β€’ LLM        β€’ Embeddings β€’ Backfill  β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Data Layer (TinyDB)                   β”‚ β”‚
β”‚  β”‚  β€’ Papers  β€’ Embeddings  β€’ Graph       β”‚ β”‚
β”‚  β”‚  β€’ Tasks   β€’ Backfill Queue            β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tech Stack:

  • Backend: FastAPI, litellm, sentence-transformers, TinyDB, NetworkX
  • Frontend: React, Vite, Recharts, Axios
  • Data Sources: arXiv API, PyPDF2 for text extraction
  • Real-time: WebSocket for progress updates

Testing

# Run core functionality tests
uv run pytest tests/test_core.py -v

# Test with LLM disabled (validates fallback)
# Remove DEFAULT_MODEL from .env temporarily
uv run pytest tests/test_core.py -v

Project Structure

researcher/
β”œβ”€β”€ src/researcher/
β”‚   β”œβ”€β”€ main.py              # FastAPI application
β”‚   β”œβ”€β”€ models.py            # Pydantic models
β”‚   β”œβ”€β”€ config.py            # Configuration
β”‚   β”œβ”€β”€ database.py          # TinyDB wrapper
β”‚   β”œβ”€β”€ embeddings.py        # Embedding service (with fallback)
β”‚   β”œβ”€β”€ llm.py               # LLM service (with graceful degradation)
β”‚   β”œβ”€β”€ ingestion.py         # arXiv ingestion
β”‚   β”œβ”€β”€ search.py            # Search and theory mode
β”‚   β”œβ”€β”€ graph.py             # GraphRAG service
β”‚   β”œβ”€β”€ continuous_import.py # Background import tasks
β”‚   β”œβ”€β”€ backfill.py          # Background backfill worker
β”‚   └── logger.py            # Logging setup
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ main.jsx         # React entry point
β”‚   β”‚   β”œβ”€β”€ App.jsx          # Main app component
β”‚   β”‚   β”œβ”€β”€ api.js           # API client
β”‚   β”‚   β”œβ”€β”€ websocket.js     # WebSocket hook
β”‚   β”‚   └── views/           # View components
β”‚   β”‚       β”œβ”€β”€ DashboardView.jsx
β”‚   β”‚       β”œβ”€β”€ PaperListView.jsx
β”‚   β”‚       β”œβ”€β”€ PaperDetailView.jsx
β”‚   β”‚       └── TheoryModeView.jsx
β”‚   β”œβ”€β”€ package.json
β”‚   └── vite.config.js
β”œβ”€β”€ tests/
β”‚   └── test_core.py         # Core functionality tests
β”œβ”€β”€ data/                    # Created at runtime (TinyDB storage)
β”œβ”€β”€ pyproject.toml
└── README.md

Comprehensive Logging

Both backend and frontend include extensive logging:

Backend (Terminal):

  • Service initialization and availability
  • Paper ingestion progress
  • Search operations and results
  • Fallback activations
  • Background worker activities
  • All errors with stack traces

Frontend (Browser Console):

  • Component lifecycle
  • API calls and responses
  • WebSocket messages
  • User actions
  • State changes
  • Service availability status

Check browser DevTools console and terminal output for detailed operation logs.

Troubleshooting

LLM Not Working

  • Check .env has correct DEFAULT_MODEL and API keys
  • Papers will still be ingested with placeholders
  • Backfill worker will process them when LLM becomes available

Embeddings Not Working

  • Should automatically fall back to sentence-transformers
  • Check terminal logs for "Fallback embedding model loaded"
  • All functionality still works, just uses different embedding model

Papers Not Appearing

  • Check terminal for ingestion errors
  • Verify arXiv ID is valid
  • Check WebSocket connection status in UI

Continuous Import Not Running

  • Ensure task is active (green status)
  • Check task interval and filters
  • Monitor backend logs for import activity

Development

# Run backend with auto-reload
uv run uvicorn researcher.main:app --reload

# Run frontend with hot-reload
cd frontend && npm run dev

# Build frontend for production
cd frontend && npm run build

License

MIT

About

vibe coded theory checker

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published