Dockerfile.render docker/render-start.sh
.envμ λ€μ 2κ° μΆκ°
- NOTION_PARENT_PAGE_ID=
- NOTION_API_TOKEN=
NOTION_API_TOKEN= https://www.notion.so/profile/integrations λ§ν¬μμ API λ°κΈ
νΈμ§ κΆνμμ μ μ₯ μνλ νμ΄μ§ μ§μ

μ μ₯ μνλ νμ΄μ§ URL μμ λ·λΆλΆ
NOTION_PARENT_PAGE_ID=

A Docker-based research assistant with MCP (Model Context Protocol) architecture and paper graph visualization.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Web UI (React) β
β ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ β
β β GlobalGraphPage (B) β β PaperGraphPage (A) β β
β β - All papers overview β β - Reference exploration β β
β β - Embedding similarity β β - Incremental expansion β β
β ββββββββββββββ¬ββββββββββββββ βββββββββββββββββ¬βββββββββββββββββββ β
β ββββββββββββββββββ¬ββββββββββββββββ β
β β HTTP :3000 β
ββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ
β Agent Layer β
β βββββββββββ ββββββββββββ ββββββββββββ βββββββββββββββ β
β β client ββββ planner ββββ executor ββββ memory β β
β β (entry) β β β β β β β β
β ββββββ¬βββββ ββββββββββββ βββββββ¬βββββ βββββββββββββββ β
β β Planning β State β
β β (no side-effects) β Storage β
βββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββ
β β
β HTTP API :8000 β
βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MCP Layer β
β βββββββββββ ββββββββββββ βββββββββββββββββββββββββββββββββββββββ β
β β server ββββ registry ββββ tools/ β β
β β (entry) β β (SSOT) β β pdf | arxiv | web_search | graph β β
β βββββββββββ ββββββββββββ βββββββββββββββββββββββββββββββββββββββ β
β Side-effects allowed β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
-
Separation of Concerns
- Agent Layer: Thinking, planning, decision-making (NO side-effects)
- MCP Layer: Tool execution, file I/O, API calls (side-effects allowed)
- Web UI: Visualization and interaction
-
Single Source of Truth (SSOT)
registry.pyis the ONLY place where tools are discovered and collected- Tools are auto-discovered from
mcp/tools/directory
-
Single Entry Points
mcp/server.py- MCP server entry pointagent/client.py- Agent entry point- Other modules (planner, executor, memory) are NEVER executed directly
Research_agent/
ββ mcp/ # MCP Layer (side-effects allowed)
β ββ __init__.py
β ββ server.py # MCP server entrypoint
β ββ registry.py # Tool auto-discovery (SSOT)
β ββ base.py # Tool base classes
β ββ tools/
β ββ pdf.py # PDF extraction tools
β ββ refer.py # Reference extraction tools
β ββ arxiv.py # arXiv API tools
β ββ web_search.py # Web search tools (Tavily)
β ββ paper_graph.py # Paper graph tools (Graph A/B)
β
ββ agent/ # Agent Layer (no side-effects)
β ββ __init__.py
β ββ client.py # Agent entrypoint (orchestration)
β ββ planner.py # Goal β plan creation
β ββ executor.py # Plan β MCP calls
β ββ memory.py # State storage
β
ββ web/ # Web UI (React/TypeScript)
β ββ api/mcp.ts # MCP REST client
β ββ components/
β β ββ GraphCanvas.tsx # D3.js force-directed graph
β β ββ SidePanel.tsx # Paper details panel
β β ββ PaperCard.tsx # Paper metadata card
β ββ pages/
β β ββ GlobalGraphPage.tsx # Graph B - All papers overview
β β ββ PaperGraphPage.tsx # Graph A - Reference exploration
β ββ App.tsx # Router setup
β ββ main.tsx # Entry point
β ββ package.json
β
ββ requirements/
β ββ base.txt # Common dependencies
β ββ mcp.txt # MCP server dependencies
β ββ agent.txt # Agent dependencies
β
ββ docker/
β ββ Dockerfile.mcp # MCP server image
β ββ Dockerfile.agent # Agent image
β ββ Dockerfile.web # Web UI image
β
ββ scripts/
β ββ run-mcp.sh # Start MCP server
β ββ process-all.sh # Process all PDFs
β
ββ pdf/ # Input PDF files
ββ output/ # Extracted content
β ββ images/ # Extracted images
β ββ text/ # Extracted text
β ββ graph/ # Graph cache
β ββ global_graph.json # Global graph (Graph B)
β ββ paper/ # Per-paper graphs (Graph A)
β ββ <paper_id>.json
β
ββ .env.example # Environment template
ββ docker-compose.yml # Service orchestration
ββ README.md
- Purpose: Explore references of a specific paper
- Behavior: On-demand, incremental expansion
- Usage: Double-click nodes to expand their references
- Center: Selected paper is fixed at center
- Purpose: Visualize relationships across all papers
- Behavior: Batch processing with embedding-based similarity
- Clustering: Louvain community detection
- Similarity: SentenceTransformer embeddings
# Clone the repository
git clone <repo-url>
cd Research_agent
# Create .env file
cp .env.example .env
# Edit .env and add your API keys:
# - OPENAI_API_KEY (required)
# - TAVILY_API_KEY (optional, for web search)docker compose buildcp /path/to/your/papers/*.pdf ./pdf/Start All Services:
# Start MCP server and Web UI
docker compose up -d mcp-server web
# Open Web UI
open http://localhost:3000Interactive Agent:
# Run Agent interactively
docker compose run --rm agentSingle Command:
docker compose run --rm agent "List all PDFs and extract text from each"Process All PDFs:
./scripts/process-all.sh| Tool | Description |
|---|---|
list_pdfs |
List all PDF files in the directory |
extract_text |
Extract text from a PDF |
extract_images |
Extract images from a PDF |
extract_all |
Extract text and images |
process_all_pdfs |
Process all PDFs |
get_pdf_info |
Get PDF metadata |
read_extracted_text |
Read previously extracted text |
check_github_link |
Find GitHub repository URLs in extracted text |
| Tool | Description |
|---|---|
arxiv_search |
Search arXiv for papers |
arxiv_get_paper |
Get paper details by ID |
arxiv_download |
Download paper PDF |
| Tool | Description |
|---|---|
web_search |
Search the web (Tavily) |
web_get_content |
Fetch URL content |
web_research |
In-depth topic research |
| Tool | Description |
|---|---|
update_user_profile |
Update interests/keywords and toggle exclude_local_papers (writes to OUTPUT_DIR/users/profile.json). |
apply_hard_filters |
Apply ALREADY_READ, blacklist keywords, and year filters. |
calculate_semantic_scores |
Compute hybrid semantic relevance scores (embeddings + optional LLM for borderline cases). |
evaluate_paper_metrics |
Compute dimension scores (keywords/authors/institutions/recency/practicality) and soft penalties. |
rank_and_select_top_k |
Combine scores, compute final ranking, and optionally add a contrastive paper. |
| Tool | Description |
|---|---|
has_pdf |
Check if PDF exists for a paper ID |
fetch_paper_if_missing |
Download from arXiv if not present |
extract_references |
Extract references from a PDF |
get_references |
Get cached references for a paper |
build_reference_subgraph |
Build Graph A (paper-centered) |
build_global_graph |
Build Graph B (all papers) |
The MCP server exposes a REST API at http://localhost:8000:
# List all tools
curl http://localhost:8000/tools
# Get tools in OpenAI format
curl http://localhost:8000/tools/schema
# Execute a tool
curl -X POST http://localhost:8000/tools/list_pdfs/execute \
-H "Content-Type: application/json" \
-d '{"arguments": {}}'
# Graph endpoints
curl http://localhost:8000/tools/build_global_graph/execute \
-H "Content-Type: application/json" \
-d '{"arguments": {"similarity_threshold": 0.7}}'
# Convenience endpoints
curl http://localhost:8000/pdf/list
curl "http://localhost:8000/arxiv/search?query=transformer"Access the web interface at http://localhost:3000:
| Page | URL | Description |
|---|---|---|
| Global Graph | / |
Overview of all papers (Graph B) |
| Paper Graph | /paper/:id |
Reference exploration for a paper (Graph A) |
- Click: Select a node to view details
- Double-click: Expand references (Graph A only)
- Drag: Reposition nodes
- Controls: Adjust similarity threshold, rebuild graph
You: What PDFs do I have?
[Calling list_pdfs...]
Assistant: You have 3 PDF files: paper1.pdf, paper2.pdf, paper3.pdf
You: Search arXiv for papers about attention mechanisms
[Calling arxiv_search...]
Assistant: Found 10 papers about attention mechanisms...
You: Download the first one and extract its content
[Calling arxiv_download...]
[Calling extract_all...]
Assistant: Downloaded and extracted the paper. Here's a summary...
You: Build a reference graph for paper 2106.09685
[Calling build_reference_subgraph...]
Assistant: Built reference graph with 15 papers and 23 edges.
- Docker & Docker Compose
- OpenAI API key
- (Optional) Tavily API key for web search
MIT