Skip to content

hub2vu/Research_Agent_Renderver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

260 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

이거 2개 μ‚­μ œν•˜λ©΄ 원상볡귀

Dockerfile.render docker/render-start.sh

Notion Save μ‚¬μš©λ²•

.env에 λ‹€μŒ 2개 μΆ”κ°€

  • NOTION_PARENT_PAGE_ID=
  • NOTION_API_TOKEN=

NOTION_API_TOKEN= https://www.notion.so/profile/integrations λ§ν¬μ—μ„œ API λ°œκΈ‰

νŽΈμ§‘ κΆŒν•œμ—μ„œ μ €μž₯ μ›ν•˜λŠ” νŽ˜μ΄μ§€ μ§€μ • image

μ €μž₯ μ›ν•˜λŠ” νŽ˜μ΄μ§€ URL μ—μ„œ λ’·λΆ€λΆ„ NOTION_PARENT_PAGE_ID= μŠ€ν¬λ¦°μƒ· 2026-01-30 111953

Research Agent

A Docker-based research assistant with MCP (Model Context Protocol) architecture and paper graph visualization.

Reference Graph

μŠ€ν¬λ¦°μƒ· 2025-12-29 210026

NeurIPS Graph

μŠ€ν¬λ¦°μƒ· 2026-01-03 232421

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                           Web UI (React)                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚   GlobalGraphPage (B)    β”‚  β”‚      PaperGraphPage (A)          β”‚ β”‚
β”‚  β”‚   - All papers overview  β”‚  β”‚   - Reference exploration        β”‚ β”‚
β”‚  β”‚   - Embedding similarity β”‚  β”‚   - Incremental expansion        β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
β”‚                                β”‚ HTTP :3000                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        Agent Layer                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚  β”‚ client  │──│ planner  │──│ executor │──│   memory    β”‚          β”‚
β”‚  β”‚ (entry) β”‚  β”‚          β”‚  β”‚          β”‚  β”‚             β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚       β”‚         Planning          β”‚         State                   β”‚
β”‚       β”‚         (no side-effects) β”‚         Storage                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚                           β”‚
        β”‚      HTTP API :8000       β”‚
        β–Ό                           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                          MCP Layer                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ server  │──│ registry │──│              tools/                 β”‚ β”‚
β”‚  β”‚ (entry) β”‚  β”‚ (SSOT)   β”‚  β”‚  pdf | arxiv | web_search | graph   β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                               Side-effects allowed                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Principles

  1. Separation of Concerns

    • Agent Layer: Thinking, planning, decision-making (NO side-effects)
    • MCP Layer: Tool execution, file I/O, API calls (side-effects allowed)
    • Web UI: Visualization and interaction
  2. Single Source of Truth (SSOT)

    • registry.py is the ONLY place where tools are discovered and collected
    • Tools are auto-discovered from mcp/tools/ directory
  3. Single Entry Points

    • mcp/server.py - MCP server entry point
    • agent/client.py - Agent entry point
    • Other modules (planner, executor, memory) are NEVER executed directly

Project Structure

Research_agent/
β”œβ”€ mcp/                          # MCP Layer (side-effects allowed)
β”‚  β”œβ”€ __init__.py
β”‚  β”œβ”€ server.py                  # MCP server entrypoint
β”‚  β”œβ”€ registry.py                # Tool auto-discovery (SSOT)
β”‚  β”œβ”€ base.py                    # Tool base classes
β”‚  └─ tools/
β”‚     β”œβ”€ pdf.py                  # PDF extraction tools
β”‚     β”œβ”€ refer.py                # Reference extraction tools
β”‚     β”œβ”€ arxiv.py                # arXiv API tools
β”‚     β”œβ”€ web_search.py           # Web search tools (Tavily)
β”‚     └─ paper_graph.py          # Paper graph tools (Graph A/B)
β”‚
β”œβ”€ agent/                        # Agent Layer (no side-effects)
β”‚  β”œβ”€ __init__.py
β”‚  β”œβ”€ client.py                  # Agent entrypoint (orchestration)
β”‚  β”œβ”€ planner.py                 # Goal β†’ plan creation
β”‚  β”œβ”€ executor.py                # Plan β†’ MCP calls
β”‚  └─ memory.py                  # State storage
β”‚
β”œβ”€ web/                          # Web UI (React/TypeScript)
β”‚  β”œβ”€ api/mcp.ts                 # MCP REST client
β”‚  β”œβ”€ components/
β”‚  β”‚  β”œβ”€ GraphCanvas.tsx         # D3.js force-directed graph
β”‚  β”‚  β”œβ”€ SidePanel.tsx           # Paper details panel
β”‚  β”‚  └─ PaperCard.tsx           # Paper metadata card
β”‚  β”œβ”€ pages/
β”‚  β”‚  β”œβ”€ GlobalGraphPage.tsx     # Graph B - All papers overview
β”‚  β”‚  └─ PaperGraphPage.tsx      # Graph A - Reference exploration
β”‚  β”œβ”€ App.tsx                    # Router setup
β”‚  β”œβ”€ main.tsx                   # Entry point
β”‚  └─ package.json
β”‚
β”œβ”€ requirements/
β”‚  β”œβ”€ base.txt                   # Common dependencies
β”‚  β”œβ”€ mcp.txt                    # MCP server dependencies
β”‚  └─ agent.txt                  # Agent dependencies
β”‚
β”œβ”€ docker/
β”‚  β”œβ”€ Dockerfile.mcp             # MCP server image
β”‚  β”œβ”€ Dockerfile.agent           # Agent image
β”‚  └─ Dockerfile.web             # Web UI image
β”‚
β”œβ”€ scripts/
β”‚  β”œβ”€ run-mcp.sh                 # Start MCP server
β”‚  └─ process-all.sh             # Process all PDFs
β”‚
β”œβ”€ pdf/                          # Input PDF files
β”œβ”€ output/                       # Extracted content
β”‚  β”œβ”€ images/                    # Extracted images
β”‚  β”œβ”€ text/                      # Extracted text
β”‚  └─ graph/                     # Graph cache
β”‚     β”œβ”€ global_graph.json       # Global graph (Graph B)
β”‚     └─ paper/                  # Per-paper graphs (Graph A)
β”‚        └─ <paper_id>.json
β”‚
β”œβ”€ .env.example                  # Environment template
β”œβ”€ docker-compose.yml            # Service orchestration
└─ README.md

Paper Graph System

Graph A: Paper Mode (Reference Exploration)

  • Purpose: Explore references of a specific paper
  • Behavior: On-demand, incremental expansion
  • Usage: Double-click nodes to expand their references
  • Center: Selected paper is fixed at center

Graph B: Global Mode (All Papers Overview)

  • Purpose: Visualize relationships across all papers
  • Behavior: Batch processing with embedding-based similarity
  • Clustering: Louvain community detection
  • Similarity: SentenceTransformer embeddings

Quick Start

1. Setup Environment

# Clone the repository
git clone <repo-url>
cd Research_agent

# Create .env file
cp .env.example .env
# Edit .env and add your API keys:
# - OPENAI_API_KEY (required)
# - TAVILY_API_KEY (optional, for web search)

2. Build Docker Images

docker compose build

3. Add PDF Files

cp /path/to/your/papers/*.pdf ./pdf/

4. Run

Start All Services:

# Start MCP server and Web UI
docker compose up -d mcp-server web

# Open Web UI
open http://localhost:3000

Interactive Agent:

# Run Agent interactively
docker compose run --rm agent

Single Command:

docker compose run --rm agent "List all PDFs and extract text from each"

Process All PDFs:

./scripts/process-all.sh

Available Tools

PDF Tools

Tool Description
list_pdfs List all PDF files in the directory
extract_text Extract text from a PDF
extract_images Extract images from a PDF
extract_all Extract text and images
process_all_pdfs Process all PDFs
get_pdf_info Get PDF metadata
read_extracted_text Read previously extracted text
check_github_link Find GitHub repository URLs in extracted text

arXiv Tools

Tool Description
arxiv_search Search arXiv for papers
arxiv_get_paper Get paper details by ID
arxiv_download Download paper PDF

Web Search Tools

Tool Description
web_search Search the web (Tavily)
web_get_content Fetch URL content
web_research In-depth topic research

Ranking Tools

Tool Description
update_user_profile Update interests/keywords and toggle exclude_local_papers (writes to OUTPUT_DIR/users/profile.json).
apply_hard_filters Apply ALREADY_READ, blacklist keywords, and year filters.
calculate_semantic_scores Compute hybrid semantic relevance scores (embeddings + optional LLM for borderline cases).
evaluate_paper_metrics Compute dimension scores (keywords/authors/institutions/recency/practicality) and soft penalties.
rank_and_select_top_k Combine scores, compute final ranking, and optionally add a contrastive paper.

Paper Graph Tools

Tool Description
has_pdf Check if PDF exists for a paper ID
fetch_paper_if_missing Download from arXiv if not present
extract_references Extract references from a PDF
get_references Get cached references for a paper
build_reference_subgraph Build Graph A (paper-centered)
build_global_graph Build Graph B (all papers)

API Endpoints

The MCP server exposes a REST API at http://localhost:8000:

# List all tools
curl http://localhost:8000/tools

# Get tools in OpenAI format
curl http://localhost:8000/tools/schema

# Execute a tool
curl -X POST http://localhost:8000/tools/list_pdfs/execute \
  -H "Content-Type: application/json" \
  -d '{"arguments": {}}'

# Graph endpoints
curl http://localhost:8000/tools/build_global_graph/execute \
  -H "Content-Type: application/json" \
  -d '{"arguments": {"similarity_threshold": 0.7}}'

# Convenience endpoints
curl http://localhost:8000/pdf/list
curl "http://localhost:8000/arxiv/search?query=transformer"

Web UI

Access the web interface at http://localhost:3000:

Page URL Description
Global Graph / Overview of all papers (Graph B)
Paper Graph /paper/:id Reference exploration for a paper (Graph A)

Graph Interactions

  • Click: Select a node to view details
  • Double-click: Expand references (Graph A only)
  • Drag: Reposition nodes
  • Controls: Adjust similarity threshold, rebuild graph

Example Usage

You: What PDFs do I have?
  [Calling list_pdfs...]
Assistant: You have 3 PDF files: paper1.pdf, paper2.pdf, paper3.pdf

You: Search arXiv for papers about attention mechanisms
  [Calling arxiv_search...]
Assistant: Found 10 papers about attention mechanisms...

You: Download the first one and extract its content
  [Calling arxiv_download...]
  [Calling extract_all...]
Assistant: Downloaded and extracted the paper. Here's a summary...

You: Build a reference graph for paper 2106.09685
  [Calling build_reference_subgraph...]
Assistant: Built reference graph with 15 papers and 23 edges.

Requirements

  • Docker & Docker Compose
  • OpenAI API key
  • (Optional) Tavily API key for web search

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors