## LLM Server Setup

Use the unified Python launcher to start local vLLM servers before running these tests:

- Start both 1B (8001) and 8B (8000):

```bash
python scripts/start_vllm.py start-both
```

- Start only 1B:

```bash
python scripts/start_vllm.py start-1b
```

- Start only 8B:

```bash
python scripts/start_vllm.py start-8b
```

This replaces the previous `scripts/start_llama_1b.sh`/`start_both_llm.sh` scripts.


In [1]:
import autorootcwd

from src.agents.state import AgentState
from src.agents.graph import build_agent_graph
from src.agents.coordinator import CoordinatorAgent
from src.agents.rating import RatingAgent
from src.agents.reviewer import ReviewerAgent
from src.agents.arxiv import ArxivPaperRetriever
from src.agents.retriever import ReviewRetriever
from src.agents.parser import ParserAgent
from src.agents.llm import get_llm

  from .autonotebook import tqdm as notebook_tqdm


## 1. Environment & LLM Setup
Check if the LLM is configured correctly.

In [2]:
llm = get_llm()
if llm:
    print(f"LLM Initialized: {llm}")
else:
    print("LLM not configured. Some agents may not work fully.")

[get_llm] Using vLLM with model meta-llama/Meta-Llama-3-8B-Instruct at http://localhost:8000/v1
LLM Initialized: client=<openai.resources.chat.completions.completions.Completions object at 0x7da39af7b8c0> async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x7da1dcfb4440> root_client=<openai.OpenAI object at 0x7da39af79010> root_async_client=<openai.AsyncOpenAI object at 0x7da1dcfb41a0> model_name='meta-llama/Meta-Llama-3-8B-Instruct' temperature=0.3 model_kwargs={} openai_api_key=SecretStr('**********') openai_api_base='http://localhost:8000/v1'


## 2. Parser Agent Test
Test parsing raw text.

In [3]:
import importlib
import src.agents.parser as parser_mod
importlib.reload(parser_mod)
ParserAgent = parser_mod.ParserAgent

parser = ParserAgent()

pdf_path = "samples/2512.15716.pdf"

parsed = parser.parse(pdf_path=pdf_path, raw_text=None)
print("Parsed Sections:", parsed.keys())
print("Abstract:", parsed.get("abstract", "Not found")[:500])
if "tables_md" in parsed:
    print("Tables captured (first 500 chars):")
    print(parsed["tables_md"][:500])

ParserAgent: Using local LLM for PDF cleanup.
[get_llm] Using vLLM with model meta-llama/Llama-3.2-1B at http://localhost:8001/v1
Parsed Sections: dict_keys(['abstract', 'conclusion', 'full_text', 'markdown_path'])
Abstract: Existing video generation models struggle to maintain long-
term spatial and temporal consistency due to the dense,
high-dimensional nature of video signals.
To overcome
this limitation, we propose Spatia, a spatial memory–aware
video generation framework that explicitly preserves a 3D
scene point cloud as persistent spatial memory. Spatia it-
eratively generates video clips conditioned on this spatial
memory and continuously updates it through visual SLAM.
This dynamic–static disentanglement de


## 3. Retriever Agent Test (ChromaDB)
Test retrieving similar reviews from ChromaDB.

In [4]:
# Ensure you have the chromadb directory in the root or adjust path
db_path = "chromadb"

try:
    retriever = ReviewRetriever(db_path=db_path)
    query = "Video Generation with Updatable Spatial Memory"
    results = retriever.retrieve_similar_reviews(query, k=2)

    documents, metadatas, abstracts, paper_urls, pdf_urls, pdf_paths = results

    print(f"Retrieved {len(documents)} reviews.")
    if len(documents) > 0:
        print("First review preview:", documents[0][:100])
        print("Metadata:", metadatas[0])
except Exception as e:
    print(f"Retriever test failed (check DB path): {e}")

ReviewRetriever initialized with 12 collections: ['neurips_2025', 'neurips_2022', 'iclr_2022', 'neurips_2024', 'iclr_2023', 'iclr_2021', 'neurips_2023', 'iclr_2025', 'iclr_2024', 'icml_2025', 'tmlr', 'neurips_2021']
Retrieved 2 reviews.
First review preview: Review: summary: This work proposes using video generation models (e.g., Hunyuan Video) to achieve s
Metadata: {'note_id': '7vr2mRnyHF', 'forum_id': 'L1m5124sNQ', 'type': 'review', 'collection_name': 'neurips_2025', 'distance': 0.6797904372215271}


## 4. Arxiv Agent Test
Test retrieving similar papers from Arxiv.

In [5]:
arxiv_agent = ArxivPaperRetriever(enable_cache=True, cache_path="../logs/arxiv_cache.json")

title = "Spatia: Video Generation with Updatable Spatial Memory"
abstract = "Existing video generation models struggle to maintain long-term spatial and temporal consistency due to the dense, high-dimensional nature of video signals. To overcome this limitation, we propose Spatia, a spatial memory–aware video generation framework that explicitly preserves a 3D scene point cloud as persistent spatial memory. Spatia iteratively generates video clips conditioned on this spatialmemory and continuously updates it through visual SLAM. This dynamic–static disentanglement design enhances spatial consistency throughout the generation process while preserving the model’s ability to produce realistic dynamic entities. Furthermore, Spatia enables applications such as explicit camera control and 3D-aware interactive editing, providing a geometrically grounded framework for scalable, memory-driven video generation"
papers, snippets, query, cache_hit = arxiv_agent.retrieve_similar_papers(
    title=title,
    abstract=abstract,
    raw_text=""
)

print(f"Cache hit: {cache_hit}")
print(f"Found {len(papers)} papers.")
if papers:
    print("First paper title:", papers[0].get("title"))

Cache hit: True
Found 3 papers.
First paper title: Spatia: Video Generation with Updatable Spatial Memory


In [6]:
# 1. Run the same query again to verify cache hit
import os
import json
print("Re-running the same query...")
_, _, _, cache_hit_2 = arxiv_agent.retrieve_similar_papers(
    title=title,
    abstract=abstract,
    raw_text=""
)

print(f"Second run cache hit: {cache_hit_2}")

if cache_hit_2:
    print("✅ Cache verification PASSED: Result served from cache.")
else:
    print("❌ Cache verification FAILED: Result not served from cache.")

# 2. Check Cache File Structure

cache_file_path = "../logs/arxiv_cache.json"

if os.path.exists(cache_file_path):
    print(f"\nReading cache file: {cache_file_path}")
    try:
        with open(cache_file_path, 'r', encoding='utf-8') as f:
            cache_data = json.load(f)

        print(f"Total cached queries: {len(cache_data)}")

        # Show a sample key (query)
        if cache_data:
            sample_query = list(cache_data.keys())[0]
            print(f"Sample cached query: '{sample_query}'")
            print(f"Cached data keys: {list(cache_data[sample_query].keys())}")  # Should be ['papers', 'timestamp']

    except Exception as e:
        print(f"Error reading cache file: {e}")
else:
    print(f"Cache file not found at {cache_file_path}")

Re-running the same query...
Second run cache hit: True
✅ Cache verification PASSED: Result served from cache.

Reading cache file: ../logs/arxiv_cache.json
Total cached queries: 3
Sample cached query: 'attention is all you need'
Cached data keys: ['papers', 'saved_at']


### ArXiv Caching Verification
Check if the second request hits the cache and inspect the cache file structure.

## 5. Reviewer Agent Test
Test generating a review based on dummy inputs.

In [7]:
reviewer = ReviewerAgent(llm=llm)

paper_text = "This paper proposes a novel architecture for image classification..."
similar_reviews = ["This paper is good but lacks experiments.", "Novelty is limited."]
arxiv_refs = ["Related work A shows similar results."]

review = reviewer.generate_review(
    paper_text=paper_text,
    similar_reviews=similar_reviews,
    arxiv_references=arxiv_refs,
    paper_title="Test Paper"
)

print("Generated Review Preview:")
print(review[:200] + "...")

Generated Review Preview:
Here is the review:

**Summary**

The paper proposes a novel architecture for image classification, which is an interesting contribution to the field. However, the paper falls short in providing suffi...


## 6. Rating Agent Test
Test predicting a rating from a dummy review.

In [8]:
rating_agent = RatingAgent(llm=llm)

dummy_review = """
Summary: Good paper.
Strengths: Novel idea.
Weaknesses: Poor writing.
Rating: 7.5
Detailed Review: ...
"""

score, rationale = rating_agent.predict_rating(dummy_review)
print(f"Predicted Score: {score}")
print(f"Rationale: {rationale}")

Predicted Score: 6.0
Rationale: The paper's novel idea is a major strength, but poor writing and mediocre scores in other criteria bring down the overall score.


## 7. Coordinator Agent Test (Full Flow)
Test the full flow using the coordinator.

In [9]:
coordinator = CoordinatorAgent()

# We need to patch the retriever path if running from notebooks folder
coordinator.retriever_agent = ReviewRetriever(db_path="../chromadb")
coordinator.arxiv_agent = ArxivPaperRetriever(cache_path="../logs/arxiv_cache.json")

state = coordinator.run(
    paper_title="Test Paper for Coordination",
    paper_text="This is a test paper text for the coordinator agent to process."
)

print("Final Rating:", state.get("predicted_rating"))
print("Progress Log:")
for log in state.get("progress_log", []):
    print(f" - {log}")

ReviewRetriever initialized with 12 collections: ['neurips_2025', 'neurips_2022', 'iclr_2022', 'neurips_2024', 'iclr_2023', 'iclr_2021', 'neurips_2023', 'iclr_2025', 'iclr_2024', 'icml_2025', 'tmlr', 'neurips_2021']
[get_llm] Using vLLM with model meta-llama/Meta-Llama-3-8B-Instruct at http://localhost:8000/v1
[get_llm] Using vLLM with model meta-llama/Meta-Llama-3-8B-Instruct at http://localhost:8000/v1
ReviewRetriever initialized with 0 collections: []
Final Rating: 1.0
Progress Log:
 - Coordinator: Parser Agent starting...
 - Coordinator: Parser Agent completed.
 - Coordinator: RAG Agent starting retrieval...
 - Coordinator: RAG Agent retrieved references.
 - Coordinator: ArXiv Agent starting retrieval...
 - Coordinator: ArXiv Agent fetched related literature.
 - Coordinator: Reviewer Agent starting...
 - Coordinator: Reviewer Agent drafted report.
 - Coordinator: Rating Agent starting...
 - Coordinator: Rating Agent finalized the score.


## 8. LangGraph Workflow Test
Test the graph execution.

In [10]:
# Note: The graph uses default initializations, so it might look for chromadb in ./chromadb relative to execution
# You might need to adjust paths in the source code or run this from root if paths are hardcoded.

try:
    app = build_agent_graph()

    inputs = AgentState(
        paper_text="This paper proposes a new method for efficient transformer training...",
        paper_title="Efficient Training with LoRA",
        progress_log=[]
    )

    print("Starting Graph...")
    for output in app.stream(inputs):
        for key, value in output.items():
            print(f"Finished Node: {key}")

    print("Graph execution finished.")
except Exception as e:
    print(f"Graph test failed: {e}")

Starting Graph...
Finished Node: parse
--- RETRIEVE SIMILAR REVIEWS ---
ReviewRetriever initialized with 12 collections: ['neurips_2025', 'neurips_2022', 'iclr_2022', 'neurips_2024', 'iclr_2023', 'iclr_2021', 'neurips_2023', 'iclr_2025', 'iclr_2024', 'icml_2025', 'tmlr', 'neurips_2021']
Finished Node: retrieve
Finished Node: arxiv
--- GENERATE REVIEW ---
[get_llm] Using vLLM with model meta-llama/Meta-Llama-3-8B-Instruct at http://localhost:8000/v1
Finished Node: review
[get_llm] Using vLLM with model meta-llama/Meta-Llama-3-8B-Instruct at http://localhost:8000/v1
Finished Node: rate
Graph execution finished.


## 9. Run Sample Paper Review
Run the full multi-agent review process on the sample PDF.

In [11]:
import autorootcwd

In [12]:
from src.agents import (ReviewerAgent, RatingAgent, ReviewRetriever, ArxivPaperRetriever)
from src.agents.coordinator import CoordinatorAgent
from src.agents.llm import get_llm

import os
import json
import importlib
from datetime import datetime
from pathlib import Path

import src.agents.llm
import src.agents.coordinator

importlib.reload(src.agents.llm)
importlib.reload(src.agents.coordinator)


def render_review_markdown(state, timestamp):
    """Render a human-readable markdown report covering every agent output."""
    def _trim_text(value, limit=1200):
        if not value:
            return ""
        text = str(value)
        return text if len(text) <= limit else f"{text[:limit]}...[truncated]"

    parsed_sections = state.get("parsed_sections") or {}
    retrieved_reviews = state.get("retrieved_reviews") or []
    metadatas = state.get("retrieved_metadatas") or []
    abstracts = state.get("retrieved_abstracts") or []
    paper_urls = state.get("retrieved_paper_urls") or []
    pdf_urls = state.get("retrieved_pdf_urls") or []
    pdf_paths = state.get("retrieved_pdf_paths") or []
    arxiv_results = state.get("arxiv_results") or []
    arxiv_snippets = state.get("arxiv_reference_texts") or []
    lines = [
        f"# Review Generation Result - {timestamp}",
        "",
        "## Paper Information",
        f"- **Paper Title:** {state.get('paper_title') or 'N/A'}",
        f"- **PDF Path:** {state.get('paper_pdf_path') or 'N/A'}",
        f"- **ArXiv Query:** {state.get('arxiv_query') or 'N/A'}",
        f"- **ArXiv Cache Hit:** {state.get('arxiv_cache_hit')}",
        "",
        "## Parser Output",
    ]

    if parsed_sections:
        for section_name, section_text in parsed_sections.items():
            lines.append(f"### {section_name.title()}")
            lines.append(_trim_text(section_text, limit=2000) or "_Empty section_")
            lines.append("")
    else:
        lines.append("_No parser output available._")
        lines.append("")

    lines.append("## Retrieved Reviews")
    if retrieved_reviews:
        for idx, review_text in enumerate(retrieved_reviews, start=1):
            meta = metadatas[idx - 1] if idx - 1 < len(metadatas) else {}
            abstract = abstracts[idx - 1] if idx - 1 < len(abstracts) else ""
            paper_url = paper_urls[idx - 1] if idx - 1 < len(paper_urls) else ""
            pdf_url = pdf_urls[idx - 1] if idx - 1 < len(pdf_urls) else ""
            pdf_path = pdf_paths[idx - 1] if idx - 1 < len(pdf_paths) else ""
            lines.append(f"### Retrieved Review {idx}")
            lines.append(f"- **Paper URL:** {paper_url or 'N/A'}")
            lines.append(f"- **PDF URL:** {pdf_url or 'N/A'}")
            lines.append(f"- **PDF Path:** {pdf_path or 'N/A'}")
            lines.append(f"- **Metadata:** {json.dumps(meta, ensure_ascii=False)}")
            lines.append("")
            lines.append("**Related Abstract:**")
            lines.append(_trim_text(abstract, limit=1500) or "_No abstract available._")
            lines.append("")
            lines.append("**Review Snippet:**")
            lines.append(_trim_text(review_text, limit=2000) or "_No review available._")
            lines.append("")
    else:
        lines.append("_No retrieved reviews available._")
        lines.append("")

    lines.append("## ArXiv Matches")
    if arxiv_results:
        for idx, paper in enumerate(arxiv_results, start=1):
            snippet = arxiv_snippets[idx - 1] if idx - 1 < len(arxiv_snippets) else ""
            lines.append(f"### Match {idx}")
            lines.append(f"- **Title:** {paper.get('title', 'N/A')}")
            lines.append(f"- **URL:** {paper.get('url', 'N/A')}")
            lines.append(f"- **PDF URL:** {paper.get('pdf_url', 'N/A')}")
            lines.append(f"- **Published:** {paper.get('published', 'N/A')}")
            lines.append(f"- **Authors:** {', '.join(paper.get('authors', [])) or 'N/A'}")
            lines.append("")
            lines.append("**Summary:**")
            lines.append(_trim_text(paper.get("summary", ""), limit=2000) or "_No summary available._")
            lines.append("")
            if snippet:
                lines.append("**Reference Snippet:**")
                lines.append(_trim_text(snippet, limit=1500))
                lines.append("")
    else:
        lines.append("_No related arXiv papers found._")
        lines.append("")

    lines.append("## Final Review")
    lines.append(state.get("final_review") or "_No final review produced._")
    lines.append("")
    lines.append("## Rating Prediction")
    lines.append(f"- **Score:** {state.get('predicted_rating')}")
    lines.append(f"- **Rationale:** {state.get('rating_rationale') or 'N/A'}")
    lines.append("")
    lines.append("## Progress Log")
    progress_log = state.get("progress_log", [])
    if progress_log:
        for log in progress_log:
            lines.append(f"- {log}")
    else:
        lines.append("- No progress recorded.")
    lines.append("")
    return "\n".join(lines)


def save_agent_outputs(state, output_root="outputs"):
    """Persist raw agent state (JSON) plus a markdown summary under outputs/."""
    timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
    output_root = Path(output_root)
    results_dir = output_root / "results"
    results_dir.mkdir(parents=True, exist_ok=True)

    state_path = results_dir / f"agent_state_{timestamp}.json"
    report_path = results_dir / f"review_{timestamp}.md"

    with state_path.open("w", encoding="utf-8") as fp:
        json.dump(state, fp, ensure_ascii=False, indent=2, default=str)

    report_path.write_text(render_review_markdown(state, timestamp), encoding="utf-8")
    return state_path, report_path


# Define paths relative to the notebook
chroma_path = os.path.join("chromadb")
cache_path = os.path.join("logs", "arxiv_cache.json")
sample_pdf_path = os.path.join("samples", "sample_paper_deep_learning_STEM-EDX_tomography_of_nanocrystals.pdf")

print(f"Using ChromaDB at: {chroma_path}")
print(f"Using ArXiv Cache at: {cache_path}")
print(f"Processing PDF: {sample_pdf_path}")

# 1. Initialize LLMs
print("\n--- Initializing LLMs ---")

# Main model (8B) on port 8000 (Replaces Gemini)
llama_8b = get_llm(
    provider="vllm",
    base_url="http://localhost:8000/v1",
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    temperature=0.3
)

# Small model (1B) on port 8001 (Replaces previous OSS LLM usage)
# Ensure you have started the server using `scripts/start_llama_1b.sh`
llama_1b = get_llm(
    provider="vllm",
    base_url="http://localhost:8001/v1",
    model="meta-llama/Llama-3.2-1B",
    temperature=0.1
)

# Alternative: Use HuggingFace Local Pipeline for 1B if server is not desired
# llama_1b = get_llm(provider="huggingface", model="meta-llama/Llama-3.2-1B")

if not llama_8b:
    print("⚠️ Llama-8B not available. Check if vLLM server is running on port 8000.")
if not llama_1b:
    print("⚠️ Llama-1B not available. Check if vLLM server is running on port 8001.")

# 2. Initialize Agents with specific LLMs
# Reviewer & Rating -> Llama 8B (was Gemini)
reviewer = ReviewerAgent(llm=llama_8b)
rating = RatingAgent(llm=llama_8b)

# Coordinator -> Llama 1B (was OSS LLM)
# We inject the 8B-powered agents into the Coordinator, but Coordinator itself might use 1B for orchestration if needed
coordinator = CoordinatorAgent(
    reviewer_agent=reviewer,
    rating_agent=rating,
    llm=llama_8b  # Assuming Coordinator accepts an LLM for its own operations
)

# Patch paths for retriever/arxiv agents inside coordinator
coordinator.retriever_agent = ReviewRetriever(db_path=chroma_path)
coordinator.arxiv_agent = ArxivPaperRetriever(cache_path=cache_path)

# Run the review process
if os.path.exists(sample_pdf_path):
    print("\nStarting review generation... This may take a few minutes.")
    state = coordinator.run(
        paper_pdf_path=sample_pdf_path,
        paper_title="Deep Learning STEM-EDX Tomography of Nanocrystals"
    )

    print("\n" + "="*30)
    print("       FINAL REVIEW REPORT       ")
    print("="*30 + "\n")
    print(state.get("final_review"))

    print("\n" + "="*30)
    print("       RATING PREDICTION       ")
    print("="*30 + "\n")
    print(f"Score: {state.get('predicted_rating')}")
    print(f"Rationale: {state.get('rating_rationale')}")

    print("\n" + "="*30)
    print("       PROGRESS LOG       ")
    print("="*30 + "\n")
    for log in state.get("progress_log", []):
        print(f" - {log}")

    state_path, report_path = save_agent_outputs(state)
    print("\nSaved agent state to:", state_path)
    print("Saved markdown report to:", report_path)
else:
    print(f"Error: Sample file not found at {sample_pdf_path}")

Using ChromaDB at: chromadb
Using ArXiv Cache at: logs/arxiv_cache.json
Processing PDF: samples/sample_paper_deep_learning_STEM-EDX_tomography_of_nanocrystals.pdf

--- Initializing LLMs ---
[get_llm] Using vLLM with model meta-llama/Meta-Llama-3-8B-Instruct at http://localhost:8000/v1
[get_llm] Using vLLM with model meta-llama/Llama-3.2-1B at http://localhost:8001/v1
ReviewRetriever initialized with 12 collections: ['neurips_2025', 'neurips_2022', 'iclr_2022', 'neurips_2024', 'iclr_2023', 'iclr_2021', 'neurips_2023', 'iclr_2025', 'iclr_2024', 'icml_2025', 'tmlr', 'neurips_2021']
ReviewRetriever initialized with 12 collections: ['neurips_2025', 'neurips_2022', 'iclr_2022', 'neurips_2024', 'iclr_2023', 'iclr_2021', 'neurips_2023', 'iclr_2025', 'iclr_2024', 'icml_2025', 'tmlr', 'neurips_2021']
Error: Sample file not found at samples/sample_paper_deep_learning_STEM-EDX_tomography_of_nanocrystals.pdf
