Skip to content

Conversation

@wizardlancet
Copy link
Contributor

.1 Re-implement RAG agent with Agent-Lightning v2.x
2. Merge environments management into uv
3. Update sample dataset to a tiny dataset for future CI
4. Update readme

Copilot AI review requested due to automatic review settings December 2, 2025 05:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request updates the RAG (Retrieval-Augmented Generation) example to be compatible with Agent-Lightning v2.x. The update includes migrating from shell-based training scripts to a Python-based training interface, consolidating environment management into uv, and using a smaller "tiny" dataset for easier testing and CI integration.

  • Migrated from shell script configuration to Python-based training with train_rag.py
  • Updated agent implementation to use v2.x API signatures (training_rollout_async and validation_rollout_async now accept Rollout parameter)
  • Consolidated dependency management by removing conda-based environment setup in favor of uv-managed dependencies

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pyproject.toml Added rag dependency group with required packages (fastmcp, faiss-cpu, sentence-transformers) and conflict resolution overrides
wiki_retriever_mcp/wiki_retriever_mcp.py Updated to use tiny dataset files and reduced top_k from 4 to 1 for demonstration purposes
wiki_retriever_mcp/wiki_retriever_install.sh Removed conda-based installation script in favor of uv dependency management
train_rag.py New Python-based training script with configurable training modes (fast/single_gpu) replacing the shell script approach
train.sh Removed shell-based training script, replaced by train_rag.py
rag_run_dev.py New development/testing script for running the RAG agent with a local vLLM server
rag_agent.py Updated to v2.x API with new method signatures, improved logging, and type hints
README.md Comprehensive documentation update with step-by-step setup instructions for the tiny dataset
Comments suppressed due to low confidence (2)

examples/rag/rag_run_dev.py:1

  • Import of 'json' is not used.
import json

examples/rag/train_rag.py:13

  • Import of 'os' is not used.
import os

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 6 to 7
python train_rag_agent.py fast # Fast training for CI/testing
python train_rag_agent.py single_gpu # Optimized for Single GPU (1.5B/7B models)
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage examples in the docstring reference train_rag_agent.py, but the actual filename is train_rag.py. Please update the docstring to match the correct filename.

Suggested change
python train_rag_agent.py fast # Fast training for CI/testing
python train_rag_agent.py single_gpu # Optimized for Single GPU (1.5B/7B models)
python train_rag.py fast # Fast training for CI/testing
python train_rag.py single_gpu # Optimized for Single GPU (1.5B/7B models)

Copilot uses AI. Check for mistakes.
Comment on lines 79 to 85
val_resources = {
"main_llm": agl.LLM(
endpoint=llm.endpoint,
model=llm.model,
sampling_parameters={"temperature": 0.7},
)
}
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The val_resources is created as a plain dictionary but is passed to training_rollout_async which expects an agl.NamedResources object. This type mismatch could cause runtime errors. Consider wrapping it properly or ensuring it's the correct type.

Suggested change
val_resources = {
"main_llm": agl.LLM(
endpoint=llm.endpoint,
model=llm.model,
sampling_parameters={"temperature": 0.7},
)
}
val_resources = agl.NamedResources({
"main_llm": agl.LLM(
endpoint=llm.endpoint,
model=llm.model,
sampling_parameters={"temperature": 0.7},
)
})

Copilot uses AI. Check for mistakes.
| `train_rag.py` | Initiates the GRPO training process |
| `rag_run_dev.py` | Development run test |
| `utils.py` | Scoring utilities for exact match, F1 score, and response parsing |
| `wiki_retriever_mcp/` | Setup scripts and MCP server (`wiki_retriever_install.sh`, `wiki_retriever_mcp.py`) for Wikipedia retrieval |
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description mentions wiki_retriever_install.sh as part of the wiki_retriever_mcp/ directory, but this file has been deleted in this PR. Please update the description to remove the reference to this file.

Suggested change
| `wiki_retriever_mcp/` | Setup scripts and MCP server (`wiki_retriever_install.sh`, `wiki_retriever_mcp.py`) for Wikipedia retrieval |
| `wiki_retriever_mcp/` | MCP server (`wiki_retriever_mcp.py`) for Wikipedia retrieval |

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,34 @@
# Copyright (c) Microsoft. All rights reserved.
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The json module is imported but never used in this file. Consider removing this unused import.

Suggested change
# Copyright (c) Microsoft. All rights reserved.

Copilot uses AI. Check for mistakes.
from __future__ import annotations

import argparse
import os
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The os module is imported but never used in this file. Consider removing this unused import.

Suggested change
import os

Copilot uses AI. Check for mistakes.
3. Start the MCP server
Open a terminal and run:
```
cd examples/rag/wiki_retriever_mcp
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a path navigation inconsistency in the instructions. After step 2, users are left in the examples/rag/wiki_retriever_mcp directory, but step 3 says to cd examples/rag/wiki_retriever_mcp (which assumes they're at the repo root). Consider adding a comment to indicate the expected working directory at each step or adjusting the paths to be consistent.

Suggested change
cd examples/rag/wiki_retriever_mcp
# (Assuming you are still in examples/rag/wiki_retriever_mcp)

Copilot uses AI. Check for mistakes.
@ultmaster
Copy link
Contributor

/ci

@github-actions
Copy link

github-actions bot commented Dec 3, 2025

🚀 CI Watcher for correlation id-3605104551-mipjmxvu triggered by comment 3605104551
🏃‍♀️ Tracking 1 workflow run(s):

✅ All runs completed.

@ultmaster ultmaster changed the title Update RAG example to v2.x Update RAG example to v0.2.x Dec 3, 2025
@ultmaster ultmaster merged commit 34811cb into microsoft:main Dec 3, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants