🤖 RAG LangChain Agent với LangGraph

Dự án AI Agent thông minh sử dụng RAG (Retrieval-Augmented Generation) để trả lời câu hỏi dựa trên dữ liệu riêng của bạn.

📋 Mô tả

Dự án này xây dựng một AI Agent có khả năng:

🔍 Tìm kiếm thông tin trong tài liệu của bạn bằng kỹ thuật RAG
💬 Trả lời câu hỏi thông minh dựa trên ngữ cảnh
🛠️ Sử dụng nhiều công cụ (tools) để giải quyết vấn đề
📝 Tìm kiếm pattern trong file với regex

✨ Tính năng

🎯 Hai Agent khác nhau:

agent_graph.py: Agent sử dụng OllamaLLM với LangGraph ReActAgent
agent_rag.py: Phiên bản debug với ChatOllama và logging chi tiết

🔧 Tools có sẵn:

rag_qa: Trả lời câu hỏi dựa trên kho tri thức (FAISS vectorstore)
search_in_file: Tìm kiếm các dòng trong file chứa pattern (hỗ trợ regex)

🚀 Cài đặt

Yêu cầu hệ thống:

Python 3.8+
Ollama đã được cài đặt và chạy

Bước 1: Clone repository

git clone <your-repo-url>
cd agent_rag

Bước 2: Cài đặt dependencies

pip install -r requirements.txt

Bước 3: Cài đặt Ollama models

# Cài đặt model LLM (chọn 1 trong các model sau)
ollama pull qwen3:latest
# hoặc
ollama pull llama3.2:latest
# hoặc
ollama pull mistral:latest

# Cài đặt model embedding (chọn 1)
ollama pull nomic-embed-text
# hoặc
ollama pull mxbai-embed-large

📚 Sử dụng

1. Xây dựng index lần đầu tiên

python agent_graph.py --input du_lieu.txt --question "Trung ương làm gì?"

2. Rebuild index (khi có dữ liệu mới)

python agent_graph.py --rebuild --input du_lieu.txt

3. Query với index đã có

# Câu hỏi mặc định
python agent_graph.py

# Câu hỏi tùy chỉnh
python agent_graph.py --question "Đại hội 14 diễn ra khi nào?" --time

# Sử dụng agent với debug mode
python agent_rag.py --question "Tìm kiếm thông tin về Trung ương"

4. Sử dụng RAG thuần (không dùng agent)

python rag_langchain.py --input du_lieu.txt --question "Hội nghị Trung ương 12 bàn về gì?"

⚙️ Tham số dòng lệnh

Agent (agent_graph.py & agent_rag.py)

Tham số	Mặc định	Mô tả
`--input`	None	File hoặc thư mục .txt để xây index
`--index-path`	`.faiss_index`	Thư mục lưu FAISS index
`--embedding-model`	`nomic-embed-text`	Model embedding trong Ollama
`--llm-model`	`qwen3:latest`	Model LLM trong Ollama
`--chunk-size`	800	Kích thước mỗi chunk văn bản
`--chunk-overlap`	100	Độ overlap giữa các chunk
`--top-k`	4	Số đoạn văn bản lấy từ vectorstore
`--rebuild`	False	Xây lại index từ đầu
`--question, -q`	None	Câu hỏi cần trả lời
`--time`	False	Hiển thị thời gian thực thi

RAG Core (rag_langchain.py)

Tương tự như trên, thêm:

Tham số	Mặc định	Mô tả
`--encoding`	`utf-8`	Encoding khi đọc file

📁 Cấu trúc dự án

agent_rag/
├── agent_graph.py          # Agent chính với LangGraph
├── agent_rag.py            # Agent debug mode
├── rag_langchain.py        # RAG core module
├── search_in_file.py       # Tool tìm kiếm trong file
├── test.py                 # File test
├── du_lieu.txt             # Dữ liệu mẫu
├── .faiss_index/           # FAISS vector database (auto-generated)
├── requirements.txt        # Dependencies
├── .gitignore             # Git ignore rules
└── README.md              # Tài liệu này

🛠️ Tool: search_in_file

Cú pháp khi Agent gọi tool:

file=<path>; pattern=<text_or_regex>; ignore_case=true|false; max_results=<int>

Ví dụ:

file=du_lieu.txt; pattern=Trung ương; ignore_case=true; max_results=5

💡 Ví dụ

Ví dụ 1: Truy vấn cơ bản

python agent_graph.py \
  --input du_lieu.txt \
  --question "Hội nghị Trung ương 12 bàn những vấn đề gì?" \
  --time

Ví dụ 2: Tùy chỉnh model

python agent_graph.py \
  --llm-model llama3.2:latest \
  --embedding-model mxbai-embed-large \
  --question "Ai là Tổng Bí thư?" \
  --top-k 3

Ví dụ 3: Thêm dữ liệu mới

# Bước 1: Thêm file .txt vào thư mục data/
mkdir data
cp your_new_data.txt data/

# Bước 2: Rebuild index với thư mục mới
python agent_graph.py \
  --rebuild \
  --input data/ \
  --question "Câu hỏi về dữ liệu mới"

🔧 Cấu hình nâng cao

Thay đổi chunk size

Nếu tài liệu của bạn có đoạn văn dài, tăng chunk-size:

python agent_graph.py --chunk-size 1200 --chunk-overlap 200

Tăng số lượng context (top-k)

Để LLM có nhiều thông tin hơn:

python agent_graph.py --top-k 6

🐛 Xử lý sự cố

Lỗi: "select() timeout" hoặc Ollama không kết nối

# Kiểm tra Ollama đang chạy
ollama list

# Nếu không, khởi động Ollama
ollama serve

Lỗi: Model không tồn tại

# Liệt kê models đã cài
ollama list

# Pull model cần thiết
ollama pull qwen3:latest
ollama pull nomic-embed-text

Lỗi: Unicode/Encoding trên Windows

Dự án đã có xử lý UTF-8 tự động. Nếu vẫn lỗi, chạy trong PowerShell:

$env:PYTHONIOENCODING="utf-8"
python agent_graph.py --question "..."

📊 So sánh các file

File	Mục đích	Khi nào dùng
`agent_graph.py`	Agent production	Sử dụng chính, ổn định
`agent_rag.py`	Agent debug	Debugging, xem chi tiết
`rag_langchain.py`	RAG thuần	Không cần agent, chỉ Q&A
`search_in_file.py`	Tool độc lập	Tìm kiếm file riêng lẻ

🤝 Đóng góp

Mọi đóng góp đều được hoan nghênh! Hãy:

Fork dự án
Tạo branch mới (git checkout -b feature/AmazingFeature)
Commit thay đổi (git commit -m 'Add some AmazingFeature')
Push to branch (git push origin feature/AmazingFeature)
Mở Pull Request

📝 License

Dự án này được phân phối dưới MIT License. Xem file LICENSE để biết thêm chi tiết.

🙏 Cảm ơn

LangChain - Framework AI mạnh mẽ
LangGraph - Xây dựng agent phức tạp
Ollama - Chạy LLM local dễ dàng
FAISS - Vector search nhanh

📧 Liên hệ

Nếu có câu hỏi hoặc góp ý, hãy mở issue trên GitHub!

Happy Coding! 🎉

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPLOYMENT.md		DEPLOYMENT.md
Dockerfile		Dockerfile
GITHUB_ACTIONS.md		GITHUB_ACTIONS.md
LICENSE		LICENSE
Makefile		Makefile
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
SUMMARY.md		SUMMARY.md
TODO.md		TODO.md
agent_graph.py		agent_graph.py
agent_rag.py		agent_rag.py
config.py		config.py
docker-compose.yml		docker-compose.yml
du_lieu.txt		du_lieu.txt
examples.py		examples.py
git_setup.ps1		git_setup.ps1
git_setup.sh		git_setup.sh
rag_langchain.py		rag_langchain.py
requirements.txt		requirements.txt
search_in_file.py		search_in_file.py
setup.py		setup.py
test.py		test.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

🤖 RAG LangChain Agent với LangGraph

📋 Mô tả

✨ Tính năng

🎯 Hai Agent khác nhau:

🔧 Tools có sẵn:

🚀 Cài đặt

Yêu cầu hệ thống:

Bước 1: Clone repository

Bước 2: Cài đặt dependencies

Bước 3: Cài đặt Ollama models

📚 Sử dụng

1. Xây dựng index lần đầu tiên

2. Rebuild index (khi có dữ liệu mới)

3. Query với index đã có

4. Sử dụng RAG thuần (không dùng agent)

⚙️ Tham số dòng lệnh

Agent (agent_graph.py & agent_rag.py)

RAG Core (rag_langchain.py)

📁 Cấu trúc dự án

🛠️ Tool: search_in_file

💡 Ví dụ

Ví dụ 1: Truy vấn cơ bản

Ví dụ 2: Tùy chỉnh model

Ví dụ 3: Thêm dữ liệu mới

🔧 Cấu hình nâng cao

Thay đổi chunk size

Tăng số lượng context (top-k)

🐛 Xử lý sự cố

Lỗi: "select() timeout" hoặc Ollama không kết nối

Lỗi: Model không tồn tại

Lỗi: Unicode/Encoding trên Windows

📊 So sánh các file

🤝 Đóng góp

📝 License

🙏 Cảm ơn

📧 Liên hệ

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages