GitHub - glhrmbg/rag-mongodb: Implementation of a Retrieval-Augmented Generation (RAG) system using MongoDB Atlas Vector Search to enhance LLM-powered applications through semantic information retrieval. Developed as part of the MongoDB University

🔍 RAG with MongoDB

RAG (Retrieval-Augmented Generation) application using MongoDB Atlas Vector Search to enhance LLM-powered applications through semantic information retrieval.

Batch PDF document processing
Automatic metadata extraction using LLM
Vector embeddings with OpenAI
MongoDB Atlas Vector Search integration
Semantic search capabilities

This project is part of MongoDB University educational materials. Part of MongoDB University - MongoDB Skills course: RAG with MongoDB

⚙️ MongoDB Atlas Setup

1. Create Database and Collection

Database: documents
Collection: docs-chunks

2. Create Vector Search Index

Navigate to Atlas Search and create a new Vector Search Index:

Index Name: vector_index

JSON Configuration:

{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 1536,
      "similarity": "cosine"
    },
    {
      "type": "filter",
      "path": "hasCode"
    }
  ]
}

👨🏻‍💻 How to use

Prerequisites:
- Python 3.13+
- MongoDB Atlas account with Vector Search Index configured
- OpenAI API key

Install dependencies:

pip install langchain langchain_community langchain_core langchain_openai langchain_mongodb pymongo pypdf python-dotenv

Create a .env file in the root directory:

OPENAI_API_KEY=your-openai-api-key
MONGODB_URI=your-mongodb-connection-string

Place your PDF files in the docs/ folder
Run the ingestion script:

python load_data.py

The script will automatically process all PDFs in the docs/ folder and:

Load and clean each PDF
Extract metadata (title, keywords, code detection)
Split documents into chunks (500 chars with 150 overlap)
Generate embeddings using OpenAI
Store everything in MongoDB Atlas

After ingesting documents, you can query your data:

python rag.py

Or import and use programmatically:

from rag import query_data

answer = query_data("What is the difference between a collection and database in MongoDB?")
print(answer)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔍 RAG with MongoDB

⚙️ MongoDB Atlas Setup

1. Create Database and Collection

2. Create Vector Search Index

👨🏻‍💻 How to use

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
.gitignore		.gitignore
README.md		README.md
load_data.py		load_data.py
rag.py		rag.py

glhrmbg/rag-mongodb

Folders and files

Latest commit

History

Repository files navigation

🔍 RAG with MongoDB

⚙️ MongoDB Atlas Setup

1. Create Database and Collection

2. Create Vector Search Index

👨🏻‍💻 How to use

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages