Skip to content

grammerpro/Java-Native-RAG-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Java-Native RAG System

A complete, full-stack Retrieval-Augmented Generation (RAG) application for healthcare semantic search using the MedQuAD dataset.

πŸ—οΈ Architecture

  • Backend: Java Spring Boot 3.x with Spring AI
  • Vector Database: PostgreSQL with pgvector extension
  • Frontend: React + TypeScript with Vite
  • Embeddings & LLM: OpenAI (text-embedding-ada-002 & GPT-4)
  • Evaluation: Python RAGAS framework

πŸ“ Project Structure

java-rag-semantic-search/
β”œβ”€β”€ backend/              # Spring Boot application
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   └── main/
β”‚   β”‚       β”œβ”€β”€ java/com/vardhan/rag/
β”‚   β”‚       β”‚   β”œβ”€β”€ RagApplication.java
β”‚   β”‚       β”‚   β”œβ”€β”€ dto/
β”‚   β”‚       β”‚   β”œβ”€β”€ service/
β”‚   β”‚       β”‚   └── controller/
β”‚   β”‚       └── resources/
β”‚   β”‚           └── application.properties
β”‚   β”œβ”€β”€ pom.xml
β”‚   └── Dockerfile
β”œβ”€β”€ frontend/             # React TypeScript app
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   β”œβ”€β”€ SearchUI.tsx
β”‚   β”‚   β”‚   └── BenchmarkDashboard.tsx
β”‚   β”‚   β”œβ”€β”€ App.tsx
β”‚   β”‚   └── main.tsx
β”‚   β”œβ”€β”€ package.json
β”‚   β”œβ”€β”€ Dockerfile
β”‚   └── nginx.conf
β”œβ”€β”€ data/                 # Dataset files
β”‚   β”œβ”€β”€ data_prep.py
β”‚   └── README.md
β”œβ”€β”€ research/             # Evaluation scripts
β”‚   β”œβ”€β”€ evaluate.py
β”‚   β”œβ”€β”€ eval_dataset.csv
β”‚   β”œβ”€β”€ requirements.txt
β”‚   └── README.md
β”œβ”€β”€ docker-compose.yml
└── README.md

πŸš€ Quick Start

Prerequisites

  • Java 21+
  • Node.js 20+
  • Docker & Docker Compose
  • Python 3.10+
  • OpenAI API Key

1. Clone and Setup

cd "c:\Projects\Java-Native RAG System"

2. Set Environment Variables

Create a .env file in the root directory:

OPENAI_API_KEY=your-openai-api-key-here

3. Prepare Data

Download the MedQuAD dataset from Kaggle and place it in the data/ directory:

cd data
pip install pandas
python data_prep.py

4. Start with Docker

docker-compose up --build

This will start:

  • PostgreSQL with pgvector on port 5432
  • Spring Boot backend on port 8080
  • React frontend on port 3000

5. Access the Application

πŸ“Š Using the Application

1. Ingest Data

First, you need to ingest the MedQuAD data into the vector store:

  • Navigate to the Search UI
  • Click the "Ingest Data" button
  • Wait for the process to complete

2. Ask Questions

Once data is ingested, you can ask medical questions:

  • Type your question in the input field
  • Click "Send"
  • The system will retrieve relevant context and generate an answer

3. Run Benchmarks

To evaluate RAG performance:

  1. Install Python dependencies:

    cd research
    pip install -r requirements.txt
  2. Navigate to /benchmark in the frontend

  3. Click "Run Benchmark"

  4. View RAGAS metrics and visualizations

πŸ”§ Development

Backend Development

cd backend
./mvnw spring-boot:run

Frontend Development

cd frontend
npm install
npm run dev

Python Evaluation

cd research
pip install -r requirements.txt
python evaluate.py

πŸ“ˆ RAGAS Metrics

The system evaluates RAG performance using:

  1. Faithfulness: Factual consistency with context (0-1)
  2. Answer Relevancy: How well answers address questions (0-1)
  3. Context Relevancy: Relevance of retrieved documents (0-1)
  4. Context Recall: Completeness of retrieved information (0-1)

πŸ› οΈ Technology Stack

Backend

  • Spring Boot 3.2.5
  • Spring AI 1.0.0-M1
  • PostgreSQL + pgvector
  • OpenAI API
  • Lombok

Frontend

  • React 18
  • TypeScript
  • Vite
  • Axios
  • Recharts
  • React Router

Research

  • RAGAS
  • LangChain
  • Pandas
  • Datasets

πŸ“ API Endpoints

RAG Controller

  • POST /api/rag/query - Query the RAG system
  • POST /api/rag/ingest - Ingest data into vector store
  • GET /api/rag/health - Health check

Benchmark Controller

  • GET /api/benchmark/run - Run RAGAS evaluation
  • GET /api/benchmark/status - Check benchmark service status

🐳 Docker Commands

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f

# Stop all services
docker-compose down

# Rebuild and start
docker-compose up --build

# Remove volumes (reset database)
docker-compose down -v

πŸ” Troubleshooting

Backend Issues

  1. Can't connect to database: Ensure PostgreSQL container is running
  2. OpenAI API errors: Check your API key in environment variables
  3. Data ingestion fails: Verify medquad.json exists in the data directory

Frontend Issues

  1. Can't connect to backend: Check backend is running on port 8080
  2. Build errors: Delete node_modules and run npm install

Evaluation Issues

  1. Python dependencies: Install all requirements: pip install -r research/requirements.txt
  2. RAGAS errors: Ensure OpenAI API key is set and valid

πŸ“š Dataset

The MedQuAD dataset contains medical Q&A pairs from authoritative sources:

  • National Institutes of Health (NIH)
  • Centers for Disease Control (CDC)
  • Food and Drug Administration (FDA)

Download from: https://www.kaggle.com/datasets/gpreda/medquad

🀝 Contributing

This is a research project. Feel free to:

  • Add more evaluation metrics
  • Experiment with different embedding models
  • Improve the UI/UX
  • Add more medical datasets

πŸ“„ License

This project is for educational and research purposes.

πŸ™ Acknowledgments

  • Spring AI team for the excellent framework
  • RAGAS team for the evaluation framework
  • MedQuAD dataset creators

Built with ❀️ for healthcare AI research

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published