A complete, full-stack Retrieval-Augmented Generation (RAG) application for healthcare semantic search using the MedQuAD dataset.
- Backend: Java Spring Boot 3.x with Spring AI
- Vector Database: PostgreSQL with pgvector extension
- Frontend: React + TypeScript with Vite
- Embeddings & LLM: OpenAI (text-embedding-ada-002 & GPT-4)
- Evaluation: Python RAGAS framework
java-rag-semantic-search/
βββ backend/ # Spring Boot application
β βββ src/
β β βββ main/
β β βββ java/com/vardhan/rag/
β β β βββ RagApplication.java
β β β βββ dto/
β β β βββ service/
β β β βββ controller/
β β βββ resources/
β β βββ application.properties
β βββ pom.xml
β βββ Dockerfile
βββ frontend/ # React TypeScript app
β βββ src/
β β βββ components/
β β β βββ SearchUI.tsx
β β β βββ BenchmarkDashboard.tsx
β β βββ App.tsx
β β βββ main.tsx
β βββ package.json
β βββ Dockerfile
β βββ nginx.conf
βββ data/ # Dataset files
β βββ data_prep.py
β βββ README.md
βββ research/ # Evaluation scripts
β βββ evaluate.py
β βββ eval_dataset.csv
β βββ requirements.txt
β βββ README.md
βββ docker-compose.yml
βββ README.md
- Java 21+
- Node.js 20+
- Docker & Docker Compose
- Python 3.10+
- OpenAI API Key
cd "c:\Projects\Java-Native RAG System"Create a .env file in the root directory:
OPENAI_API_KEY=your-openai-api-key-hereDownload the MedQuAD dataset from Kaggle and place it in the data/ directory:
cd data
pip install pandas
python data_prep.pydocker-compose up --buildThis will start:
- PostgreSQL with pgvector on port 5432
- Spring Boot backend on port 8080
- React frontend on port 3000
- Frontend: http://localhost:3000
- Backend API: http://localhost:8080/api
- Health Check: http://localhost:8080/api/rag/health
First, you need to ingest the MedQuAD data into the vector store:
- Navigate to the Search UI
- Click the "Ingest Data" button
- Wait for the process to complete
Once data is ingested, you can ask medical questions:
- Type your question in the input field
- Click "Send"
- The system will retrieve relevant context and generate an answer
To evaluate RAG performance:
-
Install Python dependencies:
cd research pip install -r requirements.txt -
Navigate to
/benchmarkin the frontend -
Click "Run Benchmark"
-
View RAGAS metrics and visualizations
cd backend
./mvnw spring-boot:runcd frontend
npm install
npm run devcd research
pip install -r requirements.txt
python evaluate.pyThe system evaluates RAG performance using:
- Faithfulness: Factual consistency with context (0-1)
- Answer Relevancy: How well answers address questions (0-1)
- Context Relevancy: Relevance of retrieved documents (0-1)
- Context Recall: Completeness of retrieved information (0-1)
- Spring Boot 3.2.5
- Spring AI 1.0.0-M1
- PostgreSQL + pgvector
- OpenAI API
- Lombok
- React 18
- TypeScript
- Vite
- Axios
- Recharts
- React Router
- RAGAS
- LangChain
- Pandas
- Datasets
POST /api/rag/query- Query the RAG systemPOST /api/rag/ingest- Ingest data into vector storeGET /api/rag/health- Health check
GET /api/benchmark/run- Run RAGAS evaluationGET /api/benchmark/status- Check benchmark service status
# Start all services
docker-compose up -d
# View logs
docker-compose logs -f
# Stop all services
docker-compose down
# Rebuild and start
docker-compose up --build
# Remove volumes (reset database)
docker-compose down -v- Can't connect to database: Ensure PostgreSQL container is running
- OpenAI API errors: Check your API key in environment variables
- Data ingestion fails: Verify
medquad.jsonexists in the data directory
- Can't connect to backend: Check backend is running on port 8080
- Build errors: Delete
node_modulesand runnpm install
- Python dependencies: Install all requirements:
pip install -r research/requirements.txt - RAGAS errors: Ensure OpenAI API key is set and valid
The MedQuAD dataset contains medical Q&A pairs from authoritative sources:
- National Institutes of Health (NIH)
- Centers for Disease Control (CDC)
- Food and Drug Administration (FDA)
Download from: https://www.kaggle.com/datasets/gpreda/medquad
This is a research project. Feel free to:
- Add more evaluation metrics
- Experiment with different embedding models
- Improve the UI/UX
- Add more medical datasets
This project is for educational and research purposes.
- Spring AI team for the excellent framework
- RAGAS team for the evaluation framework
- MedQuAD dataset creators
Built with β€οΈ for healthcare AI research