Parallel implementations of Retrieval-Augmented Generation (RAG) in Java (Spring AI), Java (LangChain4j), and Python (LangChain) for teaching purposes.
All demos follow the same flow: Load PDF → Chunk → Embed → Store → Retrieve → Generate
| Java (Spring AI) | Java (LangChain4j) | Python | |
|---|---|---|---|
| Framework | Spring AI 1.0.3 | LangChain4j 1.10.0 | LangChain |
| LLM | OpenAI (gpt-5-mini) | OpenAI (gpt-5-mini) | OpenAI (gpt-5-mini) |
| Embeddings | text-embedding-3-small | text-embedding-3-small | text-embedding-3-small |
| Vector Store | SimpleVectorStore (in-memory) | InMemoryEmbeddingStore | InMemoryVectorStore |
| PDF Parser | PagePdfDocumentReader | Apache Tika | PyPDFLoader |
| IDE | IntelliJ IDEA | IntelliJ IDEA | PyCharm / VS Code |
- OpenAI API Key — Set as environment variable
OPENAI_API_KEY - Java 21+ (for Java versions)
- Python 3.10+ (for Python version)
cd springai
export OPENAI_API_KEY=sk-...
./gradlew bootRuncd langchain4j
export OPENAI_API_KEY=sk-...
./gradlew runcd python
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
export OPENAI_API_KEY=sk-...
python -m ragdemo.mainThis main branch focuses on a minimal, in-memory setup for teaching core RAG ideas quickly.
For a persistence-backed, production-style version across all three implementations (Spring AI, LangChain4j, and Python), use the supabase-pgvector branch:
git switch supabase-pgvectorThat branch README includes full setup and run instructions for Supabase/pgvector.
| Mode | Vector Storage | Setup Effort | Best For |
|---|---|---|---|
| Main branch | In-memory stores | Low | Learning the RAG pipeline mechanics |
supabase-pgvector branch |
PostgreSQL + pgvector (Supabase) | Medium | Persistence, realism, and deployment discussions |
ragdemo/
├── springai/ # Spring AI implementation
│ ├── build.gradle.kts
│ └── src/main/java/edu/trincoll/ragdemo/
│ ├── RagDemoApplication.java # Spring Boot entry point
│ ├── RagDemoRunner.java # Interactive CLI
│ ├── config/RagConfig.java # VectorStore + ChatClient beans
│ └── service/
│ ├── DocumentLoaderService.java # PDF loading + chunking
│ └── RagService.java # Q&A via ChatClient
│
├── langchain4j/ # LangChain4j implementation (no Spring)
│ ├── build.gradle.kts
│ └── src/main/java/edu/trincoll/ragdemo/
│ ├── RagDemo.java # Plain Java CLI entry point
│ ├── RagService.java # RAG pipeline + InMemoryEmbeddingStore
│ └── DocumentLoader.java # PDF loading via Apache Tika
│
└── python/ # Python/LangChain implementation
└── src/ragdemo/
├── main.py # CLI entry point
├── document_loader.py # PDF loading + chunking
├── vector_store.py # Embedding + storage
└── rag_chain.py # LCEL chain (retriever → prompt → LLM)
┌─────────────────────────────────────────────────────────────────┐
│ INDEXING PHASE │
├─────────────────────────────────────────────────────────────────┤
│ PDF ──▶ Load ──▶ Chunk ──▶ Embed ──▶ Vector Store │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ QUERY PHASE │
├─────────────────────────────────────────────────────────────────┤
│ Question ──▶ Embed ──▶ Search ──▶ Retrieve ──▶ Augment ──▶ LLM │
└─────────────────────────────────────────────────────────────────┘
- Load — Extract text from PDF documents
- Chunk — Split into smaller pieces (~800 tokens) for better retrieval
- Embed — Convert text chunks to vector embeddings
- Store — Save embeddings in vector store for similarity search
- Retrieve — Find chunks most similar to the user's question
- Generate — Send question + retrieved context to LLM for answer
The demo includes the "Attention Is All You Need" paper. Try asking:
- What is the Transformer architecture?
- How does self-attention work?
- What are the key contributions of this paper?
- What is multi-head attention?
Drop additional PDF files into the documents/ folder:
- Java (Spring AI):
springai/src/main/resources/documents/ - Java (LangChain4j):
langchain4j/src/main/resources/documents/ - Python:
python/documents/
The application will automatically load all PDFs on startup.
cd springai
./gradlew testcd langchain4j
./gradlew testcd python
pip install -e ".[dev]"
pytestMIT License — See LICENSE for details.
Kenneth Kousen