A minimal Retrieval-Augmented Generation (RAG) prototype for asking questions over a folder of PDFs. It indexes documents into Pinecone and uses LangChain (LCEL) + OpenAI to generate answers grounded in retrieved context.
- Loads PDFs from
./docs/ - Chunks text for retrieval
- Creates embeddings with OpenAI
- Stores/retrieves vectors with Pinecone
- Builds a context-grounded prompt and answers with GPT via LangChain
Ingest (Indexing)
- Load environment variables from
.env - Load PDFs from
./docs - Split text into overlapping chunks
- Embed chunks
- Upsert embeddings to Pinecone index (
sentry-index)
Query (Retrieval + Generation)
- Take a user question
- Retrieve top-matching chunks from Pinecone
- Insert chunks + question into a prompt template
- Generate an answer with the chat model
- Print the result
- Python 3.11+
- An OpenAI API key
- A Pinecone API key (and an existing Pinecone project)
python3 -m venv venv
source venv/bin/activate
pip install -U langchain langchain-community langchain-openai langchain-pinecone pypdf pinecone-client python-dotenvIn the project root (next to sentry_query.py), create a file named .env. It is gitignored; do not commit it.
Add exactly these two variables, each on its own line, with your real keys after the =:
OPENAI_API_KEY=your_openai_key_here
PINECONE_API_KEY=your_pinecone_key_here
load_dotenv() in sentry_query.py loads this file. LangChain’s OpenAI classes read OPENAI_API_KEY. Pinecone reads PINECONE_API_KEY.
Place one or more PDFs into:
./docs/
./venv/bin/python sentry_query.pyEdit the query = "..." line in sentry_query.py and try:
- "Summarize the main purpose of these documents."
- "List key requirements or controls and group them by category."
- "Extract an implementation checklist."
- "What access control or PII-related guidance is described?"
- This script currently re-embeds and re-upserts on every run (good for a demo, inefficient for production).
- Data is sent to cloud services (OpenAI + Pinecone). Avoid indexing confidential data unless you have permission and governance controls.
MIT (see LICENSE).