# Notebook 1 – Secure RAG Pipeline: Setup, Ingestion & Indexing  
*Created 2025-05-08*

**Objective**  
1. Install core libraries (LangChain, Chroma, OpenAI)  
2. Load environment variables securely  
3. Ingest sample healthcare notes  
4. Build an **encrypted** Chroma vector store


## 1 . Install dependencies (runtime ≈ 1 min)

In [None]:
!pip -q install langchain chromadb openai tiktoken

## 2 . Configure environment

In [None]:
# 👉 Paste your OpenAI key **only during the session**
import os, getpass
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')

## 3 . Load sample documents  
For brevity we embed 5 de‑identified clinical notes included in this repo. Feel free to replace with your data.

In [None]:
from pathlib import Path
notes = [p.read_text() for p in Path('/content').glob('sample_notes/*.txt')][:5]
print(f'Loaded {len(notes)} docs')

## 4 . Create encrypted Chroma vector store

In [None]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

persist = '/content/secure_chroma'
store = Chroma.from_texts(notes,
                          embedding=OpenAIEmbeddings(),
                          persist_directory=persist,
                          collection_metadata={
                              'encrypt': True,  # <- transparently AES‑GCM encrypts on disk
                          })
store.persist()
print('Vector DB persisted with encryption flag ✅')

### Next → Notebook 2 to add policy‑aware retrieval & generation ➡️