
# RAG Demo â€” Embeddings + FAISS retrieval (small, runnable notebook)

**Purpose:** Demonstrates a minimal Retrieval-Augmented Generation (RAG) flow:
1. Create embeddings for a small set of documents using `sentence-transformers`.
2. Index embeddings with `faiss`.
3. Perform a simple retrieval for a user query.
4. (Optional) Show how to plug retrieved context into a prompt for an LLM.

**How to run:**  
- This notebook can be run locally or in Google Colab.  
- If running locally, create a virtualenv and install the required packages:
```
pip install -U sentence-transformers faiss-cpu numpy
```
- If you want to use an LLM provider (e.g., OpenAI), you'll need to add your API key and follow their usage rules. This demo only shows retrieval + prompt composition and uses a placeholder for model calls.

---

Below are the code cells in order. Execute them sequentially.


In [None]:

# Sample documents (small)
documents = [
    "The Port of Hamburg is one of the largest ports in Europe and handles millions of TEU annually.",
    "Container throughput can be forecasted using time-series models and operational features like arrival rates and berth availability.",
    "Predictive maintenance for cranes reduces downtime by combining sensor data with failure logs and scheduled inspections.",
    "RAG systems use dense vector retrieval (embeddings) to fetch supporting documents, then condition an LLM on the retrieved context."
]

len(documents)



**Install required packages** (run in a code cell if needed):

```bash
pip install -U sentence-transformers faiss-cpu numpy
```
If you're in Colab, use:
```python
!pip install -U sentence-transformers faiss-cpu numpy
```


In [None]:

# Create embeddings using sentence-transformers
# Note: If running here, ensure sentence-transformers is installed.
from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer('all-MiniLM-L6-v2')  # small, fast model
embeddings = model.encode(documents, convert_to_numpy=True)
print('Embeddings shape:', embeddings.shape)


In [None]:

# Build a FAISS index (CPU)
import faiss

d = embeddings.shape[1]
index = faiss.IndexFlatL2(d)
index.add(embeddings)
print('Indexed vectors:', index.ntotal)


In [None]:

# Simple retrieval function
def retrieve(query, k=2):
    q_emb = model.encode([query], convert_to_numpy=True)
    distances, indices = index.search(q_emb, k)
    results = []
    for dist, idx in zip(distances[0], indices[0]):
        results.append({'doc': documents[idx], 'score': float(dist)})
    return results

# Example query
query = 'How can we predict container throughput?'
results = retrieve(query, k=2)
results



## Prompt composition (example)
Compose a prompt for an LLM by combining the retrieved context with the user question.

**Example template (to be sent to an LLM):**
```
You are an expert on port logistics. Use the following context to answer the question.

Context:
[retrieved_doc_1]
[retrieved_doc_2]
...

Question:
[User question]

Answer concisely and include which assumptions you make.
```

Below is an example of how to create that prompt in code (no API call shown).


In [None]:

# Create the prompt from retrieved docs
retrieved = retrieve('What methods to forecast arrivals in a port?', k=2)
context = '\n\n'.join([r['doc'] for r in retrieved])
user_question = 'Welche Methoden eignen sich zur Vorhersage von Ankunftszeiten im Hafen?'

prompt = f"""You are an expert on port logistics. Use the following context to answer the question.

Context:
{context}

Question:
{user_question}

Answer concisely in German and mention key features you would use for forecasting.
"""

print(prompt)



## Next steps / notes
- To run a full RAG pipeline in production, consider:
  - Storing embeddings in a managed vector DB (Pinecone, Milvus, Weaviate) or a scalable FAISS setup.
  - Using a model registry (e.g., MLflow) for your embedding models and LLM prompt templates.
  - Implementing security, access control, and data governance (DSGVO) for documents used in retrieval.
- This notebook is intentionally minimal so you can quickly run and adapt it. Add real documents, increase k, or plug an LLM provider for generation.
