# Part 1: Simple RAG

In this notebook, we'll build a basic RAG system and see why it doesn't work well.

**What is RAG?**
- **R**etrieval — find relevant documents
- **A**ugmented — add them to the AI's context
- **G**eneration — generate an answer

By the end, you'll understand why simple RAG only achieves ~30% accuracy.

## Step 0: Setup

Run this cell first to clone the repo and install dependencies.

In [1]:
!git clone https://github.com/i33ym/workshop.git 2>/dev/null || echo "Already cloned"
%cd workshop

/content/workshop


In [6]:
!pip install -q openai langchain langchain-openai langchain-community langchain-text-splitters chromadb

## Step 1: Set Your API Key

Get your OpenAI API key from [platform.openai.com](https://platform.openai.com/api-keys)

In [7]:
import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")

Enter your OpenAI API key: ··········


## Step 2: Load the Documents

We'll load markdown files from the `docs/` folder.

In [8]:
from langchain_community.document_loaders import DirectoryLoader, TextLoader

loader = DirectoryLoader(
    "docs/",
    glob="**/*.md",
    loader_cls=TextLoader
)

documents = loader.load()
print(f"Loaded {len(documents)} documents")

Loaded 58 documents


## Step 3: Split Into Chunks

Documents are too long to process at once. We split them into smaller chunks.

**Why chunking matters:**
- LLMs have context limits
- Smaller chunks = more precise retrieval
- But too small = losing context

In [9]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)

chunks = splitter.split_documents(documents)
print(f"Split into {len(chunks)} chunks")

Split into 679 chunks


In [10]:
# Let's look at one chunk
print("=== Sample Chunk ===")
print(chunks[0].page_content[:500])
print("\n=== Metadata ===")
print(chunks[0].metadata)

=== Sample Chunk ===
# storeModel

## OpenAPI Specification

=== Metadata ===
{'source': 'docs/52.md'}


## Step 4: Create Embeddings

**What are embeddings?**

Embeddings convert text into numbers (vectors) that capture meaning.

Similar texts have similar vectors. This lets us find relevant documents by comparing vectors.

In [11]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Test it
test_embedding = embeddings.embed_query("How do I authenticate?")
print(f"Embedding dimension: {len(test_embedding)}")
print(f"First 5 values: {test_embedding[:5]}")

Embedding dimension: 1536
First 5 values: [-0.023669837042689323, 0.011893004179000854, -0.011457362212240696, 0.013490354642271996, 0.015828296542167664]


## Step 5: Create Vector Store

A vector store holds all our chunk embeddings and lets us search by similarity.

We'll use ChromaDB (runs in memory, no setup needed).

In [12]:
from langchain_community.vectorstores import Chroma

vector_store = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings
)

print(f"Vector store created with {len(chunks)} chunks")

Vector store created with 679 chunks


## Step 6: Test Retrieval

Let's search for relevant documents.

In [13]:
query = "How do I get an authorization token?"

results = vector_store.similarity_search(query, k=3)

print(f"Query: {query}\n")
for i, doc in enumerate(results):
    print(f"=== Result {i+1} ===")
    print(doc.page_content[:300])
    print()

Query: How do I get an authorization token?

=== Result 1 ===
# Авторизация

> Для отправки авторизированных запросов необходимо получить токен и отправлять его в последующих запросах в заголовке:

`Authorization: Bearer {token}`

В поле expiry указана дата устаревания токена (24 часа с момента выпуска токена). После этого срока запросы к API будут возвращать 

=== Result 2 ===
example:
              application_id: rhmt_test
              secret: Pw18axeBFo8V7NamKHXX
      responses:
        '200':
          description: ''
          content:
            application/json:
              schema:
                type: object
                properties:
                  toke

=== Result 3 ===
запросить заново. Временная зона GMT+5
                required:
                  - token
                  - role
                  - expiry
                x-apidog-orders:
                  - token
                  - role
                  - expiry
              example:
                token: 

## Step 7: Build Simple RAG

Now let's combine retrieval with generation.

In [14]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_template("""
Answer the question based only on the following context.
If you can't find the answer, say "I don't know."

Context:
{context}

Question: {question}

Answer:
""")

def simple_rag(question):
    # Retrieve
    docs = vector_store.similarity_search(question, k=3)
    context = "\n\n".join([doc.page_content for doc in docs])

    # Generate
    chain = prompt | llm | StrOutputParser()
    answer = chain.invoke({"context": context, "question": question})

    return answer, docs

In [15]:
# Test it!
question = "How do I get an authorization token?"

answer, docs = simple_rag(question)

print(f"Question: {question}\n")
print(f"Answer: {answer}")

Question: How do I get an authorization token?

Answer: To get an authorization token, you need to send a request with the required fields, including your application ID and secret. Upon a successful request, you will receive a JWT token, which you must include in the header of subsequent requests as follows:

`Authorization: Bearer {token}`

Make sure to check the expiry field in the response, as the token will expire 24 hours after issuance. After that, you will need to request a new token.


## Step 8: Test More Questions

Let's see how well it performs on different types of questions.

In [16]:
test_questions = [
    "How do I create a payment?",
    "What error codes can the API return?",
    "How do I set up webhooks?",
    "What is the endpoint for checking payment status?",
    "How do I authenticate API requests?"
]

for q in test_questions:
    answer, _ = simple_rag(q)
    print(f"Q: {q}")
    print(f"A: {answer[:200]}...\n")

Q: How do I create a payment?
A: To create a payment, follow these steps:

1. The Partner's system creates an invoice and receives a link to the payment page in response.
2. The user makes the payment by adding a card or using paymen...

Q: What error codes can the API return?
A: The API can return the error code 400....

Q: How do I set up webhooks?
A: I don't know....

Q: What is the endpoint for checking payment status?
A: I don't know....

Q: How do I authenticate API requests?
A: To authenticate API requests, you need to obtain a token and send it in the header of subsequent requests as follows:

`Authorization: Bearer {token}`

The token has an expiry date, which is 24 hours ...



## Problems with Simple RAG

You probably noticed some issues:

### 1. Retrieval misses exact terms
Vector search is semantic — it finds similar *meanings*, not exact *words*.

If you search for `POST /api/v1/payment`, you might get docs about "creating payments" instead of the actual endpoint.

In [17]:
# Try an exact endpoint search
results = vector_store.similarity_search("POST /api/payment/create", k=3)

print("Searching for exact endpoint 'POST /api/payment/create':\n")
for i, doc in enumerate(results):
    print(f"Result {i+1}: {doc.page_content[:150]}...\n")

Searching for exact endpoint 'POST /api/payment/create':

Result 1: # Создание платежа на выплату c передачей номера карты

## OpenAPI Specification...

Result 2: # Создание платежа по токену карты

## OpenAPI Specification...

Result 3: # Создание платежа с передачей карточных данных

## OpenAPI Specification

```yaml
openapi: 3.0.1
info:
  title: ''
  description: ''
  version: 1.0.0...



### 2. No relevance verification
Even if documents aren't really relevant, we still generate an answer from them.

In [20]:
# Ask about something NOT in the docs
answer, docs = simple_rag("How do I integrate with Stripe?")

print(f"Question about something NOT in docs:\n")
print(f"Answer: {answer}")
print(f"\n(Notice: it might hallucinate or give wrong info)")

Question about something NOT in docs:

Answer: I don't know.

(Notice: it might hallucinate or give wrong info)


### 3. All retrieved docs are used equally
Some retrieved documents are more relevant than others, but we treat them all the same.

In [21]:
# Look at similarity scores
results_with_scores = vector_store.similarity_search_with_score("How do I authenticate?", k=5)

print("Similarity scores (lower = more similar):\n")
for doc, score in results_with_scores:
    print(f"Score: {score:.3f} | {doc.page_content[:60]}...")

Similarity scores (lower = more similar):

Score: 1.217 | - is_commitent
        - active
      required:
        - id...
Score: 1.260 | otp:
                  type: string
                  descri...
Score: 1.297 | # Авторизация

> Для отправки авторизированных запросов необ...
Score: 1.300 | headers: {}
          x-apidog-name: Проверка привязанной ка...
Score: 1.306 | x-run-in-apidog: https://app.apidog.com/web/project/1022226/...


## Benchmark Results

Research comparing 18 RAG techniques found:

| Technique | Accuracy |
|-----------|----------|
| **Simple RAG** | **0.30** |
| Semantic Chunking | 0.20 |
| HyDE | 0.50 |
| Reranker | 0.70 |
| Hybrid Search | 0.83 |
| Adaptive RAG | 0.86 |

Simple RAG only gets 30% right. We can do much better.

## Summary

**What we built:**
- Loaded documents
- Split into chunks
- Created embeddings
- Built a vector store
- Combined retrieval + generation

**Why it's not enough:**
- Vector search misses exact matches
- No relevance verification
- No reranking of results
- No hallucination prevention

**Next notebook:** We'll fix all of these problems and build a production-ready system.

In [22]:
print("Part 1 complete!")
print("Next: 02_production_rag.ipynb")

Part 1 complete!
Next: 02_production_rag.ipynb
