# Toy Experiment: Build a RAG Application

This notebook is inspired by Langchain's [RAG tutorial](https://python.langchain.com/docs/get_started/quickstart). Key differences:
- Instead of LangChain's WebLoader, we will use trafilatura.
- Instead of LangChain's Text splitter, we will use semantic-text-splitter.
- Instead of LangChain's LangGraph, we will implement the same application logic (in Retrieval and Generation part) through invocations of the individual components

The reason of these changes is to reduce dependency from LangChain and better understand how the process works.

Note: This notebook focuses on the retrieval of a single web page. In the next notebook, we will extend it to multiple web pages and offline files.

#### Installation

In [None]:
%pip install --quiet --upgrade langchain-text-splitters langchain-community langgraph
%pip install openai
%pip install -qU "langchain[openai]"
%pip install -qU langchain-openai
%pip install -qU langchain-core

In [None]:
%pip install trafilatura
%pip install semantic-text-splitter

### Indexing

#### Loading

In [1]:
import trafilatura
from semantic_text_splitter import TextSplitter

url = "https://www.amsterdamsebos.nl/english/eat/"
downloaded = trafilatura.fetch_url(url)
text = trafilatura.extract(downloaded)

print(f"Total characters: {len(text)}")
print(text[:500])

Total characters: 4004
Het Bosch restaurant
Lounge in style while taking in the spectacular view of the Nieuwe Meer lake. French-Mediterranean cuisine offering lunch and dinner options. Regularly features live music. Open from Monday to Saturday. Lunch is served from 12.00 am to 3:00 pm. Dinner from 6.00 pm to 10.00 pm (on Saturday from 7.00 pm to 10.00 pm). Price indication: approx. € 50 for a four course meal. For reservations call +31 (0) 20 644 58 00.
Restaurant De Bosbaan
Escape the city and enjoy the view of the


#### Splitting

In [2]:
max_characters = 1000
splitter = TextSplitter(max_characters, trim=False)
chunks = splitter.chunks(text)

print(f"Split blog post into {len(chunks)} sub-documents.")

Split blog post into 5 sub-documents.


#### Storing

In [19]:
import os
from langchain_openai import OpenAIEmbeddings
import getpass

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

In [None]:
from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embeddings)
document_ids = vector_store.add_texts(chunks)

In [23]:
print(f"Added {len(document_ids)} documents to the vector store.")
print(document_ids)

Added 5 documents to the vector store.
['5be065d4-17a6-4594-be7b-e2c89ba26dda', '4af8e730-817e-4da3-a30d-1a6be592464c', '8243f805-3753-4fdb-bcc2-b683db00aad2', 'd132baa3-6097-4185-aba3-aa7f02569559', '3157ab94-d971-4cbc-b75b-2acd3202a6ec']


### Retrieval and Generation

In [28]:
from langchain.chat_models import init_chat_model

llm = init_chat_model("gpt-4o-mini", model_provider="openai")

question = "Where can I find a restaurant in Amsterdamse Bos that serves pancakes?"

retrieved_docs = vector_store.similarity_search(question)
print(f"Retrieved {len(retrieved_docs)} documents.")
docs_content = "\n\n".join(doc.page_content for doc in retrieved_docs)

Retrieved 4 documents.


In [25]:
from langchain_core.prompts import PromptTemplate

template = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.

Question: {question}

Context: {context}

Answer:"""

prompt = PromptTemplate.from_template(template)

# Manually render the prompt (no abstraction)
rendered = prompt.format(question=question, context=docs_content)

In [27]:
# Pass the rendered string directly to the model
# (Chat models accept strings; LangChain wraps it as a HumanMessage internally)
answer = llm.invoke(rendered).content
print(f"Answer: {answer}")
print(f"Answer length: {len(answer)} characters")

Answer: You can find traditional Dutch pancakes at Paviljoen Aquarius in Amsterdamse Bos. They are open from Wednesday to Sunday, with varied hours depending on the season. Prices start at €6.50 for a bacon pancake with syrup.
Answer length: 218 characters


## Limitations

This notebook has the following limitations:
- It does not support offline files or multiple web pages.
- It does not support multiple questions.