# Welcome

Authors:
- Jonathan Guerne, research assistant at Haute Ecole Arc Ingénierie, Switzerland
- Henrique Marques Reis, research assistant at Haute Ecole Arc Ingénierie, Switzerland
- Célien Donzé, Scientific Collaborator at HEIA-FR, Switzerland

## Introduction
This workshop explains how to create a RAG (Retrieval Augmented Generation) system to answer questions about PDF documents. We will use a self-hosted LLM with Ollama to generate answers to the questions. Additionally, we will use a vector database to retrieve relevant documents for answering the questions.

Topics covered in this exercise:
- LLM
- Ollama
- Vector database (FAISS)
- RAG
- LangChain

## Package installation

To work on Google Colab.

In [2]:
# !pip install langchain==0.2.7 langchain-community faiss-cpu pymupdf pypdf sentence_transformers rich wget python-dotenv cryptography

## Imports

In [3]:
import os
from pathlib import Path

import langchain
import wget
from dotenv import load_dotenv
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.embeddings import HuggingFaceBgeEmbeddings
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain_community.llms.ollama import Ollama
from langchain_core.documents.base import Document
from rich.console import Console
from rich.markdown import Markdown

console = Console()

# Downloading the pdfs

In [61]:
# Create the "data/PDFs" folder if it doesn't exist
PDF_FOLDER = Path("data/PDFs")
os.makedirs(PDF_FOLDER, exist_ok=True)

urls = [
    "https://ai-days.swiss-ai-center.ch/PDF/Session1.pdf",
    "https://ai-days.swiss-ai-center.ch/PDF/Session2a.pdf",
    "https://ai-days.swiss-ai-center.ch/PDF/Session2b.pdf",
    "https://ai-days.swiss-ai-center.ch/PDF/Session3a.pdf",
    "https://ai-days.swiss-ai-center.ch/PDF/Session3b.pdf",
]

# Download the PDFs
for url in urls:
    name = url.split("/")[-1]
    if not (PDF_FOLDER / name).is_file():
        filename = wget.download(url, f"data/PDFs/{name}")
console.print("Pdf file downloaded successfully.", style="bold green")

## Documentation

- [langchain](https://python.langchain.com/v0.1/docs/get_started/introduction/)
- [Ollama website](https://ollama.com/)

## Constants

In [62]:
load_dotenv(override=True)

OLLAMA_ADDRESS = os.getenv("OLLAMA_ADDRESS")
LLM_NAME = "qwen2.5:0.5b"

# start step 1

## Connecting to LLM

In [63]:
llm = Ollama(
    model=LLM_NAME,
    base_url=OLLAMA_ADDRESS,
    temperature=0.1,  # Will be explained later
    stop=["<end_of_turn>"],
)

llm.generate(["Hello, how are you today?"]).generations[0][0].text

"Hello! I'm Qwen, an artificial intelligence language model created by Alibaba Cloud. How can I assist you today?"

## Creating a prompt

A prompt is generally divided into two parts: the context and the question.

The context provides the information that the model will use to generate its answer, while the question specifies what the model is expected to respond to.

Additionally, LangChain requires markers indicating where to insert the user's question and the context retrieved from documents.

[Langchain prompt templates documentation](https://python.langchain.com/v0.2/docs/concepts/#prompt-templates)

In [64]:
template = """
You are an helpful assistant that answer the question in detail.

Human input: {question}
Assistant:"""

prompt = PromptTemplate(input_variables=["question"], template=template)
prompt

PromptTemplate(input_variables=['question'], template='\nYou are an helpful assistant that answer the question in detail.\n\nHuman input: {question}\nAssistant:')

## Creating the chain and start a conversation

In [65]:
llm_chain = prompt | llm

In [83]:
result = llm_chain.invoke(input="When is the AI-days 2025?")

console.print(Markdown(result))

# end step 1

# start step 2

## Loading a PDF

In [67]:
VECTORSTORES_DIR = Path("data/vectorstores")
PDF_FOLDER

WindowsPath('data/PDFs')

In [68]:
loader = PyPDFDirectoryLoader(PDF_FOLDER)
doc = loader.load()
doc

Ignoring wrong pointing object 6 0 (offset 0)
Ignoring wrong pointing object 8 0 (offset 0)
Ignoring wrong pointing object 10 0 (offset 0)
Ignoring wrong pointing object 13 0 (offset 0)
Ignoring wrong pointing object 15 0 (offset 0)
Ignoring wrong pointing object 17 0 (offset 0)
Ignoring wrong pointing object 20 0 (offset 0)


Ignoring wrong pointing object 22 0 (offset 0)
Ignoring wrong pointing object 28 0 (offset 0)
Ignoring wrong pointing object 30 0 (offset 0)
Ignoring wrong pointing object 32 0 (offset 0)
Ignoring wrong pointing object 6 0 (offset 0)
Ignoring wrong pointing object 8 0 (offset 0)
Ignoring wrong pointing object 10 0 (offset 0)
Ignoring wrong pointing object 13 0 (offset 0)
Ignoring wrong pointing object 15 0 (offset 0)
Ignoring wrong pointing object 17 0 (offset 0)
Ignoring wrong pointing object 19 0 (offset 0)
Ignoring wrong pointing object 21 0 (offset 0)
Ignoring wrong pointing object 23 0 (offset 0)
Ignoring wrong pointing object 25 0 (offset 0)
Ignoring wrong pointing object 31 0 (offset 0)
Ignoring wrong pointing object 6 0 (offset 0)
Ignoring wrong pointing object 8 0 (offset 0)
Ignoring wrong pointing object 10 0 (offset 0)
Ignoring wrong pointing object 13 0 (offset 0)
Ignoring wrong pointing object 15 0 (offset 0)
Ignoring wrong pointing object 17 0 (offset 0)
Ignoring wrong po

[Document(metadata={'source': 'data\\PDFs\\Session1.pdf', 'page': 0}, page_content=' \n  \nAI-DAYS@HES-SO 2025 –GENEVA & LAUSANNE –JANUARY 27-JANUARY 29, 2025\nWORKSHOP DAY JANUARY 27\nSession 1: Compute Infrastructures for IA applications in the wild Location: Movie theater 6 With the advent of Chatbots, LLMs and other generative IA technologies, as well as other progresses in the IA ﬁeld, there is an explosion of the demand for compute force. IA is no longer computer science: it is computational science. As such, it can no longer be done with casual, self-managed equipment. More advanced compute infrastructures are required both to satisfy user needs (in terms of compute power, GPU Ram capacity) and to ensure a decent utilization of the increasingly costly resources. Content and topics The purpose of this workshop is to gather people in charge of compute infrastructure (on-prem, cloud or hybrid) destined to support AI workloads (both training and inference). Being “in charge” means s

## Embedding a PDF in a vectorstore

In [69]:
CHUNK_SIZE = 500
CHUNK_OVERLAP = 100
EMBEDDING_MODEL_NAME = "BAAI/bge-large-en-v1.5"

<div>
<img src="chunk_overlap_size_scheme.png" width="800"/>
</div>

In [70]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP
)

embedding_model = HuggingFaceBgeEmbeddings(
    model_name=EMBEDDING_MODEL_NAME,
    model_kwargs={"device": "cpu"},
    encode_kwargs={"normalize_embeddings": True},
)

In [71]:
all_splits = text_splitter.split_documents(doc)
vectorstore = FAISS.from_documents(documents=all_splits, embedding=embedding_model)

In [72]:
vectorstore.save_local(VECTORSTORES_DIR)

# end step 2

# start step 3

## Loading a vectorstore

In [73]:
vectorstore = FAISS.load_local(
    VECTORSTORES_DIR, embedding_model, allow_dangerous_deserialization=True
)

## What is temperature?

The temperature parameter in a language model (LLM) controls the randomness of the model's output.

A lower temperature value (closer to 0) makes the model more deterministic, favoring higher probability words and resulting in more predictable and repetitive text.

A higher temperature value (closer to 1) increases randomness, allowing for more creative and diverse responses by giving less probable words a better chance of being chosen.

Adjusting the temperature helps balance between coherence and creativity in the generated text.

## New prompt

In RAG we need to add another marker to indicate where the new information (or context) should be inserted for this we use the variable named `{context}`.

In [74]:
prompt = """
Use the following pieces of context to answer the question at the end.
Don't try to make up an answer and only use the information you know.
Use three sentences maximum and keep the answer as concise as possible.
You must answer in english.
Context:
{context}

Question:
{input}

Answer:
"""

prompt_template = PromptTemplate(input_variables=["context", "input"], template=prompt)
prompt_template

PromptTemplate(input_variables=['context', 'input'], template="\nUse the following pieces of context to answer the question at the end.\nDon't try to make up an answer and only use the information you know.\nUse three sentences maximum and keep the answer as concise as possible.\nYou must answer in english.\nContext:\n{context}\n\nQuestion:\n{input}\n\nAnswer:\n")

## Creating the chain

In [75]:
# Top k of chunks to retrieve from the vectorstore
NB_RETRIVED_CHUNKS = 8

In [76]:
question_answer_chain = create_stuff_documents_chain(llm=llm, prompt=prompt_template)
retriever = vectorstore.as_retriever(
    search_kwargs={
        "k": NB_RETRIVED_CHUNKS,
    }
)
chain = create_retrieval_chain(retriever, question_answer_chain)

## Chatting with a pdf

In [85]:
result = chain.invoke(input={"input": "When is the AI-days 2025?"})

console.print(Markdown(result["answer"]))

## Embellishing the output

In [90]:
console.print(result)