# RAG Tutorial - From Basic do Advanced

The goal of this tutorial is to explore a set of techniques for giving LLMs context over private, recent or specific data in order to avoid LLMs hallucinations or them not being able to give a proper answer.

This set of techniques are commonly known as RAG, that stands for Retrieval Augmented Generation.
<br>
<br>

_**Warning**_ <br>
Before diving into the world of RAG, we strongly recommend you, mainly if you are not very familiarized with the field, to read the glossary below. It will give you a basic understanding of fundamental topics regarding Generative AI, Machine Learning and LLMs, crutial in order to better understand RAG.

## Glossary

#### Machine Learning

#### Generative AI

#### Large Language Models (LLMs)

#### Query

#### Hallucinations

#### Embeddings

#### Vector Database

#### Semantic Search

#### System Message

#### Chunking

## Tech Stack

The following libraries and technologies will be used in the development of this tutorial and the applications of RAG within it.

* Langchain/Llama Index
* Pinecone/MongoDB - Vector Database
* LLMs APIs (OpenAI, Anthropic, Claude)...

## What is RAG and how it works?

RAG is a technique for giving LLMs context over private, recent or specific data that the large language model had no previous access to. Its goal is to avoid LLMs hallucinations or them not being able to give a proper answer.

The process of creating an infrastructure for implementing RAG consists on 4 steps
* Getting the data (specific context you want the LLM to know)
* Chunking the data (dividing it into small pieces - chunks)
* Embedding the chunks (transforming the chunks into 'lots of numbers' - vectors of dimension _n_)
* Storing the embeddings in a vector database

When this infrastructure is built, the prompting workflow works as below:

(IMAGE SHOWING BASIC RAG WORKFLOW)

As shown, the process has 3 main stages:
* Embedding
* Retrieval
* Generation

That workflow is showing the most simple process a RAG application undertakes, in which the user prompt (query) is embedded and then a similairty search is conducted in order to find the chunks that are more related to the prompt made. After that, the most related chunks (top-k chunks) are added to the user prompt and fed into the LLM. The LLM now have (hopefully) not just the user prompt but the necessary context to answer it properly. 

# Example Application - Doing xyzxyzxyzxyz
Now we will conduct an application of the RAG technique, starting with a simple RAG application and exploring more advanced approaches later on. The goal of this example is to explore the different variables of RAG infrastructure and how they might affect the quality of the answer provided by the LLM.

Different (i) embedding models, (ii) vector databases, (iii) similarity searches and (iv) retrieval techniques will be explored.

At the end of the day, our main goal when applying RAG it to increase the assertiviness and quality of the document that is retrieved as context, in order to give the LLM enough information to provide a proper answer.


obs pra mim: Testar pelo menos duas formas de embeddar, duas bases de dados (pensar se esse aq é válido), duas formas de calcular similaridade, duas formas de retrieve contexto.

## Application Context

## Loading Documents

In [None]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

## Chunking Documents

## Embedding Chunks

## Creating Vector Database

## Query Augmentation (Adding most similar Context to User Prompt)

## Evaluating RAG

# Example application 2 - (Exploring Different Advanced Technique)