In [None]:
%pip install --upgrade vllm -q

In [None]:
%pip install langchain langchain_community pypdf sentence-transformers chromadb -q

# RAG: Retrieval Augmented Generation

<a target="_blank" href="https://colab.research.google.com/github/juanhuguet/intro_to_llms/blob/main/intro_to_llms/03_local_rag_qa.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

![](https://python.langchain.com/assets/images/rag_indexing-8160f90a90a33253d0154659cf7d453f.png)

![](https://python.langchain.com/assets/images/rag_retrieval_generation-1046a4668d6bb08786ef73c56d4f228a.png)

# Importamos las funcionalidades de langchain

In [1]:
from langchain_core.chains import RetrievalQA
from langchain.document_loaders import PyPDFLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.llms.vllm import VLLM
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.prompts import PromptTemplate

Seteamos las variables de entorno con la api key de openAI

## Preprocesamos el pdf

PyPDF loader se encarga automáticamente de leer y extraer el texto.

El splitter, crea `chunks` a partir del texto de aprox. 1000 caracteres y un overlap de 200 para no perder info.

In [2]:
loader = PyPDFLoader("data/insurance_policy_example.pdf")

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

chunks = loader.load_and_split(text_splitter=text_splitter)

### Cargamos el documento en la base de datos vectorial.

Usamos chroma, que esta en memoría y no requiere setup, y los embeddings a partir de OpenAI.

In [3]:
embedder = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [15]:
# populamos la base de datos con los chunks, y los embedings
docsearch = Chroma.from_documents(chunks, embedder)

### Creamos el `retriever`.

Esta pieza permite dada una query, calcular su embedding y recuperar los  k chunks más relevantes para formar el contexto

In [16]:
retriever=docsearch.as_retriever(search_type="mmr", fetch_k=10, k=2, return_source_documents=True)

### Creamos el reader

Una vez recuperamos el contexto, lo pasaremos a un llm. Usaremos gpt-3.5-turbo, que tiene un buen ratio performance/coste

In [6]:
reader = VLLM(
    model="TheBloke/Mistral-7B-Instruct-v0.2-AWQ",
    trust_remote_code=True,  # mandatory for hf models
    max_new_tokens=1000,
    top_p=0.95,
    stop=["\n\n"],
    temperature=0.3,
    vllm_kwargs={"quantization": "awq",
                 "max_model_len": 10000},
)

INFO 05-06 15:44:34 llm_engine.py:100] Initializing an LLM engine (v0.4.2) with config: model='TheBloke/Mistral-7B-Instruct-v0.2-AWQ', speculative_config=None, tokenizer='TheBloke/Mistral-7B-Instruct-v0.2-AWQ', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=10000, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=awq, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=TheBloke/Mistral-7B-Instruct-v0.2-AWQ)
INFO 05-06 15:44:34 utils.py:660] Found nccl from library /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1
INFO 05-06 15:44:35 selector.py:69] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
INFO 05-06 15:44:35 selector.py:32] Using XFormers backend.
INFO 05-06 15:44:35 weigh

## Creamos el prompt

Para poder controlar el comportamiento del modelo, creamos un prompt para dar instrucciones

In [7]:
prompt_template = \
"""Use the following pieces of context to answer the question at the end.
If you don't know the answer, don't try to make up an answer and answer `not-in-text`
Answer in english only.

Context:

{context}

Question:

{question}

Answer:"""

PROMPT = PromptTemplate(template=prompt_template, input_variables=["context", "question"])

## Creamos la cadena de QA

Usamos el wrapper de langchain que se ocupa de orquestar el flujo de dato:

`query` -> `embedding` -> `retrieve context` -> `prompt completion` -> answer

In [17]:
qa = RetrievalQA.from_chain_type(llm=reader,
                                 chain_type="stuff",
                                 retriever=retriever,
                                 chain_type_kwargs={"prompt": PROMPT},
                                 return_source_documents=True

                                )

## Prueba del sistema

In [22]:
results = qa("What are the issues that are not covered by the insurance ?")

Processed prompts: 100%|██████████| 1/1 [00:08<00:00,  8.86s/it]


In [23]:
print(results["result"])



The insurance does not cover:

1. Bodily injury or medical expenses that are not reasonable or necessary or already covered under Section II.
2. Fines, penalties, restitution orders, or attorney’s fees or costs.
3. Punitive damages.
4. Damages for any insured other than as specified in the coverage (i.e., damages for care, loss of consortium, loss of services, and negligent entrustment, and includes imputed negligence, agency, conspiracy, the family purpose doctrine, joint enterprise or venture, employment relationship, partnership, concert of action, or negligent hiring or supervision or retention, sustained by the injured person and any other person).
5. Multiple policy periods for the same accident or injury, the limits of liability cannot be added, combined, or stacked to increase the coverage.

Therefore, the insurance does not cover various issues such as fines, penalties, restitution orders, attorney’s fees or costs, punitive damages, and damages for any insured other than as 

In [26]:
for d in results["source_documents"]:
  print(d)

page_content="SAMPLE DOCUMENT\n51 to which this coverage applies. We will pay only for \nservices actually incurred and reported to us within 3 \nyears from the accident dat e. This coverage does not \napply to you or regular residents of your  household \nexcept residence employees . \n \nThis coverage applies to:  \n1. persons on an insured location  with the \npermission of any insured , and \n 2. persons off an insured location  if the bodily \ninjury : \n a. arises out of a condition on an insured \nlocation  or the ways immediately \nadjoining; \n b. is caused by the activities of any insured ; \n c. is caused by a residence employee  in the \ncourse of that residence employee's  \nemployment by any insured ; or \n d. is caused by an animal owned by or in the \ncare or custody of any insured  except any \nanimal that is excluded under 1.n. of \nWHAT LOSSES ARE NOT COVERED –  EXCLUSIONS - SECTION II. \n \nWHAT LOSSES ARE NOT COVERED – \nEXCLUSIONS – SECTION II \n \n1. Under SECTIO