### Retrieval-augmented generation (RAG)

Retrieval-augmented generation (RAG) is an AI framework that synergizes the capabilities of LLMs and information retrieval systems. It's useful to answer questions or generate content leveraging external knowledge. There are two main steps in RAG: 
1) **retrieval**: retrieve relevant information from a knowledge base **with text embeddings stored in a vector store**; 
2) **generation**: insert the **relevant information to the prompt** for the LLM to generate information. 

### Documentation

- Mistral API: https://docs.mistral.ai/api/
- Langchain: https://python.langchain.com/docs/introduction/

In [1]:
import os
from dotenv import load_dotenv
from mistralai import Mistral
import warnings
warnings.filterwarnings("ignore")
from langchain.prompts.prompt import PromptTemplate


from langchain_community.document_loaders import TextLoader
from langchain_mistralai.chat_models import ChatMistralAI
from langchain_mistralai.embeddings import MistralAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain


In [2]:
# Load environment variables
load_dotenv()

True

In [3]:
langchain_api_key = os.getenv("LANGCHAIN_API_KEY")
mistral_api_key = os.getenv("MISTRAL_API_KEY")

In [41]:
loader = TextLoader("../data/rag/towns.txt", encoding="utf-8")
# loader = TextLoader("../data/rag/Growth_and_decline_in_rural_Spain.pdf")

In [13]:
docs = loader.load()

In [14]:
docs

[Document(metadata={'source': '../towns.txt'}, page_content='Galapagar is a town of 20,000 habitants. It is in Madrid. It has a warm climate in summer and cold in winter. It has several schools in the surroundings and two hospitals nearby. Its job situation is good enough. It has excellent connections, and the cost of living is 140 EUR a day. It is not close to the beach.\n\nSantander is a town of 170,000 habitants. It is in Cantabria. It has mild summers and cool winters. It has several schools in the surroundings and three hospitals nearby. Its job situation is fair. It has excellent connections, and the cost of living is 120 EUR a day. It is close to the beach.\n\nSegovia is a town of 55,000 habitants. It is in Castilla y León. It has a continental climate with cold winters and warm summers. It has several schools in the surroundings and two hospitals nearby. Its job situation is moderate. It has good connections, and the cost of living is 110 EUR a day. It is not close to the beach

In [15]:
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)

In [20]:
documents

[Document(metadata={'source': '../towns.txt'}, page_content='Galapagar is a town of 20,000 habitants. It is in Madrid. It has a warm climate in summer and cold in winter. It has several schools in the surroundings and two hospitals nearby. Its job situation is good enough. It has excellent connections, and the cost of living is 140 EUR a day. It is not close to the beach.\n\nSantander is a town of 170,000 habitants. It is in Cantabria. It has mild summers and cool winters. It has several schools in the surroundings and three hospitals nearby. Its job situation is fair. It has excellent connections, and the cost of living is 120 EUR a day. It is close to the beach.\n\nSegovia is a town of 55,000 habitants. It is in Castilla y León. It has a continental climate with cold winters and warm summers. It has several schools in the surroundings and two hospitals nearby. Its job situation is moderate. It has good connections, and the cost of living is 110 EUR a day. It is not close to the beach

In [21]:
# Define the embedding model
embeddings = MistralAIEmbeddings(model="mistral-embed", api_key= os.getenv("MISTRAL_API_KEY"))

In [22]:
embeddings

MistralAIEmbeddings(client=<httpx.Client object at 0x0000026F1572A890>, async_client=<httpx.AsyncClient object at 0x0000026F06D2E950>, mistral_api_key=SecretStr('**********'), endpoint='https://api.mistral.ai/v1/', max_retries=5, timeout=120, wait_time=30, max_concurrent_requests=64, tokenizer=<langchain_mistralai.embeddings.DummyTokenizer object at 0x0000026F07588410>, model='mistral-embed')

In [24]:
# Create the vector store 
vector = FAISS.from_documents(documents, embeddings)

In [30]:
# Define a retriever interface
retriever = vector.as_retriever()
# Define LLM
model = ChatMistralAI(api_key= os.getenv("MISTRAL_API_KEY"))

In [33]:
# Define prompt template
prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")


In [34]:
# Create a retrieval chain to answer questions
document_chain = create_stuff_documents_chain(model, prompt)
retrieval_chain = create_retrieval_chain(retriever, document_chain)

In [37]:
response = retrieval_chain.invoke({"input": "I would like to live in a sunny city with more than 5000 habitants, anywhere in spain"})

In [39]:
print(response["answer"])

Based on the context provided, there are several sunny cities in Spain with more than 5000 habitants that you might consider:

* Galapagar (20,000 habitants) in Madrid has a warm climate in the summer and a cold climate in the winter.
* Santander (170,000 habitants) in Cantabria has mild summers and cool winters.
* Segovia (55,000 habitants) in Castilla y León has a continental climate with cold winters and warm summers.
* Salou (27,000 habitants) in Tarragona has a Mediterranean climate with hot summers and mild winters.
* Toledo (85,000 habitants) in Castilla-La Mancha has hot summers and cold winters.
* Ávila (58,000 habitants) in Castilla y León has very cold winters and warm summers.
* Torrevieja (82,000 habitants) in Alicante has a warm Mediterranean climate.
* Gandía (75,000 habitants) in Valencia has a Mediterranean climate with warm summers and mild winters.
* Castellón de la Plana (171,000 habitants) in Valencia has a Mediterranean climate with warm summers and mild winters.
