### Retrieval-augmented generation (RAG)

Retrieval-augmented generation (RAG) is an AI framework that synergizes the capabilities of LLMs and information retrieval systems. It's useful to answer questions or generate content leveraging external knowledge. There are two main steps in RAG: 
1) **retrieval**: retrieve relevant information from a knowledge base **with text embeddings stored in a vector store**; 
2) **generation**: insert the **relevant information to the prompt** for the LLM to generate information. 

### Documentation

- Mistral API: https://docs.mistral.ai/api/
- Langchain: https://python.langchain.com/docs/introduction/

In [1]:
import os
from dotenv import load_dotenv
from mistralai import Mistral
import warnings
warnings.filterwarnings("ignore")
from langchain.prompts.prompt import PromptTemplate


from langchain_community.document_loaders import TextLoader
from langchain_mistralai.chat_models import ChatMistralAI
from langchain_mistralai.embeddings import MistralAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain


In [2]:
# Load environment variables
load_dotenv()

True

In [3]:
langchain_api_key = os.getenv("LANGCHAIN_API_KEY")
mistral_api_key = os.getenv("MISTRAL_API_KEY")

In [24]:
client = Mistral(api_key=mistral_api_key)

In [4]:
loader = TextLoader("../data/rag/towns.txt", encoding="utf-8")
# loader = TextLoader("../data/rag/Growth_and_decline_in_rural_Spain.pdf")

In [5]:
docs = loader.load()

In [6]:
docs

[Document(metadata={'source': '../data/rag/towns.txt'}, page_content='Galapagar is a town of 20,000 habitants. It is in Madrid. It has a warm climate in summer and cold in winter. It has several schools in the surroundings and two hospitals nearby. Its job situation is good enough. It has excellent connections, and the cost of living is 140 EUR a day. It is not close to the beach.\n\nSantander is a town of 170,000 habitants. It is in Cantabria. It has mild summers and cool winters. It has several schools in the surroundings and three hospitals nearby. Its job situation is fair. It has excellent connections, and the cost of living is 120 EUR a day. It is close to the beach.\n\nSegovia is a town of 55,000 habitants. It is in Castilla y León. It has a continental climate with cold winters and warm summers. It has several schools in the surroundings and two hospitals nearby. Its job situation is moderate. It has good connections, and the cost of living is 110 EUR a day. It is not close to 

In [7]:
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)

In [8]:
documents

[Document(metadata={'source': '../data/rag/towns.txt'}, page_content='Galapagar is a town of 20,000 habitants. It is in Madrid. It has a warm climate in summer and cold in winter. It has several schools in the surroundings and two hospitals nearby. Its job situation is good enough. It has excellent connections, and the cost of living is 140 EUR a day. It is not close to the beach.\n\nSantander is a town of 170,000 habitants. It is in Cantabria. It has mild summers and cool winters. It has several schools in the surroundings and three hospitals nearby. Its job situation is fair. It has excellent connections, and the cost of living is 120 EUR a day. It is close to the beach.\n\nSegovia is a town of 55,000 habitants. It is in Castilla y León. It has a continental climate with cold winters and warm summers. It has several schools in the surroundings and two hospitals nearby. Its job situation is moderate. It has good connections, and the cost of living is 110 EUR a day. It is not close to 

In [9]:
# Define the embedding model
embeddings = MistralAIEmbeddings(model="mistral-embed", api_key= os.getenv("MISTRAL_API_KEY"))

In [10]:
embeddings

MistralAIEmbeddings(client=<httpx.Client object at 0x00000244FDF6DA10>, async_client=<httpx.AsyncClient object at 0x00000244EFA8A690>, mistral_api_key=SecretStr('**********'), endpoint='https://api.mistral.ai/v1/', max_retries=5, timeout=120, wait_time=30, max_concurrent_requests=64, tokenizer=<langchain_mistralai.embeddings.DummyTokenizer object at 0x00000244EFC8FA90>, model='mistral-embed')

In [11]:
# Create the vector store 
vector = FAISS.from_documents(documents, embeddings)

In [12]:
# Define a retriever interface
retriever = vector.as_retriever()
# Define LLM
pueblos_model = ChatMistralAI(api_key= os.getenv("MISTRAL_API_KEY"))

In [13]:
# Define prompt template
prompt = ChatPromptTemplate.from_template("""Answer the following question following based only on the provided context:

<context>
{context}
</context>

Question: {input}
                                          
Answer should provide a selection of the 3 most likely cities for the user, and a description of each of them, then ask if they have any preference of transports or schools""")


In [14]:
# Create a retrieval chain to answer questions
document_chain = create_stuff_documents_chain(pueblos_model, prompt)
retrieval_chain = create_retrieval_chain(retriever, document_chain)

In [15]:
response = retrieval_chain.invoke({"input": "I would like to live in a sunny city with more than 5000 habitants, anywhere in spain"})

In [16]:
print(response["answer"])

Based on your preference to live in a sunny city with a population of more than 50,000 inhabitants, I have selected the following three cities in Spain:

1. Torrevieja, Alicante: With a population of 82,000 inhabitants, Torrevieja has a warm Mediterranean climate, with hot summers and mild winters. It is close to the beach and has several schools in the surroundings and one hospital nearby. Its job situation is seasonal. The cost of living in Torrevieja is 135 EUR a day, and it has good connections.

2. Gandía, Valencia: With a population of 75,000 inhabitants, Gandía has a Mediterranean climate, with warm summers and mild winters. It is close to the beach and has several schools in the surroundings and two hospitals nearby. Its job situation is seasonal. The cost of living in Gandía is 135 EUR a day, and it has good connections.

3. Manresa, Barcelona: With a population of 78,000 inhabitants, Manresa has warm summers and cool winters. It is not close to the beach, but it has several s

### Work on conversation


| **Role**     | **Purpose**                                                                |
|--------------|----------------------------------------------------------------------------|
| `system`     | Sets the behavior, tone, and personality of the AI.                        |
| `user`       | Represents the user's input or questions to the AI.                        |
| `assistant`  | The AI's response to the user based on previous context and system prompt.  |


https://docs.mistral.ai/capabilities/completion/#tag/batch/operation/jobs_api_routes_batch_cancel_batch_job


In [17]:
# Initialize the message history
messages = [
    {"role": "system", "content": "You are a helpful assistant, original from a small town in Spain and dedicated to help out people find a new location to live, while fighting depopulation. You should always keep a friendly and conscious mindset, and try to understand better user needs and preferences to provide the best solutions"}
]

In [38]:
# Function chat with pueblos assistant
def chat_with_pueblos_assistant(user_input):
    # Add user message to conversation history
    messages.append({"role": "user", "content": user_input})

    # Send the conversation history to Mistral
    response = client.chat.parse(
        model=pueblos_model,  # available options also: mistral-small, mistral-tiny
        messages=messages,
        response_format= {"type": "json_object"},
        temperature = 0
    )
    # Extract AI's reply
    pueblos_reply = response.choices[0].message.content
    
    # Append AI reply to conversation history
    messages.append({"role": "assistant", "content": pueblos_reply})
    return pueblos_reply


In [None]:
user_input = "Hi, who are you?"
print("You:", user_input)
print("Pueblos asistant:", chat_with_pueblos_assistant(user_input))

You: Hi, who are you?


AttributeError: 'dict' object has no attribute 'model_json_schema'

In [None]:
user_input = "Can you explain quantum mechanics?"
print("You:", user_input)
print("Mistral AI:", chat_with_pueblos_assistant(user_input))

In [None]:

# ✅ Example conversation
while True:
    user_input = input("You: ")
    if user_input.lower() in ["exit", "quit"]:
        break
    
    ai_response = chat_with_mistral(user_input)
    print("Mistral AI:", ai_response)
