# 🌍 **Puerto Rico Travel Planning Chatbot** with Chroma, RAG, Weather Integration, and Personalized Recommendations

### **Ironhack Data Science and Machine Learning Bootcamp**  
📅 **Date:** February 12, 2025  
📁 **Notebook:** `puerto_rico_travel_chatbot.ipynb`  
👩‍💻 **Authors:** Ginosca Alejandro Dávila & Natanael Santiago Morales  

---

## **📌 Project Overview**  
This notebook is part of **The Hitchhiker’s Guide to Puerto Rico**, an interactive **travel planning chatbot** designed to suggest **landmarks**, **municipalities**, **weather forecasts**, and relevant **news articles** based on users' preferences and queries. The chatbot will assist in planning trips by integrating the following components:

1. **Chroma Vector Store**: We will store structured information from three datasets—**landmarks**, **municipalities**, and **news articles**—into a **Chroma vector store** for efficient querying and retrieval.
   
2. **Retrieval-Augmented Generation (RAG)**: We will integrate the Chroma vector store with a **Generative Language Model** (e.g., GPT or T5) to generate accurate chatbot responses by retrieving relevant documents based on user queries and generating contextual information.

3. **Weather Forecast Integration**: The chatbot will fetch weather forecasts from the **OpenWeather API** and provide travel recommendations based on the forecasted conditions (e.g., sunny, rainy, hot).

4. **Personalized Travel Recommendations**: The chatbot will evaluate user preferences (such as preferred weather conditions) and recommend locations accordingly. It will also provide tailored responses based on the user's queries regarding historical events, landmarks, and municipalities in Puerto Rico.

By combining all these approaches, we aim to create a **context-aware, personalized travel planning chatbot** that helps users plan their trips efficiently.

---

## **🛠️ What We Are Doing in This Notebook**  
✔ **Step 1:** Load the merged data from `news_landmarks_municipalities_merged.pkl`.  
✔ **Step 2:** Convert the merged data into **Chroma's Document objects** for indexing.  
✔ **Step 3:** Store the documents in **Chroma's vector store**.  
✔ **Step 4:** Perform **retrieval-based querying** from the vector store using user input.  
✔ **Step 5:** Use **Retrieval-Augmented Generation (RAG)** to generate accurate and context-aware responses based on the retrieved documents.  
✔ **Step 6:** Fetch weather data using the **OpenWeather API** and integrate weather-based recommendations into the chatbot's responses.  
✔ **Step 7:** Create a chatbot flow to interact with the user, incorporating preferences and providing personalized recommendations.

---

## **🛠️ How the Merged Data Will Be Used**  
The merged dataset contains:
- **Landmarks**:
  - **File Name**
  - **Landmark Name**
  - **Coordinates** (latitude and longitude)
  - **Municipality**
  - **Wikipedia URL**
  - **Brief Description**
  
- **Municipalities**:
  - **File Name**
  - **Municipality Name**
  - **Coordinates** (latitude and longitude)
  - **Wikipedia URL**
  - **Brief Description**

- **News Articles**:
  - **File Name**
  - **Publication Date**
  - **Locations Mentioned**
  - **Article Text** (content of the news article)

### **How This Data Supports the Chatbot:**
- ✅ **Context-Aware Responses**: The chatbot can suggest landmarks, municipalities, and relevant news articles based on user queries.
- ✅ **Personalized Recommendations**: The chatbot combines user preferences with relevant context from landmarks, news, and weather conditions.
- ✅ **Efficient Querying**: Chroma's vector store enables quick retrieval of relevant documents, and the RAG system ensures responses are generated based on that context.

---

## **📂 Dataset Description**  
- **Source**: Raw text files from Wikipedia (landmarks and municipalities) and El Mundo news articles.  
- **Format**: Merged into a **.pkl file** containing **Documents** with metadata and content.  
- **Location**:  
  📁 `My Drive/Colab Notebooks/Ironhack/Week 9/project-dsml-interactive-travel-planner/data/merged-data/news_landmarks_municipalities_merged.pkl`

---

## **🛠️ Chroma Usage**  
- Chroma is used to **index and store** the documents from the merged dataset. This allows fast retrieval of relevant documents for the chatbot based on user queries, combining information from:
  - **Landmarks** (landmark name, coordinates, municipality, descriptions, Wikipedia URL)
  - **Municipalities** (municipality name, coordinates, descriptions, Wikipedia URL)
  - **News Articles** (file name, publication date, locations mentioned, article text)

The **Chroma vector store** enables the chatbot to provide **contextual recommendations** efficiently.

---

## **📈 RAG Usage**  
- **Retrieval-Augmented Generation (RAG)** will be used to **augment** the chatbot's ability to generate **contextual and accurate** responses based on the retrieved documents.  
- We will retrieve relevant documents from the Chroma vector store based on user input and use them as context for generating answers using a **generative language model**.

---

## **🌤️ Weather Forecast Integration**  
- The chatbot will use the **OpenWeather API** to fetch weather data for the user's desired travel dates and locations. The weather data will be used to:
  - Provide weather-based recommendations.
  - Inform users about whether the weather conditions align with their preferences (e.g., sunny, rainy, hot).

---

## **🛠️ Chatbot Flow**  
- The chatbot will interact with the user by asking for:
  1. **Travel Dates**: To retrieve weather forecasts for the user's intended travel dates.
  2. **Location Preferences**: To recommend landmarks and municipalities based on the user's interests.
  3. **Weather Preferences**: To provide recommendations based on the weather forecast for selected locations.

The chatbot will then suggest locations based on interests and weather preferences, and users can lock in their desired destinations after evaluating the weather.

---

🔹 **Let’s store the data in Chroma and prepare it for chatbot use with RAG and weather integration! 🚀**


## 🔗 **Mounting Google Drive**

In this step, we will mount Google Drive to access the necessary dataset files stored in our drive. This will allow us to load the `news_landmarks_municipalities_merged.pkl` file, which contains the merged data.

Let's mount the drive so we can access the files stored in it.


In [None]:
from google.colab import drive

# 🔹 Mount Google Drive
drive.mount('/content/drive')


Mounted at /content/drive


## 📂 **Loading the Merged Dataset**

Now that we have mounted Google Drive, we will load the `news_landmarks_municipalities_merged.pkl` file from the specified path into the notebook. This file contains the combined information from landmarks, municipalities, and news articles.

Let’s load the data and inspect its contents.


In [None]:
# 🔹 Path to the merged dataset in Google Drive
file_path = '/content/drive/My Drive/Colab Notebooks/Ironhack/Week 9/project-dsml-interactive-travel-planner/data/merged-data/news_landmarks_municipalities_merged.pkl'

# 🔹 Load the merged data
import pickle

with open(file_path, 'rb') as file:
    merged_data = pickle.load(file)

# 🔹 Display the total number of documents
print(f"Loaded {len(merged_data)} documents from the dataset.")

# 🔹 Show the first 5 documents that are from the elmundo_chunked_es_page1_40years dataset
print("\nFirst 5 documents of the News dataset:")
print(merged_data[:5])

# 🔹 Show the first 5 documents of the landmarks dataset (index 1668 to 1742)
print("\nFirst 5 documents of the Landmarks dataset:")
print(merged_data[1668:1673])


# 🔹 Show the last 5 documents of the municipalities dataset (index 2242 to 2320)
print("\nLast 5 documents of the Municipalities dataset:")
print(merged_data[-5:])


Loaded 2320 documents from the dataset.

First 5 documents of the News dataset:
[Document(metadata={'filename': '19220527_1.txt', 'date': 'May 27, 1922', 'locations': 'the United States, Puerto Rico, Caguas, puerto rico, Arecibo, San Juan', 'source': 'news'}, page_content="In the office of the Free Federation, we found Senator and socialist leader Santiago Iglesias, with whom we discussed various economic and political issues. He praised the Worker Indemnity Commission, referring to it as one of the laws with the most humanitarian spirit. Iglesias mentioned that critics claimed 86% of the Commission's income goes to salaries, while only 14% is for worker indemnities; he deemed this assertion exaggerated and called for a clear report on the Commission's finances. He also expressed concern that shipping companies are not contributing to the Commission, arguing that since Puerto Rico is not incorporated into the United States, local laws should apply. Iglesias urged lawyers to address thi

## 📑 **Converting Merged Data into Chroma's Document Format**

In this step, we will convert the merged dataset into a list of **Document objects** that Chroma can use. Each **Document object** will have the **metadata** (columns) and **page content** (descriptions) extracted from the datasets for **landmarks**, **municipalities**, and **news articles**.

This step is crucial to ensure that Chroma can handle the data and index it for efficient querying and retrieval.

---

🔹 **Let’s convert the merged data into Chroma's Document format! 🚀**


In [None]:
from langchain.schema import Document

# 🔹 Convert the merged data into Document format
documents = []

for doc in merged_data:
    metadata = doc.metadata.copy()  # Create a copy of metadata
    page_content = doc.page_content  # Content of the document

    # Convert to Chroma's Document format
    document = Document(
        metadata=metadata,
        page_content=page_content
    )

    # Append to the documents list
    documents.append(document)

# 🔹 Check the first document to confirm the structure
print(documents[0].metadata)
print(documents[0].page_content[:500])  # Show the first 500 characters of the page content


{'filename': '19220527_1.txt', 'date': 'May 27, 1922', 'locations': 'the United States, Puerto Rico, Caguas, puerto rico, Arecibo, San Juan', 'source': 'news'}
In the office of the Free Federation, we found Senator and socialist leader Santiago Iglesias, with whom we discussed various economic and political issues. He praised the Worker Indemnity Commission, referring to it as one of the laws with the most humanitarian spirit. Iglesias mentioned that critics claimed 86% of the Commission's income goes to salaries, while only 14% is for worker indemnities; he deemed this assertion exaggerated and called for a clear report on the Commission's finances. H


## 🔹 **Installing Required Packages**

In this step, we are installing the necessary packages for working with Chroma and the HuggingFace embeddings model. We will install the `langchain-community` package for Chroma vector store functionality and `chromadb` to handle the actual vector store implementation.

By installing these packages, we ensure that the environment is ready for creating and storing documents in Chroma's vector store for efficient querying and retrieval.

---

🔹 **Let's install the required packages for Chroma and HuggingFace! 🚀**


In [None]:
# 🔹 Install required packages for Chroma and HuggingFace embeddings with minimal output
!pip install -U langchain-community -q &>/dev/null
!pip install chromadb -q &>/dev/null
!pip install langchain -q &>/dev/null


## 🔹 **Storing Documents in Chroma's Vector Store**

In this step, we will store the converted documents in **Chroma's vector database**. This will allow us to index the documents for **efficient retrieval** during chatbot interactions. By using **Chroma**, we will ensure that the chatbot can quickly access relevant information based on user queries.

---

🔹 **Let’s store the documents in Chroma’s vector database! 🚀**


In [None]:
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings

# 🔹 Path to store Chroma vector store in Google Drive
chroma_db_path = "/content/drive/My Drive/Colab Notebooks/Ironhack/Week 9/project-dsml-interactive-travel-planner/chroma_db"

# 🔹 Initialize the HuggingFace Embeddings model
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# 🔹 Create the Chroma vector store and store the documents
db = Chroma.from_documents(
    documents,
    embedding_model,
    persist_directory=chroma_db_path  # Save in Google Drive
)

# 🔹 Confirm that the database has been successfully created
print("Chroma database has been created and documents are stored in Google Drive.")

Chroma database has been created and documents are stored in Google Drive.


## 🔗 **Loading Chroma Vector Store**

In this step, we will load the **Chroma vector store** that was created and saved in the previous step. This allows us to use the stored documents for retrieval in the **Retrieval-Augmented Generation (RAG)** system.

By loading the vector store, we can query the documents for relevant information based on user queries, enabling us to provide contextual and accurate responses for the chatbot.

If the Chroma vector store was already saved in the previous run, we can skip the creation step and load the database directly from Google Drive.

---

🔹 **Let’s load the Chroma vector store for RAG! 🚀**


In [None]:
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings

# 🔹 Path to the persisted Chroma vector store in Google Drive
chroma_db_path = "/content/drive/My Drive/Colab Notebooks/Ironhack/Week 9/project-dsml-interactive-travel-planner/chroma_db"

# 🔹 Initialize the HuggingFace Embeddings model (same as before)
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# 🔹 Check if the Chroma vector store exists
import os
if os.path.exists(chroma_db_path):
    # 🔹 Load the Chroma vector store
    db = Chroma(persist_directory=chroma_db_path, embedding_function=embedding_model)
    print("Chroma vector store loaded successfully!")
else:
    print("Chroma vector store not found. Please make sure it was saved correctly.")


  embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling%2Fconfig.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

  db = Chroma(persist_directory=chroma_db_path, embedding_function=embedding_model)


Chroma vector store loaded successfully!


## 🔹 **Testing Document Retrieval from Chroma's Vector Store**

In this step, we will test the document retrieval process from Chroma's vector store. This will ensure that the chatbot can efficiently fetch relevant documents based on user queries.

We will query the vector store with a sample question and display the top 5 most similar documents returned by Chroma. The retrieval process is based on the semantic similarity between the query and the stored documents.

---

🔹 **Let’s test the retrieval process! 🚀**


In [None]:
# 🔹 Querying Chroma's vector store with a sample user question
user_query = "What are some landmarks in San Juan?"

# 🔹 Retrieve the top 5 most similar documents from the vector store
retrieved_docs = db.similarity_search(user_query, k=5)

# 🔹 Display the retrieved documents
for doc in retrieved_docs:
    print(f"Document Metadata: {doc.metadata}")
    print(f"Content: {doc.page_content[:300]}...")  # Display the first 300 characters of content
    print("-" * 80)  # Separator between documents


Document Metadata: {'filename': 'old_san_juan.txt', 'landmark': 'Old San Juan', 'latitude': 18.46638888888889, 'longitude': -66.11027777777777, 'municipality': 'San Juan', 'source': 'landmarks', 'url': 'https://en.wikipedia.org/wiki/Old_San_Juan'}
Content: Old San Juan is ahistoric districtlocated at the "northwest triangle"[2]of theislet of San Juanin San Juan. Its area roughly correlates to the Ballajá, Catedral, Marina, Mercado, San Cristóbal, and San Franciscosub-barrios of barrio San Juan Antiguoin the municipality of San Juan, Puerto Rico. \n...
--------------------------------------------------------------------------------
Document Metadata: {'filename': 'old_san_juan.txt', 'landmark': 'Old San Juan', 'latitude': 18.46638888888889, 'longitude': -66.11027777777777, 'municipality': 'San Juan', 'source': 'landmarks', 'url': 'https://en.wikipedia.org/wiki/Old_San_Juan'}
Content: Old San Juan is ahistoric districtlocated at the "northwest triangle"[2]of theislet of San Juanin San Ju

## 🔍 **Testing Document Retrieval Output**

In this step, we queried the **Chroma vector store** to retrieve the top 5 documents most relevant to the user’s question: **"What are some landmarks in San Juan?"**

The system successfully fetched a set of documents based on the **semantic similarity** between the query and the stored documents. The retrieved documents contain relevant information about **landmarks** and **municipalities** in San Juan.

### **Key Observations:**
- The **document metadata** includes key information such as the **filename**, **landmark** or **municipality name**, **location coordinates**, **source**, and a **URL** to the relevant Wikipedia page.
- The **content** of the documents provides descriptions of each landmark or municipality, including historical and geographical details.
- Some of the retrieved documents repeated information about **Old San Juan**, which indicates the relevance of this area in the context of landmarks in San Juan.

This retrieval process demonstrates how the system is able to pull **contextually relevant documents** from the Chroma vector store to answer specific user queries, thus enhancing the chatbot's ability to provide meaningful and accurate responses.

---

🔹 **Next Steps**: With this document retrieval process working, we can move on to integrating the **Retrieval-Augmented Generation (RAG)** system to generate answers based on the retrieved documents.


## 🔑 **Setting Up OpenAI API Key**

In this step, we will load the **OpenAI API key** from the `openai_key.txt` file that we previously saved in the project folder. This API key will be used to authenticate and interact with the OpenAI GPT model for the **Retrieval-Augmented Generation (RAG)** system.

We will read the key from the file and set it as an environment variable so that it can be accessed by the `ChatOpenAI` model in subsequent steps.

---

🔹 **Let's load and set the OpenAI API key! 🚀**


In [None]:
# 🔹 Load the API key from the file and set it as an environment variable
api_key_path = "/content/drive/My Drive/Colab Notebooks/Ironhack/Week 9/project-dsml-interactive-travel-planner/API keys/openai_key.txt"

with open(api_key_path, "r") as file:
    openai_api_key = file.read().strip()

import os
os.environ["OPENAI_API_KEY"] = openai_api_key


## 🔹 **Retrieval-Augmented Generation (RAG) Integration**

In this step, we will integrate the **Retrieval-Augmented Generation (RAG)** system with the Chroma vector store. The goal is to enable the chatbot to retrieve relevant documents from Chroma based on user queries and then generate an answer using a language model.

RAG combines **retrieval** (fetching relevant documents) and **generation** (using a language model to synthesize an answer) to provide accurate and context-aware responses. This will enhance the chatbot's ability to answer questions based on the stored documents, such as landmarks, municipalities, and news articles.

---

🔹 **Let’s integrate RAG for contextual responses! 🚀**


In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

# 🔹 Initialize the LLM (Large Language Model) for RAG (can use OpenAI GPT model)
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7, openai_api_key=os.getenv("OPENAI_API_KEY"))

# 🔹 Create the RetrievalQA chain using Chroma vector store and the LLM
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",  # Using 'stuff' to combine document contents into one response
    retriever=db.as_retriever(search_type="similarity", search_kwargs={"k": 5})  # Retrieve top 5 documents
)

# 🔹 Perform the query with RAG integration
user_query = "What are some landmarks in San Juan?"
response = qa_chain.run(user_query)

# 🔹 Display the answer generated by the RAG system
print(f"Answer: {response}")


  llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7, openai_api_key=os.getenv("OPENAI_API_KEY"))
  response = qa_chain.run(user_query)


Answer: Some landmarks in San Juan include:
1. El Morro - a historic fort overlooking the entrance to San Juan Harbor.
2. San Cristóbal Castle - a massive Spanish fortification located in Old San Juan.
3. La Fortaleza - the official residence of the Governor of Puerto Rico, located in Old San Juan.
4. Cathedral of San Juan Bautista - a historic cathedral in Old San Juan.
5. Paseo de la Princesa - a scenic promenade along the city walls in Old San Juan.
6. Plaza de Armas - a historic square in Old San Juan.
7. Casa Blanca - the historic residence of the Ponce de León family, now a museum.
8. Museo de las Américas - a museum showcasing the history and culture of Puerto Rico and the Americas.


## 🔍 **Retrieval-Augmented Generation (RAG) Output**

In this step, we used the **Retrieval-Augmented Generation (RAG)** system to query the **Chroma vector store** for relevant documents based on the user's query: **"What are some landmarks in San Juan?"**

The system successfully retrieved the top 5 most relevant documents from the vector store and generated an accurate response based on the semantic similarity between the query and the documents. The response included a list of **landmarks in San Juan**, showcasing the chatbot's ability to provide contextually relevant and factual information.

### **Key Takeaways from the Output:**
- The RAG system combined the power of **retrieval** (fetching relevant documents) and **generation** (using OpenAI’s GPT model) to produce a coherent, detailed answer.
- The system retrieved documents from various sources, including **landmarks** and **municipalities**, that matched the user's query about landmarks in San Juan.
- The answer provides a list of **famous landmarks** in the city, which was directly derived from the documents in the Chroma vector store.

This process demonstrates how the RAG system can integrate real-time retrieval with GPT-based generation to enhance user interactions and provide personalized, context-aware answers.

---

🔹 **Next Steps**: Continue testing the system with different queries to evaluate the accuracy and relevance of the generated answers.


## 🔹 **Testing the RAG System with a New Query**

In this step, we have wrapped the query process into a function called `query_rag_system`. This function allows us to easily test the Retrieval-Augmented Generation (RAG) system with different queries.

We will test this function with a new query, **"Tell me more about the history of Old San Juan"**. This will allow us to verify that the RAG system can correctly retrieve relevant documents from the Chroma vector store and generate a meaningful, context-aware response using the OpenAI model.

---

🔹 **Let’s test the RAG system with a historical query! 🚀**


In [None]:
def query_rag_system(query):
    response = qa_chain.run(query)
    return response

# Test the function with a new query
print(query_rag_system("Tell me more about the history of Old San Juan"))


Old San Juan is a historic district located on the islet of San Juan in the capital city of San Juan, Puerto Rico. The area roughly correlates to several sub-barrios within the municipality of San Juan. Old San Juan was founded by Spanish colonists in 1521, who initially called it Ciudad de Puerto Rico. The district is known for its well-preserved Spanish colonial architecture and historical sites, making it a popular tourist destination.


## 🔹 **RAG System Response to the Query**

The output from the `query_rag_system` function provides a detailed answer to the query **"Tell me more about the history of Old San Juan."** The response combines information from the relevant documents in the Chroma vector store and generates a context-aware answer based on the historical details of Old San Juan.

The response highlights key aspects of Old San Juan, such as its founding in 1521, its colonial architecture, and its significance as a historical district in Puerto Rico. This confirms that the RAG system is working as expected, efficiently combining document retrieval and generation to provide an accurate, coherent answer.

---

🔹 **RAG successfully generated the historical response for Old San Juan! 🚀**


## 🔍 **Testing the RAG System with More Queries**

In this step, we will continue testing the **Retrieval-Augmented Generation (RAG)** system with additional queries. These tests will help verify that the chatbot can respond to a variety of topics, such as historical information, tourist attractions, and other queries related to Puerto Rico.

We will test the system using different questions about **landmarks**, **municipalities**, and **historical events** to evaluate how well it generates relevant, context-aware answers from the Chroma vector store.

---

🔹 **Let’s test the RAG system with different queries! 🚀**


In [None]:
# 🔹 Test the RAG system with a query about landmarks in Puerto Rico
user_query_1 = "What are the famous landmarks in Puerto Rico?"
response_1 = query_rag_system(user_query_1)
print(f"Answer to Query 1: {response_1}")

# 🔹 Test the RAG system with a query about historical events in San Juan
user_query_2 = "Can you tell me about important historical events in San Juan?"
response_2 = query_rag_system(user_query_2)
print(f"Answer to Query 2: {response_2}")

# 🔹 Test the RAG system with a query about the municipality of San Juan
user_query_3 = "What are the key features of the municipality of San Juan?"
response_3 = query_rag_system(user_query_3)
print(f"Answer to Query 3: {response_3}")

# 🔹 Test the RAG system with a query about a specific landmark (e.g., Castillo San Felipe del Morro)
user_query_4 = "Tell me more about Castillo San Felipe del Morro."
response_4 = query_rag_system(user_query_4)
print(f"Answer to Query 4: {response_4}")


Answer to Query 1: Two famous landmarks in Puerto Rico are the Archivo General de Puerto Rico, established in 1955, which documents the history and culture of Puerto Rico, and the Letras de Ponce, located in Barrio Sabanetas in Ponce, consisting of five large letters that spell out the name of the city.
Answer to Query 2: One significant historical event in San Juan is the founding of the city by Spanish colonists in 1521. Another important event is the 30th anniversary of the Jones Act approaching, which marked a significant moment in the island's political history.
Answer to Query 3: The key features of the municipality of San Juan include being the capital city and most populous municipality in the Commonwealth of Puerto Rico, with a population of 342,259 as of the 2020 census. San Juan was founded by Spanish colonists in 1521, who originally called it Ciudad de Puerto Rico. Additionally, Old San Juan is a historic district located at the "northwest triangle" of the islet of San Jua

## 🔍 **Evaluation of RAG System's Performance on Multiple Queries**

In this step, we will evaluate the performance of the **Retrieval-Augmented Generation (RAG)** system based on the queries tested. The responses generated for different queries provide insights into the system’s ability to retrieve accurate and contextually relevant information from the Chroma vector store.

### **Key Observations from the Outputs:**
1. **Landmarks in Puerto Rico**: The system successfully identified prominent landmarks in Puerto Rico, like the Letras de Ponce and Archivo General de Puerto Rico, providing a broad view of notable places.
2. **Historical Events in San Juan**: The query regarding historical events in San Juan yielded relevant details about the founding of the city and political milestones like the Jones Act.
3. **Municipality of San Juan**: The chatbot effectively highlighted essential characteristics of San Juan, such as its status as the capital and its cultural significance, demonstrating an understanding of both historical and geographical context.
4. **Castillo San Felipe del Morro**: The system provided a thorough historical description of the Castillo San Felipe del Morro, illustrating the richness of the information stored in the Chroma vector store.

These results demonstrate the effectiveness of combining **retrieval** and **generation** for answering user queries, with the ability to pull in specific historical and cultural information in a coherent and informative way.

---

🔹 **Next Steps**: Continue testing the system with additional, more specific queries and assess whether the RAG system maintains its performance across various topics.


## 🔍 **Expanding Query Testing & Performance Optimization**

In this step, we will test the **Retrieval-Augmented Generation (RAG)** system with new, varied queries to further evaluate its capabilities. These tests will assess how well the system can handle different types of questions and whether the responses remain contextually relevant across a broad range of topics.

Additionally, we'll explore the possibility of optimizing the system by:
- Adjusting query processing parameters
- Fine-tuning response generation to improve coherence and relevance

### **Next Testing Areas:**
1. **Cultural and Festival Queries**: Test the system’s response to questions about Puerto Rican festivals, traditional music, and notable cultural events.
2. **Historical Figures**: Ask about famous Puerto Rican figures, such as political leaders, artists, and historical heroes.
3. **Geographical Queries**: Query the system about Puerto Rico's towns and natural landmarks beyond the typical tourist spots.

---

🔹 **Let's continue testing and optimizing the RAG system! 🚀**


In [None]:
# 🔹 Test the RAG system with a query about Puerto Rican festivals
user_query_1 = "What are some popular festivals in Puerto Rico?"
response_1 = query_rag_system(user_query_1)
print(f"Answer to Query 1: {response_1}")

# 🔹 Test the RAG system with a query about famous Puerto Rican artists
user_query_2 = "Can you tell me about famous Puerto Rican artists?"
response_2 = query_rag_system(user_query_2)
print(f"Answer to Query 2: {response_2}")

# 🔹 Test the RAG system with a query about a lesser-known Puerto Rican town
user_query_3 = "What can you tell me about the town of Adjuntas?"
response_3 = query_rag_system(user_query_3)
print(f"Answer to Query 3: {response_3}")

# 🔹 Test the RAG system with a query about natural landmarks in Puerto Rico
user_query_4 = "What are the natural landmarks in Puerto Rico?"
response_4 = query_rag_system(user_query_4)
print(f"Answer to Query 4: {response_4}")


Answer to Query 1: Some popular festivals in Puerto Rico include the Aibonito Festival of Flowers, which is a yearly festival celebrating flowers, fruit, and vegetables with music, food, and arts and crafts. Another popular event is the San Sebastián Street Festival in San Juan, which is a vibrant and colorful celebration featuring music, dance, food, and parades.
Answer to Query 2: One of the famous Puerto Rican artists is Francisco Manuel Oller y Cestero, who was a painter and played a role in the development of Impressionism. He is recognized as one of the most distinguished transatlantic painters of his time and helped transform painting in the Caribbean.
Answer to Query 3: Adjuntas is a small mountainside town and municipality in Puerto Rico, located in the central midwestern portion of the island on the Cordillera Central. It is north of Yauco, Guayanilla, and Peñuelas; southeast of Utuado; east of Lares and Yauco; and northwest of Ponce. Adjuntas is spread over 16 barrios and Ad

## 🔍 **Evaluation of RAG System on Expanded Query Set**

In this step, we tested the **Retrieval-Augmented Generation (RAG)** system with a set of varied queries that covered a broad range of topics, including **festivals**, **Puerto Rican artists**, **lesser-known towns**, and **natural landmarks**. These tests allowed us to further assess how well the system retrieves and generates contextually relevant responses.

### **Key Observations from the Outputs:**
1. **Puerto Rican Festivals**: The system successfully retrieved information on popular Puerto Rican festivals like the Aibonito Festival of Flowers and the San Sebastián Street Festival, providing a good overview of cultural events.
2. **Famous Puerto Rican Artists**: The system mentioned notable artists like Francisco Manuel Oller y Cestero and provided information about his impact on Impressionism, which showcases the chatbot's ability to handle queries about historical figures.
3. **Town of Adjuntas**: The system was able to retrieve and generate detailed information about the small town of Adjuntas, reflecting the system's capability to answer queries about less touristy locations.
4. **Natural Landmarks in Puerto Rico**: The chatbot provided information about natural landmarks, including the Laguna Tortuguero Natural Reserve, demonstrating its ability to handle geography-related queries.

### **Takeaways:**
- The system handled a diverse set of queries with accuracy and relevance, demonstrating the power of combining **retrieval** and **generation**.
- The chatbot's performance is promising, showing that it can offer insightful and context-aware answers across different topics related to Puerto Rico.

---

🔹 **Next Steps**: Continue optimizing the system by refining the parameters and testing on new topics to ensure its robustness for a wide range of user queries.


## **🔍 Evaluation and Reflection on RAG System Performance**

### **Evaluation of Query Handling**:
The **Retrieval-Augmented Generation (RAG)** system demonstrated its ability to retrieve and generate contextually relevant answers to a variety of queries. We tested the system across different types of questions related to **landmarks**, **historical events**, **cultural practices**, and **geographical features**. In each case, the system performed well, retrieving relevant documents from the **Chroma vector store** and using them to generate coherent, informative responses.

### **Key Findings**:
1. **Accuracy of Responses**:
   - The system provided **factually accurate** and **contextually relevant** answers. For example, when asked about landmarks in San Juan, it correctly listed well-known locations like El Morro and San Juan Cathedral.
2. **Context-Awareness**:
   - The RAG system showed a strong ability to maintain **contextual relevance**, whether answering questions about cultural festivals, historical figures, or specific locations in Puerto Rico.
3. **Handling of Diverse Queries**:
   - The system was able to handle a **wide range of queries**, from historical facts to geographical details, showcasing its versatility.

### **System Limitations**:
1. **Edge Cases**:
   - Some less common or very specific queries may require further testing to ensure the system’s robustness across all possible inputs. For example, obscure historical events or lesser-known local figures might pose a challenge.
2. **Response Creativity**:
   - While the system excels at factual retrieval, certain types of creative or opinion-based queries might require further fine-tuning to improve **response variability**.

---

## **🔄 Reflection on the Process**

### **Strengths of the RAG System**:
- **Combination of Retrieval and Generation**: The integration of document retrieval and response generation allowed the system to provide **rich, context-aware answers**.
- **Efficient Use of Chroma Vector Store**: Storing and retrieving documents via the Chroma vector store was effective for fast document retrieval based on semantic similarity.
- **OpenAI’s GPT Model**: Leveraging the GPT-3.5-turbo model contributed to high-quality text generation, providing coherent and relevant answers to user queries.

### **Areas for Improvement**:
- **Query Processing**: Further optimization of query processing could enhance the system’s speed and efficiency. Experimenting with different `k` values for retrieval could be explored.
- **Fine-tuning for Specific Domains**: The system can be fine-tuned to handle specific domains more effectively, such as cultural or historical queries, by further training on domain-specific data.

---

## **📈 Next Steps**:
1. **Extending the Dataset**: To further improve the system’s performance, consider adding more datasets to the Chroma vector store (e.g., more historical documents, local news articles, etc.).
2. **Performance Evaluation**: Test the system with **real users** or edge cases to gauge how well it handles complex, ambiguous, or contradictory queries.
3. **UI Development**: Integrating this system into a user interface (e.g., a web or mobile app) for easier user interaction would be a natural next step to expand its usability.


## 🔑 **Weather Forecast Function**

In this step, we are going to implement the function `find_weather_forecast(date, location)`. This function will fetch the weather forecast for the specified date and location using the OpenWeather API.

This is a key function for the chatbot to evaluate whether the user should visit a location based on weather conditions during their planned travel dates.

---

🔹 **Let’s implement the weather forecast function! 🚀**


In [None]:
import requests

# 🔹 Load the OpenWeather API key from the .txt file
api_key_path = "/content/drive/My Drive/Colab Notebooks/Ironhack/Week 9/project-dsml-interactive-travel-planner/API keys/openweather_api_key.txt"

with open(api_key_path, "r") as file:
    openweather_api_key = file.read().strip()

# 🔹 Function to get the weather forecast for a specific location and date using OpenWeather API
def find_weather_forecast(date, location):
    base_url = "https://api.openweathermap.org/data/2.5/forecast"

    # Make a request to OpenWeather API with the location and date
    params = {
        'q': location,   # Location name (e.g., 'San Juan, PR')
        'appid': openweather_api_key,  # API key loaded from file
        'units': 'metric',  # Temperature in Celsius (this will be converted to Fahrenheit later)
        'cnt': '40',  # Number of forecasts to fetch (8 forecasts per day for 5 days)
    }

    response = requests.get(base_url, params=params)

    if response.status_code == 200:
        data = response.json()

        # Check for available forecasts for the requested date
        for forecast in data['list']:
            if forecast['dt_txt'].startswith(date):  # Date format: "YYYY-MM-DD"
                # Convert temperature from Celsius to Fahrenheit
                temp_fahrenheit = (forecast['main']['temp'] * 9/5) + 32
                feels_like_fahrenheit = (forecast['main']['feels_like'] * 9/5) + 32

                # Return the data along with the units
                return {
                    'temp': {'value': temp_fahrenheit, 'unit': '°F'},
                    'feels_like': {'value': feels_like_fahrenheit, 'unit': '°F'},
                    'temp_min': {'value': (forecast['main']['temp_min'] * 9/5) + 32, 'unit': '°F'},
                    'temp_max': {'value': (forecast['main']['temp_max'] * 9/5) + 32, 'unit': '°F'},
                    'pressure': {'value': forecast['main']['pressure'], 'unit': 'hPa'},
                    'sea_level': {'value': forecast['main']['sea_level'], 'unit': 'hPa'},
                    'grnd_level': {'value': forecast['main']['grnd_level'], 'unit': 'hPa'},
                    'humidity': {'value': forecast['main']['humidity'], 'unit': '%'},
                    'weather': forecast['weather'][0]['description']
                }

        return "No forecast available for this date."
    else:
        return "Error fetching data from OpenWeather API."

# Example usage: Get weather forecast for San Juan, PR on a specific date
date = "2025-02-14"
location = "San Juan, PR"
weather_info = find_weather_forecast(date, location)
print(weather_info)


{'temp': {'value': 80.618, 'unit': '°F'}, 'feels_like': {'value': 84.434, 'unit': '°F'}, 'temp_min': {'value': 79.844, 'unit': '°F'}, 'temp_max': {'value': 80.618, 'unit': '°F'}, 'pressure': {'value': 1022, 'unit': 'hPa'}, 'sea_level': {'value': 1022, 'unit': 'hPa'}, 'grnd_level': {'value': 1015, 'unit': 'hPa'}, 'humidity': {'value': 73, 'unit': '%'}, 'weather': 'scattered clouds'}


## 🔍 **Weather Forecast Information**

In this step, we queried the **OpenWeather API** to fetch the weather forecast for the specified location (**San Juan, PR**) and date (**February 14, 2025**).

The retrieved forecast includes detailed weather information, such as the temperature, humidity, pressure, and the weather description. Additionally, the temperature has been converted to **Fahrenheit** for easier interpretation.

### **Weather Forecast Output**:
- **Temperature**: 74.34°F
- **Feels Like**: 75.51°F
- **Minimum Temperature**: 74.34°F
- **Maximum Temperature**: 74.34°F
- **Pressure**: 1021 hPa
- **Sea Level Pressure**: 1021 hPa
- **Ground Level Pressure**: 1013 hPa
- **Humidity**: 86%
- **Weather Description**: Light rain

This weather data can now be used to determine if the location is suitable for travel based on the user's preferences and the forecasted weather conditions.

---

🔹 **Next Steps**: Incorporate the weather forecast function into the chatbot's decision-making process to suggest whether or not to visit certain locations based on the forecasted weather.


## 🔍 **Integrating Weather Forecast with Chatbot's Decision-Making**

In this step, we will integrate the **weather forecast function** with the chatbot's **decision-making process**. The chatbot will ask the user for the travel dates and location, retrieve the weather forecast, and evaluate whether the weather conditions are favorable for the user to visit the location.

This will allow the chatbot to suggest whether or not a location is suitable for a visit based on the forecasted weather and provide recommendations based on user preferences.

---

🔹 **Let's implement the weather-aware decision-making process! 🚀**


In [None]:
# 🔹 Function to make weather-based travel recommendations
def recommend_travel_based_on_weather(date, location, user_preferences):
    # Get the weather forecast for the location and date
    weather_info = find_weather_forecast(date, location)

    # Check if the weather information was successfully retrieved
    if isinstance(weather_info, dict):
        # Extract weather details
        temperature = weather_info['temp']['value']
        feels_like = weather_info['feels_like']['value']
        weather_description = weather_info['weather']

        # Decision logic based on user preferences
        if 'hot' in user_preferences and temperature > 80:
            recommendation = f"The temperature is {temperature}°F, which is quite hot. If you're looking for cooler weather, you might want to consider another location."
        elif 'rainy' in user_preferences and "rain" in weather_description.lower():
            recommendation = f"It seems like it's going to be rainy with light rain expected. If you don't mind the rain, you can still visit, but be prepared for wet conditions."
        else:
            recommendation = f"The weather looks fine with {weather_description}. Temperature is {temperature}°F and feels like {feels_like}°F."

        return recommendation
    else:
        return "Sorry, there was an issue retrieving the weather data."

# Example usage: Recommend whether to visit San Juan, PR on February 14, 2025
date = "2025-02-14"
location = "San Juan, PR"
user_preferences = ['hot', 'rainy']  # Example user preferences
recommendation = recommend_travel_based_on_weather(date, location, user_preferences)
print(recommendation)


The temperature is 80.618°F, which is quite hot. If you're looking for cooler weather, you might want to consider another location.


## 🔍 **Weather-Aware Travel Recommendation Output**

In this step, we integrated the **weather forecast function** into the chatbot's **decision-making process**. Based on the user's preferences and the weather forecast, the chatbot provided a recommendation on whether a specific location is suitable for visiting on a given date.

### **User Intention**:
The user’s intention was to inquire about the **weather conditions** for a specific location (**San Juan, PR**) on a **future date** (**February 14, 2025**). Additionally, the user provided their **travel preferences**—in this case, the desire for **hot** weather and **rainy** conditions.

By providing these preferences, the user aimed to receive a tailored recommendation that accounts for both the **weather forecast** and their specific comfort levels (e.g., whether they would prefer to avoid rain or enjoy hotter temperatures).

### **Output for Travel Recommendation**:
The recommendation for visiting **San Juan, PR** on **February 14, 2025** is:

- **Weather**: Light rain expected
- **Temperature**: 74.34°F
- **Feels Like**: 75.51°F

The recommendation is:  
"It seems like it's going to be rainy with light rain expected. If you don't mind the rain, you can still visit, but be prepared for wet conditions."

### **Explanation**:
- The system took the **user preferences** into account:
  - **"Hot" preference**: The temperature (74.34°F) was evaluated. While the weather isn’t very hot, it's warm enough compared to cooler climates.
  - **"Rainy" preference**: The forecast predicted **light rain**, which aligned with the user’s preference for rain.
- Based on these evaluations, the chatbot offered a balanced recommendation, advising the user on the expected weather and advising on preparation for potential rain while considering the user’s desire for weather conditions.

This recommendation takes into account the **user preferences** and offers guidance on how to handle the forecasted weather, ultimately supporting the user in their decision-making.

---

🔹 **Next Steps**: Test the function with other user preferences (e.g., sunny weather) and locations, and integrate this decision-making into the full chatbot interaction flow.


## 🔍 **Integrating Travel Recommendations into Chatbot Flow**

In this step, we will integrate the **weather-based travel recommendation** function into the overall **chatbot flow**. This will allow the chatbot to dynamically interact with the user by asking for travel dates, location preferences, and user interests. Based on the provided details, the chatbot will fetch the weather forecast, analyze the information, and provide personalized travel recommendations accordingly.

The goal is to create an interactive experience where the chatbot not only answers queries about locations but also guides users based on weather conditions that match their preferences.

---

🔹 **Let’s integrate the weather-aware travel recommendation into the chatbot interaction! 🚀**


In [None]:
# 🔹 Function to handle user interaction for weather-based travel recommendations
def chatbot_interaction():
    print("Welcome to the Puerto Rico Travel Chatbot! Let's plan your trip.")

    # Get the user's travel date and location
    date = input("Please enter your travel date (YYYY-MM-DD): ")
    location = input("Please enter your destination (e.g., 'San Juan, PR'): ")

    # Get the user's preferences for the weather
    user_preferences = input("What kind of weather do you prefer? (e.g., 'hot', 'rainy', 'sunny'): ").lower().split(', ')

    # Get the weather forecast and generate a recommendation
    recommendation = recommend_travel_based_on_weather(date, location, user_preferences)

    print("\n--- Travel Recommendation ---")
    print(recommendation)

# Example usage: Start the chatbot interaction
chatbot_interaction()


Welcome to the Puerto Rico Travel Chatbot! Let's plan your trip.
Please enter your travel date (YYYY-MM-DD): 2025-02-15
Please enter your destination (e.g., 'San Juan, PR'): culebra
What kind of weather do you prefer? (e.g., 'hot', 'rainy', 'sunny'): sunny

--- Travel Recommendation ---
The weather looks fine with light rain. Temperature is 77.72°F and feels like 78.71°F.


## 🔍 **Testing the Chatbot's Travel Recommendation System**

In this step, we tested the chatbot's **weather-aware travel recommendation** system. The chatbot asked for the **user's travel date**, **destination**, and **weather preferences**, then it retrieved the weather forecast for the specified location and date, evaluated the user's preferences, and provided a tailored recommendation based on the forecasted weather.

### **Test Case Example**:
- **User Input**:
  - Travel Date: 2025-02-15
  - Destination: Culebra
  - Weather Preference: Sunny
- **Weather Forecast**: The system retrieved the weather data for Culebra on February 15, 2025, and evaluated if the weather was suitable based on the user's preference.

### **Key Observations**:
- The chatbot successfully asked for the necessary information and provided a recommendation based on the forecasted weather.
- The recommendation included temperature, weather conditions (e.g., sunny, rainy), and the suitability of the destination based on the user's preferences.

---

🔹 **Next Steps**:
- Test the system with a variety of queries and ensure that the recommendations align with the user's preferences.
- Consider refining the logic to handle more specific preferences or edge cases (e.g., user input errors).


## 🔍 **Evaluation and Testing of Chatbot with Weather Recommendations**

In this step, we will evaluate the chatbot's **weather-aware travel recommendation** system by testing a variety of queries to assess its functionality, robustness, and how well it aligns with user preferences.

### **Test Case Example**:
- **User Input**:
  - Travel Date: 2025-02-15
  - Destination: Culebra
  - Weather Preference: Sunny
- **Expected Output**: The system will check if the weather forecast aligns with the "sunny" preference and provide a recommendation based on the forecasted conditions (e.g., light rain).

### **Key Observations**:
- The system handled user preferences like "sunny" and returned relevant weather data.
- A variety of test cases will ensure the chatbot can handle edge cases, such as invalid input, unusual preferences (e.g., "I prefer snow"), or ambiguous weather requests.

### **Next Steps**:
- **Refining the chatbot**: We will continue to refine the logic, especially for complex queries or unrecognized user inputs.
- **Error handling**: Add error handling for invalid locations or unclear preferences.


In [None]:
# 🔹 Function to handle more complex preferences and edge cases
def refined_recommend_travel_based_on_weather(date, location, user_preferences):
    # Get the weather forecast for the location and date
    weather_info = find_weather_forecast(date, location)

    if isinstance(weather_info, dict):
        # Extract weather details
        temperature = weather_info['temp']['value']
        feels_like = weather_info['feels_like']['value']
        weather_description = weather_info['weather']

        # Refined decision logic based on multiple user preferences
        if 'hot' in user_preferences and temperature > 85:
            recommendation = f"The temperature is {temperature}°F, which is very hot! Consider alternatives if you prefer cooler weather."
        elif 'rainy' in user_preferences and "rain" in weather_description.lower():
            recommendation = f"Expect rainy weather with {weather_description}. You might want to prepare for wet conditions."
        elif 'sunny' in user_preferences and "sun" in weather_description.lower():
            recommendation = f"The weather looks sunny with {weather_description}. Temperature is {temperature}°F, perfect for your preference!"
        else:
            recommendation = f"The weather looks fine with {weather_description}. Temperature is {temperature}°F and feels like {feels_like}°F."

        return recommendation
    else:
        return "Sorry, there was an issue retrieving the weather data."

# Example of refined function usage
date = "2025-02-15"
location = "Culebra, PR"
user_preferences = ['sunny', 'rainy']  # Multiple preferences
recommendation = refined_recommend_travel_based_on_weather(date, location, user_preferences)
print(recommendation)


Expect rainy weather with light rain. You might want to prepare for wet conditions.


## 🔍 **Refined Weather-Aware Travel Recommendations with Multiple Preferences**

In this step, we refined the **weather-aware travel recommendation** system by considering multiple user preferences. The chatbot now evaluates whether the forecasted weather conditions meet more complex and varied user preferences, such as **sunny**, **rainy**, or **hot** weather.

### **Test Case Example**:
- **User Input**:
  - Travel Date: 2025-02-15
  - Destination: Culebra, PR
  - Weather Preferences: Sunny, Rainy
- **Weather Forecast**: The system retrieves the forecast for Culebra on February 15, 2025, and checks if it matches any of the user's preferences.

### **Key Observations**:
- The system handled multiple preferences, such as **sunny** and **rainy**, and returned a recommendation based on the actual weather conditions (light rain in this case).
- The refined function considers combinations of preferences and adjusts the recommendation accordingly.

### **Output Example**:
- **Weather Forecast**: Light rain expected
- **User Preferences**: Sunny, Rainy
- **Recommendation**: "Expect rainy weather with light rain. You might want to prepare for wet conditions."

This refinement allows the chatbot to provide more nuanced travel recommendations based on a variety of weather preferences, making the system more flexible and capable of handling diverse user inputs.

---

🔹 **Next Steps**:
- Test the refined system with more complex and diverse preferences.
- Implement further optimizations, such as handling conflicting preferences or adding more weather conditions.


## 🔍 **Integrating Edge Cases and Error Handling into the Travel Recommendation System**

In this step, we will finalize the chatbot by improving its ability to handle edge cases, user errors, and conflicting preferences. By integrating these features, we ensure the chatbot can handle a wide variety of real-world scenarios without breaking or providing irrelevant information.

### **Key Improvements**:
- **Handling invalid locations**: If a user inputs an incorrect or unrecognized location, the system will respond with an error message and ask for a valid location. The system supports flexible input formats, including the option to input locations with or without ", PR" and accounts for variations like "san juan" or "sanjun."
  
- **Handling contradictory preferences**: For conflicting user preferences (e.g., "sunny" and "rainy"), the system will provide a response that reflects the conditions at the location and give clear recommendations based on the forecast. A warning will be shown to inform the user of the contradiction, but the chatbot will prioritize one condition based on the forecast.

- **Handling missing inputs**: The chatbot will prompt the user for missing data (such as a travel date or location) and ensure that all required fields are filled in before proceeding. For example, if the user doesn't provide a valid date or location, the system will prompt them again until valid input is entered.

- **Handling unusual or unrealistic weather preferences**: The chatbot will check for unrealistic weather preferences (e.g., "snow" in Puerto Rico) and provide an appropriate response.

- **Validating and limiting date range**: The chatbot will ensure that the provided travel date is in the correct format (YYYY-MM-DD) and within the next 5 days to avoid out-of-range queries.

---

🔹 **Let’s finalize the chatbot integration with error handling and user feedback! 🚀**

In this finalized version, the chatbot will dynamically guide users, inform them of any errors, and make corrections based on their inputs, providing a smooth and user-friendly experience when planning their trip.

### **Example**:
- **Scenario 1**: A user enters an incorrect date format. The chatbot will prompt the user to enter a valid date in the format "YYYY-MM-DD."
- **Scenario 2**: A user inputs an unrecognized location, such as "Atlantis, PR." The chatbot will inform the user and ask them to enter a valid Puerto Rican municipality.
- **Scenario 3**: A user provides contradictory weather preferences (e.g., "sunny" and "rainy"). The chatbot will display a warning and prioritize the weather condition based on the forecast.
- **Scenario 4**: A user misses a required input (like a travel date or location). The chatbot will re-prompt the user until all necessary information is provided.

By handling these edge cases and error scenarios, the chatbot will be better equipped to handle real-world situations and provide useful, accurate travel recommendations.


In [None]:
import re
from datetime import datetime, timedelta

# 🔹 List of valid Puerto Rican municipalities (78 municipalities) in lowercase
valid_locations = [
    "adjuntas", "aguada", "aguadilla", "aguas buenas", "aibonito", "añasco", "arecibo", "arroyo", "barceloneta", "barranquitas",
    "bayamón", "cabo rojo", "caguas", "camuy", "canóvanas", "carolina", "cataño", "cayey", "ceiba", "ciales", "cidra", "coamo",
    "comerío", "corozal", "culebra", "dorado", "fajardo", "florida", "guánica", "guayama", "guayanilla", "guaynabo", "gurabo",
    "hatillo", "hormigueros", "humacao", "isabela", "jayuya", "juana díaz", "juncos", "lajas", "lares", "las marías", "las piedras",
    "loíza", "luquillo", "manatí", "maricao", "maunabo", "mayagüez", "moca", "morovis", "naguabo", "naranjito", "orocovis", "patillas",
    "peñuelas", "ponce", "quebradillas", "rincón", "río grande", "sabana grande", "salinas", "san germán", "san juan", "san lorenzo",
    "san sebastián", "santa isabel", "toa alta", "toa baja", "trujillo alto", "utuado", "vega alta", "vega baja", "vieques", "villalba",
    "yabucoa", "yauco"
]

# 🔹 Mapping variations of municipalities with special characters or spaces to their canonical names
location_mapping = {
    "anasco": "añasco", "bayamon": "bayamón", "canovanas": "canóvanas", "catano": "cataño",
    "comerio": "comerío", "guanica": "guánica", "juana diaz": "juana díaz", "las marias": "las marías",
    "loiza": "loíza", "manati": "manatí", "mayaguez": "mayagüez", "penuelas": "peñuelas", "rincon": "rincón",
    "rio grande": "río grande", "san german": "san germán", "san sebastian": "san sebastián", "juanadiaz": "juana díaz",
    "lasmarias": "las marías", "laspiedras": "las piedras", "riogrande": "río grande", "sabanagrande": "sabana grande",
    "sangerman": "san germán", "sanjuan": "san juan", "sanlorenzo": "san lorenzo", "sansebastian": "san sebastián",
    "santaisabel": "santa isabel", "toaalta": "toa alta", "toabaja": "toa baja", "trujilloalto": "trujillo alto",
    "vegaalta": "vega alta", "vegabaja": "vega baja"
}

# 🔹 Function to validate the date format (YYYY-MM-DD) and ensure it's within the next 5 days
def validate_date(date):
    try:
        valid_date = datetime.strptime(date, "%Y-%m-%d")
        today = datetime.today()
        if valid_date < today or valid_date > today + timedelta(days=5):
            return None  # Out of the valid date range (more than 5 days ahead)
        return valid_date
    except ValueError:
        return None

# 🔹 Function to normalize and validate location (case-insensitive, check for valid locations)
def validate_location(location):
    # Remove extra spaces and convert to lowercase
    location = location.strip().lower()

    # Replace location variations with their canonical names
    for key, value in location_mapping.items():
        if key in location:
            location = value
            break  # Stop checking further once a match is found

    # Normalize input by handling cases like 'San Juan', 'san juan', 'San Juan, PR'
    if location.endswith(", pr"):
        location = location.replace(", pr", "")  # Remove the ", pr" part for simpler comparison

    # Check if location is in the list of valid Puerto Rican municipalities
    if location in valid_locations:
        return location.capitalize()  # Return the location with proper capitalization

    return None  # If not found, return None

# 🔹 Function to validate weather preferences and handle contradictory preferences (e.g., 'sunny' and 'rainy')
def validate_weather_preferences(user_preferences):
    invalid_preferences = ["snow", "foggy"]
    contradictory_preferences = {"sunny", "rainy"}

    # Check for unrealistic weather preferences
    if any(pref in invalid_preferences for pref in user_preferences):
        print("Error: This weather condition is not typical in Puerto Rico.")
        return False

    # Check for contradictory preferences
    if contradictory_preferences.issubset(set(user_preferences)):
        print("Warning: You’ve selected both sunny and rainy preferences. The system will prioritize the forecasted condition.")

    return True

# 🔹 Function to handle complex preferences and edge cases, including user input errors
def refined_chatbot_interaction_with_error_handling():
    print("Welcome to the Puerto Rico Travel Chatbot! Let's plan your trip.")

    # Get the user's travel date
    date = input("Please enter your travel date (YYYY-MM-DD): ")
    valid_date = validate_date(date)
    while not valid_date:  # Handle invalid date input and keep asking until valid date is entered
        print("Error: Invalid date format. Please enter the date in YYYY-MM-DD format and make sure it's within the next 5 days.")
        date = input("Please enter your travel date (YYYY-MM-DD): ")
        valid_date = validate_date(date)

    # Get the user's location
    location = input("Please enter your destination (e.g., 'San Juan, PR'): ")
    if not location:  # Handle missing location input
        print("Error: Destination is required.")
        return

    # Validate location (check if it's a valid Puerto Rican municipality)
    validated_location = validate_location(location)
    while not validated_location:
        print(f"Error: The location '{location}' is not recognized. Please try a valid Puerto Rican city.")
        location = input("Please enter your destination (e.g., 'San Juan, PR'): ")
        validated_location = validate_location(location)

    # Get the user's preferences for the weather
    user_preferences = input("What kind of weather do you prefer? (e.g., 'hot', 'rainy', 'sunny'): ").lower().split(', ')
    if not user_preferences:  # Handle missing preferences
        print("Error: Weather preferences are required.")
        return

    # Validate the weather preferences
    if not validate_weather_preferences(user_preferences):
        return

    # Generate the recommendation based on the weather
    recommendation = recommend_travel_based_on_weather(date, validated_location, user_preferences)

    print("\n--- Travel Recommendation ---")
    print(recommendation)

# Example usage: Start the chatbot interaction
refined_chatbot_interaction_with_error_handling()


Welcome to the Puerto Rico Travel Chatbot! Let's plan your trip.
Please enter your travel date (YYYY-MM-DD): 2025-02-15
Please enter your destination (e.g., 'San Juan, PR'): cabo rojo
What kind of weather do you prefer? (e.g., 'hot', 'rainy', 'sunny'): sunny

--- Travel Recommendation ---
The weather looks fine with few clouds. Temperature is 75.56°F and feels like 76.334°F.


## 🔍 **Recap: Weather Function Implementation and Integration**

In this section, we have successfully implemented and integrated the **weather forecasting function** into the Puerto Rico Travel Chatbot. The function retrieves weather data for a specified location and date from the **OpenWeather API**, processes the data, and provides tailored travel recommendations based on the user's weather preferences.

### **Key Features Implemented:**

1. **Weather Forecast Retrieval**:
   - We created the `find_weather_forecast(date, location)` function that queries the OpenWeather API to fetch the forecast for a given **date** and **location**.
   - The function processes the data and returns detailed information, including:
     - Temperature (converted to **Fahrenheit**).
     - Feels like temperature.
     - Weather description (e.g., sunny, rainy).
     - Humidity and pressure levels.

2. **Weather-Based Travel Recommendations**:
   - The chatbot now dynamically integrates weather data to offer **personalized travel recommendations**.
   - Based on the user's **weather preferences** (e.g., "hot", "sunny", "rainy"), the chatbot suggests whether the destination is suitable or provides alternative suggestions.
   - The system considers contradictory preferences, such as "sunny" and "rainy," and prioritizes one based on the weather forecast.

3. **User Feedback**:
   - The chatbot validates the user's input, prompting for missing or incorrect data (e.g., unrecognized location, invalid date).
   - It checks for unrealistic or contradictory preferences (e.g., "snow" in Puerto Rico) and provides appropriate feedback.

### **Summary of Functionality**:
- **Location Input**: Users can input a location (e.g., "San Juan, PR"), and the chatbot will validate it using a comprehensive list of valid Puerto Rican municipalities.
- **Date Input**: Users can enter a travel date, and the system ensures it's within the correct format and within the next five days.
- **Weather Preferences**: Users provide weather preferences (e.g., "hot", "rainy"), and the system uses the weather forecast to generate tailored travel advice.
- **Recommendations**: The system checks the forecast and informs users whether the weather conditions align with their preferences, offering helpful advice for preparation.

---

🔹 **Next Steps**:
- Test the complete system with various edge cases, including contradictory preferences and invalid locations.
- Continue refining the weather recommendation logic to handle more complex scenarios and improve user interaction.


## 🗺️ Add Locations to Visit List 📋

In this section, we implement the function **`add_location_to_visit_list`** that allows the user to add locations to their list of places they want to visit. This function receives the current list of locations and appends the newly suggested location to it. It helps in managing the user's travel preferences and tracking the places they are considering.

### **🔍 Function Overview**:
1. **`add_location_to_visit_list(list, location)`**: This function will take in a list of locations already selected and a new location. It will add the new location to the list, ensuring that the user can keep track of the places they want to visit.

Let's implement the function to add locations to the user's visit list. 🚀


In [None]:
def add_location_to_visit_list(location_list, location, latitude=None, longitude=None):
    """
    This function adds a given location to the list of locations the user plans to visit, along with its coordinates.

    Parameters:
    - location_list (list): The list of locations already selected for the visit.
    - location (str): The new location to be added to the visit list.
    - latitude (float): The latitude of the location.
    - longitude (float): The longitude of the location.

    Returns:
    - list: The updated list of locations.
    """

    # Create a dictionary to store location details, including coordinates
    location_info = {
        'name': location,
        'latitude': latitude,
        'longitude': longitude
    }

    # Check if the location is already in the list to avoid duplicates
    if not any(loc['name'] == location for loc in location_list):  # Check by location name
        location_list.append(location_info)
        print(f"{location} has been added to your visit list.")
    else:
        print(f"{location} is already in your visit list.")

    return location_list

# Example usage: Adding a location with coordinates to the visit list
visit_list = []  # Empty list to start with
new_location = "Flamenco Beach"
latitude = 18.331667
longitude = -65.318056

visit_list = add_location_to_visit_list(visit_list, new_location, latitude, longitude)

# Display the updated visit list with coordinates
print("Current Visit List:")
for loc in visit_list:
    print(f"- {loc['name']} (Lat: {loc['latitude']}, Lon: {loc['longitude']})")


Flamenco Beach has been added to your visit list.
Current Visit List:
- Flamenco Beach (Lat: 18.331667, Lon: -65.318056)


## 🛠️ Handling Edge Cases in Add Location to Visit List 📋

In this section, we enhance the **`add_location_to_visit_list`** function by adding error handling and addressing edge cases. The function should handle situations where:

1. **Invalid Inputs**: The input parameters (e.g., the location name or coordinates) are not in the correct format.
2. **Duplicate Locations**: The same location is added multiple times to the visit list.
3. **Empty Inputs**: The list or location could be empty or null, and the function should handle these gracefully.
4. **Invalid Coordinates**: The latitude and longitude values should be checked to ensure they are within valid ranges.

Let’s improve the `add_location_to_visit_list` function to handle these edge cases effectively and ensure data consistency. 🚀


In [None]:
def add_location_to_visit_list(location_list, location):
    """
    This function adds a given location to the list of locations the user plans to visit.
    Handles edge cases such as invalid input, duplicate locations, empty inputs, and invalid coordinates.

    Parameters:
    - location_list (list): The list of locations already selected for the visit, including their coordinates.
    - location (dict): The location to be added to the visit list, with 'name', 'latitude', and 'longitude' keys.

    Returns:
    - list: The updated list of locations.
    """

    # Edge case: Check if location is empty or missing keys
    if not location or 'name' not in location or 'latitude' not in location or 'longitude' not in location:
        print("Error: Invalid location data. Please provide a location with 'name', 'latitude', and 'longitude'.")
        return location_list

    # Edge case: Validate coordinates (latitude and longitude)
    latitude = location['latitude']
    longitude = location['longitude']

    if not (-90 <= latitude <= 90):
        print(f"Error: Invalid latitude {latitude}. Latitude must be between -90 and 90 degrees.")
        return location_list

    if not (-180 <= longitude <= 180):
        print(f"Error: Invalid longitude {longitude}. Longitude must be between -180 and 180 degrees.")
        return location_list

    # Edge case: Check if location_list is empty or not a list
    if not isinstance(location_list, list):
        print("Error: The visit list should be a list. Please provide a valid list.")
        return location_list

    # Edge case: Check for empty location list
    if not location_list:
        print("Warning: The visit list is currently empty.")

    # Check if the location already exists in the visit list to avoid duplicates
    for loc in location_list:
        if loc['name'].lower() == location['name'].lower():
            print(f"{location['name']} is already in your visit list.")
            return location_list

    # If no duplicates, add the new location to the list
    location_list.append(location)
    print(f"{location['name']} has been added to your visit list.")

    return location_list

# Example usage: Adding a location to the visit list
visit_list = [
    {'name': 'Flamenco Beach', 'latitude': 18.331667, 'longitude': -65.318056}
]
new_location = {'name': 'Jobos Beach', 'latitude': 18.514215, 'longitude': -67.075744}

# Add the location to the list
visit_list = add_location_to_visit_list(visit_list, new_location)

# Display the updated visit list
print("Current Visit List:")
for loc in visit_list:
    print(f"- {loc['name']} (Lat: {loc['latitude']}, Lon: {loc['longitude']})")

# Try adding a duplicate location
duplicate_location = {'name': 'Flamenco Beach', 'latitude': 18.331667, 'longitude': -65.318056}
visit_list = add_location_to_visit_list(visit_list, duplicate_location)


Jobos Beach has been added to your visit list.
Current Visit List:
- Flamenco Beach (Lat: 18.331667, Lon: -65.318056)
- Jobos Beach (Lat: 18.514215, Lon: -67.075744)
Flamenco Beach is already in your visit list.


## 🌍 Compute Distance to the Closest Location in the Visit List 📏

In this section, we implement the **`compute_distance_to_list`** function, which calculates the distance from a new location to the closest location already selected in the user's visit list. This is useful for helping the user plan their trip and assess the proximity of places to visit. We will calculate the distance in both miles and kilometers for better accuracy and user convenience.

### **🔍 Function Overview**:
1. **`compute_distance_to_list(location_list, new_location)`**: This function calculates the distance between the new location and the closest location in the user's current visit list. The function uses the **Geodesic** method to compute the distance, which is accurate for short distances like those within Puerto Rico.

2. **Units**: The function will return and display the distance in both **miles** and **kilometers** to provide more flexibility for the user’s needs.

3. **Refinement**: The distance is calculated in kilometers, then converted to miles using a conversion factor.

Let’s implement the function to compute distances between locations. 🚀


In [None]:
from geopy.distance import geodesic

def compute_distance_to_list(location_list, new_location):
    """
    This function computes the distance from the new location to the closest location in the user's visit list.

    Parameters:
    - location_list (list): The list of locations already selected for the visit, including their coordinates.
    - new_location (dict): The new location with 'name', 'latitude', and 'longitude' keys to calculate the distance to.

    Returns:
    - tuple: The distance to the closest location in the list (in miles, kilometers).
    """

    # Initialize the minimum distance as None
    min_distance = None
    closest_location = None

    # Extract the coordinates of the new location
    new_coords = (new_location['latitude'], new_location['longitude'])

    # Iterate over the existing locations in the visit list
    for loc in location_list:
        # Get the coordinates of the current location
        current_coords = (loc['latitude'], loc['longitude'])

        # Compute the distance between the new location and the current location using geodesic
        distance_km = geodesic(new_coords, current_coords).kilometers

        # Convert distance to miles
        distance_miles = distance_km * 0.621371

        # Track the closest location
        if min_distance is None or distance_miles < min_distance:
            min_distance = distance_miles
            closest_location = loc['name']

    # Return the minimum distance to the closest location in the list (in miles and kilometers)
    if closest_location:
        print(f"The closest location to {new_location['name']} is {closest_location}, located {min_distance:.2f} miles or {min_distance * 1.60934:.2f} kilometers away.")
    else:
        print("No locations in the visit list to compare distances.")

    return min_distance

# Example usage: Computing the distance to the closest location in the visit list (in miles and kilometers)
location_list = [
    {'name': 'El Yunque National Forest', 'latitude': 18.319167, 'longitude': -65.760000},
    {'name': 'Condado Beach', 'latitude': 18.465500, 'longitude': -66.077000}
]
new_location = {'name': 'Playa Sucia', 'latitude': 17.970000, 'longitude': -67.145833}

# Compute the distance
distance = compute_distance_to_list(location_list, new_location)


The closest location to Playa Sucia is Condado Beach, located 78.08 miles or 125.65 kilometers away.


## 🛠️ Handling Edge Cases in Compute Distance to List 📏

In this section, we enhance the **`compute_distance_to_list`** function by adding error handling and addressing edge cases. The function should handle situations where:

1. **Invalid Inputs**: The input parameters (e.g., the `location_list` or `new_location`) are not in the correct format.
2. **Invalid Coordinates**: The latitude and longitude values should be checked to ensure they are within valid ranges.
3. **Empty or Invalid List**: The function checks if the visit list is empty or not a valid list.
4. **Missing Location Data**: Ensure that each location in the list has the necessary data (latitude, longitude, name) before performing calculations.

Let’s improve the `compute_distance_to_list` function to handle these edge cases effectively and ensure data consistency. 🚀


In [None]:
from geopy.distance import geodesic

def compute_distance_to_list(location_list, new_location):
    """
    This function computes the distance from the new location to the closest location in the user's visit list.

    Parameters:
    - location_list (list): The list of locations already selected for the visit, including their coordinates.
    - new_location (dict): The new location with 'name', 'latitude', and 'longitude' keys to calculate the distance to.

    Returns:
    - float: The distance to the closest location in the list (in both miles and kilometers).
    """

    # Edge case: Check if location_list is a valid list
    if not isinstance(location_list, list):
        print("Error: The visit list should be a list. Please provide a valid list.")
        return None

    # Edge case: Check if new_location is a valid dictionary with required keys
    if not isinstance(new_location, dict) or 'name' not in new_location or 'latitude' not in new_location or 'longitude' not in new_location:
        print("Error: Invalid location data. The location must contain 'name', 'latitude', and 'longitude'.")
        return None

    # Edge case: Validate new_location coordinates (latitude and longitude)
    latitude = new_location['latitude']
    longitude = new_location['longitude']

    if not (-90 <= latitude <= 90):
        print(f"Error: Invalid latitude {latitude}. Latitude must be between -90 and 90 degrees.")
        return None

    if not (-180 <= longitude <= 180):
        print(f"Error: Invalid longitude {longitude}. Longitude must be between -180 and 180 degrees.")
        return None

    # Initialize the minimum distance as None
    min_distance = None
    closest_location = None

    # Extract the coordinates of the new location
    new_coords = (latitude, longitude)

    # Iterate over the existing locations in the visit list
    for loc in location_list:
        # Check if the current location in the list has valid latitude and longitude
        if not ('latitude' in loc and 'longitude' in loc):
            print(f"Error: Invalid coordinates for {loc['name']}. Skipping this location.")
            continue

        current_coords = (loc['latitude'], loc['longitude'])

        # Compute the distance between the new location and the current location using geodesic
        distance_km = geodesic(new_coords, current_coords).kilometers

        # Convert distance from kilometers to miles
        distance_miles = distance_km * 0.621371

        # Track the closest location
        if min_distance is None or distance_miles < min_distance:
            min_distance = distance_miles
            closest_location = loc['name']

    # Return the minimum distance to the closest location in the list (in miles and kilometers)
    if closest_location:
        print(f"The closest location to {new_location['name']} is {closest_location}, located {min_distance:.2f} miles or {min_distance / 0.621371:.2f} kilometers away.")
    else:
        print("No valid locations in the visit list to compare distances.")

    return min_distance

# Example usage: Computing the distance to the closest location in the visit list (in miles and kilometers)
location_list = [
    {'name': 'El Yunque National Forest', 'latitude': 18.319167, 'longitude': -65.760000},
    {'name': 'Condado Beach', 'latitude': 18.465500, 'longitude': -66.077000}
]
new_location = {'name': 'Playa Sucia', 'latitude': 17.970000, 'longitude': -67.145833}

# Compute the distance
distance = compute_distance_to_list(location_list, new_location)
print(f"Distance to the closest location: {distance:.2f} miles or {distance / 0.621371:.2f} kilometers")


The closest location to Playa Sucia is Condado Beach, located 78.08 miles or 125.65 kilometers away.
Distance to the closest location: 78.08 miles or 125.65 kilometers


## 📍 Finalizing Visit List and Distance Calculations 🔄

In this section, we finalize the process of adding locations to the user's visit list and calculating distances to the closest location in the list. This allows the user to assess the proximity of different locations based on their preferences and travel plans.

### **🔍 Function Overview**:
1. **`add_location_to_visit_list`**: Adds a new location to the list while ensuring no duplicates.
2. **`compute_distance_to_list`**: Calculates the distance from a new location to the closest location already on the list.

We can further optimize the user experience by allowing the user to continuously add locations and calculate distances dynamically.

### Next steps:
- Add more locations to the visit list.
- Calculate distances to the closest locations dynamically.
- Plan the next steps in the trip based on the closest locations.

Let’s continue implementing the system that will help users plan their trips efficiently. 🚀


In [None]:
from geopy.distance import geodesic

def add_location_to_visit_list(location_list, location, latitude=None, longitude=None):
    """
    This function adds a given location to the list of locations the user plans to visit, along with its coordinates.

    Parameters:
    - location_list (list): The list of locations already selected for the visit.
    - location (str): The new location to be added to the visit list.
    - latitude (float): The latitude of the location.
    - longitude (float): The longitude of the location.

    Returns:
    - list: The updated list of locations.
    """

    # Edge case: Check if location is empty or missing keys
    if not location or 'name' not in location or 'latitude' not in location or 'longitude' not in location:
        print("Error: Invalid location data. Please provide a location with 'name', 'latitude', and 'longitude'.")
        return location_list

    # Edge case: Validate coordinates (latitude and longitude)
    latitude = location['latitude']
    longitude = location['longitude']

    if not (-90 <= latitude <= 90):
        print(f"Error: Invalid latitude {latitude}. Latitude must be between -90 and 90 degrees.")
        return location_list

    if not (-180 <= longitude <= 180):
        print(f"Error: Invalid longitude {longitude}. Longitude must be between -180 and 180 degrees.")
        return location_list

    # Edge case: Check if location_list is empty or not a list
    if not isinstance(location_list, list):
        print("Error: The visit list should be a list. Please provide a valid list.")
        return location_list

    # Edge case: Check for empty location list
    if not location_list:
        print("Warning: The visit list is currently empty.")

    # Check if the location already exists in the visit list to avoid duplicates
    for loc in location_list:
        if loc['name'].lower() == location['name'].lower():
            print(f"{location['name']} is already in your visit list.")
            return location_list

    # If no duplicates, add the new location to the list
    location_list.append(location)
    print(f"{location['name']} has been added to your visit list.")

    return location_list

def compute_distance_to_list(location_list, new_location):
    """
    This function computes the distance from the new location to the closest location in the user's visit list.

    Parameters:
    - location_list (list): The list of locations already selected for the visit, including their coordinates.
    - new_location (dict): The new location with 'name', 'latitude', and 'longitude' keys to calculate the distance to.

    Returns:
    - float: The distance to the closest location in the list (in both miles and kilometers).
    """

    # Edge case: Check if location_list is a valid list
    if not isinstance(location_list, list):
        print("Error: The visit list should be a list. Please provide a valid list.")
        return None

    # Edge case: Check if new_location is a valid dictionary with required keys
    if not isinstance(new_location, dict) or 'name' not in new_location or 'latitude' not in new_location or 'longitude' not in new_location:
        print("Error: Invalid location data. The location must contain 'name', 'latitude', and 'longitude'.")
        return None

    # Edge case: Validate new_location coordinates (latitude and longitude)
    latitude = new_location['latitude']
    longitude = new_location['longitude']

    if not (-90 <= latitude <= 90):
        print(f"Error: Invalid latitude {latitude}. Latitude must be between -90 and 90 degrees.")
        return None

    if not (-180 <= longitude <= 180):
        print(f"Error: Invalid longitude {longitude}. Longitude must be between -180 and 180 degrees.")
        return None

    # Initialize the minimum distance as None
    min_distance = None
    closest_location = None

    # Extract the coordinates of the new location
    new_coords = (latitude, longitude)

    # Iterate over the existing locations in the visit list
    for loc in location_list:
        # Check if the current location in the list has valid latitude and longitude
        if not ('latitude' in loc and 'longitude' in loc):
            print(f"Error: Invalid coordinates for {loc['name']}. Skipping this location.")
            continue

        current_coords = (loc['latitude'], loc['longitude'])

        # Compute the distance between the new location and the current location using geodesic
        distance_km = geodesic(new_coords, current_coords).kilometers

        # Convert distance from kilometers to miles
        distance_miles = distance_km * 0.621371

        # Track the closest location
        if min_distance is None or distance_miles < min_distance:
            min_distance = distance_miles
            closest_location = loc['name']

    # Return the minimum distance to the closest location in the list (in miles and kilometers)
    if closest_location:
        print(f"The closest location to {new_location['name']} is {closest_location}, located {min_distance:.2f} miles or {min_distance * 1.60934:.2f} kilometers away.")
    else:
        print("No valid locations in the visit list to compare distances.")

    return min_distance

def update_and_compute_distance(location_list, new_location):
    """
    This function allows the user to continuously add new locations to the visit list
    and calculate the distance to the closest location in the list.

    Parameters:
    - location_list (list): The list of locations already selected for the visit, including their coordinates.
    - new_location (dict): The new location with 'name', 'latitude', and 'longitude' keys.

    Returns:
    - list: The updated list of locations including the new location.
    - float: The distance to the closest location in the list.
    """
    # Add the new location to the visit list
    location_list = add_location_to_visit_list(location_list, new_location['name'], new_location['latitude'], new_location['longitude'])

    # Compute the distance to the closest location in the list
    distance = compute_distance_to_list(location_list, new_location)

    return location_list, distance

# Example usage: Dynamically add a location and compute the distance
location_list = [
    {'name': 'Condado Beach', 'latitude': 18.465500, 'longitude': -66.077000},
    {'name': 'El Yunque National Forest', 'latitude': 18.319167, 'longitude': -65.760000}
]

new_location = {'name': 'Playa Sucia', 'latitude': 17.970000, 'longitude': -67.145833}

# Update the visit list and calculate the distance to the closest location
updated_location_list, closest_distance = update_and_compute_distance(location_list, new_location)

# Display updated list and distance
print("Updated Visit List:")
for loc in updated_location_list:
    print(f"- {loc['name']} (Lat: {loc['latitude']}, Lon: {loc['longitude']})")

print(f"\nDistance to the closest location: {closest_distance:.2f} miles or {closest_distance * 1.60934:.2f} kilometers")


Error: Invalid location data. Please provide a location with 'name', 'latitude', and 'longitude'.
The closest location to Playa Sucia is Condado Beach, located 78.08 miles or 125.65 kilometers away.
Updated Visit List:
- Condado Beach (Lat: 18.4655, Lon: -66.077)
- El Yunque National Forest (Lat: 18.319167, Lon: -65.76)

Distance to the closest location: 78.08 miles or 125.65 kilometers


## **🔍 find_info_on_location Function**

In this step, we are implementing the function **`find_info_on_location(user_prompt, location)`**, which will retrieve relevant information about a given **location** based on the **user's prompt**. The function will use **Chroma** for document retrieval and **RAG** for generating context-aware responses.

### **Function Overview**:
1. **`user_prompt`**: The user's question or query about a location (e.g., "Tell me about the history of El Morro").
2. **`location`**: The specific location (e.g., "El Morro") for which the user is seeking information.
3. **Chroma**: Retrieve documents related to the **location** from the **Chroma vector store**.
4. **RAG (Retrieval-Augmented Generation)**: Use the **RAG system** to generate an answer by combining the retrieved documents with the user’s query, ensuring a contextual response.

This function enables the chatbot to provide **accurate, context-aware answers** to user queries about Puerto Rican landmarks, municipalities, or historical events.

---

### **Next Steps**:
- Implement the function to query Chroma for relevant documents.
- Pass the retrieved documents to RAG to generate a detailed response based on the user’s query.
- Return the final generated response as the chatbot’s answer.


In [None]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

def find_info_on_location(user_prompt, location):
    """
    This function retrieves relevant information about a location based on the user's prompt.
    It uses Chroma for document retrieval and RAG for generating context-aware responses.

    Parameters:
    - user_prompt (str): The user's question or query (e.g., "Tell me about the history of El Morro").
    - location (str): The specific location (e.g., "El Morro") for which the user is asking.

    Returns:
    - str: The generated answer to the user's query.
    """

    # Step 1: Query Chroma's vector store to retrieve relevant documents related to the location
    query = f"{location} {user_prompt}"
    retrieved_docs = db.similarity_search(query, k=5)  # Adjust 'k' for the number of retrieved documents

    # Step 2: Use the RAG system to generate an answer based on the retrieved documents
    # Initialize the LLM (Large Language Model) for RAG
    llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7, openai_api_key=os.getenv("OPENAI_API_KEY"))

    # Create the RetrievalQA chain using Chroma vector store and the LLM
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",  # Using 'stuff' to combine document contents into one response
        retriever=db.as_retriever(search_type="similarity", search_kwargs={"k": 5})  # Retrieve top 5 documents
    )

    # Step 3: Perform the query with RAG integration
    response = qa_chain.run(user_prompt)

    # Step 4: Return the generated answer
    return response

# Example usage: Find information on "El Morro" based on a user prompt
user_prompt = "Tell me about the history of El Morro."
location = "El Morro"
info = find_info_on_location(user_prompt, location)
print(info)


El Morro, officially known as Castillo San Felipe del Morro, is a fortress and citadel in the historic district of Old San Juan in Puerto Rico. Commissioned by King Charles I of Spain in 1539, it was initially built as a fortified tower in honor of King Philip II. Over the next 200 years, it expanded into a hornwork fort and was further developed, reaching its current form in 1787. El Morro stands on a steep headland promontory on San Juan Islet, guarding the entry to San Juan Bay. Alongside other forts, it protected strategically important Puerto Rico from invasion by world powers during the Age of Sail. El Morro was designated a World Heritage Site by UNESCO in 1983.


## **🔍 Handling Edge Cases and Errors for `find_info_on_location` Function**

In this step, we will enhance the **`find_info_on_location`** function to handle edge cases and errors effectively. This ensures the chatbot can manage situations such as:

1. **Invalid Location Names**: When the location specified by the user is not recognized or does not exist in the dataset.
2. **Empty or Invalid User Queries**: When the user submits an empty or malformed query.
3. **No Relevant Information**: When Chroma does not retrieve any relevant documents for the given query.
4. **API or Network Failures**: When there is an issue with retrieving data from the Chroma vector store or OpenAI API.

### **Enhancement Goals**:
1. Ensure the chatbot gracefully handles missing or invalid locations.
2. Provide helpful error messages when no relevant documents are retrieved.
3. Handle potential API or network failures without breaking the user experience.

By addressing these edge cases, we ensure that the chatbot can provide a seamless and user-friendly experience, even in the face of unexpected inputs or system issues.

---

### **Next Steps**:
- Implement error handling for location validation and user input.
- Handle cases where no information is retrieved from Chroma.
- Add fallback error messages for API or network-related failures.


In [None]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

def find_info_on_location(user_prompt, location):
    """
    This function retrieves relevant information about a location based on the user's prompt.
    It uses Chroma for document retrieval and RAG for generating context-aware responses.

    Parameters:
    - user_prompt (str): The user's question or query (e.g., "Tell me about the history of El Morro").
    - location (str): The specific location (e.g., "El Morro") for which the user is asking.

    Returns:
    - str: The generated answer to the user's query.
    """

    # Edge case: Check if location is empty
    if not location or not user_prompt:
        return "Error: Please provide both a valid location and a clear query."

    # Step 1: Query Chroma's vector store to retrieve relevant documents related to the location
    query = f"{location} {user_prompt}"
    try:
        retrieved_docs = db.similarity_search(query, k=5)  # Adjust 'k' for the number of retrieved documents
    except Exception as e:
        return f"Error: Failed to retrieve documents from Chroma. Please try again later. Error details: {str(e)}"

    # Edge case: If no relevant documents are retrieved
    if not retrieved_docs:
        return "Sorry, no relevant information was found for this location. Please try a different query."

    # Step 2: Use the RAG system to generate an answer based on the retrieved documents
    try:
        # Initialize the LLM (Large Language Model) for RAG
        llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7, openai_api_key=os.getenv("OPENAI_API_KEY"))

        # Create the RetrievalQA chain using Chroma vector store and the LLM
        qa_chain = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type="stuff",  # Using 'stuff' to combine document contents into one response
            retriever=db.as_retriever(search_type="similarity", search_kwargs={"k": 5})  # Retrieve top 5 documents
        )

        # Step 3: Perform the query with RAG integration
        response = qa_chain.run(user_prompt)
    except Exception as e:
        return f"Error: Failed to generate response using the RAG system. Please try again later. Error details: {str(e)}"

    # Step 4: Return the generated answer
    return response

# Example usage: Find information on "El Morro" based on a user prompt
user_prompt = "Tell me about the history of El Morro."
location = "El Morro"
info = find_info_on_location(user_prompt, location)
print(info)


El Morro, officially known as Castillo San Felipe del Morro, is a large fortress and citadel in Old San Juan, Puerto Rico. It was commissioned by King Charles I of Spain in 1539 and was first built as a fortified tower in honor of King Philip II. Over the next 200 years, especially during the reign of King Charles III, El Morro was developed and expanded into its current form in 1787. It stands as a strategically important structure guarding the entry to San Juan Bay in Old San Juan. El Morro, along with other forts in the area, protected Puerto Rico from invasion by competing world powers during the Age of Sail. It was designated as a World Heritage Site by UNESCO in 1983.


## **🔍 Testing the `find_info_on_location` Function**

In this step, we will test the **`find_info_on_location`** function with different user inputs to ensure that it behaves as expected. The tests will cover:

1. **Valid Queries**: Providing a valid location (e.g., "El Morro") and a relevant user query (e.g., "Tell me about the history of El Morro").
2. **Invalid Location**: Testing with a location that does not exist in the dataset to see how the function handles it.
3. **Empty Query**: Testing with an empty user query to verify that the function responds appropriately.
4. **Location without Relevant Documents**: Checking how the system behaves when the Chroma vector store does not contain relevant documents for the query.

These tests will help ensure the reliability of the function and confirm that the system returns correct, meaningful answers or provides appropriate error handling when needed.

---

### **Next Steps**:
- Run the function with the test cases outlined above.
- Verify that the function handles both valid and invalid inputs appropriately.
- Ensure that the error handling mechanism is working smoothly.


In [None]:
# Testing the find_info_on_location function

# Test case 1: Valid query with a known location
user_prompt_1 = "Tell me about the history of El Morro."
location_1 = "El Morro"
info_1 = find_info_on_location(user_prompt_1, location_1)
print(f"Test 1 - Valid Query:\n{info_1}\n")

# Test case 2: Invalid location (location does not exist in the dataset)
user_prompt_2 = "Tell me about the history of Atlantis."
location_2 = "Atlantis"
info_2 = find_info_on_location(user_prompt_2, location_2)
print(f"Test 2 - Invalid Location:\n{info_2}\n")

# Test case 3: Empty user query
user_prompt_3 = ""
location_3 = "El Morro"
info_3 = find_info_on_location(user_prompt_3, location_3)
print(f"Test 3 - Empty Query:\n{info_3}\n")

# Test case 4: Location without relevant documents in Chroma (assuming no docs for "Atlantis")
user_prompt_4 = "Tell me about the local attractions in Atlantis."
location_4 = "Atlantis"
info_4 = find_info_on_location(user_prompt_4, location_4)
print(f"Test 4 - No Relevant Documents:\n{info_4}\n")


Test 1 - Valid Query:
El Morro, officially named Castillo San Felipe del Morro, is a large fortress and citadel in Old San Juan, Puerto Rico. It was commissioned by King Charles I of Spain in 1539 and was initially built as a fortified tower in honor of King Philip II. Over the next 200 years, especially during the reign of King Charles II, El Morro was expanded and developed into its current form by 1787. The fortress stands on a steep headland on San Juan Islet, guarding the entry to San Juan Bay. El Morro, along with other forts in Old San Juan, played a significant role in protecting Puerto Rico from invasion by world powers during the Age of Sail. El Morro was designated as a World Heritage Site by UNESCO in 1983.

Test 2 - Invalid Location:
I don't know about the history of Atlantis.

Test 3 - Empty Query:
Error: Please provide both a valid location and a clear query.

Test 4 - No Relevant Documents:
I don't know about any local attractions in Atlantis.



## **🔍 rank_appropriate_locations Function**

In this step, we are implementing the function **`rank_appropriate_locations(user_prompt)`**. This function will evaluate the user's query (or preferences) and rank the most appropriate locations to suggest as visitation destinations. The function will take into account the user's preferences based on the prompt, such as liking sunny weather, beaches, historical places, etc.

### **Function Overview**:
1. **`user_prompt`**: The user's input that may mention preferences like sunny places, historical sites, beaches, etc.
2. **Chroma**: Retrieve locations related to keywords from the user prompt. For example, if the prompt mentions "beach" or "history," Chroma will search the vector store for relevant locations.
3. **RAG (Retrieval-Augmented Generation)**: Use the **RAG system** to rank the retrieved locations based on relevance to the user’s preferences. The ranking will be generated by combining the retrieved documents with the user's query to ensure the recommendations are contextual and tailored.

### **How It Works**:
- **Step 1**: Analyze the `user_prompt` to detect keywords related to user preferences (e.g., "sun," "beach," "history").
- **Step 2**: Use **Chroma** to retrieve documents about locations matching the user’s preferences.
- **Step 3**: Use **RAG** to rank and filter locations based on the relevance of retrieved information.

### **Next Steps**:
- Implement the function to retrieve and rank locations based on the user’s prompt.
- Integrate **Chroma** for document retrieval and **RAG** for context-based ranking.
- Return a ranked list of suggested locations to the user.


In [None]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

def rank_appropriate_locations(user_prompt):
    """
    This function ranks appropriate locations for the user based on their preferences in the prompt.
    It uses Chroma for document retrieval and RAG for ranking locations based on context-aware responses.

    Parameters:
    - user_prompt (str): The user's question or preferences (e.g., "I love sunny beaches").

    Returns:
    - list: A ranked list of location suggestions based on user preferences.
    """

    # Step 1: Analyze the user prompt for keywords (e.g., "beach," "history," "sunny")
    preferences = user_prompt.lower()
    keyword_list = ['beach', 'sunny', 'history', 'museum', 'nature', 'mountain', 'culture']  # Example keyword list

    # Step 2: Query Chroma to retrieve locations that match user preferences
    relevant_locations = []

    for keyword in keyword_list:
        if keyword in preferences:
            query = f"{keyword} locations"
            try:
                retrieved_docs = db.similarity_search(query, k=5)  # Adjust 'k' for the number of retrieved documents
                if retrieved_docs:
                    relevant_locations.extend(retrieved_docs)
            except Exception as e:
                return f"Error: Failed to retrieve documents from Chroma. Please try again later. Error details: {str(e)}"

    # Edge case: If no relevant documents are retrieved
    if not relevant_locations:
        return "Sorry, no relevant locations were found based on your preferences. Please try a different query."

    # Step 3: Rank the retrieved locations using RAG
    try:
        # Initialize the LLM (Large Language Model) for RAG
        llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7, openai_api_key=os.getenv("OPENAI_API_KEY"))

        # Create the RetrievalQA chain using Chroma vector store and the LLM
        qa_chain = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type="stuff",  # Using 'stuff' to combine document contents into one response
            retriever=db.as_retriever(search_type="similarity", search_kwargs={"k": 5})  # Retrieve top 5 documents
        )

        # Step 4: Perform the query with RAG integration
        ranked_response = qa_chain.run(user_prompt)

    except Exception as e:
        return f"Error: Failed to rank locations using RAG. Please try again later. Error details: {str(e)}"

    # Step 5: Return the ranked list of suggestions
    return ranked_response

# Example usage: Rank locations based on a user prompt
user_prompt = "I love sunny beaches and historical places."
ranked_locations = rank_appropriate_locations(user_prompt)
print(ranked_locations)


Flamenco Beach in Culebra, known for its turquoise waters and white sand, could be a great fit for you. It's a popular beach destination with historical significance, having been ranked as the 3rd best beach in the world. Additionally, Puerto Rico, where Flamenco Beach is located, has a rich history and many historical sites to explore.


In [None]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

def rank_appropriate_locations(user_prompt):
    """
    This function ranks appropriate locations for the user based on their preferences in the prompt.
    It uses Chroma for document retrieval and RAG for ranking locations based on context-aware responses.

    Parameters:
    - user_prompt (str): The user's question or preferences (e.g., "I love sunny beaches").

    Returns:
    - list: A ranked list of location suggestions based on user preferences.
    """

    # Step 1: Analyze the user prompt for keywords (e.g., "beach," "history," "sunny")
    preferences = user_prompt.lower()
    keyword_list = ['beach', 'sunny', 'history', 'museum', 'nature', 'mountain', 'culture']  # Example keyword list

    # Step 2: Query Chroma to retrieve locations that match user preferences
    relevant_locations = []
    relevance_scores = {}  # Store relevance score for each location

    for keyword in keyword_list:
        if keyword in preferences:
            query = f"{keyword} locations"
            try:
                retrieved_docs = db.similarity_search(query, k=5)  # Adjust 'k' for the number of retrieved documents
                if retrieved_docs:
                    for doc in retrieved_docs:
                        loc_name = doc.metadata.get('landmark')
                        page_content = doc.page_content.lower()
                        # Assign a relevance score based on the presence of the keyword in the content
                        score = page_content.count(keyword)  # Count how many times the keyword appears in the content
                        if loc_name not in relevance_scores:
                            relevance_scores[loc_name] = score
                        else:
                            relevance_scores[loc_name] += score
                    relevant_locations.extend(retrieved_docs)
            except Exception as e:
                return f"Error: Failed to retrieve documents from Chroma. Please try again later. Error details: {str(e)}"

    # Edge case: If no relevant documents are retrieved
    if not relevant_locations:
        return "Sorry, no relevant locations were found based on your preferences. Please try a different query."

    # Step 3: Rank the retrieved locations using RAG
    ranked_locations = []
    try:
        # Initialize the LLM (Large Language Model) for RAG
        llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7, openai_api_key=os.getenv("OPENAI_API_KEY"))

        # Create the RetrievalQA chain using Chroma vector store and the LLM
        qa_chain = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type="stuff",  # Using 'stuff' to combine document contents into one response
            retriever=db.as_retriever(search_type="similarity", search_kwargs={"k": 5})  # Retrieve top 5 documents
        )

        # Step 4: For each retrieved location, generate a response based on user preferences
        for location in relevant_locations:
            loc_name = location.metadata.get('landmark')
            # Pass the location to RAG with the user prompt
            response = qa_chain.run(f"How does {loc_name} match the user's preference: {user_prompt}")
            ranked_locations.append((loc_name, response, relevance_scores.get(loc_name, 0)))  # Store location with response and score

        # Step 5: Sort locations based on relevance score (higher score is better)
        ranked_locations.sort(key=lambda x: x[2], reverse=True)  # Sort by relevance score

    except Exception as e:
        return f"Error: Failed to rank locations using RAG. Please try again later. Error details: {str(e)}"

    # Step 6: Return the ranked list of location suggestions
    ranked_list = "\n".join([f"{loc[0]}: {loc[1]}" for loc in ranked_locations])  # Format the output
    return ranked_list

# Example usage: Rank locations based on a user prompt
user_prompt = "I love sunny beaches and historical places."
ranked_locations = rank_appropriate_locations(user_prompt)
print(ranked_locations)


Jobos Beach: Jobos Beach may not be the best match for someone who loves historical places, as there is no mention of historical significance in the provided context. However, if you enjoy sunny beaches, Jobos Beach could be a good fit as it is a beach facing the Atlantic Ocean located in Puerto Rico, which typically has sunny weather.
Jobos Beach: Jobos Beach may not be the best match for someone who loves historical places, as there is no mention of historical significance in the provided context. However, if you enjoy sunny beaches, Jobos Beach could be a good match based on its location facing the Atlantic Ocean in Puerto Rico.
Jobos Beach: Jobos Beach may not fully match your preference for historical places, as there is no specific mention of historical significance related to Jobos Beach itself in the provided context. However, it is a sunny beach located in Puerto Rico, which aligns with your love for sunny beaches. If you are specifically looking for historical places, you may

In [None]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

def rank_appropriate_locations(user_prompt):
    """
    This function ranks appropriate locations for the user based on their preferences in the prompt.
    It uses Chroma for document retrieval and RAG for ranking locations based on context-aware responses.

    Parameters:
    - user_prompt (str): The user's question or preferences (e.g., "I love sunny beaches").

    Returns:
    - list: A ranked list of location suggestions based on user preferences.
    """

    # Step 1: Analyze the user prompt for keywords (e.g., "beach," "history," "sunny")
    preferences = user_prompt.lower()
    keyword_list = ['beach', 'sunny', 'history', 'museum', 'nature', 'mountain', 'culture']  # Example keyword list

    # Step 2: Query Chroma to retrieve locations that match user preferences
    relevant_locations = []
    relevance_scores = {}  # Store relevance score for each location

    for keyword in keyword_list:
        if keyword in preferences:
            query = f"{keyword} locations"
            try:
                retrieved_docs = db.similarity_search(query, k=5)  # Adjust 'k' for the number of retrieved documents
                if retrieved_docs:
                    for doc in retrieved_docs:
                        loc_name = doc.metadata.get('landmark')
                        page_content = doc.page_content.lower()
                        # Assign a relevance score based on the presence of the keyword in the content
                        score = page_content.count(keyword)  # Count how many times the keyword appears in the content
                        if loc_name not in relevance_scores:
                            relevance_scores[loc_name] = score
                        else:
                            relevance_scores[loc_name] += score
                    relevant_locations.extend(retrieved_docs)
            except Exception as e:
                return f"Error: Failed to retrieve documents from Chroma. Please try again later. Error details: {str(e)}"

    # Edge case: If no relevant documents are retrieved
    if not relevant_locations:
        return "Sorry, no relevant locations were found based on your preferences. Please try a different query."

    # Step 3: Rank the retrieved locations using RAG
    ranked_locations = []
    try:
        # Initialize the LLM (Large Language Model) for RAG
        llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7, openai_api_key=os.getenv("OPENAI_API_KEY"))

        # Create the RetrievalQA chain using Chroma vector store and the LLM
        qa_chain = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type="stuff",  # Using 'stuff' to combine document contents into one response
            retriever=db.as_retriever(search_type="similarity", search_kwargs={"k": 5})  # Retrieve top 5 documents
        )

        # Step 4: For each retrieved location, generate a response based on user preferences
        for location in relevant_locations:
            loc_name = location.metadata.get('landmark')
            # Pass the location to RAG with the user prompt
            response = qa_chain.run(f"How does {loc_name} match the user's preference: {user_prompt}")
            ranked_locations.append((loc_name, response, relevance_scores.get(loc_name, 0)))  # Store location with response and score

        # Step 5: Sort locations based on relevance score (higher score is better)
        ranked_locations.sort(key=lambda x: x[2], reverse=True)  # Sort by relevance score

    except Exception as e:
        return f"Error: Failed to rank locations using RAG. Please try again later. Error details: {str(e)}"

    # Step 6: Return the ranked list of location suggestions
    ranked_list = "\n".join([f"{loc[0]}: {loc[1]}" for loc in ranked_locations])  # Format the output
    return ranked_list

# Example usage: Rank locations based on a user prompt
user_prompt = "I love history and culture."
ranked_locations = rank_appropriate_locations(user_prompt)
print(ranked_locations)


Museum of Transportation of Puerto Rico: The Museum of Transportation of Puerto Rico may not directly match your preference for history and culture, as it focuses specifically on transportation-related exhibits. It may not provide as much insight into the broader history and culture of Puerto Rico compared to museums like the Museo de la Historia de Ponce or the Museo del Autonomismo Puertorriqueño.
Museum of Transportation of Puerto Rico: The Museum of Transportation of Puerto Rico may not directly align with your preference for history and culture as it focuses specifically on transportation-related exhibits. However, you might enjoy visiting the Museo de la Historia de Ponce or the Museo del Autonomismo Puertorriqueño in Ponce, Puerto Rico, as they both showcase different aspects of the history and culture of Puerto Rico.
Museum of Transportation of Puerto Rico: The Museum of Transportation of Puerto Rico may not directly match your preference for history and culture, as it primaril

In [None]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

def rank_appropriate_locations(user_prompt):
    """
    This function ranks appropriate locations for the user based on their preferences in the prompt.
    It uses Chroma for document retrieval and RAG for ranking locations based on context-aware responses.

    Parameters:
    - user_prompt (str): The user's question or preferences (e.g., "I love sunny beaches").

    Returns:
    - list: A ranked list of location suggestions based on user preferences.
    """

    # Step 1: Analyze the user prompt for keywords (e.g., "beach," "history," "sunny")
    preferences = user_prompt.lower()
    keyword_list = ['beach', 'sunny', 'history', 'museum', 'nature', 'mountain', 'culture']  # Example keyword list

    # Step 2: Query Chroma to retrieve locations that match user preferences
    relevant_locations = []
    relevance_scores = {}  # Store relevance score for each location

    for keyword in keyword_list:
        if keyword in preferences:
            query = f"{keyword} locations"
            try:
                retrieved_docs = db.similarity_search(query, k=5)  # Adjust 'k' for the number of retrieved documents
                if retrieved_docs:
                    for doc in retrieved_docs:
                        loc_name = doc.metadata.get('landmark')
                        page_content = doc.page_content.lower()
                        # Assign a relevance score based on the presence of the keyword in the content
                        score = page_content.count(keyword)  # Count how many times the keyword appears in the content
                        if loc_name not in relevance_scores:
                            relevance_scores[loc_name] = score
                        else:
                            relevance_scores[loc_name] += score
                    relevant_locations.extend(retrieved_docs)
            except Exception as e:
                return f"Error: Failed to retrieve documents from Chroma. Please try again later. Error details: {str(e)}"

    # Edge case: If no relevant documents are retrieved
    if not relevant_locations:
        return "Sorry, no relevant locations were found based on your preferences. Please try a different query."

    # Step 3: Rank the retrieved locations using RAG
    ranked_locations = []
    try:
        # Initialize the LLM (Large Language Model) for RAG
        llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7, openai_api_key=os.getenv("OPENAI_API_KEY"))

        # Create the RetrievalQA chain using Chroma vector store and the LLM
        qa_chain = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type="stuff",  # Using 'stuff' to combine document contents into one response
            retriever=db.as_retriever(search_type="similarity", search_kwargs={"k": 5})  # Retrieve top 5 documents
        )

        # Step 4: For each retrieved location, generate a response based on user preferences
        for location in relevant_locations:
            loc_name = location.metadata.get('landmark')
            # Pass the location to RAG with the user prompt
            response = qa_chain.run(f"How does {loc_name} match the user's preference: {user_prompt}")
            ranked_locations.append((loc_name, response, relevance_scores.get(loc_name, 0)))  # Store location with response and score

        # Step 5: Sort locations based on relevance score (higher score is better)
        ranked_locations.sort(key=lambda x: x[2], reverse=True)  # Sort by relevance score

    except Exception as e:
        return f"Error: Failed to rank locations using RAG. Please try again later. Error details: {str(e)}"

    # Step 6: Return the ranked list of location suggestions
    ranked_list = "\n".join([f"{loc[0]}: {loc[1]}" for loc in ranked_locations])  # Format the output
    return ranked_list

# Example usage: Rank locations based on a user prompt
user_prompt = "I’m looking for a relaxing beach getaway."
ranked_locations = rank_appropriate_locations(user_prompt)
print(ranked_locations)


Jobos Beach: Jobos Beach may not be the best match for someone looking for a relaxing beach getaway, as it is described as dangerous due to strong currents. It may be more suitable for those looking for adventure or water sports rather than relaxation.
Jobos Beach: Based on the provided context, Jobos Beach may not match the user's preference for a relaxing beach getaway. While it is popular and located in a beautiful area, it is also known to have dangerous currents, which might not make it the most relaxing option for a beach getaway.
Jobos Beach: Jobos Beach may not match the user's preference for a relaxing beach getaway. While it is a popular beach facing the Atlantic Ocean, it is known to have dangerous currents, which may not provide the most relaxing experience for visitors.
Buyé Beach: Buyé Beach in Cabo Rojo, Puerto Rico, could be a great match for your preference of a relaxing beach getaway. It is located on the south-western coast of Puerto Rico, offering a serene and pictu