[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1aMa3Ull4rTcsBqtZwIf6LdVV05YgFMp9?usp=sharing)

In [1]:
#install all proper packages
!pip install -qU langchain==0.1.5 langchain-community==0.0.17 langchain-core==0.1.18 langchain-openai==0.0.5 openai==1.11.0 tiktoken==0.5.2 chromadb==0.4.22 pandas==2.2.0

In [2]:
!pip freeze | grep langchain

langchain==0.1.5
langchain-community==0.0.17
langchain-core==0.1.18
langchain-openai==0.0.5


## Data Download
Firstly, we will fetch the data from a website containing information about the official public holidays in the UAE for this year. To work with our own data, we will save the table as a CSV file and later load it using the `CSVLoader`. Theoretically, one can use `WebCrawler` instead of a custom function or include our function in a tool.

In [3]:
import requests
import bs4
import pandas as pd
import pprint

In [4]:
html = '''
<table class="publicholidays phgtable ">
  <thead>
    <tr>
      <th>Date</th>
      <th>Day</th>
      <th>Holiday</th>
    </tr>
  </thead>
  <tbody>
    <tr class="even ">
      <td>1 Jan</td>
      <td>Mon</td>
      <td>
        <a href="https://publicholidays.ae/new-years-day/" class="summary url">
          New Year's Day
        </a>
      </td>
    </tr>
    <tr class="odd ">
      <td>8 Apr</td>
      <td>Mon</td>
      <td>
        <a href="https://publicholidays.ae/eid-al-fitr/" class="summary url">
          Eid al-Fitr Holiday
        </a>
      </td>
    </tr>
    <tr class="even ">
      <td>9 Apr</td>
      <td>Tue</td>
      <td>
        <a href="https://publicholidays.ae/eid-al-fitr/" class="summary url">
          Eid al-Fitr Holiday
        </a>
      </td>
    </tr>
    <tr class="odd ">
      <td>10 Apr</td>
      <td>Wed</td>
      <td>
        <a href="https://publicholidays.ae/eid-al-fitr/" class="summary url">
          Eid al-Fitr
        </a>
      </td>
    </tr>
    <tr class="even ">
      <td>11 Apr</td>
      <td>Thu</td>
      <td>
        <a href="https://publicholidays.ae/eid-al-fitr/" class="summary url">
          Eid al-Fitr Holiday
        </a>
      </td>
    </tr>
    <tr class="odd ">
      <td>12 Apr</td>
      <td>Fri</td>
      <td>
        <a href="https://publicholidays.ae/eid-al-fitr/" class="summary url">
          Eid al-Fitr Holiday
        </a>
      </td>
    </tr>
    <tr class="even ">
      <td>15 Jun</td>
      <td>Sat</td>
      <td>
        <a href="https://publicholidays.ae/arafat-day/" class="summary url">
          Arafat Day
        </a>
      </td>
    </tr>
    <tr class="odd ">
      <td>16 Jun</td>
      <td>Sun</td>
      <td>
        <a href="https://publicholidays.ae/eid-al-adha/" class="summary url">
          Eid al-Adha
        </a>
      </td>
    </tr>
    <tr class="even ">
      <td>17 Jun</td>
      <td>Mon</td>
      <td>
        <a href="https://publicholidays.ae/eid-al-adha/" class="summary url">
          Eid al-Adha Holiday
        </a>
      </td>
    </tr>
    <tr class="odd ">
      <td>18 Jun</td>
      <td>Tue</td>
      <td>
        <a href="https://publicholidays.ae/eid-al-adha/" class="summary url">
          Eid al-Adha Holiday
        </a>
      </td>
    </tr>
    <tr class="even ">
      <td>7 Jul</td>
      <td>Sun</td>
      <td>
        <a
          href="https://publicholidays.ae/islamic-new-year/"
          class="summary url"
        >
          Islamic New Year
        </a>
      </td>
    </tr>
    <tr class="odd ">
      <td>15 Sep</td>
      <td>Sun</td>
      <td>
        <a
          href="https://publicholidays.ae/prophet-muhammads-birthday/"
          class="summary url"
        >
          Prophet Muhammad's Birthday
        </a>
      </td>
    </tr>
    <tr class="even ">
      <td>1 Dec</td>
      <td>Sun</td>
      <td>
        <a href="https://publicholidays.ae/martyrs-day/" class="summary url">
          Commemoration Day
        </a>
      </td>
    </tr>
    <tr class="odd ">
      <td>2 Dec</td>
      <td>Mon</td>
      <td>
        <a href="https://publicholidays.ae/national-day/" class="summary url">
          National Day
        </a>
      </td>
    </tr>
    <tr class="even ">
      <td>3 Dec</td>
      <td>Tue</td>
      <td>
        <a href="https://publicholidays.ae/national-day/" class="summary url">
          National Day Holiday
        </a>
      </td>
    </tr>
    <tr>
      <td colspan="3">
        Visit u.ae for the original
        <a
          href="https://wam.ae/article/apr45ya-uae-cabinet-approves-official-holidays-calendar"
          target="_blank"
        >
          release
        </a>
        .
      </td>
    </tr>
  </tbody>
</table>;

'''

In [5]:
# Function to make HTTP GET request
def get_request(url, cookies={}, headers={}):
    headers = {
      'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    }

    return requests.get(url, cookies=cookies, headers=headers)

# Function to collect data from a URL and extract the table
def collect_data(url):
    response = get_request(url)
    soup = bs4.BeautifulSoup(response.text, features="lxml")
    table = soup.find("table", class_="publicholidays")
    # if blocked by cloudfare
    if not table:
      soup = bs4.BeautifulSoup(html, features="lxml")
      table = soup.find("table", class_="publicholidays")
    return table

# Function to convert HTML table to pandas DataFrame
def convert_html_table_to_df(html_text):
    return pd.read_html(str(html_text))[0]

In [6]:
# Root URL for the website containing holiday data
ROOT_URL = "https://publicholidays.ae/2024-dates/"

# Collect the data and convert it to a DataFrame
html_text = collect_data(url=ROOT_URL)
df = convert_html_table_to_df(html_text=html_text)

  return pd.read_html(str(html_text))[0]


In [7]:
df

Unnamed: 0,Date,Day,Holiday
0,1 Jan,Mon,New Year's Day
1,8 Apr,Mon,Eid al-Fitr Holiday
2,9 Apr,Tue,Eid al-Fitr Holiday
3,10 Apr,Wed,Eid al-Fitr
4,11 Apr,Thu,Eid al-Fitr Holiday
5,12 Apr,Fri,Eid al-Fitr Holiday
6,15 Jun,Sat,Arafat Day
7,16 Jun,Sun,Eid al-Adha
8,17 Jun,Mon,Eid al-Adha Holiday
9,18 Jun,Tue,Eid al-Adha Holiday


In [8]:
# Save the DataFrame to a CSV file
df.iloc[:-1, :].to_csv("uae_holidays.csv")

## LangChain
Now, we will import several LangChain methods that we will be utilizing. For the purposes of this demo, we will begin with a straightforward approach using the `ChatOpenAI` model. To achieve this, we will load the previously saved file and create a vector index from its contents. Additionally, we will create a simple prompt and set up a memory to store the conversation history. Finally, we will configure a `RetrievalQA` chain to bring all these components together.

In [9]:
# Set OpenAI API key from Google Colab's user environment or default
def set_openai_api_key(default_key: str = "YOUR_API_KEY") -> None:
    """Set the OpenAI API key from Google Colab's user environment or use a default value."""
    from google.colab import userdata
    import os

    os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY") or default_key


set_openai_api_key()

In [10]:
'''
# Load language model, embeddings, and index for conversational AI
from langchain.chat_models import ChatOpenAI                #model
from langchain.indexes import VectorstoreIndexCreator       #index
from langchain.document_loaders.csv_loader import CSVLoader #tool
from langchain.prompts import PromptTemplate                #prompt
from langchain.memory import ConversationBufferMemory       #memory
from langchain.chains import RetrievalQA                    #chain

#import langchain
#langchain.verbose = True
'''


'\n# Load language model, embeddings, and index for conversational AI\nfrom langchain.chat_models import ChatOpenAI                #model\nfrom langchain.indexes import VectorstoreIndexCreator       #index\nfrom langchain.document_loaders.csv_loader import CSVLoader #tool\nfrom langchain.prompts import PromptTemplate                #prompt\nfrom langchain.memory import ConversationBufferMemory       #memory\nfrom langchain.chains import RetrievalQA                    #chain\n\n#import langchain\n#langchain.verbose = True\n'

In [11]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts.chat import ChatPromptTemplate, MessagesPlaceholder
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
from langchain.agents import AgentExecutor
from langchain_core.output_parsers import StrOutputParser
from langchain.agents.format_scratchpad import format_to_openai_function_messages
from langchain.agents.output_parsers.openai_functions import OpenAIFunctionsAgentOutputParser
from langchain.indexes import VectorstoreIndexCreator
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.memory import ConversationSummaryBufferMemory, ChatMessageHistory

from operator import itemgetter

In [12]:
model_name = "gpt-4-0125-preview"
MEMORY_KEY = "chat_history"
verbose= False

In [13]:
# Create a prompt using the template
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are Larry. You are a assistant to help answer when are the official UAE holidays, based only on the data provided",
        ),
        MessagesPlaceholder(variable_name=MEMORY_KEY),
        ("user", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)

In [14]:
# Creating memory

history = ChatMessageHistory()
history.add_user_message("Hi, my name is Ivan.")
history.add_ai_message("Hello Ivan! How can I assist you today?")

memory = ConversationSummaryBufferMemory(
    llm=ChatOpenAI(model=model_name),
    return_messages=True,
    memory_key=MEMORY_KEY,
    chat_memory=history
)

In [15]:
print(memory.load_memory_variables({}))
memory.predict_new_summary(existing_summary="", messages=history.messages)

{'chat_history': [HumanMessage(content='Hi, my name is Ivan.'), AIMessage(content='Hello Ivan! How can I assist you today?')]}


'Ivan greets the AI, and the AI responds warmly, asking how it can assist him.'

In [16]:
# Creating vectorstore

def load_index():
    # if you want to avoid the step of saving/loading a file, you can use the `from_documents()` method of the VectorstoreIndexCreator()
    loader = CSVLoader(file_path='uae_holidays.csv')
    index = VectorstoreIndexCreator().from_loaders([loader])
    return index

retriever=load_index().vectorstore.as_retriever()

  warn_deprecated(


In [17]:
# Import things that are needed generically
from langchain.pydantic_v1 import BaseModel, Field
from langchain.tools import BaseTool, StructuredTool, tool, format_tool_to_openai_function
from langchain.tools.retriever import create_retriever_tool

class SearchInput(BaseModel):
    query: str = Field(description="should be a search query")


@tool("search-tool", args_schema=SearchInput, return_direct=True)
def search_tool(query: str) -> str:
    """Palceholder tool. Does nothing. Don't use it!"""
    return "LangChain"

retrive_tool = create_retriever_tool(
    retriever,
    "search_for_uae_holidays",
    "Searches and returns dates of UAE holidays",
)

tools = [search_tool, retrive_tool]
formatted_functions = [format_tool_to_openai_function(t) for t in tools]

  warn_deprecated(


In [18]:
def load_llm():
    llm = ChatOpenAI(temperature=0,model_name=model_name)
    return llm

llm = load_llm()
llm_with_tools = llm.bind(functions=formatted_functions)

In [19]:
agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_to_openai_function_messages(
            x["intermediate_steps"]
        ),
    }
    | RunnablePassthrough.assign(
        # chat_history=RunnableLambda(memory.load_memory_variables) | itemgetter(MEMORY_KEY)
        **{MEMORY_KEY: RunnableLambda(memory.load_memory_variables) | itemgetter(MEMORY_KEY)}
    )
    | prompt
    | llm_with_tools
    | OpenAIFunctionsAgentOutputParser()
)

qa = AgentExecutor(agent=agent, tools=tools, memory=memory, verbose=verbose)

## Q&A
Let's now ask some questions regarding the holidays in UAE:

In [20]:
# Function to print the response for a given query
def print_response_for_query(query):
    result = qa.invoke({"input": query})['output']
    output_format = f"""===================================
    {result}
    ==================================="""
    return pprint.pprint(output_format)

### Holidays in March/December

In [21]:
query = "Are there any holidays in March?"
print_response_for_query(query)

 '    There are no official UAE holidays in March. However, there are several '
 'holidays in April related to Eid al-Fitr. Would you like to know about '
 'holidays in any other month?\n'


Correct response. What about December?

In [22]:
query = "Sorry, I meant December"
print_response_for_query(query)

 '    In December, the UAE observes the following official holidays:\n'
 '\n'
 '- **Commemoration Day** on December 1st, Sunday\n'
 '- **National Day** on December 2nd, Monday\n'
 '- **National Day Holiday** on December 3rd, Tuesday\n'
 '\n'
 'These holidays are a time for celebration and remembrance across the '
 'country. Is there anything else you would like to know?\n'


Did you notice, how we used the **memory** here? If it wasn't for it, the response would've sounded as:
> Sorry, I can't understand you. What exactly are you looking for in December?

### Multichain Ramadan example

In [23]:
query = "When does this year's holiday marking the end of Ramadan start?"
print_response_for_query(query)

 '    This year, the holiday marking the end of Ramadan, Eid al-Fitr, starts '
 'on April 10th, Wednesday, and extends through April 12th, Friday. Here are '
 'the specific dates:\n'
 '\n'
 '- **Eid al-Fitr** on April 10th, Wednesday\n'
 '- **Eid al-Fitr Holiday** on April 11th, Thursday\n'
 '- **Eid al-Fitr Holiday** also includes April 8th, Monday, and April 12th, '
 'Friday\n'
 '\n'
 'Please note that the exact dates may vary slightly based on the moon '
 "sighting. Is there anything else you'd like to know?\n"


Now this is quite interesting. The chain correctly identified Eid al-Fitr as the holiday that marks the end of Ramadan. But there is a reason, why I'm starting with scraping, instead of clean csv file. As you may notice, from the table, there is only one holiday called "Eid al-Fitr":

| Date | Day | Holiday |
| --- | --- | --- |
| 8 Apr | Thu | Eid al-Fitr Holiday |
| 9 Apr | Thu | Eid al-Fitr Holiday |
| 10 Apr | Fri | Eid al-Fitr |
| 11 Apr | Sat | Eid al-Fitr Holiday |
| 12 Apr | Sun | Eid al-Fitr Holiday |

The problem here is that the data is dirty and the model can't identify, that it's actually a 5-day holiday. Of course the easy solution here would be to either clean the data, possibly through tools or modify prompt.




In [24]:
query = "How many days is it celebrated for this year?"
print_response_for_query(query)

 '    This year, Eid al-Fitr is celebrated for 4 days. Here are the specific '
 'dates for the holiday:\n'
 '\n'
 '- **Eid al-Fitr** on April 10th, Wednesday\n'
 '- **Eid al-Fitr Holiday** on April 11th, Thursday\n'
 '- **Eid al-Fitr Holiday** also includes April 8th, Monday, and April 12th, '
 'Friday\n'
 '\n'
 'These dates mark the official public holidays for Eid al-Fitr in the UAE. Is '
 'there anything else I can help you with?\n'


### What is the next holiday?

In [25]:
query = "Today is February 17. When is the nearest holiday?"
print_response_for_query(query)

 '    The nearest holiday from today, February 17, is **Eid al-Fitr**, which '
 'starts on April 10th, Wednesday, and extends through April 12th, Friday. '
 'This marks the end of Ramadan and is celebrated for 4 days. \n'
 '\n'
 'Is there anything else you would like to know?\n'


As one can see the nearest holiday is detected correctly!