# Building our First RAG bot - Skill: talk to Search Engine

We have now all the building blocks to build our first Bot that "talks with my data". These blocks are:

1) A well indexed hybrid (text and vector) engine with my data in chunks -> Azure AI Search
2) A good LLM python framework to build LLM Apps -> LangChain
3) Quality OpenAI GPT models that understand language and follow instructions -> GPT3.5 and GPT4
4) A persisten memory database -> CosmosDB

We are missing just one thing: **Agents**.

In this Notebook we introduce the concept of Agents and we use it to build or first RAG bot.

In [3]:
import os
import random
import asyncio
from typing import Dict, List
from concurrent.futures import ThreadPoolExecutor
from typing import Optional, Type

from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_openai import AzureChatOpenAI
from langchain_core.runnables import ConfigurableField, ConfigurableFieldSpec
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory, CosmosDBChatMessageHistory
from langchain.callbacks.manager import AsyncCallbackManagerForToolRun, CallbackManagerForToolRun
from langchain.pydantic_v1 import BaseModel, Field
from langchain.tools import BaseTool, StructuredTool, tool

#custom libraries that we will use later in the app
from common.utils import  GetDocSearchResults_Tool
from common.prompts import AGENT_DOCSEARCH_PROMPT

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

from dotenv import load_dotenv
load_dotenv("credentials.env")


True

In [4]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]

## Introducing: Agents

The implementation of Agents is inspired by two papers: the [MRKL Systems](https://arxiv.org/abs/2205.00445) paper (pronounced ‘miracle’ 😉) and the [ReAct](https://arxiv.org/abs/2210.03629) paper.

Agents are a way to leverage the ability of LLMs to understand and act on prompts. In essence, an Agent is an LLM that has been given a very clever initial prompt. The prompt tells the LLM to break down the process of answering a complex query into a sequence of steps that are resolved one at a time.

Agents become really cool when we combine them with ‘experts’, introduced in the MRKL paper. Simple example: an Agent might not have the inherent capability to reliably perform mathematical calculations by itself. However, we can introduce an expert - in this case a calculator, an expert at mathematical calculations. Now, when we need to perform a calculation, the Agent can call in the expert rather than trying to predict the result itself. This is actually the concept behind [ChatGPT Pluggins](https://openai.com/blog/chatgpt-plugins).

In our case, in order to solve the problem "How do I build a smart bot that talks to my data", we need this REACT/MRKL approach, in which we need to instruct the LLM that it needs to use 'experts/tools' in order to read/load/understand/interact with a any particular source of data.

Let's create then an Agent that interact with the user and uses a Tool to get the information from the Search engine.

#### 1. We start first defining the Tool/Expert

Tools are functions that an agent can invoke. If you don't give the agent access to a correct set of tools, it will never be able to accomplish the objectives you give it. If you don't describe the tools well, the agent won't know how to use them properly.

In [5]:
index1_name = "cogsrch-index-files"
index2_name = "cogsrch-index-csv"
index3_name = "cogsrch-index-books"
indexes = [index1_name, index2_name, index3_name]

We have to convert the Retreiver object into a Tool object ("the expert"). Check out the Tool `GetDocSearchResults_Tool` in `utils.py` and see how it is done.

Declare the tools the agent will use

In [6]:
tools = [GetDocSearchResults_Tool(indexes=indexes, k=5, reranker_th=1, sas_token=os.environ['BLOB_SAS_TOKEN'])]

#### 2. Define the LLM to use

In [7]:
COMPLETION_TOKENS = 1500
llm = AzureChatOpenAI(deployment_name=os.environ["GPT4_DEPLOYMENT_NAME"], 
                      temperature=0.5, max_tokens=COMPLETION_TOKENS, streaming=True)

#### 3. Bind tools to the LLM

Newer OpenAI models (1106 and newer) have been fine-tuned to detect when one or more function(s) should be called and respond with the inputs that should be passed to the function(s). In an API call, you can describe functions and have the model intelligently choose to output a JSON object containing arguments to call these functions. The goal of the OpenAI tools APIs is to more reliably return valid and useful function calls than what can be done using a generic text completion or chat API.

OpenAI termed the capability to invoke a single function as **functions**, and the capability to invoke one or more functions as [**tools**](https://platform.openai.com/docs/guides/function-calling).

> OpenAI API has deprecated functions in favor of tools. The difference between the two is that the tools API allows the model to request that multiple functions be invoked at once, which can reduce response times in some architectures. It’s recommended to use the tools agent for OpenAI models.

Having an LLM call multiple tools at the same time can greatly speed up agents whether there are tasks that are assisted by doing so. Thankfully, OpenAI models versions 1106 and newer support parallel function calling, which we will need to make sure our smart bot is performant.

##### **From now on and for the rest of the notebooks, we are going to use OpenAI tools API tool call our experts/tools**

To pass in our tools to the agent, we just need to format them to the [OpenAI tool format](https://platform.openai.com/docs/api-reference/chat/create) and pass them to our model. (By bind-ing the functions, we’re making sure that they’re passed in each time the model is invoked.)

In [8]:
# Bind (attach) the tools/functions we want on each LLM call

llm_with_tools = llm.bind_tools(tools)

# Let's also add the option to configure in real time the model we want

llm_with_tools = llm_with_tools.configurable_alternatives(
    ConfigurableField(id="model"),
    default_key="gpt35",
    gpt4=AzureChatOpenAI(deployment_name=os.environ["GPT4_DEPLOYMENT_NAME"], temperature=0.5, max_tokens=COMPLETION_TOKENS, streaming=True),
)

#### 4. Define the System Prompt

Because OpenAI Function Calling is finetuned for tool usage, we hardly need any instructions on how to reason, or how to output format. We will just have two input variables: `question` and `agent_scratchpad`. The input variable `question` should be a string containing the user objective, and `agent_scratchpad` should be a sequence of messages that contains the previous agent tool invocations and the corresponding tool outputs.

Get the prompt to use `AGENT_DOCSEARCH_PROMPT` - you can modify this in `prompts.py`! Check it out!
It looks like this:

```python
AGENT_DOCSEARCH_PROMPT = ChatPromptTemplate.from_messages(
    [
        ("system", CUSTOM_CHATBOT_PREFIX  + DOCSEARCH_PROMPT_TEXT),
        MessagesPlaceholder(variable_name='history', optional=True),
        ("human", "{question}"),
        MessagesPlaceholder(variable_name='agent_scratchpad')
    ]
)
```

In [9]:
prompt = AGENT_DOCSEARCH_PROMPT

#### 5. Create the agent

The core idea of agents is to use a language model to choose a sequence of actions to take. In chains, a sequence of actions is hardcoded (in code). In agents, a language model is used as a reasoning engine to determine which actions to take and in which order.

In [10]:
from langchain.agents.format_scratchpad.openai_tools import format_to_openai_tool_messages
from langchain.agents.output_parsers.openai_tools import OpenAIToolsAgentOutputParser

agent = (
    {
        "question": lambda x: x["question"],
        "agent_scratchpad": lambda x: format_to_openai_tool_messages(x["intermediate_steps"]),
    }
    | prompt
    | llm_with_tools
    | OpenAIToolsAgentOutputParser()
)

Or , which is equivalent, LangChain has a class that does exactly the cell code above: `create_openai_tools_agent`

```python
agent = create_openai_tools_agent(llm, tools, prompt)
```

**Important Note: Other models like Mistral Large or Command R+ won't work with the same OpenAI Tools API, so in order to create agents with these models, try using the ReAct type instead from langchain**. Like [THIS COHERE AGENT](https://python.langchain.com/docs/integrations/providers/cohere/#react-agent) for example

Create an agent executor by passing in the agent and tools

In [11]:
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=False)

Give it memory - since AgentExecutor is also a Runnable class, we do the same with did on Notebook 5

In [12]:
def get_session_history(session_id: str, user_id: str) -> CosmosDBChatMessageHistory:
    cosmos = CosmosDBChatMessageHistory(
        cosmos_endpoint=os.environ['AZURE_COSMOSDB_ENDPOINT'],
        cosmos_database=os.environ['AZURE_COSMOSDB_NAME'],
        cosmos_container=os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
        connection_string=os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
        session_id=session_id,
        user_id=user_id
        )

    # prepare the cosmosdb instance
    cosmos.prepare_cosmos()
    return cosmos

Because cosmosDB needs two fields (an id and a partition), and RunnableWithMessageHistory takes by default only one identifier for memory (session_id), we need to use `history_factory_config` parameter and define the multiple keys for the memory class

In [14]:
userid_spec = ConfigurableFieldSpec(
            id="user_id",
            annotation=str,
            name="User ID",
            description="Unique identifier for the user.",
            default="",
            is_shared=True,
        )
session_id = ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="Unique identifier for the conversation.",
            default="",
            is_shared=True,
        )

In [15]:
agent_with_chat_history = RunnableWithMessageHistory(
    agent_executor,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
    history_factory_config=[userid_spec,session_id]
)

In [16]:
# configure the session id and user id
random_session_id = "session"+ str(random.randint(1, 1000))
ramdom_user_id = "user"+ str(random.randint(1, 1000))

config={"configurable": {"session_id": random_session_id, "user_id": ramdom_user_id}}
config

{'configurable': {'session_id': 'session411', 'user_id': 'user46'}}

#### 6.Run the Agent!

In [17]:
%%time
agent_with_chat_history.invoke({"question": "Hi, I'm Pablo Marin. What's yours"}, config=config)

CPU times: total: 0 ns
Wall time: 5.67 s


{'question': "Hi, I'm Pablo Marin. What's yours",
 'history': [],
 'output': 'Hello Pablo Marin, my name is Jarvis. How can I assist you today?'}

In [18]:
printmd(agent_with_chat_history.invoke(
    {"question": "What are markov chains and is there an application in medicine?"}, 
    config=config)["output"])

  warn_deprecated(


### What are Markov Chains?

A Markov chain is a mathematical system that undergoes transitions from one state to another within a finite or countable number of possible states. It is a stochastic process characterized by the "memoryless" property, which means that the next state depends only on the current state and not on the sequence of events that preceded it. This property is known as the Markov property.

Markov chains can be described by:
1. **States**: The different possible conditions or positions in which the system can be.
2. **Transition Probabilities**: The probabilities of moving from one state to another.
3. **Initial State**: The state in which the system starts.

The process is typically represented using a state transition matrix, where each element indicates the probability of transitioning from one state to another.

### Applications of Markov Chains in Medicine

Markov chains have several applications in the field of medicine, ranging from disease modeling to treatment planning. Here are some notable examples:

1. **Phylogenetic Analysis**:
   - Markov chain Monte Carlo (MCMC) methods are used to improve the resolution of phylogenies for rapidly evolving pathogens. This approach helps in joint estimation of alignment and phylogeny, taking into account the ensemble of near-optimal alignments to avoid biases [[1]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1853084/?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2025-01-30T14:55:42Z&st=2024-07-10T06:55:42Z&spr=https&sig=NQgAnpUqrSUPKdOZwtvdSQP2pjwoeUK0xlNtd%2F554t8%3D).

2. **Epidemiological Modeling**:
   - Markov processes are used in stochastic models to predict the spread of diseases, such as in the SEIR (Susceptible-Exposed-Infected-Recovered) model. This approach helps in improved time series prediction of the number of infectious cases, which is crucial for managing public health responses [[2]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2780467/?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2025-01-30T14:55:42Z&st=2024-07-10T06:55:42Z&spr=https&sig=NQgAnpUqrSUPKdOZwtvdSQP2pjwoeUK0xlNtd%2F554t8%3D).

3. **Disease Outbreak Analysis**:
   - During the 2001 foot and mouth disease (FMD) epidemic in Great Britain, Markov Chain Monte Carlo (MCMC) methods were used to estimate epidemiological parameters. This helped in understanding the impact of control measures and in creating predictive risk maps of transmission potential [[3]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1876810/?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2025-01-30T14:55:42Z&st=2024-07-10T06:55:42Z&spr=https&sig=NQgAnpUqrSUPKdOZwtvdSQP2pjwoeUK0xlNtd%2F554t8%3D).

4. **Dynamic Treatment Regimes**:
   - In the context of personalized medicine, Markov Decision Processes (MDPs) and their extensions are used to determine optimal treatment strategies based on a patient's evolving health state. This method helps in making data-driven decisions in dynamic treatment regimes, improving the efficiency and effectiveness of treatments [[4]](https://blobstoragek2ozpi26tpz6e.blob.core.windows.net/arxivcs/2310.07518.pdf?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2025-01-30T14:55:42Z&st=2024-07-10T06:55:42Z&spr=https&sig=NQgAnpUqrSUPKdOZwtvdSQP2pjwoeUK0xlNtd%2F554t8%3D).

These applications demonstrate the versatility and importance of Markov chains in addressing complex problems in the medical field.

In [19]:
printmd(agent_with_chat_history.invoke(
        {"question": "Interesting, Tell me more about the use specifically in the spread of viruses"},
        config=config)["output"])

The spread of viruses is influenced by a variety of factors, including human contact networks, transportation systems, healthcare resources, and the strategic use of medical interventions. Here are some key insights from recent studies:

1. **Human Contact Networks**:
    - The dynamics of infectious diseases that spread via direct person-to-person transmission (e.g., influenza, smallpox, HIV/AIDS) are heavily influenced by the structure of human contact networks. These networks often exhibit strong community structures, which can significantly impact disease dynamics. Effective immunization strategies often target individuals who bridge different communities rather than just highly connected individuals [[1]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2851561/?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2025-01-30T14:55:42Z&st=2024-07-10T06:55:42Z&spr=https&sig=NQgAnpUqrSUPKdOZwtvdSQP2pjwoeUK0xlNtd%2F554t8%3D).

2. **Transportation Networks**:
    - The spread of diseases through transportation networks, such as bus transportation systems, can be modeled to understand the impact of the starting point of an epidemic. These models can help in designing control and preventive measures by highlighting the importance of the topological context of the outbreak's origin [[2]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2275240/?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2025-01-30T14:55:42Z&st=2024-07-10T06:55:42Z&spr=https&sig=NQgAnpUqrSUPKdOZwtvdSQP2pjwoeUK0xlNtd%2F554t8%3D).

3. **Global Spread and Vaccination Strategies**:
    - The global spread of viruses like influenza can be tracked through sequence analysis, despite challenges like antigenic drift and sampling biases. Network analysis has shown that regions such as China and Hong Kong are often origins of new seasonal strains, while increased vaccination in regions like the United States can disrupt global virus spread [[3]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2987833/?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2025-01-30T14:55:42Z&st=2024-07-10T06:55:42Z&spr=https&sig=NQgAnpUqrSUPKdOZwtvdSQP2pjwoeUK0xlNtd%2F554t8%3D).

4. **Antiviral Use and Drug Resistance**:
    - The strategic use of antiviral drugs during a pandemic is crucial to prevent the development of drug resistance. Mathematical models suggest that conservative treatment levels during the early stages of an outbreak, followed by a timely increase in drug use, can effectively manage drug resistance and avoid depletion of stockpiles [[4]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2653495/?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2025-01-30T14:55:42Z&st=2024-07-10T06:55:42Z&spr=https&sig=NQgAnpUqrSUPKdOZwtvdSQP2pjwoeUK0xlNtd%2F554t8%3D).

5. **Healthcare Resources and International Spread**:
    - The availability of healthcare resources significantly affects the reporting and control of pandemics. Countries with lower healthcare resources often experience delays in reporting cases, which can hinder early control measures. Enhanced surveillance and rapid deployment of resources to these countries can help in early detection and reduction of pandemic impacts [[5]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2939898/?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2025-01-30T14:55:42Z&st=2024-07-10T06:55:42Z&spr=https&sig=NQgAnpUqrSUPKdOZwtvdSQP2pjwoeUK0xlNtd%2F554t8%3D).

These insights highlight the complexity of virus spread and the importance of targeted interventions, efficient use of resources, and global cooperation in managing pandemics.

In [20]:
printmd(agent_with_chat_history.invoke({"question": "Thank you!"}, config=config)["output"])

You're welcome! If you have any questions or need assistance with something, feel free to ask.

### Let's add more things we have learned so far: dynamic LLM selection of GPT4 and asyncronous streaming

In [21]:
agent = create_openai_tools_agent(llm_with_tools.with_config(configurable={"model": "gpt4"}), tools, prompt) # We select now GPT-4
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=False)
agent_with_chat_history = RunnableWithMessageHistory(agent_executor,get_session_history,input_messages_key="question", 
                                                     history_messages_key="history", history_factory_config=[userid_spec,session_id])

In prior notebooks with use the function `.stream()` of the runnable in order to stream the tokens. However if you need to stream individual tokens from the agent or surface steps occuring within tools, you would need to use a combination of `Callbacks` and `.astream()` OR the new `astream_events` API (beta).

Let’s use here the astream_events API to stream the following events:

    Agent Start with inputs
    Tool Start with inputs
    Tool End with outputs
    Stream the agent final anwer token by token
    Agent End with outputs

In [22]:
QUESTION = "Tell me more about your last answer, search again multiple times and provide a deeper explanation"

In [23]:
async for event in agent_with_chat_history.astream_events(
    {"question": QUESTION}, config=config, version="v1",
):
    kind = event["event"]
    if kind == "on_chain_start":
        if (event["name"] == "AgentExecutor"):
            print( f"Starting agent: {event['name']}")
    elif kind == "on_chain_end":
        if (event["name"] == "AgentExecutor"):  
            print()
            print("--")
            print(f"Done agent: {event['name']}")
    if kind == "on_chat_model_stream":
        content = event["data"]["chunk"].content
        # Empty content in the context of OpenAI means that the model is asking for a tool to be invoked.
        # So we only print non-empty content
        if content:
            print(content, end="")
    elif kind == "on_tool_start":
        print("--")
        print(f"Starting tool: {event['name']} with inputs: {event['data'].get('input')}")
    elif kind == "on_tool_end":
        print(f"Done tool: {event['name']}")
        # print(f"Tool output was: {event['data'].get('output')}")
        print("--")

Starting agent: AgentExecutor
--
Starting tool: docsearch with inputs: {'query': 'human contact networks and virus spread'}
--
Starting tool: docsearch with inputs: {'query': 'transportation networks and virus spread'}
--
Starting tool: docsearch with inputs: {'query': 'global spread of viruses and vaccination strategies'}
--
Starting tool: docsearch with inputs: {'query': 'antiviral use and drug resistance in pandemics'}
--
Starting tool: docsearch with inputs: {'query': 'healthcare resources and international virus spread'}
Done tool: docsearch
--
Done tool: docsearch
--
Done tool: docsearch
--
Done tool: docsearch
--
Done tool: docsearch
--
### Detailed Explanation on the Spread of Viruses Using Markov Chains and Other Models

#### Human Contact Networks

Human contact networks play a crucial role in the spread of infectious diseases that are transmitted through direct person-to-person contact, such as influenza, smallpox, and HIV/AIDS. These networks often exhibit strong community 

# Summary

We just built our first RAG BOT!.

- We learned that **Agents + Tools are the best way to go about building Bots**. <br>
- We converted the Azure Search retriever into a Tool using the function `GetDocSearchResults_Tool` in `utils.py`
- We learned about the events API (Beta), one way to stream the answer from agents
- We learned that for comprehensive, quality answers we will run out of space with GPT3.5. GPT4 then becomes necessary.


# NEXT

Now that we have a bot with one skill (Document Search), let's build more skills!. In the next Notebook, we are going to build an agent that can understand tabular data in csv file and can execute python commands