<a href="https://colab.research.google.com/github/mrhamedani/LLM-Agents/blob/main/12_LangSmith_agent_ipynb2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Tracing and Evaluating LLMs with LangSmith.Medical (Agent with LangSmith)

Models: OpenAI

Colab Environment: CPU-High RAM.

Keys:
* Tracing LLMs
* LangSmith
* Agents.




In this notebook, we will explore how to trace the different calls that occur in a LangChain Agent using LangSmith. We will use a familiar agent, employed in the LangChain section of the course, where a RAG system with medical information was created.

So, not only will we observe the traces of the agent, but we will also examine the traces of the retriever. Additionally, we'll inspect the query sent to the vectorial database and the returned results.
__________

In [None]:
!pip install -q langchain==0.3.0
!pip install -q langchain-openai==0.2.0
!pip install -q langchainhub==0.1.21
!pip install -q datasets==3.0.0
!pip install -q chromadb==0.5.5
!pip install -q langchain-community==0.3.0

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.3/49.3 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from datasets import load_dataset  #We will download the dataset from the Hugging Face datasets library. It's a dataset with information about diseases.
data = load_dataset("keivalya/MedQuad-MedicalQnADataset", split='train')

In [None]:
data = data.to_pandas()
data.head(10)

Unnamed: 0,qtype,Question,Answer
0,susceptibility,Who is at risk for Lymphocytic Choriomeningiti...,LCMV infections can occur after exposure to fr...
1,symptoms,What are the symptoms of Lymphocytic Choriomen...,LCMV is most commonly recognized as causing ne...
2,susceptibility,Who is at risk for Lymphocytic Choriomeningiti...,Individuals of all ages who come into contact ...
3,exams and tests,How to diagnose Lymphocytic Choriomeningitis (...,"During the first phase of the disease, the mos..."
4,treatment,What are the treatments for Lymphocytic Chorio...,"Aseptic meningitis, encephalitis, or meningoen..."
5,prevention,How to prevent Lymphocytic Choriomeningitis (L...,LCMV infection can be prevented by avoiding co...
6,information,What is (are) Parasites - Cysticercosis ?,Cysticercosis is an infection caused by the la...
7,susceptibility,Who is at risk for Parasites - Cysticercosis? ?,Cysticercosis is an infection caused by the la...
8,exams and tests,How to diagnose Parasites - Cysticercosis ?,"If you think that you may have cysticercosis, ..."
9,treatment,What are the treatments for Parasites - Cystic...,Some people with cysticercosis do not need to ...


In [None]:
#uncoment this line if you want to limit the size of the data.
data = data[0:100]

In [None]:
from langchain.document_loaders import DataFrameLoader
from langchain.vectorstores import Chroma

In [None]:
df_loader = DataFrameLoader(data, page_content_column="Answer") #The Document is in the Answer column, and the others columns are Metadata.

In [None]:
df_document = df_loader.load()
display(df_document[:2])

We can chunk the documents. The size to which we want to split the document is a design decision. The larger it is, the larger the prompt will be, and the slower the Model's response process.

We also need to consider the maximum prompt size and ensure that the document does not exceed it.

In [None]:
from langchain.text_splitter import CharacterTextSplitter

In [None]:
text_splitter = CharacterTextSplitter(chunk_size=1250,separator="\n",chunk_overlap=100)
texts = text_splitter.split_documents(df_document)

These warnings we see are because it can't perform the partition of the required size. This is because it waits for a page break to divide the text and does so when possible.

In [None]:
first_doc = texts[1]
print(first_doc.page_content)

LCMV is most commonly recognized as causing neurological disease, as its name implies, though infection without symptoms or mild febrile illnesses are more common clinical manifestations. 
                
For infected persons who do become ill, onset of symptoms usually occurs 8-13 days after exposure to the virus as part of a biphasic febrile illness. This initial phase, which may last as long as a week, typically begins with any or all of the following symptoms: fever, malaise, lack of appetite, muscle aches, headache, nausea, and vomiting. Other symptoms appearing less frequently include sore throat, cough, joint pain, chest pain, testicular pain, and parotid (salivary gland) pain.


### Initialize the Embedding Model and Vector DB

In [None]:
from getpass import getpass
import os
if not 'OPENAI_API_KEY' in os.environ:
  os.environ["OPENAI_API_KEY"] = getpass("OpenAI API Key: ")

OpenAI API Key: ··········


Obtain Your LangChain API Key from your Personal->Settings Area in LangSmith panel.

![My Image](https://github.com/peremartra/Large-Language-Model-Notebooks-Course/blob/main/img/langsmith_API_KEY.jpg?raw=true)

In [None]:
if not 'LANGCHAIN_API_KEY' in os.environ:
  os.environ["LANGCHAIN_API_KEY"] = getpass("LangChain API Key: ")

LangChain API Key: ··········


In [None]:
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"]="https://api.smith.langchain.com"
os.environ["LANGCHAIN_PROJECT"]="langsmith_test3"

We load the text-embedding-ada-002 model from OpenAI.


In [None]:
from langchain_openai import OpenAIEmbeddings

model_name = 'text-embedding-ada-002'

embed = OpenAIEmbeddings(
    model=model_name
)

The execution of this cell may take 3 to 5 minutes. If you want it to be faster, you can reduce the number of records in the dataset.

In [None]:
directory_cdb = '/content/drive/MyDrive/chromadb'
chroma_db = Chroma.from_documents(
    df_document, embed, persist_directory=directory_cdb
)

We are going to create three objects.

* The language model, which can be any of those from OpenAI, the most common being gpt-3.5.
* The memory, responsible for keeping the prompt with all the necessary history.
* The retrieval, used to obtain information stored in ChromaDB.

In [None]:
model="gpt-4o"
#model="gpt-3.5-turbo"

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain_openai import OpenAI
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.chains import RetrievalQA

llm=OpenAI(temperature=0.0)

conversational_memory = ConversationBufferWindowMemory(
    memory_key='chat_history',
    k=4, #Number of messages stored in memory
    return_messages=True #Must return the messages in the response.
)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    #chain_type="stuff",
    retriever=chroma_db.as_retriever()
)

We can try the isolated Retrieval to see if the information it returns is relevant.




In [None]:
qa.run("What is the main symptom of LCM?")

  qa.run("What is the main symptom of LCM?")


' The main symptom of LCM is a biphasic febrile illness, which includes symptoms such as fever, malaise, lack of appetite, muscle aches, headache, nausea, and vomiting.'

When observing the Retriever on Langsmith is possible to see the input and the documents returned by it:
![My Image](https://github.com/peremartra/Large-Language-Model-Notebooks-Course/blob/main/img/Martra_Figure_4_Retriever_1.jpg?raw=true)

In the image below you can observe the call to the Model where the full prompt and the response are displayed.
![My Image](https://github.com/peremartra/Large-Language-Model-Notebooks-Course/blob/main/img/Martra_Figure_4_Retriever_2.jpg?raw=true)


## Creating the Agent.

In [None]:
from langchain.agents import Tool

#Defining the list of tool objects to be used by LangChain.
tools = [
    Tool(
        name='Medical KB',
        func=qa.run,
        description=(
            """use this tool when answering medical knowledge queries to get
            more information about the topic"""
        )
    )
]

In [None]:
from langchain.agents import create_react_agent
from langchain import hub

prompt = hub.pull("hwchase17/react-chat")
agent = create_react_agent(
    tools=tools,
    llm=llm,
    prompt=prompt,
)

  prompt = loads(json.dumps(prompt_object.manifest))


In [None]:
# Create an agent executor by passing in the agent and tools
from langchain.agents import AgentExecutor
agent_executor2 = AgentExecutor(agent=agent,
                               tools=tools,
                               verbose=True,
                               memory=conversational_memory,
                               max_iterations=30,
                               max_execution_time=600,
                               handle_parsing_errors=True
                               )

### Using the Conversational Agent

To make queries we simply call the `agent` directly.

First i will try a request not related to the Medical field.

In [None]:
agent_executor2.invoke({"input": "What is 2 multiplied by 2?"})



[32;1m[1;3mThought: Do I need to use a tool? No
Final Answer: 2 multiplied by 2 is 4.[0m

[1m> Finished chain.[0m


{'input': 'What is 2 multiplied by 2?',
 'chat_history': [],
 'output': '2 multiplied by 2 is 4.'}

The initial call to the Agent happens with just one call to OpenAI. One piece of information available in LangSmith is the entire prompt. I'll copy it just below the image, so you can see.

![My Image](https://github.com/peremartra/Large-Language-Model-Notebooks-Course/blob/main/img/Martra_Figure_4_1AE_1.jpg?raw=true)



```
Assistant is a large language model trained by OpenAI.

Assistant is designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, Assistant is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.

Assistant is constantly learning and improving, and its capabilities are constantly evolving. It is able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. Additionally, Assistant is able to generate its own text based on the input it receives, allowing it to engage in discussions and provide explanations and descriptions on a wide range of topics.

Overall, Assistant is a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether you need help with a specific question or just want to have a conversation about a particular topic, Assistant is here to assist.

TOOLS:
------

Assistant has access to the following tools:

Medical KB: use this tool when answering medical knowledge queries to get
            more information about the topic

To use a tool, please use the following format:


Thought: Do I need to use a tool? Yes
Action: the action to take, should be one of [Medical KB]
Action Input: the input to the action
Observation: the result of the action


When you have a response to say to the Human, or if you do not need to use a tool, you MUST use the format:


Thought: Do I need to use a tool? No
Final Answer: [your response here]


Begin!

Previous conversation history:
[]

New input: What is 2 multiplied by 2?
```

Perfect, the model has responded without accessing the configured knowledge database.

Now I will try with a question that is also not related to health.

In [None]:
agent_executor2.memory.clear()

In [None]:
agent_executor2.invoke({"input": """I have a patient that can have Botulism,
how can I confirm the diagnosis?"""})



[32;1m[1;3m
Thought: Do I need to use a tool? Yes
Action: Medical KB
Action Input: Botulism[0m[36;1m[1;3m Botulism is a rare but serious paralytic illness caused by a nerve toxin produced by certain bacteria. It can be contracted through contaminated food, wounds, or ingestion of bacterial spores. Symptoms include muscle paralysis, difficulty swallowing, and respiratory failure. Treatment includes antitoxin, supportive care, and removal of contaminated food or wound.[0m[32;1m[1;3mDo I need to use a tool? No
Final Answer: To confirm the diagnosis, you should perform a physical exam and order laboratory tests, such as a stool or blood test, to detect the presence of the bacteria or its toxin. You may also need to consult with a specialist, such as an infectious disease doctor, for further evaluation and treatment.[0m

[1m> Finished chain.[0m


{'input': 'I have a patient that can have Botulism,\nhow can I confirm the diagnosis?',
 'chat_history': [],
 'output': 'To confirm the diagnosis, you should perform a physical exam and order laboratory tests, such as a stool or blood test, to detect the presence of the bacteria or its toxin. You may also need to consult with a specialist, such as an infectious disease doctor, for further evaluation and treatment.'}

Perfect, the most important thing for us is that it has been able to identify that it should go to the medical database to search for information about the symptoms.
![My Image](https://github.com/peremartra/Large-Language-Model-Notebooks-Course/blob/main/img/Martra_Figure_4_1AE_3.jpg?raw=true)

In [None]:
agent_executor2.invoke({"input": "Is this an important illness?"})



[32;1m[1;3m
Thought: Do I need to use a tool? Yes
Action: Medical KB
Action Input: Botulism[0m[36;1m[1;3m Botulism is a rare but serious paralytic illness caused by a nerve toxin produced by certain bacteria. It can be contracted through contaminated food, wounds, or ingestion of bacterial spores. Symptoms include muscle paralysis, difficulty swallowing, and respiratory failure. Treatment includes antitoxin, supportive care, and removal of the source of the toxin.[0m[32;1m[1;3mDo I need to use a tool? No
Final Answer: Yes, botulism is an important illness that requires prompt diagnosis and treatment. If you suspect a patient may have botulism, it is important to consult with a specialist and perform necessary tests to confirm the diagnosis.[0m

[1m> Finished chain.[0m


{'input': 'Is this an important illness?',
 'chat_history': [HumanMessage(content='I have a patient that can have Botulism,\nhow can I confirm the diagnosis?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='To confirm the diagnosis, you should perform a physical exam and order laboratory tests, such as a stool or blood test, to detect the presence of the bacteria or its toxin. You may also need to consult with a specialist, such as an infectious disease doctor, for further evaluation and treatment.', additional_kwargs={}, response_metadata={})],
 'output': 'Yes, botulism is an important illness that requires prompt diagnosis and treatment. If you suspect a patient may have botulism, it is important to consult with a specialist and perform necessary tests to confirm the diagnosis.'}

On the left side of the image, you can see the various calls made by the Agent, which took 2.5 seconds to execute and consumed 2000 tokens.

I will try to describe what happens in each call.

**OpenAI**: he complete prompt, including the template from Hugging Face, and the user's question is passed to the OpenAI Model. It responds with the following step. Its answer is:

*Thought: Do I need to use a tool?*

*Yes  Action: Medical KB*  

*Action Input: Botulism*

**Medical KB**: The Agent utilizes the configured tool, passing only a single word as a parameter: "Botulism."

**Medical KB.Retriever:** The retriever returns four documents extracted from the Vectorial Database.

**Medical KB.OpenAI:** The prompt is constructed with the information from the vectorial database. Even though I won't include the paste, I manage to identify that the four returned documents are actually two but duplicated. This, in a real project, could have helped me detect some issue, perhaps I have duplicates in the dataset, or maybe I loaded them twice. In any case, I'm consuming many more tokens than necessary. The model returns a response created considering the information contained in the prompt.

**OpenAI:** In this final call to the model, it decides that the received response is what the user needs. Therefore, it marks it as correct and returns it to the user.

![My Image](https://github.com/peremartra/Large-Language-Model-Notebooks-Course/blob/main/img/Martra_Figure_4_1AE_3.jpg?raw=true)

And the memory works perfectly. We can maintain a conversation, taking into account that the model knows the previous questions and answers.


# Conclusions.
LangSmith is an incredibly useful tool for tracing and storing all the information generated when making calls to LangChain.

The experiment has been a small success. The Vectorial database has been configured and filled with information from the dataset. A LangChain agent has been created, and it has been able to retrieve information from the database only when necessary. Don't forget that our ChatBot has memory.



And you have all the information in a project stored in LangSmith!!!!
![My Image](https://github.com/peremartra/Large-Language-Model-Notebooks-Course/blob/main/img/Martra_Figure_4_1AE_Final.jpg?raw=true)



---