## Comparing models for our different tasks

In this Notebook, we are going to use another model, Flan-T5-Small in parallel to Mistral-7B and see how it behaves.

Flan-T5-Small is indeed much smaller, runs without GPU and on only 1GB of RAM, but is it up to the task?

### Requirements and Imports

If you have selected the right workbench image to launch as per the Lab's instructions, you should already have all the needed libraries. If not uncomment the first line in the next cell to install all the right packages.

In [None]:
from langchain.chains import LLMChain
from langchain_community.llms import VLLMOpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

from certs import prepare_certs
from tgis_client import TgisClient

### Langchain pipeline

We are now going to define two different LLM endpoints, and two different pipelines.

In [None]:
mistral_inference_server_url = "REPLACE_ME"

prepare_certs(mistral_inference_server_url)
llm = VLLMOpenAI(
    openai_api_key="EMPTY",
    openai_api_base= f"{mistral_inference_server_url}/v1",
    model_name="/mnt/models/",
    max_tokens=512,
    top_p=0.95,
    temperature=0.01,
    presence_penalty=1.03,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

Flan-t5-small client

In [None]:
flan_t5_inference_url = 'REPLACE_ME'
flan_t5_model_id = 'flan-t5-tgis'

flan_t5_client = TgisClient(flan_t5_inference_url, model_id=flan_t5_model_id)

The **template** will be the same for both models.

In [None]:
template = """<s>[INST]
You are a helpful, respectful and honest assistant.
Always assist with care, respect, and truth. Respond with utmost utility yet securely.
Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity.
I will give you a text, then ask a question about it. Give a precise and as concise as possible answer to this question.

### TEXT:
{text}

### QUESTION:
{query}

### ANSWER:
[/INST]
"""
PROMPT = PromptTemplate(input_variables=["input"], template=template)

And we can now create two **conversation** objects that we will use to query the two models.

In [None]:
conversation = LLMChain(llm=llm, prompt=PROMPT, verbose=False)

We are now ready to query the models!

In this example, we are only going to query one claim and see what happens. Of course, feel free to try with different ones.

In [None]:
text_input = '''I hope this email finds you well. My name is Sarah Turner, and I am writing to file a claim for a recent car accident that occurred on January 2nd, 2024, at approximately 3:30 PM. My policy number is AC-987654321.

The accident took place at the intersection of Birch Street and Willow Avenue in the city of Evergreen. I was driving my vehicle, a black Toyota Camry with license plate number DEF-456, heading south on Birch Street. At the intersection, the traffic signal was green, and I proceeded through the intersection.

At the same time, another vehicle, a blue Chevrolet Traverse with license plate number GHI-789, was traveling west on Willow Avenue. Unfortunately, the driver failed to stop at the red traffic signal, resulting in a collision with the front passenger side of my vehicle.

The impact caused significant damage to both vehicles. The front bumper and right headlight of my Toyota Camry are extensively damaged, and there are also damages to the front driver's side of the Chevrolet Traverse. Fortunately, no injuries were sustained during the accident, and both drivers were able to move their vehicles to the side of the road.

I promptly exchanged information with the other driver, Mr. Daniel Reynolds, including our names, phone numbers, insurance details, and a brief description of the accident. Additionally, I took photos of the accident scene, including the damages to both vehicles and the position of the traffic signal.'''

Analyze the claim

In [None]:
sentiment_query = "What is the sentiment of the person sending this claim?"
location_query = "Where does the event the claim is related to happen?"
time_query = "When does the event the claim is related to happen? If possible, specify the date and the time."

print("***************************")
print("Original content:")
print("-----------------")
print(text_input)
print("\n\n                          ----====----\n")

print('Analysis with Mistral-7B:')
print("--------")
print(f"- Sentiment: ")
conversation.predict(text=text_input, query=sentiment_query);
print("\n- Location: ")
conversation.predict(text=text_input, query=location_query);
print("\n- Time: ")
conversation.predict(text=text_input, query=time_query);
print("\n\n                          ----====----\n")

print('Analysis with Flan-T5-Small:')
print("--------")
print(f"- Sentiment: ")
print(flan_t5_client.predict(text=text_input, query=sentiment_query))
print("\n- Location: ")
print(flan_t5_client.predict(text=text_input, query=location_query))
print("\n- Time: ")
print(flan_t5_client.predict(text=text_input, query=time_query))
print("\n\n                          ----====----\n")

As you can see, Flan-T5-Small is much faster. After all, it's an 80 Million parameters model only. It seems to work to some extent, but the results are nowhere near the ones from Mistral-7B, which is a 7 Billion parameter.

The art of working with LLM is to find the right balance between the performance and accuracy you require, and the resources it takes along with the involved costs.

Therefore it's important to have sanity checks in place to make sure that as you data changes, or your model evolves, you always get the behaviour you expected.