## Example of querying the vLLM Inference server with Mistral-7b, Langchain and a custom Prompt

### Set the Inference server url (replace with your own address)

In [32]:
inference_server_url = "https://mistral-instruct.dell-digital-assistant.svc.cluster.local"

## Preparation

In [33]:
!pip install -q langchain==0.1.9 openai==1.13.3


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [34]:
# Imports
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain_community.llms import VLLMOpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate
import httpx

#### Create the LLM instance

In [35]:
# LLM definition
llm = VLLMOpenAI(
    openai_api_key="EMPTY",
    openai_api_base= f"{inference_server_url}/v1",
    model_name="mistral-instruct",
    top_p=0.92,
    temperature=0.01,
    max_tokens=512,
    presence_penalty=1.03,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()],
    async_client=httpx.AsyncClient(verify=False),
    http_client=httpx.Client(verify=False)
)

#### Create the Prompt

In [36]:
template="""

{history}
{input}

"""
PROMPT = PromptTemplate(input_variables=["history", "input"], template=template)

#### And finally some memory for the conversation

In [37]:
memory=ConversationBufferMemory()

## First example, fully verbose mode

#### Create the Chain using the different components

In [38]:
# Verbose mode is intentionally set to True so you can see the prompt and the history from the buffer memory
conversation = ConversationChain(llm=llm,
                                 prompt=PROMPT,
                                 verbose=False,
                                 memory=memory,
                                )

#### Let's talk...

In [42]:
first_input = "give output in valid json format.  below the chat between bot and customer . response should not include any text outside of json format . nothing before json and after json response. here is the json output format: {\"chatSummary\": { \"customerNumber\":\"<customer Number>\", \"dateOfBirth\":\"<customer data of birth>\", \"overallSentiment\":\"<Positive/Neutral/Negative>\", \"summary\":[\"<chat summary in one liner bullet points>\"] } }. Here is the conversation history:Customer Says:Hi Can you get me FD interest rates over a period of 3 year time Bot Says:Hi! To get the FD interest rates over a period of 3 years, I'd be happy to help you with that. Based on Vista Finance's current rates, here are the options for a 3-year period: * 36-Month CD: 2.00% APY * 60-Month CD: 2.25% APY Please note that these rates are subject to change and may be updated based on market conditions. You can check our website or contact customer service for the most current rates. If you have any further questions or would like to open a CD account, feel free to ask! For a specific query feel free to chat with a <b><a href=\"#\" onclick=\"handleButtonClick('Connect me to agent')\">Live Agent</a></b>.. Customer Says:Good, what is home loan interest rates for 25 years period Bot Says:Good day! I'd be happy to help you with your question. For a 25-year period, the interest rate for a mortgage at Vista Finance would be: * 30-Year Fixed-Rate Mortgage: Interest Rate Range: 5.00% - 6.50% APR Please note that these rates are subject to change and may vary based on individual borrower profiles and market conditions. If you'd like to know more or discuss your options, feel free to contact us through the methods listed in our Contact Details section. For a specific query feel free to chat with a <b><a href=\"#\" onclick=\"handleButtonClick('Connect me to agent')\">Live Agent</a></b>.. Customer Says:Connect me to agent Bot Says:Ok, so before I connect to an human agent, i would like to know if you are an existing customer of Vista Banking & Finance. <br> Customer Says:Yes Bot Says:Please Provide your 8 digit Customer Number Customer Says:12345678 Bot Says:Thanks, now last thing, Please give me your date of birth Customer Says:2024-11-20 Bot Says:Connecting to available Agent. Please wait... Bot Says:Connected to an agent: Jane Smith "
conversation.predict(input=first_input);

AI: {
  "chatSummary": {
    "customerNumber": "12345678",
    "dateOfBirth": "2024-11-20",
    "overallSentiment": "Neutral",
    "summary": [
      "Customer asked for FD interest rates for a 3 year period and received options for 36-month CD and 60-month CD with respective APYs.",
      "Customer then asked for home loan interest rates for a 25 year period and received the interest rate range for a 30-year fixed-rate mortgage.",
      "Customer requested to be connected to an agent and provided their customer number and date of birth."
    ]
  }
}

In [75]:
# This is a follow-up question without context to showcase the conversation memory
second_input = """Respond strictly in valid JSON format only. Do not include any text outside the JSON. 
Here is the required format: 
{"chatSummary": { "customerNumber":"<customer Number>", "dateOfBirth":"<customer data of birth>", "overallSentiment":"<Positive/Neutral/Negative>", "summary":["<chat summary in one liner bullet points>"] }} 
Process this conversation history: 
Customer Says:Hi Can you get me FD interest rates over a period of 3 year time ...
"""
conversation.predict(input=second_input);

Current conversation:
Human: give me output in valid json format given below for the chat bot and customer conversation. response should not include any text outside of json format given. nothing before json and after json. here is the output format: {"chatSummary": { "customerNumber":"<customer Number>", "dateOfBirth":"<customer data of birth>", "overallSentiment":"<Positive/Neutral/Negative>", "summary":["<chat summary in one liner bullet points>"] } }. Here is the conversation history:nullCustomer Says:Hi Can you get me FD interest rates over a period of 3 year time Bot Says:Hi! To get the FD interest rates over a period of 3 years, I'd be happy to help you with that. Based on Vista Finance's current rates, here are the options for a 3-year period: * 36-Month CD: 2.00% APY * 60-Month CD: 2.25% APY Please note that these rates are subject to change and may be updated based on market conditions. You can check our website or contact customer service for the most current rates. If you h

### Second example, no verbosity

In [None]:
# Verbose mode is intentionally set to True so you can see the prompt and the history from the buffer memory
conversation2 = ConversationChain(llm=llm,
                                 prompt=PROMPT,
                                 verbose=False,
                                 memory=memory
                                )

In [96]:
# Let's clear the previous conversation first
memory.clear()

In [None]:
first_input = "Can you describe Naples in 100 words?"
conversation2.predict(input=first_input);

In [None]:
# This is a follow-up question without context to showcase the conversation memory
second_input = "Is there a river?"
conversation2.predict(input=second_input);