# Not-so-simple phi2 chatbot

This notebook uses all the pieces explained in

+ `experiments.ipyn` and
+ `experiments_langchain.ipyn`

to implement a chatbot that can

+ classify input queries into 3 different categories (`support`, `sales` and `joke`),
+ respond accordingly to each of the user queries depending on the category they belong to.

## Putting it all together

Now, in order to put it all together (prompting, conditional branching and conversation memory), we first need to modify a bit our prompts from the _routing_ section. They were meant to be used standalone and they make use of the `Instruction/Output` template from `phi-2`. When we add the conversation bits, this doesn't make sense anymore.

We're going to do this in 3 steps:
1. Create the classification chain
2. Add the branch template
3. Add the conversation memory

## Step 0: load the model

The very first thing we have to do is to load the phi model in a format that can be used with LangChain. We've created some conveniency functions for that in `load_phi_model.py`.

In [1]:
from load_phi_model import load_phi_model_and_tokenizer, get_langchain_model

model, tokenizer = load_phi_model_and_tokenizer()
hf = get_langchain_model(model, tokenizer)

  from .autonotebook import tqdm as notebook_tqdm


Your device is cuda


Loading checkpoint shards: 100%|██████████████████| 2/2 [00:00<00:00,  2.21it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


### First, create the initial classification template

We change `text` variable to `human_input` as this is the name we'll use for the chat at the end.

In [2]:
from langchain.prompts import PromptTemplate, ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableBranch, RunnableLambda, RunnablePassthrough
from operator import itemgetter
from langchain.memory import ConversationBufferMemory

In [3]:
classification_template = """
Instruct: Classify the following text in one of the following categories: ["support", "sales", "joke"]. Output only the name of the category.
+ "support" for customer support texts
+ "sales" for sales and comercial texts
+ "joke" for jokes, funny or comedy like texts
Text: {human_input}
Output:
""".strip()

In [4]:
classification_prompt = ChatPromptTemplate.from_template(classification_template)
classification_chain = (
    classification_prompt
    | hf
    | StrOutputParser()
)

In [5]:
print(classification_chain.invoke({"human_input": "Can I track my order? I'm eager to know its status"}))

 support



### Second, add the branch template

The branch template gives the right instructions for the chatbot.

In [26]:
support_instructions = """\
You are a customer support agent. It seems that the user may have some issues. Answer to their query politely and sincerely. \
Be kind, understanding and say you're sorry for the inconvenience or the situation whenever necessary. Be brief and to the point.\
"""

sales_instructions = """\
Instruction: You are an aggressive salesperson. The user is looking for some information on products. \
Reply to their query by giving information on related products and showcasing how good they are and why they should buy them. \
Be brief and to the point.
"""

joke_instructions = """\
Instruction: You are a comedian. The user want's to have some fun. Reply to their query in a funny way.\
"""

general_instructions = """\
Instruction: Respond to the following query.\
"""

support_chain = PromptTemplate.from_template(support_instructions)
sales_chain = PromptTemplate.from_template(sales_instructions)
joke_chain = PromptTemplate.from_template(joke_instructions)
general_chain = PromptTemplate.from_template(general_instructions)

In [27]:
branch = RunnableBranch(
    (lambda x: "support" in x["topic"].lower(), support_chain),
    (lambda x: "sales" in x["topic"].lower(), sales_chain),
    (lambda x: "joke" in x["topic"].lower(), joke_chain),
    general_chain,
) | RunnableLambda(lambda x: x.text)

branch_chain = {"topic": classification_chain, "human_input": lambda x: x["human_input"]} | branch

In [28]:
response = branch_chain.invoke({"human_input": "Can I track my order? I'm eager to know its status"})



In [29]:
response

"You are a customer support agent. It seems that the user may have some issues. Answer to their query politely and sincerely. Be kind, understanding and say you're sorry for the inconvenience or the situation whenever necessary. Be brief and to the point."

### Third and final, add the memory and chat template

In [30]:
template = """\
You are a chatbot having a conversation with a human. Follow the given instructions to reply to the Human message below.

Instructions:{instructions}

{chat_history}
Human: {human_input}
Chatbot:"""

prompt = PromptTemplate(
    input_variables=["instructions", "chat_history", "human_input"], template=template
)
memory = ConversationBufferMemory(
    return_messages=False,
    ai_prefix="Chatbot", 
    human_prefix="Human",
    memory_key="chat_history"
)

In [31]:
def print_prompt(prompt):
    # Add `RunnableLambda(print_prompt)` to the chain to see the prompt at this point
    print("---FULL PROMPT---")
    print(prompt.text)
    print("---END  PROMPT---")
    return prompt

chat_chain = (
    {
        "human_input": lambda x: x["human_input"], 
        "instructions": lambda x: branch_chain,
        "chat_history": (RunnableLambda(memory.load_memory_variables) | itemgetter("chat_history")),
    } | prompt | hf
)

In [32]:
from langchain.callbacks.tracers import ConsoleCallbackHandler

input_ = {"human_input": "Can I track my order? I'm eager to know its status"}

response = chat_chain.invoke(
    input_
    #, config={'callbacks': [ConsoleCallbackHandler()]}
)
memory.save_context({"input": input_['human_input']}, {"ouput": response.strip()})
response



" I'm sorry to hear that you're eager to know the status of your order. I can certainly help you with that. Could you please provide me with your order number?\n"

## Add user interface

In [33]:
import gradio as gr
import torch
from load_phi_model import StopOnTokens, StopOnNames
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer, pipeline
from threading import Thread
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
import re

In [34]:
from load_phi_model import StopOnNames

In [35]:
HUMAN_NAME = "Human"
BOT_NAME = "Chatbot"

In [38]:
device = "cuda" if torch.cuda.is_available() else "cpu"

chat_name_pattern_end = r'\n.+:$' # matches substrings like `\nUser:` at the end

chat_chain = (
    {
        "human_input": lambda x: x["human_input"], 
        "instructions": lambda x: branch_chain,
        "chat_history": lambda x: x["chat_history"],
    } | prompt
)

def predict(message, history):
    #history_transformer_format = history + [[message, ""]]
    stop_on_tokens = StopOnTokens()
    stop_on_names = StopOnNames(
        [tokenizer.encode(HUMAN_NAME), tokenizer.encode(BOT_NAME)])

    messages = "".join(["".join(
        [f"\n{HUMAN_NAME}: "+item[0], f"\n{BOT_NAME}:"+item[1]]
    ) for item in history]).strip()


    # added
    # streamer = TextStreamer(tokenizer)
    # pipe = pipeline(model=model,
    #                 tokenizer=tokenizer, 
    #                 streamer=streamer}
    # llm = HuggingFacePipeline(pipeline=pipe)

    # pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=200, stopping_criteria=[StopOnTokens()])
    # hf = HuggingFacePipeline(pipeline=pipe)


    streamer = TextIteratorStreamer(tokenizer, timeout=100., 
                                    skip_prompt=True, skip_special_tokens=True)
    pipe = pipeline("text-generation", 
                    model=model, tokenizer=tokenizer, streamer=streamer,
                    max_new_tokens=200, stopping_criteria=[StopOnTokens()])
    hf = HuggingFacePipeline(pipeline=pipe)


    input_dict = {
        "human_input": message,
        "chat_history": messages,
    }

    input_prompt = chat_chain.invoke(input_dict).text
    print(input_prompt)

    model_inputs = tokenizer([input_prompt], return_tensors="pt").to(device)
    streamer = TextIteratorStreamer(tokenizer, timeout=10., 
                                    skip_prompt=True, skip_special_tokens=True)
    generate_kwargs = dict(
        model_inputs,
        streamer=streamer,
        max_new_tokens=256,
        do_sample=True,
        top_p=0.95,
        top_k=1000,
        temperature=1.0,
        num_beams=1,
        stopping_criteria=StoppingCriteriaList([stop_on_tokens, stop_on_names])
        )
    t = Thread(target=model.generate, kwargs=generate_kwargs)
    t.start()

    partial_message = ""
    for new_token in streamer:
        partial_message += new_token
        match = re.search(chat_name_pattern_end, partial_message)
        if match:
            partial_message = partial_message[:-len(match.group())]
        yield partial_message


gr.ChatInterface(predict).queue().launch()

Running on local URL:  http://127.0.0.1:7868

To create a public link, set `share=True` in `launch()`.




{'human_input': 'Why did the chicken cross the road?', 'chat_history': ''}
You are a chatbot having a conversation with a human. Follow the given instructions to reply to the Human message below.

Instructions:Instruction: You are a comedian. The user want's to have some fun. Reply to their query in a funny way.


Human: Why did the chicken cross the road?
Chatbot:




{'human_input': 'I have an issue with my order', 'chat_history': 'Human: Why did the chicken cross the road?\nChatbot:To get to the other side. Or to avoid the kazoo bird. Or because it was too chicken to walk in the rain.'}
You are a chatbot having a conversation with a human. Follow the given instructions to reply to the Human message below.

Instructions:You are a customer support agent. It seems that the user may have some issues. Answer to their query politely and sincerely. Be kind, understanding and say you're sorry for the inconvenience or the situation whenever necessary. Be brief and to the point.

Human: Why did the chicken cross the road?
Chatbot:To get to the other side. Or to avoid the kazoo bird. Or because it was too chicken to walk in the rain.
Human: I have an issue with my order
Chatbot:


