# Custom Chatbot

## Setup and prerequisites

**Python/Jupyer notebook setup**
1. Install Python from [python.org](https://www.python.org/downloads/)
2. Create a virtual environment `python3.11 -m venv my_virtual_environment`
3. Activate virtual environment `source my_virtual_environment/bin/activate`
4. Install dependencies with pip `python3.11 -m pip llama-cpp-python ipython ipykernel jupyter panel`
5. Create kernel for this virutal environemnt `python3.11 -m ipykernel install --user --name my_virtual_environment --display-name "Python kernel display name"`
6. Start up `jupyter notebook`
(When you are done you can stop the notebook and deactivate the virtual environment with `deactivate`)

**LLM model download**
1. Download llama2 language model from [huggingface (gguf format)](https://huggingface.co/TheBloke/Llama-2-7B-GGUF)


**Documentation**
1. [llama.cpp usage docs](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#llama_cpp.Llama)
2. [panel chat interface ui docs](https://panel.holoviz.org/reference/chat/ChatInterface.html)


### Chat LLM

In [122]:
from llama_cpp import Llama

llm_chat = Llama(model_path="../llama-2-7b-chat-gguf/llama-2-7b-chat.Q5_K_M.gguf", n_ctx=2048)

llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from ../llama-2-7b-chat-gguf/llama-2-7b-chat.Q5_K_M.gguf (version GGUF V2)
llama_model_loader: - tensor    0:                token_embd.weight q5_K     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    2:            blk.0.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor    3:            blk.0.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    6:              blk.0.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    7:         blk.0.attn_output.weight q5_K     [  4096,  4096,

In [77]:
llm = Llama(model_path="../llama-2-13b-gguf/llama-2-13b.Q5_K_M.gguf")

llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from ../llama-2-13b-gguf/llama-2-13b.Q5_K_M.gguf (version GGUF V2)
llama_model_loader: - tensor    0:                token_embd.weight q5_K     [  5120, 32000,     1,     1 ]
llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor    2:            blk.0.ffn_down.weight q6_K     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor    3:            blk.0.ffn_gate.weight q5_K     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.ffn_up.weight q5_K     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor    6:              blk.0.attn_k.weight q5_K     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    7:         blk.0.attn_output.weight q5_K     [  5120,  5120,     1, 

In [78]:
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user","content": "Which one is the largest planet in our solar system?"}
]

llm_chat.create_chat_completion(messages=messages)


llama_print_timings:        load time =   10779.74 ms
llama_print_timings:      sample time =       7.25 ms /    74 runs   (    0.10 ms per token, 10204.08 tokens per second)
llama_print_timings: prompt eval time =   10779.48 ms /    36 tokens (  299.43 ms per token,     3.34 tokens per second)
llama_print_timings:        eval time =    4503.99 ms /    73 runs   (   61.70 ms per token,    16.21 tokens per second)
llama_print_timings:       total time =   15377.24 ms


{'id': 'chatcmpl-af8526ed-1457-45c5-9614-3736332203d6',
 'object': 'chat.completion',
 'created': 1701952139,
 'model': '../llama-2-7b-chat-gguf/llama-2-7b-chat.Q5_K_M.gguf',
 'choices': [{'index': 0,
   'message': {'role': 'assistant',
    'content': "  The largest planet in our solar system is Jupiter! It has a diameter of approximately 89,000 miles (143,000 kilometers) and is more than 300 times more massive than Earth. It's so big that it could fit all the other planets in our solar system inside of it!"},
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 36, 'completion_tokens': 73, 'total_tokens': 109}}

In [80]:
streamed_response=llm_chat.create_chat_completion(messages=messages, stream=True)

In [81]:
for r in streamed_response:
    print(r)

Llama.generate: prefix-match hit


{'id': 'chatcmpl-cdeb9f1d-5d79-437b-96f4-2d0af27d9c34', 'model': '../llama-2-7b-chat-gguf/llama-2-7b-chat.Q5_K_M.gguf', 'created': 1701952204, 'object': 'chat.completion.chunk', 'choices': [{'index': 0, 'delta': {'role': 'assistant'}, 'finish_reason': None}]}
{'id': 'chatcmpl-cdeb9f1d-5d79-437b-96f4-2d0af27d9c34', 'model': '../llama-2-7b-chat-gguf/llama-2-7b-chat.Q5_K_M.gguf', 'created': 1701952204, 'object': 'chat.completion.chunk', 'choices': [{'index': 0, 'delta': {'content': ' '}, 'finish_reason': None}]}
{'id': 'chatcmpl-cdeb9f1d-5d79-437b-96f4-2d0af27d9c34', 'model': '../llama-2-7b-chat-gguf/llama-2-7b-chat.Q5_K_M.gguf', 'created': 1701952204, 'object': 'chat.completion.chunk', 'choices': [{'index': 0, 'delta': {'content': ' Ah'}, 'finish_reason': None}]}
{'id': 'chatcmpl-cdeb9f1d-5d79-437b-96f4-2d0af27d9c34', 'model': '../llama-2-7b-chat-gguf/llama-2-7b-chat.Q5_K_M.gguf', 'created': 1701952204, 'object': 'chat.completion.chunk', 'choices': [{'index': 0, 'delta': {'content': ','}


llama_print_timings:        load time =   10779.74 ms
llama_print_timings:      sample time =       9.28 ms /   106 runs   (    0.09 ms per token, 11417.49 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =    7187.85 ms /   106 runs   (   67.81 ms per token,    14.75 tokens per second)
llama_print_timings:       total time =    7333.18 ms


In [85]:
# {'id': 'chatcmpl-cdeb9f1d-5d79-437b-96f4-2d0af27d9c34',
#  'model': '../llama-2-7b-chat-gguf/llama-2-7b-chat.Q5_K_M.gguf', 
#  'created': 1701952204, 
#  'object': 'chat.completion.chunk', 
#  'choices': [
#      {'index': 0, 'delta': {'content': '!'}, 'finish_reason': None}]
# }

llm_chat.verbose=False
def consume_stream_chat_response(stream_response, color=True):
    for response in stream_response:
        if 'choices' in response and 'delta' in response['choices'][0] and 'content' in response['choices'][0]['delta']:
                print(response['choices'][0]['delta']['content'], end='', flush=True)


consume_stream_chat_response(llm_chat.create_chat_completion(messages=messages, stream=True))


  Ah, an excellent question! The largest planet in our solar system is Jupiter. It has a diameter of approximately 89,000 miles (143,000 kilometers) and is more than 300 times more massive than Earth. Its size and mass make it the most prominent planet in the solar system and give it a significant impact on the orbit of other planets. So, the answer to your question is Jupiter!

### Simple Chatbot

In [94]:
def llm_text_completion(contents, user, instance):
    return llm(contents, max_tokens=25)['choices'][0]['text']

def llm_chat_completion(contents, user, instance):
    messages = [
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user","content": f'{contents}'}
    ]
    return llm_chat.create_chat_completion(messages=messages, max_tokens=50)['choices'][0]['message']['content']


pn.chat.ChatInterface(
    callback=llm_chat_completion,
    widgets=pn.widgets.TextAreaInput(
        placeholder="Enter some text...", auto_grow=True, max_rows=3
    ),
)

### Streaming Chatbot

In [99]:
def llm_chat_completion_stream(contents, user, instance):
    messages_input = [
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user","content": f'{contents}'}
    ]
    stream_response = llm_chat.create_chat_completion(messages=messages_input, max_tokens=50, stream=True)
    message_response = ""
    for response in stream_response:
        if 'choices' in response and 'delta' in response['choices'][0] and 'content' in response['choices'][0]['delta']:
            message_response += response['choices'][0]['delta']['content']
            yield message_response


chat_interface=pn.chat.ChatInterface(
    callback=llm_chat_completion_stream,
    widgets=pn.widgets.TextAreaInput(
        placeholder="Enter some text...", auto_grow=True, max_rows=3
    ),
)

chat_interface.servable()


### Streaming Chatbot with memory

In [105]:
chat_interface.serialize()

[{'role': 'user', 'content': 'Hi!'},
 {'role': 'assistant',
  'content': "cannot access local variable 'context' where it is not associated with a value"}]

In [107]:
[{rolerow['role'],row['content']) for row in chat_interface.serialize()]

[('user', 'Hi!'),
 ('assistant',
  "cannot access local variable 'context' where it is not associated with a value")]

In [109]:


def llm_chat_completion_stream(contents, user, instance):
    system_msg = {"role": "system", "content": "You are a helpful assistant"}
    chat_history = chat_interface.serialize()
    new_msg = {"role": "user","content": f'{contents}'}
    messages_input = [system_msg] + chat_history + [new_msg]

    stream_response = llm_chat.create_chat_completion(messages=messages_input, max_tokens=50, stream=True)
    message_response = ""
    for response in stream_response:
        if 'choices' in response and 'delta' in response['choices'][0] and 'content' in response['choices'][0]['delta']:
            message_response += response['choices'][0]['delta']['content']
            yield message_response


chat_interface=pn.chat.ChatInterface(
    callback=llm_chat_completion_stream,
    widgets=pn.widgets.TextAreaInput(
        placeholder="Enter some text...", auto_grow=True, max_rows=3
    ),
)

chat_interface.servable()

### Custom chatbot

In [121]:
def llm_chat_completion_stream(contents, user, instance):
    system_msg = {"role": "system", "content": """
        Your name is OrderBot, an automated service to collect orders for a pizza restaurant. \
        You first greet the customer, then collects the order, \
        and then asks if it's a pickup or delivery. \
        You wait to collect the entire order, then summarize it and check for a final \
        time if the customer wants to add anything else. \
        If it's a delivery, you ask for an address. \
        You respond in a short, very conversational friendly style. \
        The menu includes \
        pepperoni pizza  $10.00 \
        cheese pizza   $8.95 \
        fries $4.00 \
        coke $3.00 \
        bottled water $5.00 \
    """}
    chat_history = chat_interface.serialize()
    new_msg = {"role": "user","content": f'{contents}'}
    messages_input = [system_msg] + chat_history + [new_msg]

    stream_response = llm_chat.create_chat_completion(messages=messages_input, stream=True)
    message_response = ""
    for response in stream_response:
        if 'choices' in response and 'delta' in response['choices'][0] and 'content' in response['choices'][0]['delta']:
            message_response += response['choices'][0]['delta']['content']
            yield message_response


chat_interface=pn.chat.ChatInterface(
    callback=llm_chat_completion_stream,
    widgets=pn.widgets.TextAreaInput(
        placeholder="Enter some text...", auto_grow=True, max_rows=3
    ),
)

chat_interface.servable()

In [117]:
chat_interface.serialize()

[{'role': 'user', 'content': 'Hi'},
 {'role': 'assistant',
  'content': "  Hey there! Welcome to OrderBot, your personal pizza concierge! 🍕👋 How can I help you today? Do you have a craving for some delicious pizza? Let me know and I'll take care of the rest. Are you picking up or having it delivered?"},
 {'role': 'user', 'content': "What's on the menu?"},
 {'role': 'assistant',
  'content': "  Oh, great! You've come to the right place! Our menu includes some mouth-watering options:\n* Pepperoni Pizza - $10.00\n* Cheese Pizza - $8.95\n* French Fries - $4.00\n* Coke - $3.00\n* Bottled Water - $5.00\n\nWhat can I get for you today?"},
 {'role': 'user', 'content': '2 waters only please'},
 {'role': 'assistant',
  'content': '  Great choice! Two bottles of water will be added to your order. Would you like me to add anything else to your order, such as a side of fries or a drink upgrade?'}]