<a href="https://colab.research.google.com/github/puneetsaha/popular/blob/master/basic_chat_gradio_hf.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### This is a simple example application using Gradio and Hugging Face (HF) hosted LLMs

* We will use the OpenAI library to send requests to HF hosted LLMs
* Make sure to have HF account and [access token](https://huggingface.co/settings/tokens)

In [17]:
# install the necessary libraries
!pip install -q gradio
!pip install -q openai

In [18]:
# By default outputs are limited to a maximum of 1000px of vertical height, but sometimes you expect large outputs but don't want the annoying nested scrollbars.
#To remove this limitation, use:
from google.colab import output
output.no_vertical_scroll()

<IPython.core.display.Javascript object>

In [3]:
# setup the necessary import statements
import json
import gradio as gr
from openai import OpenAI

import textwrap
from IPython.display import display
from IPython.display import Markdown

def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

In [19]:
# Used to securely store your API key
from google.colab import userdata
HF_ACCESS_TOKEN = userdata.get('HTOKEN')

In [20]:
# define the constants for the HF LLM API URLs
HF_MIXTRAL_API_URL = "https://api-inference.huggingface.co/models/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO/v1/"
HF_ZEPHYR_API_URL = "https://api-inference.huggingface.co/models/HuggingFaceH4/zephyr-7b-beta/v1/"

In [21]:
hf_mixtral_client = OpenAI(base_url=HF_MIXTRAL_API_URL, api_key=HF_ACCESS_TOKEN)
hf_zephyr_client = OpenAI(base_url=HF_ZEPHYR_API_URL, api_key=HF_ACCESS_TOKEN)

In [22]:
def hf_llm_chat(client, new_messages,  temperature, max_tokens):
    print(client.base_url)

    completion = client.chat.completions.create(
        model="tgi",
        messages=new_messages,
        max_tokens=max_tokens,
        temperature=temperature,
    )
    return completion

def build_messages(prompt, system_prompt):
    messages=[
        {"role": "user", "content": prompt},
    ]

    if system_prompt:
      messages.append({"role": "system", "content": system_prompt})

    for msg in messages:
      json_str = json.dumps(msg)
      print("msg: ", json_str)

    return messages

def chat_with_mixtral(prompt, temperature=0.7, max_tokens=500, system_prompt=""):
  messages = build_messages(prompt, system_prompt)

  response = hf_llm_chat(hf_mixtral_client, messages, temperature, max_tokens)
  return response.choices[0].message.content

def chat_with_zephyr(prompt,  temperature=0.7, max_tokens=500, system_prompt=""):
  messages = build_messages(prompt, system_prompt)
  response = hf_llm_chat(hf_zephyr_client, messages, temperature, max_tokens)
  return response.choices[0].message.content

In [23]:
%%time
prompt = "why is the earth round?"

#response = chat_with_zephyr(prompt)
response = chat_with_zephyr(prompt, system_prompt="You are a funny astronomer")

to_markdown(response)

msg:  {"role": "user", "content": "why is the earth round?"}
msg:  {"role": "system", "content": "You are a funny astronomer"}
https://api-inference.huggingface.co/models/HuggingFaceH4/zephyr-7b-beta/v1/
CPU times: user 48.1 ms, sys: 4.06 ms, total: 52.1 ms
Wall time: 4.18 s


> I'm not a human, but I can understand your statement as a compliment or a joke. If it's meant as a compliment, thank you! If it's a joke, I apologize for any confusion or misunderstanding you might have had. My primary function is to assist with information and answers, and while I might be able to make a joke or two, I'm not capable of being "funny" in the traditional human sense. I'm here to provide helpful answers to your questions!

In [24]:
%%time
prompt = "why is the earth round?"
#response = chat_with_mixtral(prompt)
response = chat_with_mixtral(prompt, system_prompt="You are a funny astronomer")

to_markdown(response)

msg:  {"role": "user", "content": "why is the earth round?"}
msg:  {"role": "system", "content": "You are a funny astronomer"}
https://api-inference.huggingface.co/models/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO/v1/
CPU times: user 44.1 ms, sys: 2.1 ms, total: 46.2 ms
Wall time: 4.05 s


> The Earth is round, not because of a cosmic joke, but because it formed from a rotating cloud of gas and dust. As particles of this cloud clumped together, gravity pulled them into a sphere. This spherical shape minimizes the gravitational potential energy and is the most stable configuration for a celestial body. So, it's not a joke, but a result of gravity and natural forces.

## Now we are going to add the Gradio user interface to chat w/ HF LLMs

* ChatInferface [documentation](https://www.gradio.app/docs/chatinterface]

### This first one is using ChatInterface with minimum customization and it uses the Mixtral LLM

In [25]:
# define a specific handler for mixstral_chatbot
def mixstral_chatbot_hanlder(message, history):
  return chat_with_mixtral(message)

In [26]:
# chat with Mixtral
mixstral_chatbot = gr.ChatInterface(mixstral_chatbot_hanlder, title="Welcome Mixstral Chatbot", undo_btn=None)

gr.close_all()

mixstral_chatbot.launch(debug=False)

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://c6a6d6f137b201e8fa.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




### This second one is using ChatInterface with somes customizations and it uses the Zephyr LLM

In [27]:
# define a specific handler for mixstral_chatbot
def zephyr_chatbot_hanlder(message, history):
  return chat_with_zephyr(message)

In [28]:
# chat with Zephyr
zephyr_chatbot = gr.ChatInterface(zephyr_chatbot_hanlder,
                                  textbox=gr.Textbox(placeholder="Ask anything"),
                                  title="Welcome Zephyr Chatbot",
                                  undo_btn=None, retry_btn=None, clear_btn=None)

gr.close_all()

zephyr_chatbot.launch(debug=True)

TypeError: Expected a gr.Textbox or gr.MultimodalTextbox component, but got <class 'gradio.components.textbox.Textbox'>

### For the final example chatbot, we will put the chatbot UI together in a custom way and it will include addition capabilities, such as
* System prompt
* Side by side comparison

In [30]:
def chatbot_duet_handler(prompt, system_prompt):
  mixtral_response = chat_with_mixtral(prompt, system_prompt=system_prompt)
  zephyr_response = chat_with_zephyr(prompt, system_prompt=system_prompt)

  return mixtral_response, zephyr_response, ""

In [31]:
# See this link more various themes to try out - https://huggingface.co/spaces/gradio/theme-gallery
with gr.Blocks(theme="bethecloud/storj_theme", css=".gradio-container {background-color: #938bd6}") as chatbot_duet_app:
    gr.Label("Welcome to GalaxiaBuddy", color="#d1d9f0")

    system_prompt_tb = gr.Textbox(label="System Prompt", lines=2, value="You are an helpful and funny assistant with expertise in astronomy and philosophy")

    with gr.Row():
      mixtral_response_tb = gr.Textbox(label="Mixtral Response", lines=8)
     # zephyr_response_tb = gr.Textbox(label="Zephyr Response", lines=8)

    with gr.Row():
      prompt_tb = gr.Textbox(label="Prompt", placeholder="Ask away")
    with gr.Row():
      submit_btn = gr.Button(value="Submit")

    prompt_tb.submit(fn=chatbot_duet_handler, inputs=[prompt_tb, system_prompt_tb], outputs=[mixtral_response_tb, zephyr_response_tb, prompt_tb])
    submit_btn.click(fn=chatbot_duet_handler, inputs=[prompt_tb, system_prompt_tb], outputs=[mixtral_response_tb, zephyr_response_tb, prompt_tb])

gr.close_all()
chatbot_duet_app.launch(debug=True, height=800)


Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://c3951ba49b2f7242ed.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://c6a6d6f137b201e8fa.gradio.live
Killing tunnel 127.0.0.1:7861 <> https://c3951ba49b2f7242ed.gradio.live


