# Gradio Experiment. Ollama ChatBot.

Lets create a ChatBot using Ollama and Gradio. We can create a dropdown to choose among few Ollama models. 

The purpose is to get a local ChatBot to test for coding assistance.

Let's install/ update gradio first.

In [1]:
%pip install -U gradio
%pip install -U ollama

Collecting gradio
  Downloading gradio-5.21.0-py3-none-any.whl.metadata (16 kB)
Downloading gradio-5.21.0-py3-none-any.whl (46.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.2/46.2 MB[0m [31m72.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: gradio
  Attempting uninstall: gradio
    Found existing installation: gradio 5.20.0
    Uninstalling gradio-5.20.0:
      Successfully uninstalled gradio-5.20.0
Successfully installed gradio-5.21.0
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
from langchain_ollama import ChatOllama
from ollama import chat
from ollama import Client
import gradio as gr
from typing import Iterator, Optional, Any

from langchain_community.tools.tavily_search import TavilySearchResults

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

In [3]:
# defining the custom client to target ollama docker service
client = Client(
  host='ollama',
)

In [2]:
# using LangChain
llm = ChatOllama(
    model="llama3.1:8b-instruct-fp16",
    temperature=0.12,
    base_url="ollama"
)

In [7]:
gr.load_chat("http://ollama:11434/v1/", model="llama3.1:8b-instruct-fp16", token="ollama").launch(share=True)

* Running on local URL:  http://127.0.0.1:7864
* Running on public URL: https://aa2c687c02ec85dd91.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [4]:
system = "You are a friendly and helpful assistant. Your aim is to answer the questions posed by your human friends thoroughly, providing examples where possible."

### Lets Create a Class

Here is some description of what it should do:

- Takes in the constructor:
    1. model name out of Literal of supported names that have been loaded in ollama
    2. system_message
    3. temperature
    4. top_p
- exposes get_stream function that takes in messages and returns the stream from the model defined above
- parameters should be obtained from gradio chat interface with few more inputs.
- model is a dropdown to limit what can be used
- the rest of the inputs are text fields
- potentially extend it with tool use later

In [3]:
from typing_extensions import Literal, List, TypedDict
from ollama import Client

class ChatWrapper:
    client = Client(
      host='ollama',
    )
    def __init__(self, 
                 model: Literal['llama3.2:latest', 'qwen2.5-coder:32b', 'qwen2.5:14b', 'llama3.1:8b-instruct-fp16', 'deepseek-r1:32b'],
                 system_message: str,
                 temperature: float = 0.25,
                 top_p: float = 0.8
                ):
        self.model = model
        self.system_message = [{"role": "system", "content": system_message}]
        self.temperature = temperature
        self.top_p = top_p

    def get_stream(self, messages: List[TypedDict]):
        return ChatWrapper.client.chat(
            model=self.model,
            messages=self.system_message + messages,
            stream=True,
            options={"temperature": self.temperature,
                    "top_p": self.top_p
                    }
        )

    def get_response(self, messages: List[TypedDict]) -> str:
        return ChatWrapper.client.chat(
            model=self.model,
            messages=self.system_message + messages,
            stream=False,
            options={"temperature": self.temperature,
                    "top_p": self.top_p
                    }
        )
    def get_with_tools(self, messages: List[TypedDict], tools: List) -> Iterator[dict]:
 
        return ChatWrapper.client.chat(
            model=self.model,
            messages=self.system_message + messages,
            stream=False,
            options={"temperature": self.temperature,
                    "top_p": self.top_p
                    },
            tools=tools
        )

In [4]:
def web_search(query: str) -> Optional[list[dict[str, Any]]]:

    """
    Query search engine.
    This function queirs the web to fetch comprehensive, accurate and trusted results. It's useful
    for answering questions about current events.
    
    Args:
        query (str): The query to search
    
    Returns:
        list: A list of the results
    """
    if query:
        wrapped = TavilySearchResults(max_results=3)
        result = wrapped.invoke({"query": query})
        return result
    else:
        raise ValueError("No query was supplied by the LLM")

In [63]:
## Testing DeepSeek without Gradio
tools_map = {"web_search": web_search}
wrapper = ChatWrapper(model="deepseek-r1:32b", 
                     system_message="You are a friendly, polite AI assistant that answers questions",
                    temperature=0.15)
wrapper_qwen = ChatWrapper(model="qwen2.5-coder:32b", 
                     system_message="You are a friendly, polite AI assistant that answers questions. \
                     User your vast knowledge to answer the questions posed by our human friend. \
                     You have 'web_search' tool at your disposal. Only use the tool for questions that \
                     are concerning current events or you have insufficient knowledge of. \
                     Otherwise answer the question directly.",
                    temperature=0.1)
messages = [{"role": "user", "content": "what was the command to clean up yay cache on an Arch based distro?"}]

In [64]:
response = wrapper_qwen.get_with_tools(messages, [web_search])

In [65]:
max_iter = 0
while response.message.tool_calls and max_iter < 2:
    for tool in response.message.tool_calls:
        formatted_search_docs = ''
        if function_to_call := tools_map.get(tool.function.name):
            print('Calling function:', tool.function.name)
            
            output = function_to_call(**tool.function.arguments)
            formatted_search_docs = "\n\n---\n\n".join(
                [
                    f'<Document href="{doc["url"]}"/>\n{doc["content"]}\n</Document>'
                    for doc in output
                ]
            )
            
        else:
          print('Function', tool.function.name, 'not found')
    messages.append({'role': 'tool', 'content': formatted_search_docs, 'name': tool.function.name})
    response = wrapper_qwen.get_with_tools(messages, [web_search])
    print(response.message.content)
    max_iter += 1
    

In [66]:
print(response.message.content)

To clean up the Yay cache on an Arch-based distribution, you can use the following command:

```bash
yay -Sc
```

This command will remove unused packages from the cache. If you want to remove all cached packages, you can use:

```bash
yay -Scc
```

Please be cautious with `yay -Scc` as it will clear your entire package cache, which might require re-downloading packages in the future.


In [26]:
formatted_search_docs = "\n\n---\n\n".join(
        [
            f'<Document href="{doc["url"]}"/>\n{doc["content"]}\n</Document>'
            for doc in output
        ]
    )

In [56]:
chunks = []
for chunk in stream:
    # chunks.append(chunk["message"]["content"])
    print( chunk["message"]["content"], end="", flush=True)

<think>
Okay, so I need to figure out whether Elrond or Tom Bombadil is older in the Lord of the Rings universe. Hmm, both are characters from J.R.R. Tolkien's works, but they don't interact much, so it's not immediately clear which one is older.

First, let me recall what I know about each character. Elrond is a prominent figure in Middle-earth, especially in Rivendell. He was present during the First Age and played a role in significant events like the War of the Ring. Tom Bombadil, on the other hand, is more of a mysterious figure who appears early in The Fellowship of the Ring when Frodo and his companions encounter him in the Old Forest.

I think Elrond's age is mentioned somewhere. I remember that he was born during the First Age, specifically around the time of the creation of the Silmarils. That would make him very old by the time of the Third Age when The Lord of the Rings takes place. Tom Bombadil's origins are a bit more unclear. He seems to be an ancient being, perhaps even

In [5]:
def resolve_tool_response(response: dict, chat_wrapper: ChatWrapper, tools_map: dict, messages: list):
    max_iter = 0
    while response.message.tool_calls and max_iter < 2:
        for tool in response.message.tool_calls:
            formatted_search_docs = ''
            if function_to_call := tools_map.get(tool.function.name):
                print('Calling function:', tool.function.name)
                
                output = function_to_call(**tool.function.arguments)
                formatted_search_docs = "\n\n---\n\n".join(
                    [
                        f'<Document href="{doc["url"]}"/>\n{doc["content"]}\n</Document>'
                        for doc in output
                    ]
                )
                
            else:
              print('Function', tool.function.name, 'not found')
        messages.append({'role': 'tool', 'content': formatted_search_docs, 'name': tool.function.name})
        response = chat_wrapper.get_with_tools(messages, [web_search])
        print(response.message.content)
        max_iter += 1
    return response
    
def chatter(message, history: list, model: str, tools: str, system_message: str, temperature: float):
    tools_map = {"web_search": web_search}
    chat_wrapper = ChatWrapper(model=model, 
                               system_message=system_message,
                               temperature=float(temperature),
                               top_p=0.85
                              )
    history.append({"role": "user", "content": message})
  
    if tools in tools_map:
        response = chat_wrapper.get_with_tools(history, [tools_map[tools]])
        # lets catch the tool call
        if response.message.tool_calls:
            print("need tools")
            response = resolve_tool_response(response, chat_wrapper, tools_map, history)
        
        yield response["message"]["content"]
    else:
        stream = chat_wrapper.get_stream(history)
        chunks = []
        for chunk in stream:
            chunks.append(chunk["message"]["content"])
            yield "".join(chunks)
    # print(type(response["message"]["content"]))
    
    # yield response["message"]["content"]
    

In [4]:
# messages = [{"role": "system", "content": system}]
# messages.append({"role": "user", 
#                  "content": "Write me a 1000 word story on elves and dwarves who are in alliance against an evil human king."})

# # using chat from Ollama with custom client
# # llama3.1:8b-instruct-fp16
# # qwen2.5-coder:32b
# stream = client.chat(
#     model='qwen2.5-coder:32b',
#     messages=messages,
#     stream=True,
    
# )

NameError: name 'system' is not defined

In [8]:
messages

[{'role': 'system',
  'content': 'You are a friendly and helpful assistant. Your aim is to answer the questions posed by your human friends thoroughly, providing examples where possible.'},
 {'role': 'user',
  'content': 'Who is the oldest on Arda that has a role in the LOTR books?'}]

In [6]:

for c in stream:
    print(c['message']['content'], flush=True, end="")

In the realm of Eridoria, where the sun dipped into the horizon and painted the sky with hues of crimson and gold, two ancient enemies had put aside their differences to form an unlikely alliance against a common foe: King Malakai, the tyrannical ruler of the human kingdom.

For centuries, elves and dwarves had clashed in battle, each side convinced that they were the superior beings. The elves, with their lithe bodies and agile limbs, were skilled archers and assassins, while the dwarves, sturdy and proud, excelled at mining and smithing. Their conflicts had shaped the history of Eridoria, leaving scars that would never fully heal.

However, as King Malakai's power grew, his ambition consumed him. He sought to conquer all of Eridoria, enslaving its inhabitants and bending them to his will. His armies marched across the land, leaving destruction in their wake.

It was amidst this darkness that the elves and dwarves found common ground. In a secret gathering deep within the heart of the

In [None]:
def update_system_prompt(selected_tool):
    if selected_tool == "web_search":
        return gr.update(value="You are a research assistant with access to a 'web_search' tool. Use it to find recent or unknown information.")
    return gr.update(value="You are a friendly and helpful assistant. Your aim is to answer the questions posed by your human friends thoroughly, providing examples where possible.")


In [7]:
system_tools = "You are a friendly and helpful assistant. Your aim is to answer the questions posed by your human friends thoroughly, providing \
examples where possible. You have access to a 'web_search' tool to help you with questions that concern recent events or topics that \
you have limited knowledge of. If the question can be answered by using your own knowledge base, do not resort to the tool use.".strip()

system = "You are a friendly and helpful assistant. Your aim is to answer the questions posed by your human friends thoroughly, providing \
examples where possible.".strip()

In [10]:
gr.ChatInterface(
    fn=chatter, 
    type="messages",
    additional_inputs=[
        gr.Dropdown(['llama3.1:8b-instruct-fp16', 'qwen2.5-coder:32b', 'llama3.2', 'qwen2.5:14b', 'deepseek-r1:32b'], value='qwen2.5:14b', label="Choose Downloaded Ollama Model"),
        gr.Dropdown(['No tools', 'web_search'], value='web_search', label="Choose Tools to Use"),
        gr.TextArea(system.strip(), label="System Prompt"),
        gr.Textbox(0.1, label="Temperature")
    ]).launch(share=True)

* Running on local URL:  http://127.0.0.1:7861
* Running on public URL: https://a1827ec920874bc6b5.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




> The below does not work since the `gr.ChatInterface().launch()` does not support `close()` method.

In [4]:
import gradio as gr

class ChatInterfaceContext:
    def __init__(self, fn, type="messages", additional_inputs=None):
        self.fn = fn
        self.type = type
        self.additional_inputs = additional_inputs if additional_inputs is not None else []
        self.app = None

    def __enter__(self):
        chatbot = gr.ChatInterface(
                    fn=self.fn,
                    type=self.type,
                    additional_inputs=self.additional_inputs
                )
            
        self.app = chatbot.launch(share=True, inbrowser=False)
        return self.app

    def __exit__(self, exc_type, exc_value, traceback):
        if self.app:
            # not supported
            self.app.close()

def chatter(message, history, model, system_prompt, temperature):
    # Your chat logic here
    response = f"Response from {model} with prompt: {system_prompt} and temperature: {temperature}"
    history.append((message, response))
    return "", history

with ChatInterfaceContext(
    fn=chatter,
    additional_inputs=[
        gr.Dropdown(['llama3.1:8b-instruct-fp16', 'qwen2.5-coder:32b', 'llama3.2', 'deepseek-r1:32b'], value='deepseek-r1:32b', label="Choose Downloaded Ollama Model"),
        gr.TextArea("You are a friendly and helpful assistant. Your aim is to answer the questions posed by your human friends thoroughly, providing examples where possible.", label="System Prompt"),
        gr.Slider(0.0, 1.0, value=0.1, step=0.01, label="Temperature")
    ]
) as demo:
    print("Gradio app is running.")

* Running on local URL:  http://127.0.0.1:7862
* Running on public URL: https://9b06d4caad36a9abd5.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Gradio app is running.


AttributeError: 'TupleNoPrint' object has no attribute 'close'

Traceback (most recent call last):
  File "/opt/conda/lib/python3.12/site-packages/gradio/queueing.py", line 625, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/gradio/blocks.py", line 2042, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/gradio/blocks.py", line 1587, in call_function
    prediction = await fn(*processed_input)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/gradio/utils.py", line 850, in async_wrapper
    response = await f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-

In [8]:
def update_system_prompt(selected_tool):
    if selected_tool == "web_search":
        return gr.update(value=system_tools)
    return gr.update(value=system)


In [9]:
with gr.Blocks() as demo:
    chatbot = gr.Chatbot(type="messages")
    model_dropdown = gr.Dropdown(
        ['llama3.1:8b-instruct-fp16', 'qwen2.5-coder:32b', 'llama3.2', 'deepseek-r1:32b'],
        value='qwen2.5-coder:32b',
        label="Choose Downloaded Ollama Model"
    )
    
    tools_dropdown = gr.Dropdown(
        ['No tools', 'web_search'],
        value='No tools',
        label="Choose Tools to Use"
    )

    system_prompt = gr.TextArea(
        system,
        label="System Prompt"
    )

    temperature = gr.Textbox(0.1, label="Temperature")

    tools_dropdown.change(update_system_prompt, tools_dropdown, system_prompt)

demo.launch(share=True)

* Running on local URL:  http://127.0.0.1:7860
* Running on public URL: https://11d960ef5c040c3495.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [8]:
client.pull("qwen2.5-coder:32b")

{'status': 'success'}

In [5]:
client.pull('deepseek-r1:32b')

{'status': 'success'}

In [None]:
client.pull("qwen2.5:14b")

In [19]:
import json

In [4]:
for model in client.list().models:
    print(model.json())

{"model":"qwen2.5:14b","modified_at":"2025-03-08T13:16:08.873753Z","digest":"7cdf5a0187d5c58cc5d369b255592f7841d1c4696d45a8c8a9489440385b22f6","size":8988124069,"details":{"parent_model":"","format":"gguf","family":"qwen2","families":["qwen2"],"parameter_size":"14.8B","quantization_level":"Q4_K_M"}}
{"model":"deepseek-r1:32b","modified_at":"2025-02-03T07:45:31.124246Z","digest":"38056bbcbb2d068501ecb2d5ea9cea9dd4847465f1ab88c4d4a412a9f7792717","size":19851337640,"details":{"parent_model":"","format":"gguf","family":"qwen2","families":["qwen2"],"parameter_size":"32.8B","quantization_level":"Q4_K_M"}}
{"model":"qwen2.5-coder:32b","modified_at":"2025-01-26T14:18:39.353482Z","digest":"4bd6cbf2d094264457a17aab6bd6acd1ed7a72fb8f8be3cfb193f63c78dd56df","size":19851349856,"details":{"parent_model":"","format":"gguf","family":"qwen2","families":["qwen2"],"parameter_size":"32.8B","quantization_level":"Q4_K_M"}}
{"model":"llama3.1:8b-instruct-fp16","modified_at":"2024-11-20T15:44:02.680559Z","dig