<a href="https://colab.research.google.com/github/sourcesync/kagglex_gemma/blob/gw%2Finitial/colab/Rukayat_medical_chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# This notebook demonstrates:
* Loads a fined-tuned model stored in Google Drive
* Uses gradio as chatbot interface within the notebook

# Install required packages

In [4]:
%%time
!pip install -q -U keras-nlp
!pip install -q -U keras>=3
!pip install gradio

CPU times: user 60.9 ms, sys: 3.77 ms, total: 64.7 ms
Wall time: 7.15 s


# Import required packages

In [5]:
import os
import keras
import keras_nlp
from IPython.display import Markdown
import textwrap
import gradio as gr

# Configure this notebook


In [6]:
os.environ["KERAS_BACKEND"] = "jax"  # Or "torch" or "tensorflow".
# Avoid memory fragmentation on JAX backend.
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"]="1.00"

# Define some useful functions

In [7]:
def display_chat(prompt, response):
  '''Displays an LLM prompt and response in a pretty way.'''
  prompt = prompt.replace('\n\n','<br><br>')
  prompt = prompt.replace('\n','<br>')
  formatted_prompt = "<font size='+1' color='brown'>🙋‍♂️<blockquote>" + prompt + "</blockquote></font>"
  response = response.replace('•', '  *')
  response = textwrap.indent(response, '', predicate=lambda _: True)
  response = response.replace('\n\n','<br><br>')
  response = response.replace('\n','<br>')
  response = response.replace("```","")
  formatted_text = "<font size='+1' color='teal'>🤖<blockquote>" + response + "</blockquote></font>"
  return Markdown(formatted_prompt+formatted_text)

# Load the model stored in Google Drive
* Note: You need to mount your local Google Drive
* Your path will likely be different

In [8]:
%%time
# Rukayat - gemma_lm = keras_nlp.models.CausalLM.from_preset("/kaggle/input/bert/keras/finetuned_gemma2/1")
gemma_lm = keras_nlp.models.CausalLM.from_preset("/content/drive/MyDrive/Kaggle_X/Rukayat/bert/keras/finetuned_gemma2/1")
gemma_lm.summary()

CPU times: user 8.72 s, sys: 18.7 s, total: 27.5 s
Wall time: 2min 38s


# Test the model with a prompt

In [10]:
%%time
# Ask a simple query this time using a specific template per the documentation
template = "Instruction:\n{instruction}\n\nResponse:\n{response}"
prompt = template.format(
    instruction="Can you tell me if i can take fluconazole with my simvastatin?",
    response="",
)
completion = gemma_lm.generate(prompt)
response = completion.replace(prompt, "")
display_chat(prompt, response)

CPU times: user 41.9 s, sys: 312 ms, total: 42.2 s
Wall time: 50.2 s


<font size='+1' color='brown'>🙋‍♂️<blockquote>Instruction:<br>Can you tell me if i can take fluconazole with my simvastatin?<br><br>Response:<br></blockquote></font><font size='+1' color='teal'>🤖<blockquote>Fluconazole can increase the levels of simvastatin in the blood, which may increase the risk of side effects like muscle pain or liver problems. It’s important to monitor for symptoms and consult your doctor for advice.</blockquote></font>

# Prepare function for the gradio chat interface
* create a function which wraps the interaction with the model returning a gradio-friendly generator object

In [14]:
def chat_with_model(message, history):
    '''Generates a response from the finetuned Gemma model'''
    template = "Instruction:\n{instruction}\n\nResponse:\n{response}"
    prompt = template.format(
    instruction=message,
    response="",
)
    completion = gemma_lm.generate(prompt, max_length=1024)
    answer = completion.replace(prompt, "")
    response = {"role": "assistant", "content": ""}
    response['content'] += answer
    yield response

# Create a simple gradio chat interface and launch it

In [15]:
%%time
demo = gr.ChatInterface(chat_with_model,
                        type="messages",
                        description = "Gemma-powered Drug Interactions AI App")
demo.launch(debug=True)

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://20b7e6152f3d101912.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


CPU times: user 709 ms, sys: 54.3 ms, total: 764 ms
Wall time: 2.7 s




# Shutdown the demo interface

In [None]:
demo.close()