<a href="https://colab.research.google.com/github/mcadete/LLM_4_Biz_16/blob/main/tech16_finetune_prepared.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using any model

In [1]:
%pip install 'aisuite[all]'



In [2]:
from google.colab import userdata
import os
open_ai_key = userdata.get('open_ai_key')
anthrophic_key = userdata.get('claude_key')
hf_key = userdata.get('hf_token')
os.environ["OPENAI_API_KEY"] = open_ai_key
os.environ["ANTHROPIC_API_KEY"] = anthrophic_key
os.environ["HF_TOKEN"] = hf_key

In [8]:
import aisuite as ai
client = ai.Client()

models = ["openai:gpt-4o", "anthropic:claude-3-opus-20240229"]

messages = [
    {"role": "system", "content": "You are the most senior level ai staff engineer. respond in clear and concise english."},
    {"role": "user", "content": "Provide a clear step by step guide to making an ai chat application using stremlit and google colab?"},
]

for model in models:
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.75
    )
    print(response.choices[0].message.content)


Creating an AI chat application using Streamlit and Google Colab involves several steps. Below is a clear, step-by-step guide to help you build a simple chat application:

### Step 1: Set Up Your Environment

1. **Google Colab Setup:**
   - Open Google Colab and create a new notebook.
   - Ensure you have access to a GPU by navigating to `Runtime > Change runtime type` and selecting `GPU`.

2. **Streamlit Setup:**
   - Install Streamlit in your local environment or a virtual environment using the command:
     ```bash
     pip install streamlit
     ```

### Step 2: Develop the Chatbot Model in Google Colab

1. **Import Required Libraries:**
   - Use libraries like `transformers`, `torch`, and `numpy` for model development.
     ```python
     !pip install transformers
     import torch
     from transformers import AutoModelForCausalLM, AutoTokenizer
     ```

2. **Load a Pre-trained Model:**
   - Use a pre-trained model like GPT-2 for the chatbot.
     ```python
     tokenizer = Auto

# Finetuning

In [4]:
!pip install transformers



In [5]:
import tensorflow as tf
import keras_nlp  # A Keras-based library for natural language processing tasks.
from tensorflow import keras
from transformers import AutoTokenizer, TFAutoModel
# Mixed Precision Training:
# This enables the model to use both 16-bit and 32-bit floating-point types.
# Using float16 for most operations reduces memory usage and speeds up computation,
# while keeping some operations in float32 maintains stability.
tf.keras.mixed_precision.set_global_policy('mixed_float16')

# ------------------------------
# Load the Pre-trained Gemma Model
# ------------------------------
print("Loading model (this may take a while)...")
# This command loads a pre-trained language model named GemmaCausalLM from Hugging Face.
# "Causal" means the model generates text in a sequential, left-to-right manner.

# base_model = keras_nlp.models.Llama3CausalLM.from_preset(
#   "hf://deepseek-ai/DeepSeek-R1"
# )

# base_model = TFAutoModel.from_pretrained("deepseek-ai/DeepSeek-R1")
# tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1")

base_model = keras_nlp.models.GemmaCausalLM.from_preset(
    "hf://google/gemma-2-2b-it"
)

# base_model = keras_nlp.models.Llama3CausalLM.from_preset(
#     "hf://meta-llama/Llama-3.2-1B"
# )


# Display the structure of the model, including layers and number of parameters.
base_model.summary()

# ------------------------------
# Enable LoRA Fine-Tuning
# ------------------------------
# LoRA (Low-Rank Adaptation) is a technique to efficiently fine-tune large models.
# Instead of updating every parameter in the model (which can be millions or billions),
# LoRA adds smaller matrices with a much lower rank (here, rank=2) to approximate the needed adjustments.
# Think of it as fine-tuning by "tweaking" only a few parameters instead of re-writing a whole book.
base_model.backbone.enable_lora(rank=2)
print("Enabled LoRA for efficient fine-tuning with reduced rank.")

# ------------------------------
# Prepare Training Data
# ------------------------------
# Here, we define a small dataset with pairs of symptoms and corresponding diseases.
# Each string follows the format:
# "Symptom: <list of symptoms>.\nDisease: <disease name>."
# The "\n" is a newline character that separates the symptoms from the disease.
# train_data = [
#     "Symptom: persistent cough, fever, difficulty breathing.\nDisease: Pneumonia.",
#     "Symptom: severe headache, neck stiffness, photophobia.\nDisease: Meningitis.",
#     "Symptom: sudden weakness on one side, slurred speech.\nDisease: Stroke.",
#     "Symptom: increased thirst, frequent urination, unexplained weight loss.\nDisease: Diabetes.",
#     "Symptom: joint pain, prolonged morning stiffness, swelling in multiple joints.\nDisease: Rheumatoid Arthritis."
# ]

# train_data = [
#     "Line Item: Starbucks, $5.67, 2025-02-28, Coffee Shop.\nLabel: Not Fraud.",
#     "Line Item: Unknown Merchant, $1200.00, 2025-02-27, Electronics.\nLabel: Fraud.",
#     "Line Item: Walmart Supercenter, $45.32, 2025-02-26, Groceries.\nLabel: Not Fraud.",
#     "Line Item: Luxury Boutique, $2200.00, 2025-02-28, Designer Clothing.\nLabel: Fraud.",
#     "Line Item: Uber, $18.75, 2025-02-27, Ride Share.\nLabel: Not Fraud."
# ]

train_data = [
    "Quote: May the Force be with you. \n: Movie: Star Wars, Character: Obi-Wan Kenobi, Release Date: 1977.",
    "Quote: I'll be back.\n Movie: The Terminator, Character: Terminator, Release Date: 1984.",
    "Quote: I'm going to make him an offer he can't refuse.\n: Movie: The Godfather, Character: Vito Corleone, Release Date: 1972.",
    "Quote: Here's looking at you, kid.\n Movie: Casablanca, Character: Rick Blaine, Release Date: 1942.",
    "Quote: You talking to me?\n Movie: Taxi Driver, Character: Travis Bickle, Release Date: 1976."
]

# ------------------------------
# Compile the Model
# ------------------------------
# Before training, the model is compiled by specifying:
# - A loss function: Measures how well the model's predictions match the actual labels.
# - An optimizer: Determines how the model's weights are updated during training.
# - Metrics: Additional measurements to judge performance (here, accuracy).
base_model.compile(
    # SparseCategoricalCrossentropy is used when you have multiple classes and your labels are integers.
    # "from_logits=True" indicates that the model's outputs are raw values (logits), not probabilities.
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    # Adam optimizer is chosen for its ability to adjust learning rates during training.
    # It combines ideas from momentum and adaptive learning rate techniques.
    optimizer=keras.optimizers.Adam(learning_rate=5e-5),
    # SparseCategoricalAccuracy computes the percentage of correct predictions.
    metrics=[keras.metrics.SparseCategoricalAccuracy()]
)

# ------------------------------
# Fine-Tune the Model
# ------------------------------
print("Starting fine-tuning...")
# The model is fine-tuned (trained) on the provided training data.
# Fine-tuning adjusts the model's weights to specialize in the new task (mapping symptoms to diseases).
# A batch size of 1 is used, meaning one training sample is processed at a time.
# The training runs for 10 epochs, meaning the model sees the entire dataset 10 times.
# train_encodings = tokenizer(train_data, truncation=True, padding=True, return_tensors="tf")
base_model.fit(train_data, batch_size=1, epochs=2)
print("Fine-tuning complete.")

# ------------------------------
# Save the Fine-Tuned Model
# ------------------------------
# After training, the model is saved in the recommended .keras format.
# This allows you to reuse the model later without retraining.
base_model.save("fine_tuned_model.keras")
print("Model saved.")


Loading model (this may take a while)...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Enabled LoRA for efficient fine-tuning with reduced rank.
Starting fine-tuning...
Epoch 1/2


ResourceExhaustedError: Graph execution error:

Detected at node StatefulPartitionedCall defined at (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main

  File "<frozen runpy>", line 88, in _run_code

  File "/usr/local/lib/python3.11/dist-packages/colab_kernel_launcher.py", line 37, in <module>

  File "/usr/local/lib/python3.11/dist-packages/traitlets/config/application.py", line 992, in launch_instance

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelapp.py", line 712, in start

  File "/usr/local/lib/python3.11/dist-packages/tornado/platform/asyncio.py", line 205, in start

  File "/usr/lib/python3.11/asyncio/base_events.py", line 608, in run_forever

  File "/usr/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once

  File "/usr/lib/python3.11/asyncio/events.py", line 84, in _run

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py", line 510, in dispatch_queue

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py", line 499, in process_one

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py", line 406, in dispatch_shell

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelbase.py", line 730, in execute_request

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/ipkernel.py", line 383, in do_execute

  File "/usr/local/lib/python3.11/dist-packages/ipykernel/zmqshell.py", line 528, in run_cell

  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 2975, in run_cell

  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 3030, in _run_cell

  File "/usr/local/lib/python3.11/dist-packages/IPython/core/async_helpers.py", line 78, in _pseudo_sync_runner

  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 3257, in run_cell_async

  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 3473, in run_ast_nodes

  File "/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py", line 3553, in run_code

  File "<ipython-input-5-2e09202b945c>", line 105, in <cell line: 0>

  File "/usr/local/lib/python3.11/dist-packages/keras_hub/src/utils/pipeline_model.py", line 177, in fit

  File "/usr/local/lib/python3.11/dist-packages/keras/src/utils/traceback_utils.py", line 117, in error_handler

  File "/usr/local/lib/python3.11/dist-packages/keras/src/backend/tensorflow/trainer.py", line 371, in fit

  File "/usr/local/lib/python3.11/dist-packages/keras/src/backend/tensorflow/trainer.py", line 219, in function

  File "/usr/local/lib/python3.11/dist-packages/keras/src/backend/tensorflow/trainer.py", line 132, in multi_step_on_iterator

Out of memory while trying to allocate 1065353216 bytes.
	 [[{{node StatefulPartitionedCall}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
 [Op:__inference_multi_step_on_iterator_76065]

In [None]:
# ------------------------------
# Reload the Model for Inference
# ------------------------------
# The saved model is reloaded for performing inference (generating predictions).

reloaded_model = keras.models.load_model("fine_tuned_model.keras")
print("Model reloaded for inference.")

# ------------------------------
# Set Up a Sampler for Text Generation
# ------------------------------
# When generating text, a sampler helps decide the next token (word or subword).
# GreedySampler always selects the token with the highest probability at each step.
sampler = keras_nlp.samplers.GreedySampler()
# The sampler is integrated into the model for use during inference.
reloaded_model.compile(sampler=sampler)


In [None]:
# Generate an answer for a given healthcare-related symptom prompt
# prompt = "Symptom: sudden weakness on one side, slurred speech.\nDisease:"
# prompt = "Line Item: random merchant, $543.67, 2025-02-31, Retail.\nLabel:"
prompt = "Quote: Greed is good.\n Movie:"
result = reloaded_model.generate(prompt, max_length=50)
print("Generated Response:")
print(result)
