# Gemma-3n-Swahili: Improving Gemma-3n Swahili Capabilities

## Install Unsloth and dependencies

In [None]:
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1,<4.0.0" huggingface_hub hf_transfer
    !pip install --no-deps unsloth

In [None]:
%%capture
# Install latest transformers for Gemma 3N
!pip install --no-deps --upgrade timm # Only for Gemma 3N

### Unsloth

`FastModel` supports loading nearly any model now! This includes Vision and Text models!

In [None]:
from unsloth import FastModel
import torch

fourbit_models = [
    # 4bit dynamic quants for superior accuracy and low memory use
    "unsloth/gemma-3n-E4B-it-unsloth-bnb-4bit",
    "unsloth/gemma-3n-E2B-it-unsloth-bnb-4bit",
    # Pretrained models
    "unsloth/gemma-3n-E4B-unsloth-bnb-4bit",
    "unsloth/gemma-3n-E2B-unsloth-bnb-4bit",

    # Other Gemma 3 quants
    "unsloth/gemma-3-1b-it-unsloth-bnb-4bit",
    "unsloth/gemma-3-4b-it-unsloth-bnb-4bit",
    "unsloth/gemma-3-12b-it-unsloth-bnb-4bit",
    "unsloth/gemma-3-27b-it-unsloth-bnb-4bit",
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastModel.from_pretrained(
    model_name = "unsloth/gemma-3n-E2B-it",
    dtype = None, # None for auto detection
    max_seq_length = 1024, # Choose any for long context!
    load_in_4bit = True,  # 4 bit quantization to reduce memory
    full_finetuning = False, # [NEW!] We have full finetuning now!
    # token = "hf_...", # use one if using gated models
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.8.1: Fast Gemma3N patching. Transformers: 4.54.0.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Gemma3N does not support SDPA - switching to eager!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/2.65G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/469M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/210 [00:00<?, ?B/s]

processor_config.json:   0%|          | 0.00/98.0 [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

preprocessor_config.json: 0.00B [00:00, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/4.70M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/33.4M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/777 [00:00<?, ?B/s]

# Gemma 3N can process Text, Vision and Audio!

Let's first experience how Gemma 3N can handle multimodal inputs. We use Gemma 3N's recommended settings of `temperature = 1.0, top_p = 0.95, top_k = 64`

In [None]:
from transformers import TextStreamer
# Helper function for inference
def do_gemma_3n_inference(messages, max_new_tokens = 128):
    _ = model.generate(
        **tokenizer.apply_chat_template(
            messages,
            add_generation_prompt = True, # Must add for generation
            tokenize = True,
            return_dict = True,
            return_tensors = "pt",
        ).to("cuda"),
        max_new_tokens = max_new_tokens,
        temperature = 1.0, top_p = 0.95, top_k = 64,
        streamer = TextStreamer(tokenizer, skip_prompt = True),
    )

# Gemma 3N can see images!

<img src="https://files.worldwildlife.org/wwfcmsprod/images/Kenya_Giraffes_Travel_8.16.2012/story_full_width/2ump6hpm1j_giraffes_WWF_US_Terry_Macko.jpg" alt="Alt text" height="256">

In [None]:
giraffe_link = "https://files.worldwildlife.org/wwfcmsprod/images/Kenya_Giraffes_Travel_8.16.2012/story_full_width/2ump6hpm1j_giraffes_WWF_US_Terry_Macko.jpg"

messages = [{
    "role" : "user",
    "content": [
        { "type": "image", "image" : giraffe_link },
        { "type": "text",  "text" : "Unaweza kuniambia huyu mnyama anaitwa nani na ana sifa zipi?" }
    ]
}]
# You might have to wait 1 minute for Unsloth's auto compiler
do_gemma_3n_inference(messages, max_new_tokens = 256)

Mnyama huyo unaitwa **giraffe**, au **giraffe ya Afrika**.

Sifa zake ni:

* **Mavutwa:** Ina mavutwa mweusi yenye vidogo vidogo na ni mharibai sana.
* **Mitaa ya mavutwa:** Ina mitaa ya mavutwa iliyopatikana juu ya mmea mrefu, ikihimili kutoa chakula.
* **Moyo mkubwa:** Ina moyo mkubwa sana, unaweza kuendesha mitaa ya mavutwa kwa muda mrefu.
* **Mita:** Inaweza kuwa mrefu sana, hali ya kawaida ni urefu wa karibu mita 4.5 hadi 6.
* **Mwezi:** Ina mwezi mrefu, unaweza kuishi kwa muda mrefu bila kula.
* **Mmea mrefu:** Inaweza kupata chakula cha juu sana kwa kuangalia mmea mrefu.

Giraffe ni mnyama mrefu na mchumlu unaopatikana barani Afrika.<end_of_turn>


**From The Above despite correct Swahili generation the model incorrectly names the animal in swahili as 'ng'ombe wa savanna' instead of 'Twiga'**

# Gemma 3N can also hear!

In [None]:
from IPython.display import Audio, display
Audio("https://www.nasa.gov/wp-content/uploads/2015/01/591240main_JFKmoonspeech.mp3")

In [None]:
!wget -qqq https://www.nasa.gov/wp-content/uploads/2015/01/591240main_JFKmoonspeech.mp3 -O kennedy-speech.mp3

In [None]:
audio_file = "kennedy-speech.mp3"

messages = [{
    "role" : "user",
    "content": [
        { "type": "audio", "audio" : audio_file },
        { "type": "text",  "text" : "Tafsiri hotuba hii kwa kiswahili sanifu" }
    ]
}]
do_gemma_3n_inference(messages, max_new_tokens = 512)

Ninakuwepo kwa kumwongoza nchi hii kufikia jicho hili kwa muda wa mwisho wa dekada hii, kwa kuwezesha mtu mmoja kupata mwezi na kurudiwa salama kwenye ardhi ya dunia.<end_of_turn>


**The model has the ability to understand Swahili but still fails on concrete text generation, sentence structure and coherence**

# Data

In [None]:
!pip install --quiet kagglehub

In [None]:
# =============================================================================
# BLOCK 1: IMPORTS AND SETUP
# =============================================================================

# Required installations (run once)
# !pip install kagglehub datasets huggingface_hub pandas

import os
import pandas as pd
from pathlib import Path
import kagglehub
from datasets import load_dataset
from huggingface_hub import login, hf_hub_download
import getpass

print(" All required libraries imported successfully!")

 All required libraries imported successfully!


In [None]:
# =============================================================================
# BLOCK 2: DIRECTORY STRUCTURE SETUP
# =============================================================================

# Create directories for organized storage
DATA_DIR = Path("./datasets")
KAGGLE_DIR = DATA_DIR / "kaggle"
HF_DIR = DATA_DIR / "huggingface"

# Create directories
for dir_path in [DATA_DIR, KAGGLE_DIR, HF_DIR]:
    dir_path.mkdir(exist_ok=True, parents=True)

print("📁 Created directory structure:")
print(f"   Main data directory: {DATA_DIR}")
print(f"   Kaggle datasets: {KAGGLE_DIR}")
print(f"   HuggingFace datasets: {HF_DIR}")
print("✅ Directory setup complete!")

📁 Created directory structure:
   Main data directory: datasets
   Kaggle datasets: datasets/kaggle
   HuggingFace datasets: datasets/huggingface
✅ Directory setup complete!


In [None]:
# =============================================================================
# BLOCK 3: HUGGINGFACE AUTHENTICATION
# =============================================================================

def setup_hf_auth():
    """Setup HuggingFace authentication for gated datasets"""
    try:
        # Check if already logged in
        from huggingface_hub import whoami
        user_info = whoami()
        print(f"✅ Already authenticated as: {user_info['name']}")
        return True
    except:
        print("\n🔐 HuggingFace Authentication Required for Gated Datasets")
        print("Please provide your HuggingFace token (get it from: https://huggingface.co/settings/tokens)")

        # Get token securely
        hf_token = getpass.getpass("Enter your HuggingFace token: ")

        try:
            login(token=hf_token)
            print("✅ Successfully authenticated with HuggingFace")
            return True
        except Exception as e:
            print(f"❌ Authentication failed: {e}")
            return False

# Setup HF authentication
print("🔑 Setting up HuggingFace authentication...")
hf_authenticated = setup_hf_auth()

🔑 Setting up HuggingFace authentication...

🔐 HuggingFace Authentication Required for Gated Datasets
Please provide your HuggingFace token (get it from: https://huggingface.co/settings/tokens)
Enter your HuggingFace token: ··········
✅ Successfully authenticated with HuggingFace


In [None]:
# =============================================================================
# BLOCK 4: KAGGLE DOWNLOAD HELPER FUNCTION
# =============================================================================

def download_kaggle_with_hub(dataset_id, local_name):
    """Download Kaggle dataset using kagglehub"""
    try:
        print(f"📥 Downloading {dataset_id}...")
        dataset_path = kagglehub.dataset_download(dataset_id)
        print(f"✅ Downloaded to: {dataset_path}")

        # Create symlink or copy to our organized structure
        target_path = KAGGLE_DIR / local_name
        if target_path.exists():
            import shutil
            shutil.rmtree(target_path)

        # Create symlink for efficient access
        try:
            target_path.symlink_to(Path(dataset_path), target_is_directory=True)
            print(f"🔗 Linked to: {target_path}")
        except OSError:
            # If symlink fails, copy the data
            import shutil
            shutil.copytree(dataset_path, target_path)
            print(f"📁 Copied to: {target_path}")

        return str(dataset_path), target_path
    except Exception as e:
        print(f"❌ Error downloading {dataset_id}: {e}")
        return None, None

print("Kaggle download helper function ready!")

Kaggle download helper function ready!


In [None]:
# =============================================================================
# BLOCK 5: DOWNLOAD KAGGLE DATASETS
# =============================================================================

print("🚀 Starting Kaggle dataset downloads...")

# 1. Swahili Instructions Dataset
print("\n" + "="*50)
print("📚 SWAHILI INSTRUCTIONS DATASET")
print("="*50)
swahili_instructions_path, swahili_instructions_dir = download_kaggle_with_hub(
    "alfaxadeyembe/swahili-instructions", "swahili_instructions"
)

print("\n✅ Kaggle datasets download completed!")

🚀 Starting Kaggle dataset downloads...

📚 SWAHILI INSTRUCTIONS DATASET
📥 Downloading alfaxadeyembe/swahili-instructions...
✅ Downloaded to: /kaggle/input/swahili-instructions
🔗 Linked to: datasets/kaggle/swahili_instructions

✅ Kaggle datasets download completed!


In [None]:
# =============================================================================
# DATASET SUMMARY AND VARIABLES
# =============================================================================

print("\n" + "="*70)
print(" DATASET SUMMARY")
print("="*70)

# Dataset paths for easy access
DATASETS = {
    'swahili_instructions_path': swahili_instructions_dir if swahili_instructions_dir else None,

}


# Print summary
print(" Dataset Paths:")
for name, path in DATASETS.items():
    if path and path.exists():
        status = " Available"
        # Show some file info
        files = list(path.glob("*")) if path.is_dir() else []
        file_info = f" ({len(files)} files)" if files else ""
        print(f"   {name}: {path}{file_info} - {status}")
    else:
        print(f"   {name}: Not downloaded -  Failed")

print(f"\n  Main data directory: {DATA_DIR}")


 DATASET SUMMARY
 Dataset Paths:
   swahili_instructions_path: datasets/kaggle/swahili_instructions (1 files) -  Available

  Main data directory: datasets


In [None]:
# =============================================================================
# UTILITY FUNCTIONS FOR EASY ACCESS
# =============================================================================

def load_swahili_instructions():
    """Load Swahili instructions dataset"""
    if not swahili_instructions_dir or not swahili_instructions_dir.exists():
        print(" Swahili instructions dataset not available")
        return None

    print(" Searching for Swahili instructions data files...")

    # Look for CSV files first
    csv_files = list(swahili_instructions_dir.rglob("*.csv"))
    if csv_files:
        print(f" Found CSV file: {csv_files[0]}")
        return pd.read_csv(csv_files[0])

    # Look for JSON files
    json_files = list(swahili_instructions_dir.rglob("*.json"))
    if json_files:
        print(f" Found JSON file: {json_files[0]}")
        return pd.read_json(json_files[0])

    # Look for parquet files
    parquet_files = list(swahili_instructions_dir.rglob("*.parquet"))
    if parquet_files:
        print(f" Found Parquet file: {parquet_files[0]}")
        return pd.read_parquet(parquet_files[0])

    print(" No supported data files found in Swahili instructions directory")
    print(f"📁 Available files: {[f.name for f in swahili_instructions_dir.rglob('*') if f.is_file()][:5]}")
    return None



def get_dataset_info():
    """Get detailed information about all datasets"""
    print("📊 Getting detailed dataset information...")
    info = {}

    # Swahili Instructions
    if swahili_instructions_dir and swahili_instructions_dir.exists():
        files = list(swahili_instructions_dir.rglob("*"))
        info['swahili_instructions'] = {
            'path': swahili_instructions_dir,
            'files': len([f for f in files if f.is_file()]),
            'file_types': list(set([f.suffix for f in files if f.is_file() and f.suffix]))
        }

    return info

# Display available functions
print(" Utility Functions Available:")
print("    load_swahili_instructions() - Load Swahili instructions dataset")
print("    get_dataset_info() - Get detailed info about all datasets")

 Utility Functions Available:
    load_swahili_instructions() - Load Swahili instructions dataset
    get_dataset_info() - Get detailed info about all datasets


In [None]:
# Show quick dataset info
print(f"\n Quick Dataset Overview:")
dataset_info = get_dataset_info()
for name, info in dataset_info.items():
    print(f"    {name}: {info}")



 Quick Dataset Overview:
📊 Getting detailed dataset information...
    swahili_instructions: {'path': PosixPath('datasets/kaggle/swahili_instructions'), 'files': 1, 'file_types': ['.json']}


<a name="Data"></a>
### Data Prep
We now use the `Gemma-3` format for conversation style finetunes. We use [Swahili-Instructions Dataset](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. Gemma-3 renders multi turn conversations like below:

```
<bos><start_of_turn>user
Hello!<end_of_turn>
<start_of_turn>model
Hey there!<end_of_turn>
```

We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, phi3, llama3, phi4, qwen2.5, gemma3` and more.

In [None]:
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(
    tokenizer,
    chat_template = "gemma-3",
)

In [None]:
# =============================================================================
# SWAHILI INSTRUCTIONS DATA PREPARATION FOR GEMMA-3 FORMAT
# =============================================================================

import json
import pandas as pd
from pathlib import Path

# First, let's read and examine the Swahili instructions JSON file
print(" Reading Swahili Instructions Dataset...")

# Load the JSON file
swahili_instructions_path = Path("datasets/kaggle/swahili_instructions")
json_files = list(swahili_instructions_path.glob("*.json"))

if json_files:
    json_file = json_files[0]
    print(f" Found JSON file: {json_file}")

    # Read the JSON data
    with open(json_file, 'r', encoding='utf-8') as f:
        swahili_data = json.load(f)

    print(f" Successfully loaded JSON data")
    print(f" Data type: {type(swahili_data)}")

    # If it's a list, show first few items
    if isinstance(swahili_data, list):
        print(f" Number of entries: {len(swahili_data)}")
        print(f" First entry structure:")
        if swahili_data:
            first_item = swahili_data[0]
            print(f"   Keys: {list(first_item.keys()) if isinstance(first_item, dict) else 'Not a dict'}")
            print(f"   Sample: {first_item}")

            # Show a few more samples to understand the structure
            print(f"\n Sample entries (first 3):")
            for i, item in enumerate(swahili_data[:3]):
                print(f"   Entry {i+1}: {item}")

    # If it's a dict, show the structure
    elif isinstance(swahili_data, dict):
        print(f" Dictionary keys: {list(swahili_data.keys())}")
        for key, value in swahili_data.items():
            print(f"   {key}: {type(value)} - {len(value) if hasattr(value, '__len__') else value}")

else:
    print(" No JSON files found in the directory")
    swahili_data = None

 Reading Swahili Instructions Dataset...
 Found JSON file: datasets/kaggle/swahili_instructions/swahili-instructions-response.json
 Successfully loaded JSON data
 Data type: <class 'list'>
 Number of entries: 67017
 First entry structure:
   Keys: ['instruction', 'input', 'id', 'output']
   Sample: {'instruction': 'Unda tangazo fupi la nafaka mpya ya kiamsha kinywa.', 'input': '', 'id': 'alpaca-20113', 'output': '"Nafaka mpya kwa kiamsha kinywa! Sasa unaweza kufurahia kifungua kinywa chenye ladha nzuri na afya. Jaribu nafaka yetu mpya leo na ujionee mwenyewe kwa nini ni bidhaa bora zaidi kwa afya yako."'}

 Sample entries (first 3):
   Entry 1: {'instruction': 'Unda tangazo fupi la nafaka mpya ya kiamsha kinywa.', 'input': '', 'id': 'alpaca-20113', 'output': '"Nafaka mpya kwa kiamsha kinywa! Sasa unaweza kufurahia kifungua kinywa chenye ladha nzuri na afya. Jaribu nafaka yetu mpya leo na ujionee mwenyewe kwa nini ni bidhaa bora zaidi kwa afya yako."'}
   Entry 2: {'instruction': 'Tathm

In [None]:
# =============================================================================
# STANDARDIZE DATA FORMATS USING UNSLOTH
# =============================================================================

from unsloth.chat_templates import standardize_data_formats
from datasets import Dataset

print(" Converting Swahili instructions to standardized format...")

# Convert the raw JSON data to ShareGPT format first
def convert_to_conversations_format(data):
    """Convert instruction-input-output format to conversations format"""
    conversations_data = []

    for item in data:
        # Create the user message
        user_message = item['instruction']

        # If there's input content, add it to the instruction
        if item.get('input', '').strip():
            user_message += f"\n\n{item['input']}"

        # Create conversation structure
        conversation = {
            "conversations": [
                {"from": "human", "value": user_message},
                {"from": "gpt", "value": item['output']}
            ]
        }

        # Keep the ID for reference
        if 'id' in item:
            conversation['id'] = item['id']

        conversations_data.append(conversation)

    return conversations_data

# Convert to conversations format
print(f" Converting {len(swahili_data)} entries to conversations format...")
conversations_data = convert_to_conversations_format(swahili_data)

# Create HuggingFace dataset
dataset = Dataset.from_list(conversations_data)
print(f" Created dataset with {len(dataset)} conversations")

# Apply unsloth's standardize_data_formats
print(" Applying unsloth's standardize_data_formats...")
try:
    dataset = standardize_data_formats(dataset)
    print(" Data standardization completed!")

    # Print sample to see the structure after standardization
    print(f"\n Sample after standardization:")
    print(f"Dataset features: {dataset.features}")

    if len(dataset) > 0:
        sample = dataset[0]
        print(f"\nSample entry structure:")
        for key, value in sample.items():
            if isinstance(value, list) and len(value) > 0:
                print(f"  {key}: {value}")
            elif isinstance(value, str) and len(value) > 200:
                print(f"  {key}: {value[:200]}...")
            else:
                print(f"  {key}: {value}")

        print(f"\n📋 Sample conversation:")
        if 'conversations' in sample:
            for i, turn in enumerate(sample['conversations']):
                # Handle the new format: role/content instead of from/value
                role = turn.get('role', turn.get('from', 'unknown'))
                content = turn.get('content', turn.get('value', 'no content'))
                print(f"  Turn {i+1} ({role}): {content[:150]}...")

    print(f"\n Dataset ready for chat template application!")
    print(f"   Total conversations: {len(dataset)}")
    print(f"   Features: {list(dataset.features.keys())}")

    # Check the actual structure after standardization
    print(f"\n🔧 Detected conversation format:")
    if len(dataset) > 0:
        first_convo = dataset[0]['conversations'][0]
        if 'role' in first_convo:
            print("   Using role/content format (standardized)")
            conversation_format = "role_content"
        elif 'from' in first_convo:
            print("   Using from/value format (original)")
            conversation_format = "from_value"
        else:
            print("   Unknown format detected")
            conversation_format = "unknown"

    # Store the format for the next block
    globals()['conversation_format'] = conversation_format
    print(f"   Format saved for next blocks: {conversation_format}")

except Exception as e:
    print(f" Error during standardization: {e}")
    print("Dataset created but standardization failed. You may proceed with manual formatting.")
    # Set a fallback format
    globals()['conversation_format'] = "from_value"

 Converting Swahili instructions to standardized format...
 Converting 67017 entries to conversations format...
 Created dataset with 67017 conversations
 Applying unsloth's standardize_data_formats...


Unsloth: Standardizing formats (num_proc=12):   0%|          | 0/67017 [00:00<?, ? examples/s]

 Data standardization completed!

 Sample after standardization:
Dataset features: {'conversations': [{'content': Value(dtype='string', id=None), 'role': Value(dtype='string', id=None)}], 'id': Value(dtype='string', id=None)}

Sample entry structure:
  conversations: [{'content': 'Unda tangazo fupi la nafaka mpya ya kiamsha kinywa.', 'role': 'user'}, {'content': '"Nafaka mpya kwa kiamsha kinywa! Sasa unaweza kufurahia kifungua kinywa chenye ladha nzuri na afya. Jaribu nafaka yetu mpya leo na ujionee mwenyewe kwa nini ni bidhaa bora zaidi kwa afya yako."', 'role': 'assistant'}]
  id: alpaca-20113

📋 Sample conversation:
  Turn 1 (user): Unda tangazo fupi la nafaka mpya ya kiamsha kinywa....
  Turn 2 (assistant): "Nafaka mpya kwa kiamsha kinywa! Sasa unaweza kufurahia kifungua kinywa chenye ladha nzuri na afya. Jaribu nafaka yetu mpya leo na ujionee mwenyewe kw...

 Dataset ready for chat template application!
   Total conversations: 67017
   Features: ['conversations', 'id']

🔧 Detected

In [None]:
# =============================================================================
# GEMMA-3 CHAT TEMPLATE AND FORMAT DATASET
# =============================================================================


print("🤖 Setting up Gemma-3 chat template and formatting dataset...")

# First, apply Gemma-3 chat template to tokenizer
print("🔧 Applying Gemma-3 chat template to tokenizer...")
try:
    tokenizer = get_chat_template(
        tokenizer,
        chat_template = "gemma-3",
    )
    print("✅ Gemma-3 chat template applied successfully!")

except NameError:
    print(" Tokenizer not found!")
    print("Make sure you have loaded your tokenizer first:")
    print("from unsloth import FastLanguageModel")
    print("model, tokenizer = FastLanguageModel.from_pretrained(...)")
    raise RuntimeError("Tokenizer not loaded. Please initialize it before applying chat template.")

# Test what format the tokenizer expects
print("\n Testing tokenizer format requirements...")
test_conversation = [
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Hi there!"}
]
try:
    test_result = tokenizer.apply_chat_template(test_conversation, tokenize=False, add_generation_prompt=False)
    print(" Tokenizer expects role/content format")
    expected_format = "role_content"
except:
    try:
        test_conversation_alt = [
            {"from": "human", "value": "Hello"},
            {"from": "gpt", "value": "Hi there!"}
        ]
        test_result = tokenizer.apply_chat_template(test_conversation_alt, tokenize=False, add_generation_prompt=False)
        print(" Tokenizer expects from/value format")
        expected_format = "from_value"
    except Exception as e:
        print(f" Could not determine tokenizer format: {e}")
        expected_format = "unknown"

print(f" Expected format: {expected_format}")

# Define the formatting function based on what tokenizer expects
def formatting_prompts_func(examples):
    """Apply chat template to conversations and create text field"""
    convos = examples["conversations"]

    # Convert to the format the tokenizer expects
    processed_convos = []
    for convo in convos:
        processed_convo = []

        for turn in convo:
            if expected_format == "role_content":
                # Keep the original role/content format from standardization
                if turn['role'] == 'user':
                    processed_convo.append({"role": "user", "content": turn['content']})
                elif turn['role'] == 'assistant':
                    processed_convo.append({"role": "assistant", "content": turn['content']})
            else:
                # Convert to from/value format
                if turn['role'] == 'user':
                    processed_convo.append({"from": "human", "value": turn['content']})
                elif turn['role'] == 'assistant':
                    processed_convo.append({"from": "gpt", "value": turn['content']})

        processed_convos.append(processed_convo)

    # Apply chat template to each conversation
    texts = []
    failed_count = 0
    for i, convo in enumerate(processed_convos):
        try:
            text = tokenizer.apply_chat_template(
                convo,
                tokenize=False,
                add_generation_prompt=False
            )
            # Remove <bos> token if present
            if text.startswith('<bos>'):
                text = text.removeprefix('<bos>')
            texts.append(text)
        except Exception as e:
            if failed_count < 3:  # Only show first 3 errors
                print(f" Error processing conversation {i}: {e}")
                print(f"   Conversation format: {convo}")
            failed_count += 1
            texts.append("")  # Empty text for failed conversations

    if failed_count > 0:
        print(f"⚠️ {failed_count} conversations failed to process")

    return {"text": texts}

# Apply the formatting function to the dataset
print("\n Applying formatting function to dataset...")
try:
    # Test on a small sample first
    print("🧪 Testing on first 5 examples...")
    test_sample = dataset.select(range(5))
    test_result = test_sample.map(formatting_prompts_func, batched=True)

    # Check if test was successful
    successful_texts = [text for text in test_result["text"] if len(text.strip()) > 0]

    if len(successful_texts) > 0:
        print(f" Test successful! {len(successful_texts)}/5 conversations processed correctly")
        print(f"\n Sample formatted text:")
        print("=" * 60)
        print(successful_texts[0])
        print("=" * 60)

        # Apply to full dataset
        print("\n Applying to full dataset...")
        dataset = dataset.map(formatting_prompts_func, batched=True)
        print(" Dataset formatting completed!")

        # Remove empty texts
        original_length = len(dataset)
        dataset = dataset.filter(lambda x: len(x["text"].strip()) > 0)
        final_length = len(dataset)

        if final_length < original_length:
            print(f" Filtered out {original_length - final_length} empty texts")

        print(f" Final dataset size: {final_length} conversations")

        # Print the sample as requested
        if len(dataset) > 100:
            print(f"\n dataset[100]['text']:")
            print("=" * 80)
            print(dataset[100]["text"])
            print("=" * 80)
        else:
            print(f"\n dataset[0]['text'] (dataset has {len(dataset)} entries):")
            print("=" * 80)
            print(dataset[0]["text"])
            print("=" * 80)

        # Show text length statistics
        text_lengths = [len(item["text"]) for item in dataset if len(item["text"]) > 0]
        if text_lengths:
            print(f"\n Dataset Summary:")
            print(f"   Total examples: {len(dataset)}")
            print(f"   Features: {list(dataset.features.keys())}")
            print(f"   Average text length: {sum(text_lengths) / len(text_lengths):.1f} characters")
            print(f"   Min text length: {min(text_lengths)} characters")
            print(f"   Max text length: {max(text_lengths)} characters")

        print(f"\n SUCCESS! Dataset ready for training!")
        print(f" Your {len(dataset)} Swahili instruction pairs are formatted for Gemma-3!")

    else:
        print(" Test failed completely. All conversations failed to process.")
        print("This suggests a fundamental issue with the tokenizer or data format.")

except Exception as e:
    print(f" Unexpected error during formatting: {e}")
    print("Please check your tokenizer setup.")

print(f"\n Ready for training!")


🤖 Setting up Gemma-3 chat template and formatting dataset...
🔧 Applying Gemma-3 chat template to tokenizer...
✅ Gemma-3 chat template applied successfully!

 Testing tokenizer format requirements...
 Tokenizer expects role/content format
 Expected format: role_content

 Applying formatting function to dataset...
🧪 Testing on first 5 examples...


Map:   0%|          | 0/5 [00:00<?, ? examples/s]

 Test successful! 5/5 conversations processed correctly

 Sample formatted text:
<start_of_turn>user
Unda tangazo fupi la nafaka mpya ya kiamsha kinywa.<end_of_turn>
<start_of_turn>model
"Nafaka mpya kwa kiamsha kinywa! Sasa unaweza kufurahia kifungua kinywa chenye ladha nzuri na afya. Jaribu nafaka yetu mpya leo na ujionee mwenyewe kwa nini ni bidhaa bora zaidi kwa afya yako."<end_of_turn>


 Applying to full dataset...


Map:   0%|          | 0/67017 [00:00<?, ? examples/s]

 Dataset formatting completed!


Filter:   0%|          | 0/67017 [00:00<?, ? examples/s]

 Final dataset size: 67017 conversations

 dataset[100]['text']:
<start_of_turn>user
Fafanua sentensi ifuatayo ili kumaanisha kitu kimoja.

Mtihani haukuwa mgumu.<end_of_turn>
<start_of_turn>model
Mtihani ulikuwa rahisi.<end_of_turn>


 Dataset Summary:
   Total examples: 67017
   Features: ['conversations', 'id', 'text']
   Average text length: 795.5 characters
   Min text length: 78 characters
   Max text length: 26120 characters

 SUCCESS! Dataset ready for training!
 Your 67017 Swahili instruction pairs are formatted for Gemma-3!

 Ready for training!


In [None]:
# =============================================================================
# SAVE FORMATTED DATASET AND SHOW ADDITIONAL SAMPLES
# =============================================================================

print(" Saving formatted dataset and showing additional samples...")

if 'text' in dataset.features:
    # Save the formatted dataset
    save_path = "datasets/processed/swahili_instructions_gemma3_formatted"
    print(f" Saving formatted dataset to: {save_path}")
    dataset.save_to_disk(save_path)
    print(" Dataset saved successfully!")

    # Show additional samples for verification
    print(f"\n Additional sample texts:")
    sample_indices = [0, 25, 50, 75] if len(dataset) > 75 else [0, min(25, len(dataset)-1)]

    for idx in sample_indices:
        if idx < len(dataset):
            print(f"\n Sample {idx}:")
            print("-" * 60)
            sample_text = dataset[idx]["text"]
            print(sample_text)
            print("-" * 60)

    print(f"\n Final Dataset Statistics:")
    print(f"    Total Swahili conversations: {len(dataset)}")
    print(f"    Features: {list(dataset.features.keys())}")
    print(f"    Format: Gemma-3 chat template")
    print(f"    Ready for: Fine-tuning with unsloth")

    print(f"\n Your Swahili instruction dataset is ready for training! ")

else:
    print(" 'text' field not found. Make sure Block 3 ran successfully.")

 Saving formatted dataset and showing additional samples...
 Saving formatted dataset to: datasets/processed/swahili_instructions_gemma3_formatted


Saving the dataset (0/1 shards):   0%|          | 0/67017 [00:00<?, ? examples/s]

 Dataset saved successfully!

 Additional sample texts:

 Sample 0:
------------------------------------------------------------
<start_of_turn>user
Unda tangazo fupi la nafaka mpya ya kiamsha kinywa.<end_of_turn>
<start_of_turn>model
"Nafaka mpya kwa kiamsha kinywa! Sasa unaweza kufurahia kifungua kinywa chenye ladha nzuri na afya. Jaribu nafaka yetu mpya leo na ujionee mwenyewe kwa nini ni bidhaa bora zaidi kwa afya yako."<end_of_turn>

------------------------------------------------------------

 Sample 25:
------------------------------------------------------------
<start_of_turn>user
Simulia hadithi ya athari za coronavirus kwenye uchumi wa Amerika katika sentensi tano.<end_of_turn>
<start_of_turn>model
1. Coronavirus imesababisha kupungua kwa uzalishaji na biashara huko Amerika, kwa sababu ya hatua za kuzuia kuenea kwa virusi.
2. Kuanguka kwa masoko ya hisa na kupoteza ajira kunachangia kupungua kwa mapato ya walaji, ambayo yanaweza kuathiri uchumi wa Amerika.
3. Mashirika ya n

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`.

In [None]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA A100-SXM4-40GB. Max memory = 39.557 GB.
8.309 GB of memory reserved.


In [None]:
# =============================================================================
# ADD LORA ADAPTERS TO QUANTIZED MODEL
# =============================================================================

from unsloth import FastLanguageModel
from peft import LoraConfig

print("🔧 Adding LoRA adapters to the quantized model...")

# Add LoRA adapters using unsloth's method
model = FastLanguageModel.get_peft_model(
    model,
    finetune_vision_layers     = False, # False if not finetuning vision layers
    finetune_language_layers   = True, # False if not finetuning language layers
    finetune_attention_modules = True, # False if not finetuning attention layers
    finetune_mlp_modules       = True, # False if not finetuning MLP layers

    r = 16,  # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # LoftQ
)

print(" LoRA adapters added successfully!")
print(f" LoRA Configuration:")
print(f"    Rank (r): 16")
print(f"    Alpha: 16")
print(f"    Dropout: 0 (optimized)")
print(f"    Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj")
print(f"    Memory optimization: unsloth gradient checkpointing")

# Show model statistics
print(f"\n📋 Model Statistics:")
model.print_trainable_parameters()

print(f"\n Model is now ready for training with LoRA adapters!")
print(f" Proceed to run the training setup block")

🔧 Adding LoRA adapters to the quantized model...
Unsloth: Making `model.base_model.model.model.language_model` require gradients
 LoRA adapters added successfully!
 LoRA Configuration:
    Rank (r): 16
    Alpha: 16
    Dropout: 0 (optimized)
    Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
    Memory optimization: unsloth gradient checkpointing

📋 Model Statistics:
trainable params: 22,904,832 || all params: 5,462,343,104 || trainable%: 0.4193

 Model is now ready for training with LoRA adapters!
 Proceed to run the training setup block


In [None]:
# =============================================================================
# TRAINING DATA SET SIZE REDUCTION
# =============================================================================

import math

print(" Optimizing training on high quality data...")
print(f" Current dataset: {len(dataset)} samples")

# Strategy 1: Smart data selection (keep high-quality diverse samples)
def select_high_quality_subset(dataset, target_size=10000):
    """Select diverse, high-quality samples"""
    print(f" Selecting {target_size} high-quality samples...")

    # Calculate text lengths for filtering
    text_lengths = [len(item["text"]) for item in dataset]

    # Filter out very short (< 100 chars) and very long (> 4000 chars) samples
    min_length, max_length = 100, 4000

    filtered_indices = []
    for i, length in enumerate(text_lengths):
        if min_length <= length <= max_length:
            filtered_indices.append(i)

    print(f"    Filtered by length: {len(filtered_indices)} samples remain")

    # Select evenly distributed samples from filtered set
    if len(filtered_indices) > target_size:
        step = len(filtered_indices) // target_size
        selected_indices = filtered_indices[::step][:target_size]
    else:
        selected_indices = filtered_indices

    return dataset.select(selected_indices)

# Strategy 2: Increase efficiency with larger batch size
def optimize_batch_settings():
    """Calculate optimal batch settings for A100"""
    # With A100 40GB, we can handle larger batches
    per_device_batch_size = 2  # Increase from 1
    gradient_accumulation_steps = 8  # Increase from 4
    effective_batch_size = per_device_batch_size * gradient_accumulation_steps

    print(f" Optimized batch settings:")
    print(f"    Per device batch size: {per_device_batch_size}")
    print(f"    Gradient accumulation: {gradient_accumulation_steps}")
    print(f"    Effective batch size: {effective_batch_size}")

    return per_device_batch_size, gradient_accumulation_steps

# Apply Strategy 1: Smart data selection
print("\n Data Selection")
optimized_dataset = select_high_quality_subset(dataset, target_size=10000)

# Apply Strategy 2: Optimize batch settings
print("\n Optimize Batch Settings")
per_device_batch_size, gradient_accumulation_steps = optimize_batch_settings()

# Calculate new training time estimates
effective_batch_size = per_device_batch_size * gradient_accumulation_steps
steps_per_epoch = len(optimized_dataset) // effective_batch_size
estimated_time_hours = (steps_per_epoch * 1.5) / 3600  # ~1.5 seconds per step with larger batches


# Update the global dataset variable
dataset = optimized_dataset

print(f"\n Training optimized!")
print(f" Ready to train on {len(dataset)} high-quality Swahili samples")


 Optimizing training on high quality data...
 Current dataset: 67017 samples

 Data Selection
 Selecting 10000 high-quality samples...
    Filtered by length: 66735 samples remain

 Optimize Batch Settings
 Optimized batch settings:
    Per device batch size: 2
    Gradient accumulation: 8
    Effective batch size: 16

 Training optimized!
 Ready to train on 10000 high-quality Swahili samples


In [None]:
# =============================================================================
# TRAINING SETUP
# =============================================================================

from trl import SFTTrainer, SFTConfig
from unsloth.chat_templates import train_on_responses_only
import wandb

print(" Setting up training for Swahili Gemma-3...")
print(f" Optimized dataset: {len(dataset)} high-quality Swahili samples")


# Initialize W&B with updated config
print("\n Initializing Weights & Biases...")
try:
    wandb.init(
        project="swahili-gemma3-finetune",
        name="gemma-3n-E2b-swahili-text-lora-finetune",
        config={
            "model": "gemma-3n-E2B-it",
            "dataset_size": len(dataset),
            "optimization": "fast-training",
            "language": "swahili",
            "task": "text-instruction-following",
        },
        reinit=True
    )
    print(" W&B initialized successfully!")
    report_to = "wandb"
except:
    print(" W&B not available, proceeding without logging")
    report_to = "none"

# Create optimized SFT Trainer
print("\n Traineing started...")
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    eval_dataset = None,
    args = SFTConfig(
        # Dataset configuration
        dataset_text_field = "text",

        # OPTIMIZED batch size for speed
        per_device_train_batch_size = 2,        # Increased from 1
        gradient_accumulation_steps = 8,        # Increased from 4

        # Training configuration
        num_train_epochs = 1,                   # Full epoch on subset

        # OPTIMIZED learning rate for faster convergence
        learning_rate = 3e-5,                   # Slightly higher for faster learning
        warmup_steps = 50,                      # More warmup for stability
        weight_decay = 0.01,
        optim = "adamw_8bit",
        lr_scheduler_type = "cosine",           # Cosine for better convergence

        # FREQUENT logging and checkpointing
        logging_steps = 5,                      # Less frequent logging
        save_steps = 200,                       # More frequent saves
        report_to = report_to,

        # Output configuration
        output_dir = "./gemma-3n-E2b-swahili-text-lora",
        run_name = "gemma-3n-E2b-swahili-text-lora",

        # Performance optimizations
        dataloader_pin_memory = True,           # Faster data loading
        dataloader_num_workers = 2,             # Parallel data loading

        # Reproducibility
        seed = 3407,
    ),
)

print("✅ SFT Trainer created successfully!")

# Apply train_on_responses_only
print("\n Applying train_on_responses_only...")
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<start_of_turn>user\n",
    response_part = "<start_of_turn>model\n",
)

print("✅ Response-only training configured!")




 Setting up training for Swahili Gemma-3...
 Optimized dataset: 10000 high-quality Swahili samples

 Initializing Weights & Biases...


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33malfaxadeyembe[0m ([33malfaxad[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


 W&B initialized successfully!

 Traineing started...


Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/10000 [00:00<?, ? examples/s]

✅ SFT Trainer created successfully!

 Applying train_on_responses_only...


Map (num_proc=12):   0%|          | 0/10000 [00:00<?, ? examples/s]

✅ Response-only training configured!


In [None]:
# =============================================================================
# START TRAINING - SWAHILI GEMMA-3 TEXT FINE-TUNING
# =============================================================================

print("Starting Swahili Gemma-3 text fine-tuning...")
print(" Training gemma-3n-E2b-swahili-text-lora")
print("="*70)

# Start the training process
try:
    trainer_stats = trainer.train()

    print("\n Training completed successfully!")
    print("="*70)

    # Display training statistics
    print("Training Statistics:")
    if trainer_stats:
        print(f"   Training completed with {trainer_stats.global_step} steps")
        if hasattr(trainer_stats, 'training_loss'):
            print(f"    Final training loss: {trainer_stats.training_loss:.4f}")





except Exception as e:
    print(f" Training error: {e}")
    print("Check your setup and try again.")


Starting Swahili Gemma-3 text fine-tuning...
 Training gemma-3n-E2b-swahili-text-lora


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 10,000 | Num Epochs = 1 | Total steps = 625
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 8
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 8 x 1) = 16
 "-____-"     Trainable parameters = 22,904,832 of 5,462,343,104 (0.42% trained)
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
5,6.5732
10,3.3648
15,3.0755
20,2.9144
25,2.8881
30,2.8198
35,2.7513
40,2.669
45,2.7349
50,2.5658


Step,Training Loss
5,6.5732
10,3.3648
15,3.0755
20,2.9144
25,2.8881
30,2.8198
35,2.7513
40,2.669
45,2.7349
50,2.5658



 Training completed successfully!
Training Statistics:
   Training completed with 625 steps
    Final training loss: 1.8536


In [None]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

6515.9335 seconds used for training.
108.6 minutes used for training.
Peak reserved memory = 9.746 GB.
Peak reserved memory for training = 1.437 GB.
Peak reserved memory % of max memory = 24.638 %.
Peak reserved memory for training % of max memory = 3.633 %.


<a name="Inference"></a>
### Inference
Let's run the model via Unsloth native inference! According to the `Gemma-3` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64`

In [None]:
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(
    tokenizer,
    chat_template = "gemma-3",
)
messages = [{
    "role": "user",
    "content": [{
        "type" : "text",
        "text" : "Endeleza huu mlolongo: 1, 1, 2, 3, 5, 8,",
    }]
}]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
    tokenize = True,
    return_dict = True,
).to("cuda")
outputs = model.generate(
    **inputs,
    max_new_tokens = 64, # Increase for longer outputs!
    # Recommended Gemma-3 settings!
    temperature = 1.0, top_p = 0.95, top_k = 64,
)
tokenizer.batch_decode(outputs)

['<bos><start_of_turn>user\nEndeleza huu mlolongo: 1, 1, 2, 3, 5, 8,<end_of_turn>\n<start_of_turn>model\n13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6']

 You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!

In [None]:
messages = [{
    "role": "user",
    "content": [{"type" : "text", "text" : "Kwanini anga lina rangi ya bluu?",}]
}]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
    tokenize = True,
    return_dict = True,
).to("cuda")

from transformers import TextStreamer
_ = model.generate(
    **inputs,
    max_new_tokens = 2048, # Increase for longer outputs!
    # Recommended Gemma-3 settings!
    temperature = 1.0, top_p = 0.95, top_k = 64,
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)

Anga lina rangi ya bluu kutokana na uchambuzi wa mwanga wa jua na vumbi na kaboni vii katika anga. Vumbi na kaboni vii vina uwezo wa kusambaza mwanga wa jua katika mwelekeo mbalimbali, na mwanga wa bluu unasambazwa zaidi kuliko mwanga wa rangi nyingine. Hivyo, mwanga wa bluu unapatikana zaidi katika anga, na kwa hiyo anga inaonekana ya bluu.<end_of_turn>


## Save The LoRA Adapters, Save The Model In Different Formats And Push It To HuggingFace

In [None]:
model.save_pretrained("gemma-3n-swahili-2b")  # Local saving
tokenizer.save_pretrained("gemma-3n-swahili-2b")
# model.push_to_hub("HF_ACCOUNT/gemma-3", token = "...") # Online saving
# tokenizer.push_to_hub("HF_ACCOUNT/gemma-3", token = "...") # Online saving

['gemma-3n-swahili-2b/processor_config.json']

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [None]:
if False:
    from unsloth import FastModel
    model, tokenizer = FastModel.from_pretrained(
        model_name = "gemma-3n-swahili-2b", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = 2048,
        load_in_4bit = True,
    )

messages = [{
    "role": "user",
    "content": [{"type" : "text", "text" : "Gemma-3N ni nani?",}]
}]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
    tokenize = True,
    return_dict = True,
).to("cuda")

from transformers import TextStreamer
_ = model.generate(
    **inputs,
    max_new_tokens = 128, # Increase for longer outputs!
    # Recommended Gemma-3 settings!
    temperature = 1.0, top_p = 0.95, top_k = 64,
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)

Gemma-3N ni mfumo wa AI wa kwanza wa kufunguliwa kwa ujumla wa Google, unaweza kufanya kazi nyingi kama ChatGPT na Gemini. Gemma-3N ni mfumo wa AI wa kifurushi cha kufunguliwa kwa ujumla, ambayo inamaanisha kuwa inaweza kutumika na wanyamapawi na watafiti kwa ajili ya kazi zao bila mahitaji ya idhoni za Google. Gemma-3N inapatikana kwa ajili ya Windows, macOS, Linux, Android, na iOS.<end_of_turn>


### Saving to float16 for VLLM

We also support saving to `float16` directly for deployment! We save it in the folder `gemma-3N-finetune`. Set `if False` to `if True` to let it run!

In [None]:
if True: # Change to True to save finetune!
    model.save_pretrained_merged("gemma-3n-swahili-e2b-text-finetuned", tokenizer)

Found HuggingFace hub cache directory: /root/.cache/huggingface/hub
Checking cache directory for required files...
Cache check failed: model-00001-of-00003.safetensors not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Downloading safetensors index for unsloth/gemma-3n-e2b-it...


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Unsloth: Merging weights into 16bit:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/3.08G [00:00<?, ?B/s]

Unsloth: Merging weights into 16bit:  33%|███▎      | 1/3 [00:18<00:37, 18.92s/it]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

Unsloth: Merging weights into 16bit:  67%|██████▋   | 2/3 [01:00<00:32, 32.45s/it]

model-00003-of-00003.safetensors:   0%|          | 0.00/2.82G [00:00<?, ?B/s]

Unsloth: Merging weights into 16bit: 100%|██████████| 3/3 [01:30<00:00, 30.29s/it]


In [None]:
if True: # Change to True to upload finetune
    model.push_to_hub_merged(
        "Alfaxad/gemma-3n-swahili-e2b-text-finetuned", tokenizer,
        token="hf_"
        )

  0%|          | 0/2 [00:00<?, ?it/s]

tokenizer.json:   0%|          | 0.00/33.4M [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.70M [00:00<?, ?B/s]

Found HuggingFace hub cache directory: /root/.cache/huggingface/hub
Checking cache directory for required files...
Cache check failed: model-00001-of-00003.safetensors not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Downloading safetensors index for unsloth/gemma-3n-e2b-it...


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Unsloth: Merging weights into 16bit:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/3.08G [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/3.08G [00:00<?, ?B/s]

Unsloth: Merging weights into 16bit:  33%|███▎      | 1/3 [00:54<01:48, 54.13s/it]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

Unsloth: Merging weights into 16bit:  67%|██████▋   | 2/3 [02:26<01:16, 76.83s/it]

model-00003-of-00003.safetensors:   0%|          | 0.00/2.82G [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/2.82G [00:00<?, ?B/s]

Unsloth: Merging weights into 16bit: 100%|██████████| 3/3 [03:17<00:00, 65.79s/it]


In [None]:
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(
    tokenizer,
    chat_template = "gemma-3",
)
messages = [{
    "role": "user",
    "content": [{
        "type" : "text",
        "text" : "endeleza huu mlolongo: 1, 1, 2, 3, 5, 8,",
    }]
}]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
    tokenize = True,
    return_dict = True,
).to("cuda")
outputs = model.generate(
    **inputs,
    max_new_tokens = 64, # Increase for longer outputs!
    # Recommended Gemma-3 settings!
    temperature = 1.0, top_p = 0.95, top_k = 64,
)
tokenizer.batch_decode(outputs)

In [None]:
messages = [{
    "role": "user",
    "content": [{"type" : "text", "text" : "Kwanini Anga lina rangi ya bluu?",}]
}]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
    tokenize = True,
    return_dict = True,
).to("cuda")

from transformers import TextStreamer
_ = model.generate(
    **inputs,
    max_new_tokens = 64, # Increase for longer outputs!
    # Recommended Gemma-3 settings!
    temperature = 1.0, top_p = 0.95, top_k = 64,
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)

To save the final model as LoRA adapters, either use Huggingface's push_to_hub for an online save or save_pretrained for a local save.

[NOTE] This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
model.save_pretrained("gemma-3n-Swahili")  # Local saving
tokenizer.save_pretrained("gemma-3n-Swahili")
# model.push_to_hub("HF_ACCOUNT/gemma-3", token = "...") # Online saving
# tokenizer.push_to_hub("HF_ACCOUNT/gemma-3", token = "...") # Online saving

Now if you want to load the LoRA adapters we just saved for inference, set False to True:

In [None]:
if False:
    from unsloth import FastModel
    model, tokenizer = FastModel.from_pretrained(
        model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = 2048,
        load_in_4bit = True,
    )

messages = [{
    "role": "user",
    "content": [{"type" : "text", "text" : "What is Gemma-3N?",}]
}]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
    tokenize = True,
    return_dict = True,
).to("cuda")

from transformers import TextStreamer
_ = model.generate(
    **inputs,
    max_new_tokens = 128, # Increase for longer outputs!
    # Recommended Gemma-3 settings!
    temperature = 1.0, top_p = 0.95, top_k = 64,
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)

### Saving to float16 for VLLM

We also support saving to `float16` directly for deployment! We save it in the folder `gemma-3N-finetune`. Set `if False` to `if True` to let it run!


In [None]:
if False: # Change to True to save finetune!
    model.save_pretrained_merged("gemma-3n-Swahili-4B", tokenizer)

### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now for all models! For now, you can convert easily to `Q8_0, F16 or BF16` precision. `Q4_K_M` for 4bit will come later!

In [None]:
if False: # Change to True to save to GGUF
    model.save_pretrained_gguf(
        "gemma-3n-Swahili-4B",
        quantization_type = "Q8_0", # For now only Q8_0, BF16, F16 supported
    )