# Gemma 3n Kaggle Test Notebook

This notebook is designed to run on Kaggle with Gemma 3n model for image classification tasks.

In [None]:
import torch
print(f"🧠 Torch CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f'🚀 Using GPU: {torch.cuda.get_device_name(torch.cuda.current_device())}')

🧠 Torch CUDA Available: True
🚀 Using GPU: Tesla P100-PCIE-16GB


In [None]:
# 🧠 KAGGLE Notebook Cell — installs dependencies
!pip install timm --upgrade
!pip install accelerate
!pip install git+https://github.com/huggingface/transformers.git
!pip install kagglehub


Collecting timm
  Downloading timm-1.0.16-py3-none-any.whl.metadata (57 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.6/57.6 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch->timm)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch->timm)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch->timm)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch->timm)
  Downloading nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cusolver-cu12==11.6.1.9 (from torch->timm)
  Downloading nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cusparse-cu12==12

In [None]:
# Kaggle-specific imports and setup
import os
import numpy as np
import pandas as pd
from PIL import Image
import matplotlib.pyplot as plt

## ✅ Gemma 3n Image Captioning Test (Patched for P100 GPU)
This notebook uses the `google/gemma-3n-E4B-it` multimodal model from Hugging Face. Make sure your Kaggle runtime is set to **P100 GPU** (or T4 ×2) and `transformers>=4.53.0` is installed.
**Model supports image + text → caption output.**

In [None]:
import kagglehub

GEMMA_PATH = kagglehub.model_download("google/gemma-3n/transformers/gemma-3n-e4b-it")

In [None]:
from huggingface_hub import login
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("HF_TOKEN")


# Authenticate with Hugging Face
login(secret_value_0)
print("✅ Hugging Face login successful.")

✅ Hugging Face login successful.


In [None]:
from transformers import AutoProcessor, Gemma3nForConditionalGeneration
from PIL import Image
import torch


# Load model and processor
processor = AutoProcessor.from_pretrained(GEMMA_PATH)
model = Gemma3nForConditionalGeneration.from_pretrained(
    GEMMA_PATH,
    device_map="auto",
    torch_dtype=torch.bfloat16
).eval()


print(f'📦 Model loaded on device: {model.device}')


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

📦 Model loaded on device: cuda:0


In [None]:
# Load test image
img_path = "/kaggle/input/sosimagescnntest/sand_sos.png"
img = Image.open(img_path).convert("RGB")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": img},
            {"type": "text", "text": "Describe this image in detail and look for pattern that matches SOS word."}
        ]
    }
]

# Prepare input
inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt"
).to(model.device)

input_len = inputs["input_ids"].shape[-1]

# Generate
with torch.inference_mode():
    output = model.generate(**inputs, max_new_tokens=128, do_sample=False)
    output = output[0][input_len:]

text = processor.decode(output, skip_special_tokens=True)
print("🖼️ Caption:", text)

The following generation flags are not valid and may be ignored: ['top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


🖼️ Caption: ## Detailed Description of the Image:

The image shows the numbers "2020" deeply imprinted in light brown sand. The numbers are arranged vertically, one below the other, and are clearly formed by someone tracing the digits with their finger or a similar tool. 

The sand is fine-grained and appears slightly damp, allowing the impressions to be well-defined. The texture of the sand is visible, with subtle variations in tone and small grains scattered across the surface. 

The lighting in the image seems to be coming from above, casting soft shadows within the impressions of the numbers, which adds depth and dimension to the
