# 🧠 Image Captioning with Google PaLI-Gemma
This notebook demonstrates how to use the `google/paligemma-3b-pt-224` model from Hugging Face for image captioning using PyTorch.
We'll walk through loading the model, preprocessing an image, generating a description, and displaying the result.

In [None]:
# Install required libraries if not already installed
!pip install transformers pillow torch huggingface_hub

## 🔐 Authenticate with Hugging Face

In [None]:
from huggingface_hub import login

# Replace with your actual token
HF_TOKEN = "hf_your_token_here"
login(token=HF_TOKEN)

## 📦 Load the Model and Processor

In [None]:
from transformers import AutoProcessor, AutoModelForVision2Seq
import torch

model_id = "google/paligemma-3b-pt-224"
processor = AutoProcessor.from_pretrained(model_id, token=HF_TOKEN)
model = AutoModelForVision2Seq.from_pretrained(
    model_id,
    token=HF_TOKEN,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)

## 🖥️ Move Model to Device

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

## 🖼️ Load and Preprocess Image

In [None]:
from PIL import Image

# Replace with your own image path
image_path = "/content/road_cracks.jpeg"
image = Image.open(image_path).convert("RGB")

## 💬 Prepare Prompt and Process Input

In [None]:
prompt = "Describe the image."
inputs = processor(images=image, text=prompt, return_tensors="pt").to(device)

## 🔮 Generate Caption

In [None]:
generated_ids = model.generate(**inputs, max_new_tokens=50)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

## 🖨️ Display Caption

In [None]:
print("🖼️ Image Caption:")
print(generated_text)