# CodSoft Task 3 – Image Caption Generator 🧠📸

🔹 Internship Task: Build an AI that generates captions for uploaded images.  
🔹 Tech Used: ViT-GPT2 pretrained transformer model from Hugging Face.  
🔹 Tools: Python, Google Colab, Gradio.

This notebook demonstrates how computer vision and NLP are combined to generate image captions. A Vision Transformer (ViT) is used to extract image features, and a GPT2 decoder generates human-like text captions.


In [1]:
!pip install transformers gradio torch torchvision --quiet

from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer
from PIL import Image
import torch
import gradio as gr

# Load model components
brain_model = VisionEncoderDecoderModel.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
vision_eyes = ViTImageProcessor.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
caption_mouth = AutoTokenizer.from_pretrained("nlpconnect/vit-gpt2-image-captioning")

# Setup device
energy_source = torch.device("cuda" if torch.cuda.is_available() else "cpu")
brain_model.to(energy_source)

# Caption generation function
def caption_creator(photo):
    if photo.mode != "RGB":
        photo = photo.convert("RGB")

    try:
        pixels_ready = vision_eyes(images=photo, return_tensors="pt").pixel_values.to(energy_source)
        brain_output = brain_model.generate(pixels_ready, max_length=20)
        final_words = caption_mouth.decode(brain_output[0], skip_special_tokens=True)
        return final_words
    except Exception as oops:
        return f"Oops! Something went wrong: {str(oops)}"

# Gradio interface
gr.Interface(
    fn=caption_creator,
    inputs=gr.Image(type="pil"),
    outputs="text",
    title="🖼️ Image Caption Wizard",
    description="Upload an image and get a caption! Powered by ViT + GPT2 transformer model.",
    allow_flagging="never"
).launch()


[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m38.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m73.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m43.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/982M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/982M [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/228 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/241 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/120 [00:00<?, ?B/s]



It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://9de6c4689088c77560.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


