# Give Meaningful Names to Your Photos with AI

## 📜 Table of Contents
1. [Introduction](#introduction)
2. [Learning Objectives](#learning-objectives)
3. [Setting Up the Environment](#setting-up-the-environment)
4. [Generating Image Captions with the BLIP Model](#generating-image-captions-with-the-blip-model)
5. [Building an Image Captioning App with Gradio](#building-an-image-captioning-app-with-gradio)
6. [Automating Image Captioning from a URL](#automating-image-captioning-from-a-url)
7. [Business Applications](#business-applications)
8. [Conclusion & Next Steps](#conclusion-next-steps)

## 📖 Introduction
Images contain rich untapped information but are often overlooked by search engines and data systems. Image captioning AI helps by:

- **Improving accessibility** (for visually impaired individuals).
- **Enhancing SEO** (search engines can better index images).
- **Supporting content discovery** (helps in cataloging large datasets).
- **Boosting security** (automated monitoring with descriptions).
- **Aiding education & research** (AI-assisted content analysis).
- **Providing multilingual support** (global reach).
- **Saving time** (faster than manual captioning).

## 🎯 Learning Objectives
By the end of this project, you will be able to:

- ✅ Implement an image captioning tool using the **BLIP model** from Hugging Face.
- ✅ Use **Gradio** to build a user-friendly image captioning interface.
- ✅ Adapt AI-based captioning for **real-world business applications**.

## 🔧 Setting Up the Environment
To build this AI-powered application, you will need to set up the necessary environment.

In [None]:
!pip install virtualenv
!virtualenv my_env
!source my_env/bin/activate

In [2]:
# installing required libraries in my_env
!pip install langchain==0.1.11 gradio==4.44.0 transformers==4.38.2 bs4==0.0.2 requests==2.31.0 torch==2.2.1

Collecting langchain==0.1.11
  Using cached langchain-0.1.11-py3-none-any.whl.metadata (13 kB)
Collecting gradio==4.44.0
  Using cached gradio-4.44.0-py3-none-any.whl.metadata (15 kB)
Collecting transformers==4.38.2
  Using cached transformers-4.38.2-py3-none-any.whl.metadata (130 kB)
Collecting bs4==0.0.2
  Using cached bs4-0.0.2-py2.py3-none-any.whl.metadata (411 bytes)
Collecting requests==2.31.0
  Using cached requests-2.31.0-py3-none-any.whl.metadata (4.6 kB)
Collecting torch==2.2.1
  Using cached torch-2.2.1-cp311-cp311-manylinux1_x86_64.whl.metadata (26 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain==0.1.11)
  Using cached dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting langchain-community<0.1,>=0.0.25 (from langchain==0.1.11)
  Using cached langchain_community-0.0.38-py3-none-any.whl.metadata (8.7 kB)
Collecting langchain-core<0.2,>=0.1.29 (from langchain==0.1.11)
  Using cached langchain_core-0.1.53-py3-none-any.whl.metadata (5.9 kB)
Collecting

## 🎨 Generating Image Captions with the BLIP Model

In [1]:
import requests
from PIL import Image
from transformers import AutoProcessor, BlipForConditionalGeneration

# Load the BLIP processor and model
processor = AutoProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


preprocessor_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/506 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/4.56k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [8]:
# Load image from Google Drive
img_path = "/content/drive/MyDrive/tulip-symphony.jpeg"
image = Image.open(img_path).convert('RGB')
inputs = processor(images=image, text="the image of", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
caption = processor.decode(outputs[0], skip_special_tokens=True)
print("Generated Caption:", caption)

Generated Caption: the image of a field of tuliplips


## 🚀 Building an Image Captioning App with Gradio

In [16]:
import gradio as gr

def greet(name):
    return f"Hello {name}!"

demo = gr.Interface(fn=greet, inputs="text", outputs="text")
demo.launch(server_name="0.0.0.0", server_port=7860)

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()


--------


Running on public URL: https://e4c73fac54489f03b2.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [17]:
import gradio as gr
import numpy as np
from PIL import Image
from transformers import AutoProcessor, BlipForConditionalGeneration

processor = AutoProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")

def caption_image(input_image: np.ndarray):
    raw_image = Image.fromarray(input_image).convert('RGB')
    inputs = processor(images=raw_image, return_tensors="pt")
    outputs = model.generate(**inputs, max_length=50)
    caption = processor.decode(outputs[0], skip_special_tokens=True)
    return caption

iface = gr.Interface(fn=caption_image, inputs=gr.Image(), outputs="text", title="Image Captioning AI")
iface.launch()



Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()


--------


Running on public URL: https://0645477a5f3421e58d.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




## 🌍 Automating Image Captioning from a URL

In [15]:
import requests
from PIL import Image, UnidentifiedImageError
from io import BytesIO
from bs4 import BeautifulSoup
from transformers import AutoProcessor, BlipForConditionalGeneration
import time

# Load BLIP model
processor = AutoProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")

# Wikipedia URL
url = "https://en.wikipedia.org/wiki/Windmill"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36"
}

# Fetch page content
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')

# Extract image media links (not just thumbnails)
img_links = []
for img in soup.find_all("a", {"class": "image"}):  # Look for links to media files
    media_page_url = "https://en.wikipedia.org" + img["href"]
    img_links.append(media_page_url)

# Output file for captions
output_file = "captions.txt"
max_images = 5  # Limit number of processed images
image_count = 0

with open(output_file, "w", encoding="utf-8") as caption_file:
    for media_page_url in img_links:
        if image_count >= max_images:
            break  # Stop after processing max_images

        try:
            # Fetch the media page where the full image is stored
            media_response = requests.get(media_page_url, headers=headers)
            media_soup = BeautifulSoup(media_response.text, "html.parser")

            # Extract full-size image from the media page
            full_img_tag = media_soup.find("a", {"class": "internal"})
            if not full_img_tag:
                print(f"Skipping: No full image found on {media_page_url}")
                continue

            full_img_url = "https:" + full_img_tag["href"]
            print(f"Extracted Full Image URL: {full_img_url}")

            # Download the full image
            img_response = requests.get(full_img_url, headers=headers, timeout=5)
            img_response.raise_for_status()

            # Open image
            raw_image = Image.open(BytesIO(img_response.content)).convert('RGB')

            # Skip very small images (likely icons)
            if raw_image.size[0] < 100 or raw_image.size[1] < 100:
                print(f"Skipping small image: {full_img_url}")
                continue

            # Process image with BLIP
            inputs = processor(images=raw_image, return_tensors="pt")
            outputs = model.generate(**inputs, max_length=50)
            caption = processor.decode(outputs[0], skip_special_tokens=True)

            # Write caption to file
            caption_file.write(f"{full_img_url}: {caption}\n")
            print(f"Processed Image: {full_img_url} --> {caption}")

            image_count += 1
            time.sleep(2)  # Small delay to avoid being rate-limited

        except (requests.RequestException, UnidentifiedImageError, OSError) as e:
            print(f"Error processing image {media_page_url}: {e}")

print(f"\n✅ Captions saved to {output_file}")



✅ Captions saved to captions.txt


## 🎯 Conclusion & Next Steps
Congratulations! 🎉 You built an **AI-powered image captioning tool**.

### **Next Steps:**
- ✅ Deploy your app.
- ✅ Experiment with **different models**.
- ✅ Enhance the UI for **better engagement**.