# BLIP Image Captioning for Influencer Content Analysis

This notebook demonstrates how to use BLIP (Bootstrapping Language-Image Pre-training), a state-of-the-art vision language model, to automatically generate captions for images in a local folder.

## What is BLIP Captioning?

BLIP is a multimodal AI model that can understand images and generate human-readable text descriptions. You can use it in two ways:

1. **Unconditional Captioning**: The model generates a complete caption from scratch based on what it sees in the image
2. **Conditional Captioning**: You can provide a prompt (e.g., "a photo of"), and the model completes the caption based on your direction

## Use Case for Influencer Marketing

Automatically generating captions for influencer content helps you:
- Understand what visual elements are prominent in influencer posts
- Extract semantic descriptions of brand visibility and product placement
- Analyze whether images convey intended messaging
- Process large volumes of influencer content programmatically
- Create alternative text descriptions for accessibility

## Workflow

This notebook will:
1. Load the BLIP model
2. Find all images in a local folder (`./Images/`)
3. Generate captions for each image
4. Save results to a CSV file for analysis
5. Display sample results

## 1. Import Required Libraries and Setup

Import the necessary libraries for image processing, model loading, and data management.

In [None]:
import torch
import os
import glob
import pandas as pd
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
from tqdm import tqdm

# Check if GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
if device == "cuda":
    print(f"GPU: {torch.cuda.get_device_name(0)}")

## 2. Configuration

Set the paths and parameters for the captioning task.

In [None]:
"""
Configuration Settings
"""

# Path to folder containing images
IMAGE_FOLDER = './Images'

# File extensions to look for
IMAGE_EXTENSIONS = ['*.jpg', '*.jpeg', '*.png', '*.gif', '*.webp']

# Output file to save results
OUTPUT_CSV = 'image_captions.csv'

# Type of captioning
# Options: "unconditional" or "conditional"
CAPTIONING_MODE = "unconditional"

# If using conditional captioning, provide a prompt
# Example: "a photo of" or "the main subject in this image is"
CONDITIONAL_PROMPT = "a photo of"

## 3. Load the BLIP Model

Download and load the pre-trained BLIP model. This may take a minute on first run as it downloads the model weights (~4GB).

In [None]:
print("Loading BLIP model and processor...")
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained(
    "Salesforce/blip-image-captioning-base", 
    torch_dtype=torch.float16
).to(device)

print("Model loaded successfully!")

## 4. Find All Images in the Folder

Locate all image files in the specified directory.

In [None]:
# Find all image files
image_paths = []
for extension in IMAGE_EXTENSIONS:
    image_paths.extend(glob.glob(os.path.join(IMAGE_FOLDER, extension)))
    # Also search in subdirectories
    image_paths.extend(glob.glob(os.path.join(IMAGE_FOLDER, '**', extension), recursive=True))

# Remove duplicates
image_paths = list(set(image_paths))
image_paths.sort()

print(f"Found {len(image_paths)} images in {IMAGE_FOLDER}")
if len(image_paths) > 0:
    print("\nFirst few images:")
    for img_path in image_paths[:5]:
        print(f"  - {img_path}")

## 5. Generate Captions for All Images

Process each image through the model and generate captions. This may take several minutes depending on the number of images and your GPU.

In [None]:
results = []

print(f"\nGenerating captions for {len(image_paths)} images...")
print(f"Captioning mode: {CAPTIONING_MODE}")

for image_path in tqdm(image_paths):
    try:
        # Open and convert image
        raw_image = Image.open(image_path).convert('RGB')
        
        # Generate caption
        if CAPTIONING_MODE == "conditional":
            # Conditional captioning with prompt
            inputs = processor(raw_image, CONDITIONAL_PROMPT, return_tensors="pt").to(device, torch.float16)
        else:
            # Unconditional captioning
            inputs = processor(raw_image, return_tensors="pt").to(device, torch.float16)
        
        # Generate caption
        out = model.generate(**inputs, max_length=50)
        caption = processor.decode(out[0], skip_special_tokens=True)
        
        # Store result
        results.append({
            'image_path': image_path,
            'caption': caption
        })
        
    except Exception as e:
        print(f"\nError processing {image_path}: {str(e)}")
        results.append({
            'image_path': image_path,
            'caption': f"Error: {str(e)}"
        })

print(f"\nSuccessfully generated captions for {len(results)} images")

## 6. Save Results to CSV

Save all captions to a CSV file for further analysis and review.

In [None]:
# Convert to DataFrame
df_results = pd.DataFrame(results)

# Save to CSV
df_results.to_csv(OUTPUT_CSV, index=False)
print(f"Results saved to: {OUTPUT_CSV}")

# Display summary
print(f"\nDataFrame shape: {df_results.shape}")
print("\nFirst 5 results:")
print(df_results.head())

## 7. Display Sample Results

View captions alongside their images to verify quality.

In [None]:
from IPython.display import Image as IPImage, display, HTML

# Display first 5 images with their captions
num_samples = min(5, len(df_results))

print(f"Displaying first {num_samples} results:\n")

for idx in range(num_samples):
    row = df_results.iloc[idx]
    image_path = row['image_path']
    caption = row['caption']
    
    print(f"\n{'='*60}")
    print(f"Image {idx+1}: {os.path.basename(image_path)}")
    print(f"Caption: {caption}")
    print(f"Full path: {image_path}")
    
    # Display image if running in Jupyter
    try:
        display(IPImage(filename=image_path, width=400))
    except:
        print("(Image preview not available in terminal)")

## 8. (Optional) Try Different Prompts

You can re-run the captioning with different prompts to guide the model. Change the settings above and re-run the captioning cells.

In [None]:
"""
Example prompts you might try:

# For product-focused analysis
- "the main product shown in this image is"
- "the brand elements visible in this image include"

# For influencer analysis
- "the person in this image is"
- "the influencer appears to be"

# For general analysis
- "this image shows"
- "a photo of"

# For engagement analysis
- "the emotional tone of this image is"
- "this image conveys"


To use a different prompt:
1. Change CAPTIONING_MODE to "conditional"
2. Update CONDITIONAL_PROMPT with your desired prompt
3. Update OUTPUT_CSV to a new filename
4. Re-run the configuration and captioning cells
"""

print("Ready to process images!")
print(f"Images found: {len(image_paths)}")
print(f"Current mode: {CAPTIONING_MODE}")
if CAPTIONING_MODE == "conditional":
    print(f"Current prompt: '{CONDITIONAL_PROMPT}'")