# Demonstration of BLIP on Image Captioning

## Introduction

This notebook is an exploration how the outputs one can expect using BLIP.

It first shows the output if an image is passed without prompt. Then, it shows the output when both
a prompt and an image are passed.

## Imports

In [1]:
# Import packages
import os
import sys
sys.path.append('..')

# Depending on the platform/IDE used, the home directory might be the socraticmodels or the
# socraticmodels/scripts directory. The following ensures that the current directory is the scripts folder.
try:
    os.chdir('scripts')
except FileNotFoundError:
    pass

from scripts.image_captioning import ImageManager, BlipManager
from scripts.utils import get_device

## Instantiate the BLIP and image manager classes

In [2]:
# Set the device to use
device = get_device()

# Instantiate the BLIP2 manager
blip_manager = BlipManager(device)

# Instantiate the image manager
image_manager = ImageManager()

Downloading (…)rocessor_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/506 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/4.56k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

### Load the image

In [3]:
img_folder = '../data/images/example_images/'
img_file = 'astronaut_with_beer.jpg'
img_path = img_folder + img_file
image = image_manager.load_image(img_path)

### Set the model parameters

In [4]:
model_params = {
    'max_length': 40,
    'no_repeat_ngram_size': 2,
    'repetition_penalty': 1.5,
}

### Example 1: Caption without a prompt

In [5]:
caption = blip_manager.generate_response(image, model_params=model_params)
print(f'BLIP2 caption without prompt: "{caption}"')

  if unfinished_sequences.max() == 0 or stopping_criteria(input_ids, scores):


BLIP2 caption without prompt: "astronaut drinking beer on the moon"


### Example 2: Asking questions to BLIP

In [6]:
question1 = "Question: where is the picture taken? Answer:"
response1 = blip_manager.generate_response(image, prompt=question1, model_params=model_params)
print(f'Prompt input: "{question1}" - BLIP Output: "{response1}"')

question2 = "Question: what are the different objects in the image? Answer:"
response2 = blip_manager.generate_response(image, prompt=question2, model_params=model_params)
print(f'Prompt input: "{question2}" - BLIP Output: "{response2}"')

question3 = "Question: Who is in the image? Answer:"
response3 = blip_manager.generate_response(image, prompt=question3, model_params=model_params)
print(f'Prompt input: "{question3}" - BLIP Output: "{response3}"')

Prompt input: "Question: where is the picture taken? Answer:" - BLIP Output: "question : where is the picture taken? answer : astronaut"
Prompt input: "Question: what are the different objects in the image? Answer:" - BLIP Output: "question : what are the different objects in the image? answer :"
Prompt input: "Question: Who is in the image? Answer:" - BLIP Output: "question : who is in the image? answer : astronaut"
