## BLIP-2 Install & Testing

Scripts working with BLIP-2 image-to-text generation from Salesforce Research.

Paper: [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597)  
Article: [Zero-shot image-to-text generation with BLIP-2](https://huggingface.co/blog/blip-2)  
Documentation: [BLIP-2](https://huggingface.co/docs/transformers/main/en/model_doc/blip-2)

### Install

In [7]:
!rustup target add x86_64-apple-darwin

[1minfo: [mdownloading component 'rust-std' for 'x86_64-apple-darwin'
[1minfo: [minstalling component 'rust-std' for 'x86_64-apple-darwin'
 24.5 MiB /  24.5 MiB (100 %)  19.0 MiB/s in  1s ETA:  0s


In [8]:
!pip install safetensors

Collecting safetensors
  Using cached safetensors-0.3.2.tar.gz (35 kB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hBuilding wheels for collected packages: safetensors
  Building wheel for safetensors (pyproject.toml) ... [?25ldone
[?25h  Created wheel for safetensors: filename=safetensors-0.3.2-cp39-cp39-macosx_12_0_x86_64.whl size=403925 sha256=effc533704dddfd0c0a7d7ce2171cdfb1184fe587778b4cc6b8299a3d31217a1
  Stored in directory: /Users/rachelharrison/Library/Caches/pip/wheels/33/f3/12/beb2fa43480705c919e21e7b9c9bccec1abf7da624d027067e
Successfully built safetensors
Installing collected packages: safetensors
Successfully installed safetensors-0.3.2


In [9]:
!pip install git+https://github.com/huggingface/transformers

Collecting git+https://github.com/huggingface/transformers
  Cloning https://github.com/huggingface/transformers to /private/var/folders/01/5sfcrl8n50zb9t8h8tgh_7pw0000gp/T/pip-req-build-fqjzvq49
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /private/var/folders/01/5sfcrl8n50zb9t8h8tgh_7pw0000gp/T/pip-req-build-fqjzvq49
  Resolved https://github.com/huggingface/transformers to commit 1982dd3b15867c46e1c20645901b0de469fd935f
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Collecting huggingface-hub<1.0,>=0.15.1 (from transformers==4.32.0.dev0)
  Obtaining dependency information for huggingface-hub<1.0,>=0.15.1 from https://files.pythonhosted.org/packages/7f/c4/adcbe9a696c135578cabcbdd7331332daad4d49b7c43688bc2d36b3a47d2/huggingface_hub-0.16.4-py3-none-any.whl.metadata
  Using cached huggingface_hub-0.16.4-py3-none-any.w

In [11]:
!pip install accelerate

Collecting accelerate
  Obtaining dependency information for accelerate from https://files.pythonhosted.org/packages/70/f9/c381bcdd0c3829d723aa14eec8e75c6c377b4ca61ec68b8093d9f35fc7a7/accelerate-0.21.0-py3-none-any.whl.metadata
  Downloading accelerate-0.21.0-py3-none-any.whl.metadata (17 kB)
Downloading accelerate-0.21.0-py3-none-any.whl (244 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: accelerate
Successfully installed accelerate-0.21.0


### BLIP-2, OPT-2.7b

In [1]:
# Model setup: OPT-2.7b

from PIL import Image
import requests
from transformers import Blip2Processor, Blip2ForConditionalGeneration
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

processor = Blip2Processor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained(
    "Salesforce/blip2-opt-2.7b", torch_dtype=torch.float32
)
model.to(device);

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [34]:
# Image captioning (without providing a text prompt):

# url = "http://images.cocodataset.org/val2017/000000039769.jpg"
# image = Image.open(requests.get(url, stream=True).raw)
image = Image.open('./wain23_images_v2/pexels-zen-chung-5529541.jpg')

inputs = processor(images=image, return_tensors="pt").to(device, torch.float16)

generated_ids = model.generate(**inputs, max_new_tokens=64, temperature=0)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
print(generated_text)

a woman in a straw hat picking apples from an orchard


In [8]:
# Visual question answering (prompt = question):

image = Image.open('./wain23_images_v2/pexels-zen-chung-5529541.jpg')

prompt = "How many people are in this image?"
inputs = processor(images=image, text=prompt, return_tensors="pt").to(device, torch.float32)

generated_ids = model.generate(**inputs, max_new_tokens=40)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
print(generated_text)




In [58]:
# Testing complex QA prompt:

image = Image.open('./wain23_images_v2/pexels-zen-chung-5529541.jpg')

prompt = "Question: Formatted as alt text, what is the gender, age (young adult, adult, middle aged, senior), \
and race of the person in the photo, and what activity are they engaging in? Answer:"

inputs = processor(image, text=prompt, return_tensors="pt").to(device, torch.float32)

generated_ids = model.generate(**inputs, max_new_tokens=124)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
print(generated_text)




In [139]:
"Generate alt text consisting of no more than one sentence for the following image that briefly \
describes the activity shown as well as the approximate gender, age, and race of the person in the image \
(if possible). The age categories are: young adult, adult, middle aged, senior."

'Generate alt text consisting of no more than one sentence for the following image that briefly describes the activity shown as well as the approximate gender, age, and race of the person in the image (if possible). The age categories are: young adult, adult, middle aged, senior.'

### BLIP-2, Flan T5-xxl

In [41]:
# Model setup: Flan T5-xxl

from PIL import Image
import requests
from transformers import Blip2Processor, Blip2ForConditionalGeneration
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

processor2 = Blip2Processor.from_pretrained("Salesforce/blip2-flan-t5-xxl")
model2 = Blip2ForConditionalGeneration.from_pretrained(
    "Salesforce/blip2-flan-t5-xxl", torch_dtype=torch.float32
)
model2.to(device);

Downloading (…)rocessor_config.json:   0%|          | 0.00/432 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/6.89k [00:00<?, ?B/s]

Downloading (…)model.bin.index.json:   0%|          | 0.00/128k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/6 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00006.bin:   0%|          | 0.00/9.37G [00:00<?, ?B/s]

Downloading (…)l-00002-of-00006.bin:   0%|          | 0.00/9.87G [00:00<?, ?B/s]

Downloading (…)l-00003-of-00006.bin:   0%|          | 0.00/9.99G [00:00<?, ?B/s]

Downloading (…)l-00004-of-00006.bin:   0%|          | 0.00/10.0G [00:00<?, ?B/s]

Downloading (…)l-00005-of-00006.bin:   0%|          | 0.00/9.70G [00:00<?, ?B/s]

Downloading (…)l-00006-of-00006.bin:   0%|          | 0.00/526M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/6 [00:00<?, ?it/s]

In [55]:
# VQA Testing

image = Image.open('./wain23_images_v2/pexels-zen-chung-5529541.jpg')

prompt = "Question: Formatted as alt text, what is the gender, age (young adult, adult, middle aged, senior), \
and race (white, black, asian, hispanic) of the person in the photo, and what activity are they engaging in? Answer:"

inputs = processor2(image, text=prompt, return_tensors="pt").to(device, torch.float32)

generated_ids = model2.generate(**inputs, min_new_tokens=12, max_new_tokens=128, temperature=0)
generated_text = processor2.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
print(generated_text)

senior black woman picking apples in an orchard - fbf018


In [59]:
image = Image.open('./wain23_images_v2/pexels-andrea-piacquadio-3768176.jpg')

prompt = "Question: Formatted as alt text, what is the gender, age (young adult, adult, middle aged, senior), \
and race (white, black, asian, hispanic) of the person in the photo, and what activity are they engaging in? Answer:"

inputs = processor2(image, text=prompt, return_tensors="pt").to(device, torch.float32)

generated_ids = model2.generate(**inputs, min_new_tokens=12, max_new_tokens=128, temperature=0)
generated_text = processor2.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
print(generated_text)

woman, middle aged, white, working at home, writing


In [64]:
image = Image.open('./wain23_images_v2/pexels-madison-inouye-2180092.jpg')

prompt = "Question: Formatted as an alt text caption, what is the gender, age (young adult, adult, middle aged, senior), \
and race of the person in the photo, and what activity are they engaging in? \
If there is no person, simply describe the image. Answer:"

inputs = processor2(image, text=prompt, return_tensors="pt").to(device, torch.float32)

generated_ids = model2.generate(**inputs, min_new_tokens=12, max_new_tokens=128, temperature=0)
generated_text = processor2.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
print(generated_text)

a white notebook, a cup of coffee, and a pen on a white table


In [63]:
image = Image.open('./wain23_images_v2/pexels-binyamin-mellish-169523.jpg')

prompt = "Question: Formatted as an alt text caption, what is the gender, age (young adult, adult, middle aged, senior), \
and race of the person in the photo, and what activity are they engaging in? \
If there is no person, simply describe the image. Answer:"

inputs = processor2(image, text=prompt, return_tensors="pt").to(device, torch.float32)

generated_ids = model2.generate(**inputs, min_new_tokens=12, max_new_tokens=128, temperature=0)
generated_text = processor2.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
print(generated_text)

a man is planting a seed in the ground in a garden


In [65]:
image = Image.open('./wain23_images_v2/pexels-orione-conceição-8663203.jpg')

prompt = "Question: Formatted as an alt text caption, what is the gender, age (young adult, adult, middle aged, senior), \
and race of the person in the photo, and what activity are they engaging in? \
If there is no person, simply describe the image. Answer:"

inputs = processor2(image, text=prompt, return_tensors="pt").to(device, torch.float32)

generated_ids = model2.generate(**inputs, min_new_tokens=12, max_new_tokens=128, temperature=0)
generated_text = processor2.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
print(generated_text)

young african american woman listening to music with headphones on an orange background


In [67]:
image = Image.open('./wain23_images_v2/pexels-karolina-grabowska-4467583.jpg')

prompt = "Question: Formatted as an alt text caption, what is the gender, age (young adult, adult, middle aged, senior), \
and race of the person in the photo, and what activity are they engaging in? \
If there is no person, simply describe the image. Answer:"

inputs = processor2(image, text=prompt, return_tensors="pt").to(device, torch.float32)

generated_ids = model2.generate(**inputs, min_new_tokens=12, max_new_tokens=128, temperature=0)
generated_text = processor2.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
print(generated_text)

woman writing on a paper with a glass of wine


In [68]:
image = Image.open('./wain23_images_v2/pexels-ono-kosuki-5973906.jpg')

prompt = "Question: Formatted as an alt text caption, what is the gender, age (young adult, adult, middle aged, senior), \
and race of the person in the photo, and what activity are they engaging in? \
If there is no person, simply describe the image. Answer:"

inputs = processor2(image, text=prompt, return_tensors="pt").to(device, torch.float32)

generated_ids = model2.generate(**inputs, min_new_tokens=12, max_new_tokens=128, temperature=0)
generated_text = processor2.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
print(generated_text)

asian man preparing a dish in a kitchen


In [69]:
image = Image.open('./wain23_images_v2/pexels-shvets-production-7513038.jpg')

prompt = "Question: Formatted as an alt text caption, what is the gender, age (young adult, adult, middle aged, senior), \
and race of the person in the photo, and what activity are they engaging in? \
If there is no person, simply describe the image. Answer:"

inputs = processor2(image, text=prompt, return_tensors="pt").to(device, torch.float32)

generated_ids = model2.generate(**inputs, min_new_tokens=12, max_new_tokens=128, temperature=0)
generated_text = processor2.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
print(generated_text)

woman cleaning the mirror with rubber gloves and rags in the bathroom


def create_caption(prompt, image):
    inputs = processor2(image, text=prompt, return_tensors="pt").to(device, torch.float32)
    
    generated_ids = model2.generate(**inputs, min_new_tokens=12, max_new_tokens=128, temperature=0)
    generated_text = processor2.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
    
    return generated_text

In [102]:
prompt = "Question: If the photo contains a person, what is their gender, race, and age (young adult, adult, \
middle aged, senior), and what activity are they engaging in? Answer:"

In [103]:
import os

image_dir = './wain23_images_v3/'
for image_name in sorted(os.listdir(image_dir)):
    image = Image.open(image_dir + image_name)
    caption = create_caption(prompt, image)
    print(f'{image_name}: {caption}')

pexels-andrea-piacquadio-3768176.jpg: woman, middle aged, white, writing, working at home
pexels-zen-chung-5529541.jpg: senior, black, female, picking apples in an orchard


In [122]:
def caption_images(prompt, image_dir='./wain23_images_v3/'):
    for image_name in sorted(os.listdir(image_dir)):
        if image_name == '.DS_Store':
            continue
        image = Image.open(image_dir + image_name)
        caption = create_caption(prompt, image)
        print(f'{image_name}: {caption}')

In [None]:
prompt = "Question: Formatted as an alt text caption, what is the gender, age (young adult, adult, middle aged, senior), \
and race of the person in the photo, and what activity are they engaging in? \
If there is no person, simply describe the image. Answer:"
caption_images(prompt)

In [None]:
prompt = "Question: If the image contains a person, what is their gender, age (young adult, adult, \
middle aged, senior), and race, and what activity are they engaging in? \ If there is no person, \
simply describe the image. Answer:"
caption_images(prompt)

In [None]:
prompt = "Question: What would be good alt text for this image? If the image contains a person, \
describe their gender, age (young adult, adult, senior), and race. Answer:"
caption_images(prompt)

In [105]:
# Current Best - remove "middle aged"

prompt = "Question: If the photo contains a person, what is their gender, race, and age (young adult, adult, \
middle aged, senior), and what activity are they engaging in? Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman, middle aged, white, writing, working at home
pexels-zen-chung-5529541.jpg: senior, black, female, picking apples in an orchard


In [107]:
prompt = "Question: If the photo contains a person, what is their gender, race, and age (young adult, adult, \
senior), and what activity are they engaging in? Format your answer as an alt text caption. Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman in glasses writing at home on a laptop stock photo
pexels-zen-chung-5529541.jpg: senior woman picking apples in an orchard - fbf018


In [108]:
prompt = "Question: Formatted as alt text, if the photo contains a person, what is their gender, race, \
and age (young adult, adult, senior), and what activity are they engaging in? Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman in glasses sitting at a desk writing on a piece of paper
pexels-zen-chung-5529541.jpg: senior woman picking apples in an orchard - fbf018


In [109]:
prompt = "Question: If the photo contains a person, what is their gender, race, and age (young adult, adult, \
senior), and what activity are they engaging in? If the photo does not contain a person, what is the photo \
of? Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman, white, age, adult, activity, writing, work
pexels-zen-chung-5529541.jpg: senior woman picking apples in an orchard - fbf018


In [110]:
prompt = "Question: If the photo contains a person, what is their gender, race, and age (young adult, adult, \
senior), and what are they doing? If the photo does not contain a person, what is the photo of? Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman in glasses sitting at a desk writing on a piece of paper
pexels-zen-chung-5529541.jpg: senior woman picking apples in an orchard - fb013


In [111]:
prompt = "Question: What is this a photo of? If the photo contains a person, what is their gender, race, \
and age (young adult, adult, senior), and what are they doing? Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman in glasses sitting at a desk writing on a piece of paper
pexels-zen-chung-5529541.jpg: senior woman picking apples in an orchard - fb015


In [112]:
prompt = "If an image contais a person, the caption should identify their gender, race, \
age (young adult, adult, senior), and what they are doing. This is an image of"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: a woman working at home on her laptop, writing a letter
pexels-zen-chung-5529541.jpg: an older woman picking apples in an orchard - fb015


In [113]:
prompt = "Question: What is a good caption for this photo? If it contais a person, the caption should \
identify their gender, race, age (young adult, adult, senior), and what they are doing. What is a \
good caption? Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman working at home with laptop and pen on paper - stock photo
pexels-zen-chung-5529541.jpg: woman picking apples in an orchard - fb


In [114]:
prompt = "Question: If the photo contains a person, what is their gender, race, and age (young adult, adult, \
middle aged, senior), and what activity are they engaging in? Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman, middle aged, white, writing, working at home
pexels-madison-inouye-2180092.jpg: young adult, white, adult, middle aged, senior, writing
pexels-zen-chung-5529541.jpg: senior, black, female, picking apples in an orchard


In [115]:
prompt = "Question: Caption the photo. If the photo contains a person, what is their gender, race, and \
age (young adult, adult, senior), and what activity are they engaging in? Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman working at home with laptop and pen in hand - stock photo
pexels-madison-inouye-2180092.jpg: young adult, white, adult, writing in a journal
pexels-zen-chung-5529541.jpg: senior woman picking apples in an orchard - fb015


In [116]:
prompt = "Question: Caption the photo, and if the photo contains a person, list their gender, race, and \
age (young, adult, senior), and what they are doing. Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman working at home with laptop and pen on paper - stock photo
pexels-madison-inouye-2180092.jpg: a notebook, a cup of coffee, and a pen
pexels-zen-chung-5529541.jpg: woman picking apples in an orchard - fbf018


In [117]:
prompt = "Caption the photo, and if the photo contains a person, list their gender, race, and \
age (young, adult, senior), and what they are doing. A good caption with all required attributes is"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman working at home with laptop and pen on a table
pexels-madison-inouye-2180092.jpg: a young woman is writing in her journal and drinking a cup of coffee
pexels-zen-chung-5529541.jpg: a woman picking apples in an orchard - fbf018


In [118]:
prompt = "Question: Caption the photo, and if the photo contains a person, list their gender, race, and \
age (young, adult, senior), and what they are doing. Example: A young black woman reading a book. Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: young black woman reading a book at home in front of a window
pexels-madison-inouye-2180092.jpg: young black woman reading a book on a white surface
pexels-zen-chung-5529541.jpg: young black woman reading a book in the park - fb


In [123]:
prompt = "Question: Formatted as an alt text caption, what is the gender, age (young, adult, senior), \
and race of the person in the photo, and what activity are they engaging in? \ If there is no person, \
simply describe the image. Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman working at home with laptop and pen in hand - stock photo
pexels-pixabay-159618.jpg: A desk with a clock, pencils, pens, and a book
pexels-zen-chung-5529541.jpg: woman picking apples in an orchard - fbf018


In [125]:
prompt = "Question: If the photo contains a person, what is their gender, race, and age (young, adult, \
senior), and what activity are they engaging in? Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman, white, adult, writing, working at home, home office
pexels-pixabay-159618.jpg: young, adult, senior, writing in a book,
pexels-zen-chung-5529541.jpg: senior woman picking apples in an orchard - fbf018


In [126]:
prompt = "Question: Caption the photo, and if the photo contains a person, mention their gender, race, \
age (young, adult, senior), and what they are doing. Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman working at home with laptop and pen on paper - stock photo
pexels-pixabay-159618.jpg: a desk with a clock, pencils, and a book
pexels-zen-chung-5529541.jpg: woman picking apples in an orchard - fbf018


In [127]:
prompt = "Question: What is the gender, age (young, adult, senior), and race of the person in the photo, \
and what activity are they engaging in? If there is no person, simply describe the image. Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman sitting at a table writing on a piece of paper
pexels-pixabay-159618.jpg: A desk with a clock, pencils, pens, and a book
pexels-zen-chung-5529541.jpg: woman picking apples in an orchard - fb013


In [128]:
prompt = "Question: Generate alt text consisting of no more than one sentence for the following image that briefly \
describes the activity shown as well as the approximate gender, age, and race of the person in the image \
(if possible). The age categories are: young, adult, and senior. Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman working at home with laptop and pen in the living room
pexels-pixabay-159618.jpg: A desk with a clock, pencils, and a book
pexels-zen-chung-5529541.jpg: senior woman picking apples in orchard - fb015


In [129]:
prompt = "Question: You are tasked with generating alt text. Alt text must describe the activity shown \
as well as the approximate gender, age, and race of the person in the image (if possible). The age \
categories are: young, adult, and senior. If there is no person in the image, describe the image. Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman working at home with laptop and pen on paper - stock photo
pexels-pixabay-159618.jpg: A desk with a clock, pencils, and a book
pexels-zen-chung-5529541.jpg: woman picking apples in orchard - fbf018


In [130]:
prompt = "Good alt text contains all information about an image, including all the attributes of people in \
the image. Attributes are gender, age (young, adult, senior), and race. Good alt text for this image would be"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: a woman is working at home on her laptop and writing
pexels-pixabay-159618.jpg: A desk with a clock, pencils, pens, and a notebook
pexels-zen-chung-5529541.jpg: a senior woman picking apples in an orchard - fbf018


In [131]:
prompt = "Alt text that thoroughly describes the person in this image (including gender, age, and race \
attributes) would be"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: a woman is working at home on her laptop and writing
pexels-pixabay-159618.jpg: A desk with a clock, pencils, and a notebook
pexels-zen-chung-5529541.jpg: a woman picking apples in an orchard - fbf018


In [132]:
prompt = "Question: What is a thorough and detailed caption for this image? Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman working at home with laptop and pen in hand - stock photo
pexels-pixabay-159618.jpg: A desk with a clock, pencils, pens, and a notebook
pexels-zen-chung-5529541.jpg: woman picking apples in an orchard - fb


In [133]:
prompt = "Question: Formatted as a sentence, what is the gender, age (young, adult, senior), and race of \
the person in this photo, and what are they doing? Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman in glasses sitting at a table writing on a piece of paper
pexels-toa-heftiba-şinca-1194408.jpg: male, young, adult, white, reading a book in the dead sea
pexels-zen-chung-5529541.jpg: woman, age adult, race african american, picking apples in an orchard


In [134]:
prompt = "Question: Formatted as a sentence, what is the person in this photo doing and what is their \
gender, age (young, adult, senior), and race? Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman in glasses writing at home with laptop and pen stock photo
pexels-toa-heftiba-şinca-1194408.jpg: reading a book in the dead sea, israel, male, young, adult, israeli
pexels-zen-chung-5529541.jpg: woman picking apples in an orchard, age adult, race african american


In [135]:
prompt = "Question: What is the person in this photo doing and what is their gender, age (young, adult, senior), \
and race? Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman writing at home, white, adult, senior, blond hair
pexels-toa-heftiba-şinca-1194408.jpg: reading a book in the dead sea, israel, male, young, adult, israeli
pexels-zen-chung-5529541.jpg: picking apples in an orchard, female, adult, black


In [136]:
# Person-only working prompt

prompt = "Question: What is the gender, age (young, adult, senior), and race of the person in this photo, \
and what are they doing? Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3768176.jpg: woman, adult, white, sitting at a desk, writing
pexels-toa-heftiba-şinca-1194408.jpg: male, young, adult, israeli, reading a book in the dead sea
pexels-zen-chung-5529541.jpg: senior, black, female, picking apples in an orchard


In [137]:
# using wain23_images_v3_init

prompt = "Question: What is the gender, age (young, adult, senior), and race of the person in this photo, \
and what are they doing? Answer:"
caption_images(prompt)

pexels-akshar-dave-977971.jpg: male, young, indian, playing guitar, smiling, sitting on the floor
pexels-alena-koval-820673.jpg: female, adult, white, drawing a doodle
pexels-andrea-piacquadio-3768176.jpg: woman, adult, white, sitting at a desk, writing
pexels-andrea-piacquadio-3769999.jpg: woman, adult, white, preparing food, smiling, preparing food
pexels-andrea-piacquadio-3782829.jpg: senior, white, male, glasses, reading, walking, outdoors
pexels-beyzaa-yurtkuran-13533591.jpg: male, adult, white, reading a newspaper, crossword puzzle
pexels-binyamin-mellish-169523.jpg: male, adult, white, and they are planting a seed
pexels-cottonbro-studio-4004116.jpg: female, adult, and asian, and they are putting clothes on a hanger
pexels-cottonbro-studio-4045621.jpg: reading a newspaper in bed, afro-american, adult
pexels-cottonbro-studio-7885580.jpg: senior, white, male, painting, painting a flower
pexels-italo-melo-1786244.jpg: senior, female, asian, and dancing in the street
pexels-jackson-

In [138]:
prompt = "Question: What is the gender, age (young, adult, senior), and race of the person in this photo, \
and what are they doing? Answer:"
caption_images(prompt)

pexels-andrea-piacquadio-3967832.jpg: female, young, adult, and smiling while leaning on the edge of a swimming pool
pexels-antoni-shkraba-production-8791162.jpg: senior man with beard playing with wooden blocks in the living room
pexels-budgeron-bach-5157742.jpg: young black man holding a skateboard in the street, posing for a photo
pexels-cottonbro-studio-9710643.jpg: male, young, black, watering plants in a kitchen
pexels-eli-zaturanski-821683.jpg: male, adult, white, drawing a mountain landscape on a piece of paper
pexels-jeshootscom-7432.jpg: female, young, adult, and asian, and they are tying their shoes
pexels-ketut-subiyanto-4132326.jpg: writing in a journal, white, adult, and writing
pexels-ketut-subiyanto-5038856.jpg: young, male, indian, tying shoelaces
pexels-ketut-subiyanto-5039638.jpg: young, black, female, sitting on stairs, tying shoes
pexels-mikhail-nilov-6620627.jpg: female, adult, white, aiming at a bow and arrow
pexels-nappy-936037.jpg: a young person, white, is pla