## Image Captioning with Google Imagen

Mustafa Ozcicek ozcicekmustafa@gmail.com

I have been testing many different captioning models. Altough they all of them work pretty well, most of them fail when they describe simplistic logos, graphic design works etc. If you are (like me) trying to create a training dataset with captions and do not want to write captions manually, I found that Google's Imagen works pretty well. 

Before you start, make sure that your images have proper names starting from 1 to whatever. And you should adjust the range value in the loop manually based on the number of images you have.

### Import Libraries

In [16]:
import requests
import json
import base64
import os
import pandas as pd
from tqdm import tqdm
from PIL import Image
from io import BytesIO

### Define Useful Functions

In [17]:
def get_image_base64_encoding(image_path: str) -> str:
    """
    Function to return the base64 string representation of an image
    """
    with open(image_path, 'rb') as file:
        image_data = file.read()
    image_extension = os.path.splitext(image_path)[1]
    base64_encoded = base64.b64encode(image_data).decode('utf-8')
    return f"data:image/{image_extension[1:]};base64,{base64_encoded}"


def get_image_base64_encoding2(image_path: str) -> str:
    """
    
    Function to return the base64 string representation of an image
    This function does not include data:image/png;base64, before the encoding
    I am using for the iterating http requests
    
    """
    with open(image_path, 'rb') as file:
        image_data = file.read()
    image_extension = os.path.splitext(image_path)[1]
    base64_encoded = base64.b64encode(image_data).decode('utf-8')
    return f"{base64_encoded}"

def image_to_base64_PNG(image, format="PNG"):
    
    """
    Process PNG Images
    
    """

    buffer = BytesIO()
    image.save(buffer, format=format)
    image_str = base64.b64encode(buffer.getvalue()).decode("utf-8")
    return image_str


### Create a Dataframe to Keep the Captions and Other Info

In [18]:
captiondb = pd.DataFrame({"Image ID": [], "Image": [], "Description": []})
print(captiondb)

Empty DataFrame
Columns: [Image ID, Image, Description]
Index: []


### Start the Request

**Enter Google Cloud Project Credentials**

In order to get an access token, enable Vertex API on Google Cloud and open google cloud terminal
enter this line:

```!gcloud auth application-default print-access-token ``` *without the exclamation mark*

In [21]:
project_id = "projectname"

locations = "us-central1"

# Use google cloud terminal to get an access token !gcloud auth application-default print-access-token
access_token = "acces token"

lang = "en" #caption language en, fr, es, de, it

caption_count = 1 #how many caption alternatives do you want max: 3

url = f"https://us-central1-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{locations}/publishers/google/models/imagetext:predict"

headers = {
    "Authorization": "Bearer " + access_token,
    "Content-Type": "application/json; charset=utf-8"
}


#### Single Image

Skip this line if you are trying to caption multiple images in a directory

In [28]:
data = f'''{{
        "instances": [
            {{
                "image": {{
                    "bytesBase64Encoded": "{image_to_base64_PNG(Image.open("../../ImageSimilarity/data/"+ str(i) + ".png"))}"
                }}
            }}
        ],
        "parameters": {{
            "sampleCount": 1,
            "language": "en"
        }}
    }}'''

response = requests.post(url, headers=headers, data=data)

if response.status_code == 200:
    result = response.json()
    print(result)
else:
    print("Request failed with status code:", response.status_code)
    print(response.text)


{'predictions': ['the letter h is in the shape of a house .'], 'deployedModelId': '6747203681382301696'}


#### Batch Captioning with for loop

### Check the DB and Export

In [None]:
for i in tqdm(range(1, 50)):
    
    i += 1
    
    encode_img = image_to_base64_PNG(Image.open("../../ImageSimilarity/data/"+ str(i) + ".png"))
    

    data = f'''{{
        "instances": [
            {{
                "image": {{
                    "bytesBase64Encoded": "{encode_img}"
                }}
            }}
        ],
        "parameters": {{
            "sampleCount": 1,
            "language": "en"
        }}
    }}'''

    response = requests.post(url, headers=headers, data=data)

    if response.status_code == 200:
        result = response.json()
        # Append captions to the DataFrame
        captiondb.loc[i] = [str(i) + ".png", encode_img,  result['predictions'][0]]
    else:
        print("Request failed with status code:", response.status_code)
        print(response.text)
        print(f"The process failed while captioning {i}.png")


In [25]:
captiondb.head()

Unnamed: 0,Image ID,Image,Description
1,1.png,iVBORw0KGgoAAAANSUhEUgAAAQAAAAEACAIAAADTED8xAA...,a cross with the number 2 and 1 on it
2,2.png,iVBORw0KGgoAAAANSUhEUgAAAQAAAAEACAIAAADTED8xAA...,a black and white icon of a house with a squar...
3,3.png,iVBORw0KGgoAAAANSUhEUgAAAQAAAAEACAIAAADTED8xAA...,the letter s is in a circle on a white backgro...
4,4.png,iVBORw0KGgoAAAANSUhEUgAAAQAAAAEACAIAAADTED8xAA...,the 3m logo is black and white on a white back...
5,5.png,iVBORw0KGgoAAAANSUhEUgAAAQAAAAEACAIAAADTED8xAA...,a black and white logo with a cross and the nu...


In [28]:
print(captiondb.info())

<class 'pandas.core.frame.DataFrame'>
Index: 14 entries, 1 to 20
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Image ID     14 non-null     object
 1   Image        14 non-null     object
 2   Description  14 non-null     object
dtypes: object(3)
memory usage: 448.0+ bytes
None


#### Export DB to a csv file

In [30]:
captiondb.to_csv("captiondb.cvs", index=False)

### Closing Marks

Google Cloud does not allow you to run the request forever. It comes to a halt after ~10-20 iteration. You need to get in touch with them and ask to increase the quota. I have not done it yet so I cannot provide the information about that yet.