<a href="https://colab.research.google.com/github/michaelachmann/social-media-lab/blob/main/notebooks/2023_12_19_Caption_Generation_VertexAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Caption Generation with VertexAI (Google) [![DOI](https://zenodo.org/badge/660157642.svg)](https://zenodo.org/badge/latestdoi/660157642)
![Notes on (Computational) Social Media Research Banner](https://raw.githubusercontent.com/michaelachmann/social-media-lab/main/images/banner.png)

## Overview

This Jupyter notebook is a part of the social-media-lab.net project, which is a work-in-progress textbook on computational social media analysis. The notebook is intended for use in my classes.

The **Caption Generation with VertexAI (Google)** Notebook provides a few lines of code to downlaod images from a given URL, convert it to base64, and upload the image using a signed request to VertexAI.

### Project Information

- Project Website: [social-media-lab.net](https://social-media-lab.net/)
- GitHub Repository: [https://github.com/michaelachmann/social-media-lab](https://github.com/michaelachmann/social-media-lab)

## License Information

This notebook, along with all other notebooks in the project, is licensed under the following terms:

- License: [GNU General Public License version 3.0 (GPL-3.0)](https://www.gnu.org/licenses/gpl-3.0.de.html)
- License File: [LICENSE.md](https://github.com/michaelachmann/social-media-lab/blob/main/LICENSE.md)


## Citation

If you use or reference this notebook in your work, please cite it appropriately. Here is an example of the citation:

```
Michael Achmann. (2023). michaelachmann/social-media-lab: DOI Release (v0.0.7). Zenodo. https://doi.org/10.5281/zenodo.8199902
```

# Captions using Google Vertex AI

In [1]:
!pip install -q google-cloud-aiplatform

Run the next cell, click on the link, authorize the request, and paste the authorization code below, there is actually an input field next to *Enter authorization code*; click there for the field to appear.

In [None]:
!gcloud auth login

In [5]:
import requests
import base64
import subprocess
import numpy as np

class GoogleAPI:
    def __init__(self, project_id):
        self.token = self.get_gcloud_access_token()
        self.project_id = project_id

    def get_gcloud_access_token(self):
        token = subprocess.check_output(["gcloud", "auth", "print-access-token"]).strip().decode('utf-8')
        return token

    def make_request(self, image_url, response_count=1, language_code="en"):
        image_content = self.get_image_from_signed_url(image_url)
        b64_image = self.image_to_base64(image_content)

        json_data = {
            "instances": [
                {
                    "image": {
                        "bytesBase64Encoded": b64_image
                    }
                }
            ],
            "parameters": {
                "sampleCount": response_count,
                "language": language_code
            }
        }

        url = f"https://us-central1-aiplatform.googleapis.com/v1/projects/{self.project_id}/locations/us-central1/publishers/google/models/imagetext:predict"
        headers = {
            "Authorization": f"Bearer {self.token}",
            "Content-Type": "application/json; charset=utf-8"
        }

        try:
          response = requests.post(url, headers=headers, json=json_data)

          if response.status_code == 401:
              # Refresh the token and retry
              self.token = self.get_gcloud_access_token()
              headers["Authorization"] = f"Bearer {self.token}"
              response = requests.post(url, headers=headers, json=json_data)

          response.raise_for_status()  # Raise an exception for HTTP errors
          response_data = response.json()

          # Check for predictions and return them
          predictions = response_data.get('predictions', [])
          if predictions:
              return predictions[0]  # Return the first prediction
          else:
              return None  # or return an empty string "", based on your preference

        except requests.HTTPError as e:
            print(f"Error for URL {image_url}: {e}")
            return np.nan

    @staticmethod
    def get_image_from_signed_url(url):
        response = requests.get(url)
        response.raise_for_status()
        return response.content

    @staticmethod
    def image_to_base64(image_content):
        return base64.b64encode(image_content).decode('utf-8')

Activate the [Vertex AI API in the Google Cloud Console](https://console.cloud.google.com/marketplace/product/google/aiplatform.googleapis.com). Enter your `project_id` below.

In [6]:
project_id = "some-greatid-23125"
api = GoogleAPI(project_id)

In [8]:
response = api.make_request("https://placekitten.com/408/287")

Let's print the caption:

In [9]:
print(response)

a cat is peeking out from behind a door and looking at the camera .
