# Vertex AI Gemini Pro (Vision)

Already, there are lots of LLMs and embedding models in the world. But multi-modal (Image / Video) supported models are few.

In this example, we will show the how to use it on the RPA segment. 

In [4]:
#! pip3 install --upgrade google-cloud-aiplatform
#! pip3 install google-cloud-storage

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


In [3]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

: 

Wait for restarting the kernel.

In [18]:
from vertexai.preview.generative_models import (
    GenerationConfig,
    GenerativeModel,
    Image,
    Part,
    HarmBlockThreshold,
    HarmCategory,
)

In [2]:
multimodal_model = GenerativeModel("gemini-pro-vision")

The sample video(wooribank_login.mp4) is made from the window's recording. And it was converted into 10fps to reduce video size (under 8 MB - the limit of Gemini Pro video clip).

In [5]:
from google.cloud import storage

bucket_name = "cloud-ai-platform-9cc0d33e-aa40-42fb-8ec6-78c0bc4a557c"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)

blob = bucket.blob("wooribank_login.mp4")
blob.upload_from_filename("resources/wooribank_login.mp4")




In [19]:
prompt = "Please describe the user's actions in this video in detail, including the URL sites they accessed and the buttons they clicked."

generation_config = GenerationConfig(
    temperature=0.1,
    max_output_tokens=2048,
)

safety_config = {
    HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
}

video = Part.from_uri(
    uri=f"gs://{bucket_name}/wooribank_login.mp4",
    mime_type="video/mp4",
)
contents = [prompt, video]

responses = multimodal_model.generate_content(contents, generation_config=generation_config, 
    safety_settings=safety_config, stream=False)


In [20]:
print(responses)

candidates {
  content {
    role: "model"
    parts {
      text: " The user opens Google Chrome and goes to google.com. They then search for \"wooribank.\" The user clicks on the first result, which is the Woori Bank website.\n\nOn the Woori Bank website, the user clicks on the \"Personal Login\" button. A login page appears, and the user enters their login information. The user then clicks on the \"Login\" button.\n\nThe user is now logged into their Woori Bank account. They click on the \"My Profile\" tab, and then click on the \"Change Password\" link. The user enters their current password and new password, and then clicks on the \"Change Password\" button.\n\nThe user\'s password has now been changed. They click on the \"Logout\" button to log out of their account."
    }
  }
  finish_reason: STOP
  safety_ratings {
    category: HARM_CATEGORY_HARASSMENT
    probability: NEGLIGIBLE
  }
  safety_ratings {
    category: HARM_CATEGORY_HATE_SPEECH
    probability: NEGLIGIBLE
  }
  s