### Vertex AI Gemini API 실습

- Python 용 Vertex AI SDK 를 활용하여 Gemini 1.5 Pro 모델과 상호작용 및 다양한 작업 실습

- 다양한 입력 유형 (텍스트 프롬프트, 이미지, 비디오) 에서 텍스트를 생성하는 것과 다양한 기능과 구성 옵션을 실험하여 결과를 미세조정

<br />

### Gemini Pro

- 다음을 포함한 복잡한 추론을 위해 설계

- 대량의 정보를 분석하고 요약

- 정교한 크로스 모달 추출 (텍스트, 코드, 이미지)

- 복잡한 코드베이스를 통한 문제 해결


<br />

### Gemini Flash

- 속도와 효율성을 위해 최적화되었으며 다음을 제공

- 1초 미만의 응답 시간과 높은 처리량

- 다양한 작업에 대해 저렴한 비용으로 높은 품질을 제공

- 향상된 멀티모달 기능에는 공간 이해 개선, 새로운 출력 모드(텍스트, 오디오, 이미지), 기본 도구 사용(Google 검색, 코드 실행, 타사 기능)


<br />

### 목표

- Python 용 Vertex AI SDK 를 사용하여 Vertex AI 에서 Gemini API 를 사용

- Gemini 1.5 Pro(gemeni-1.5-pro) 모델과 상호 작용

- 텍스트 프롬프트에서 텍스트를 생성

- 다양한 기능과 구성 옵션

- 이미지와 텍스트 프롬프트에서 텍스트 생성

- 비디오와 텍스트 프롬프트에서 텍스트 생성



### Set Google Cloud project information and initialize Vertex AI SDK

In [None]:
# Use the environment variable if the user doesn't provide Project ID.
import os

import vertexai

PROJECT_ID = "{project_ID}"  # @param {type: "string", placeholder: "[your-project-id]" isTemplate: true}
if not PROJECT_ID or PROJECT_ID == "{project_ID}":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "{region}")

vertexai.init(project=PROJECT_ID, location=LOCATION)

### Import library

In [None]:
from vertexai.generative_models import (
    GenerationConfig,
    GenerativeModel,
    HarmBlockThreshold,
    HarmCategory,
    Image,
    Part,
    SafetySetting,
)

### Load the Gemini 1.5 Pro model

In [None]:
model = GenerativeModel("gemini-1.5-pro")

### Generate text from text prompts

- Send a text prompt to the model using the generate_content method. The generate_content method can handle a wide variety of use cases, including multi-turn chat and multimodal input, depending on what the underlying model supports.

In [None]:
response = model.generate_content("Why is the sky blue?")

print(response.text)

### Streaming

- By default, the model returns a response after completing the entire generation process. You can also stream the response as it is being generated, and the model will return chunks of the response as soon as they are generated.

In [None]:
responses = model.generate_content("Why is the sky blue?", stream=True)

for response in responses:
    print(response.text, end="")

### Try your own prompts

- What are the biggest challenges facing the healthcare industry?

- What are the latest developments in the automotive industry?

- What are the biggest opportunities in retail industry?

- (Try your own prompts!)

In [None]:
prompt = """Create a numbered list of 10 items. Each item in the list should be a trend in the tech industry.

Each trend should be less than 5 words."""  # try your own prompt

response = model.generate_content(prompt)

print(response.text)

### Model parameter

- Every prompt you send to the model includes parameter values that control how the model generates a response. The model can generate different results for different parameter values. You can experiment with different model parameters to see how the results change.

In [None]:
generation_config = GenerationConfig(
    temperature=0.9,
    top_p=1.0,
    top_k=32,
    candidate_count=1,
    max_output_tokens=8192,
)

response = model.generate_content(
    "Why is the sky blue?",
    generation_config=generation_config,
)

print(response.text)

### Safety filters

- The Gemini API provides safety filters that you can adjust across multiple filter categories to restrict or allow certain types of content. You can use these filters to adjust what's appropriate for your use case. See the [Configure safety filters](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-filters) page for details.

- When you make a request to Gemini, the content is analyzed and assigned a safety rating. You can inspect the safety ratings of the generated content by printing out the model responses, as in this example:

In [None]:
response = model.generate_content("Why is the sky blue?")

print(f"Safety ratings:\n{response.candidates[0].safety_ratings}")

You can use safety_settings to adjust the safety settings for each request you make to the API. This example demonstrates how you set the block threshold to BLOCK_ONLY_HIGH for the dangerous content category:

In [None]:
safety_settings = [
    SafetySetting(
        category=HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
        threshold=HarmBlockThreshold.BLOCK_ONLY_HIGH,
    ),
]

prompt = """
    Write a list of 2 disrespectful things that I might say to the universe after stubbing my toe in the dark.
"""

response = model.generate_content(
    prompt,
    safety_settings=safety_settings,
)

print(response)

### Generate text from multimodal prompt

- Gemini 1.5 Pro (gemini-1.5-pro) is a multimodal model that supports multimodal prompts. You can include text, image(s), and video in your prompt requests and get text or code responses.

### Define helper functions

- Define helper functions to load and display images.

In [None]:
import http.client
import typing
import urllib.request

import IPython.display
from PIL import Image as PIL_Image
from PIL import ImageOps as PIL_ImageOps


def display_images(
    images: typing.Iterable[Image],
    max_width: int = 600,
    max_height: int = 350,
) -> None:
    for image in images:
        pil_image = typing.cast(PIL_Image.Image, image._pil_image)
        if pil_image.mode != "RGB":
            # RGB is supported by all Jupyter environments (e.g. RGBA is not yet)
            pil_image = pil_image.convert("RGB")
        image_width, image_height = pil_image.size
        if max_width < image_width or max_height < image_height:
            # Resize to display a smaller notebook image
            pil_image = PIL_ImageOps.contain(pil_image, (max_width, max_height))
        IPython.display.display(pil_image)


def get_image_bytes_from_url(image_url: str) -> bytes:
    with urllib.request.urlopen(image_url) as response:
        response = typing.cast(http.client.HTTPResponse, response)
        image_bytes = response.read()
    return image_bytes


def load_image_from_url(image_url: str) -> Image:
    image_bytes = get_image_bytes_from_url(image_url)
    return Image.from_bytes(image_bytes)


def get_url_from_gcs(gcs_uri: str) -> str:
    # converts GCS uri to url for image display.
    url = "https://storage.googleapis.com/" + gcs_uri.replace("gs://", "").replace(
        " ", "%20"
    )
    return url


def print_multimodal_prompt(contents: list):
    """
    Given contents that would be sent to Gemini,
    output the full multimodal prompt for ease of readability.
    """
    for content in contents:
        if isinstance(content, Image):
            display_images([content])
        elif isinstance(content, Part):
            url = get_url_from_gcs(content.file_data.file_uri)
            IPython.display.display(load_image_from_url(url))
        else:
            print(content)

### Generate text from a video file

- Specify the Cloud Storage URI of the video to include in the prompt. The bucket that stores the file must be in the same Google Cloud project that's sending the request. You must also specify the mime_type field. The supported MIME type for video includes video/mp4.

In [None]:
file_path = "github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4"
video_uri = f"gs://{file_path}"
video_url = f"https://storage.googleapis.com/{file_path}"

IPython.display.Video(video_url, width=450)

In [None]:
prompt = """
Answer the following questions using the video only:
What is the profession of the main person?
What are the main features of the phone highlighted?
Which city was this recorded in?
Provide the answer in JSON.
"""

video = Part.from_uri(video_uri, mime_type="video/mp4")
contents = [prompt, video]

response = model.generate_content(contents)

print(response.text)

### Direct analysis of publicly available web media

- This new feature enables you to directly process publicly available URL resources including images, text, video and audio with Gemini. This feature supports all currently supported modalities and file formats.

In this example, you add the file URL of a publicly available image file to the request to identify what's in the image.

In [None]:
prompt = """
Extract the objects in the given image and output them in a list in alphabetical order.
"""

image_file = Part.from_uri(
    "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/office-desk.jpeg",
    "image/jpeg",
)

response = model.generate_content([image_file, prompt])

print(response.text)

This example demonstrates how to add the file URL of a publicly available video file to the request, and use the controlled generation capability to constraint the model output to a structured format.

In [None]:
response_schema = {
    "type": "ARRAY",
    "items": {
        "type": "OBJECT",
        "properties": {
            "timecode": {
                "type": "STRING",
            },
            "chapter_summary": {
                "type": "STRING",
            },
        },
        "required": ["timecode", "chapter_summary"],
    },
}

prompt = """
Chapterize this video content by grouping the video content into chapters and providing a brief summary for each chapter. 
Please only capture key events and highlights. If you are not sure about any info, please do not make it up. 
"""

video_file = Part.from_uri(
    "https://storage.googleapis.com/cloud-samples-data/generative-ai/video/rio_de_janeiro_beyond_the_map_rio.mp4",
    "video/mp4",
)

response = model.generate_content(
    contents=[video_file, prompt],
    generation_config=GenerationConfig(
        response_mime_type="application/json",
        response_schema=response_schema,
    ),
)

print(response.text)