In [None]:
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Build Zeitghost Custom Container Image for MLOps 
<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/language/intro_palm_api.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/language/intro_palm_api.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/blob/main/language/intro_palm_api.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
    </a>
  </td>
</table>

## Overview

<center>
<img src="imgs/zghost_overview_pipeline_steps.png" width="1200"/>
</center>

Vertex Pipelines are effectively running containerized executions for each step (component) in the pipeline workflow. To simplify and reduce container creation overhead, we've included this notebook so that you can create a single custom container image that can be used for the pipeline in notebook 05-gdelt-pipelines. 

Whilst you don't necessarily have to use a custom container, the alternative is to use a base container image and pass a list of packages required to run each component step in the pipeline definition.

The previous notebooks:

1. [Setup Vertex Vector Store](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/language/intro_palm_api.ipynb)
2. [GDELT DataOps](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/language/intro_palm_api.ipynb)
3. [Vector Store Index Loader](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/language/intro_palm_api.ipynb)
4. [Alternative document format embeddings](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/language/intro_palm_api.ipynb)

Have shown how you can perform a one-off extraction, index, and load into the Matching Engine Vector Store and how you can use a Langchain Agent to query the Vector Store using natural language queries. 

However, given the nature of GDELT data it may become critical to orchestrate and schedule a regularly occuring update of the Index with new vectors, as new GDELT entity and event data becomes available for a given topic or actor. 

To address this challenge, we've created a Vertex AI pipeline which will modularize, orchestrate, and only update the existing Matching Engine Index with new vectors when new data is extracted - the [GDELT Pipelines](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/language/intro_palm_api.ipynb) notebook shows how to create this pipeline. 

In order to create this pipeline, you'll need to containerize the environment dependencies for all of these differnet components to run - to streamline this process, you can use this notebook to generate the container to use with the pipeline. 

If your environment changes and you're extending your application, we also show how you can create a [Cloud Build Trigger](https://cloud.google.com/build/docs/automating-builds/create-manage-triggers) to automatically rebuild this container image each time new code changes are pushed to your `main` branch.

---

### Objectives

In this notebook, the steps performed include:

- Create a `Dockerfile_gdelt` image to use with Vertex AI Pipelines components
- Choose to build the custom container image either:
    - Locally with `Docker`
    - Using Cloud services with [Cloud Build](https://cloud.google.com/build/docs)
- Cloud Build [trigger](https://cloud.google.com/build/docs/automating-builds/create-manage-triggers) setup to rebuild image each time pushed to `main` branch and tag with `latest`
- The built image will be used as the base image in pipeline steps

### Costs
This tutorial uses billable components of Google Cloud:

* Cloud Build
* Artifact Registry

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing),
and use the [Pricing Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

## Getting Started
**Colab only:** Uncomment the following cell to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top. 

In [None]:
# # Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

### Authenticating your notebook environment
* If you are using **Colab** to run this notebook, uncomment the cell below and continue.
* If you are using **Vertex AI Workbench**, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env).

In [None]:
# from google.colab import auth
# auth.authenticate_user()

### Make sure you edit the values below
Each time you run the notebook for the first time with new variables, you just need to edit the actor prefix and version variables below. They are needed to grab all the other variables in the notebook configuration.

In [None]:
# CREATE_NEW_ASSETS        = True # True | False
ACTOR_PREFIX             = "way"
VERSION                  = 'v1'

# print(f"CREATE_NEW_ASSETS  : {CREATE_NEW_ASSETS}")
print(f"ACTOR_PREFIX       : {ACTOR_PREFIX}")
print(f"VERSION            : {VERSION}")

### Load configuration settings from setup notebook
> Set the constants used in this notebook and load the config settings from the `00-env-setup.ipynb` notebook.

In [None]:
# staging GCS
GCP_PROJECTS             = !gcloud config get-value project
PROJECT_ID               = GCP_PROJECTS[0]

BUCKET_NAME              = f'zghost-{ACTOR_PREFIX}-{VERSION}-{PROJECT_ID}'
BUCKET_URI               = f'gs://{BUCKET_NAME}'

config = !gsutil cat {BUCKET_URI}/config/notebook_env.py
print(config.n)
exec(config.n)

print(f"BUCKET_NAME        : {BUCKET_NAME}")
print(f"BUCKET_URI         : {BUCKET_URI}")

### Container Image Variables

In [None]:
import os

root_path = '..'
os.chdir(root_path)
os.getcwd()

In [None]:
DOCKERNAME                = f'Dockerfile_gdelt'

REPOSITORY                = f'zghost-{ACTOR_PREFIX}'
IMAGE_NAME                = f'gdelt-pipe-{VERSION}'

REMOTE_IMAGE_NAME         = f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{REPOSITORY}/{IMAGE_NAME}"

print(f"DOCKERNAME        = {DOCKERNAME}")
print(f"REPOSITORY        = {REPOSITORY}")
print(f"IMAGE_NAME        = {IMAGE_NAME}")
print(f"REMOTE_IMAGE_NAME = {REMOTE_IMAGE_NAME}")

### Create Artifact Repository
If you don't have an existing artifact repository, create one using the gcloud command below

In [None]:
! gcloud artifacts repositories create $REPOSITORY --repository-format=docker --location=$LOCATION

## Local Docker build
Provide a name for your dockerfile and make sure you are authenticated

In [None]:
! gcloud auth configure-docker $REGION-docker.pkg.dev --quiet

Create your Dockerfile 

In [None]:
%%writefile {DOCKERNAME}

FROM python:3.10

ENV PYTHONUNBUFFERED True

ENV APP_HOME /workspace

WORKDIR $APP_HOME

COPY notebooks/requirements.txt $APP_HOME/requirements.txt

RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r $APP_HOME/requirements.txt

ADD zeitghost $APP_HOME/zeitghost

RUN ls zeitghost

RUN export PYTHONPATH=${PYTHONPATH}:${APP_HOME}/

### Build Image Locally

In [None]:
!docker build -t $REMOTE_IMAGE_NAME -f $DOCKERNAME .

Once your container has finished building, now push it to the GCP Artifact Registry - once it's pushed, it can be used in the pipelines

In [None]:
# ### push the container to registry
!docker push $REMOTE_IMAGE_NAME