![ga4](https://www.google-analytics.com/collect?v=2&tid=G-6VDTYWLKX6&cid=1&en=page_view&sid=1&dl=statmike%2Fvertex-ai-mlops%2F00+-+Setup&dt=00+-+Environment+Setup.ipynb)

# 00 - Environment Setup

This is the notebook that sets up the GCP project for the other notebooks in this repository. 

**Conceptual Flow & Workflow**

<p align="center">
  <img alt="Conceptual Flow" src="../architectures/slides/00_arch.png" width="70%">
</p>
<p align="center">
  <img alt="Workflow" src="../architectures/slides/00_console.png" width="70%">
</p>

---
## Setup

inputs:

In [7]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'avid-streamer-396319'

In [2]:
REGION = 'us-central1'

packages:

In [3]:
from google.cloud import storage
from google.cloud import bigquery

import pandas as pd
from sklearn import datasets

### Clients:

we are using Google Cloud Client Libraries to interact with Google Cloud Storage (GCS) and Google BigQuery (BQ) services. 
*  Google Cloud Storage (GCS): GCS is a scalable object storage service that allows you to store and retrieve any amount of data in a secure and cost-effective manner. It is designed for storing files, objects, and unstructured data (images, videos, logs...)
*  Google BigQuery: BigQuery is a serverless data warehousing and analytics platform. It's designed for analyzing large datasets using SQL-like queries. BigQuery excels at processing and querying structured and semi-structured data.


#### Why we should use BigQuery?
*  While you can store data in GCS and access it programmatically, GCS itself does not offer built-in querying or analytics capabilities. You would typically need to transfer data from GCS to a platform like BigQuery or a data warehouse to perform analytics.
*  With BigQuery, you'll get great performance on your data, while knowing you can scale seamlessly to store and analyze petabytes more without having to buy more capacity. 


In [4]:
gcs = storage.Client(project = PROJECT_ID)
bq = bigquery.Client(project = PROJECT_ID)

parameters:

In [5]:
BUCKET = PROJECT_ID

---
## Create Storage Bucket
To run a training job on Vertex AI, we'll need a storage bucket to store our saved model assets (data, models ...). The bucket needs to be regional. We're using us-central.

Check to see if bucket already exist and create if missing:
- [GCS Python Client](https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.client.Client)

In [8]:
if not gcs.lookup_bucket(BUCKET):
    bucketDef = gcs.bucket(BUCKET)
    bucket = gcs.create_bucket(bucketDef, project=PROJECT_ID, location=REGION)
    print(f'Created Bucket: {gcs.lookup_bucket(BUCKET).name}')
else:
    bucketDef = gcs.bucket(BUCKET)
    print(f'Bucket already exist: {bucketDef.name}')

Created Bucket: avid-streamer-396319


In [9]:
print(f'Review the storage bucket in the console here:\nhttps://console.cloud.google.com/storage/browser/{PROJECT_ID};tab=objects&project={PROJECT_ID}')

Review the storage bucket in the console here:
https://console.cloud.google.com/storage/browser/avid-streamer-396319;tab=objects&project=avid-streamer-396319


or go to cloud storage => bucket

---
<a id = 'permissions'></a>
## Service Account & Permissions

A service account is a special account used by an application or a virtual machine (VM) instance, not a person. You can create and assign permissions to service accounts to provide specific permissions to a resource or application.

-  Service accounts can be managed by administrators or project owners within the Google Cloud Console. They have the ability to create, modify, and assign permissions to service accounts to control their access to various resources and services within Vertex AI.
-  This notebook instance is running as a service account in google cloud project GCP.  This service account will also be used to run other services in Vertex AI like training jobs and pipelines.  The service account will need permission to interact with object in Cloud Storage which requires the role ([roles/storage.objectAdmin](https://cloud.google.com/storage/docs/access-control/iam-roles)). 

Get the current service account:

In [10]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'114910882374-compute@developer.gserviceaccount.com'

**Enable the Cloud Resource Manager API:**
-  The Google Cloud Resource Manager API is responsible for managing Google Cloud resources, such as projects, folders, and organizations. Enabling this API allows you to create, manage, and organize your Google Cloud resources through API calls and other interactions.

In [14]:
!gcloud services enable cloudresourcemanager.googleapis.com

List the service account current roles:

In [18]:
!gcloud projects get-iam-policy $PROJECT_ID --filter="bindings.members:$SERVICE_ACCOUNT" --format='table(bindings.role)' --flatten="bindings[].members"

ROLE
roles/editor


If the resulting list is missing `roles/storage.objectAdmin` or another role that contains this permission, like the basic role `roles/owner`, then it will need to be added for the service account. Use these instructions to complete this:

In [16]:
print(f'Go To IAM in the Google Cloud Console:\nhttps://console.cloud.google.com/iam-admin/iam?orgonly=true&project={PROJECT_ID}&supportedpurview=organizationId')

Go To IAM in the Google Cloud Console:
https://console.cloud.google.com/iam-admin/iam?orgonly=true&project=avid-streamer-396319&supportedpurview=organizationId


From the console link above, or by going to https:/console.cloud.google.com and navigating to "IAM & Admin > IAM":
- Locate the row for the service account listed above: `<project number>-compute@developer.gserviceaccount.com`
- Under the `inheritance` column click the pencil icon to edit roles
- In the fly over menu, under `Assign roles` select `Add Another Role`
- Click the `Select a role` box and type `Storage Object Admin`, then select `Storage Object Admin`
- Click Save
- Rerun the list of services below and verify the role has been added:

In [20]:
!gcloud projects get-iam-policy $PROJECT_ID --filter="bindings.members:$SERVICE_ACCOUNT" --format='table(bindings.role)' --flatten="bindings[].members"

ROLE
roles/editor
roles/owner
roles/storage.objectAdmin


---
## Install KFP

Vertex AI Pipelines lets you orchestrate your machine learning (ML) workflows in a serverless manner. Before Vertex AI Pipelines can orchestrate your ML workflow, you must describe your workflow as a pipeline. ML pipelines are portable and scalable ML workflows that are based on containers and Google Cloud services.

Vertex AI Pipelines can also run pipelines built using any of the following SDKs:
Kubeflow Pipelines SDK

If you get an error after a step, rerun it.  The dependecies sometimes resolve.
- [Install the Kubeflow Pipelines SDK](https://www.kubeflow.org/docs/components/pipelines/v1/sdk/install-sdk/)

In [21]:
!pip install kfp -U -q

install kubeflow components that interract with vertex 

In [22]:
!pip install google-cloud-pipeline-components -U -q

---
## Update AIPlatform Package:

The `google-cloud-aiplatform` package updates frequently.  Update it for latest functionality.

- [aiplatform Python Client](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform)
- [GitHub Repo for api-common-protos](https://github.com/googleapis/api-common-protos)

For a better understanding of the Vertex AI APIs client, version, and layers please review the tip here [aiplatform_notes.md](../Tips/aiplatform_notes.md).

In [23]:
!pip install googleapis-common-protos -U -q

In [24]:
!pip install google-cloud-aiplatform -U -q

In [25]:
from google.cloud import aiplatform
aiplatform.__version__

'1.31.0'