# Getting Started

Before we proceed with the labs we need to make sure some services are activated in the Google Cloud, modules are installed and global variables are set.  

These variables, modules and services will support three diffent architectures we will build along the way. More details will be presented in the labs.

### Architeture 1 - API Centric
<img src="./images/4_arch_front.png"
     style="width:70%"
     />

### Architeture 2 - Event Driven
<img src="./images/5_arch_event.png"
     style="width:70%"
     />

### Architeture 3 - Data Visualization
<img src="./images/6_arch_dataviz.png"
     style="width:70%"
     />



## Install required packages

The following python packages are necessary to run the labs. Let's make sure they are installed.

In [None]:
%pip install -U google-cloud-core
%pip install -U google-cloud-resource-manager
%pip install -U google-cloud-documentai
%pip install -U google-cloud-storage
%pip install -U google-cloud-firestore
%pip install -U google-cloud-dlp
%pip install -U google-cloud-language

## Activate the APIs

We need to activate all the necessary APIs.

In [1]:
# This operation may take a few seconds
!gcloud services enable documentai.googleapis.com storage.googleapis.com automl.googleapis.com bigquery.googleapis.com
!gcloud services enable cloudapis.googleapis.com cloudfunctions.googleapis.com cloudresourcemanager.googleapis.com
!gcloud services enable cloudscheduler.googleapis.com containerregistry.googleapis.com dataflow.googleapis.com dlp.googleapis.com
!gcloud services enable eventarc.googleapis.com firestore.googleapis.com language.googleapis.com run.googleapis.com

## Get project ID using resource manager API

Let's use the resource manager API to retrieve the project ID.

In [50]:
# Resource manager to get project_id
from google.cloud import resource_manager

In [51]:
# Set the name prefix of the project
project_name_prefix = 'cool-ml-'

In [52]:
# Get project_id using Resource Manager API
res_client = resource_manager.Client()

project_id = ''
project_filter = {'name':project_name_prefix + '*'}

for project in res_client.list_projects(filter_params=project_filter):
    PROJECT_ID = project.project_id

In [53]:
# Make sure your project starts with 'qwiklabs'
print(f'The ID of your project is: {PROJECT_ID}')

The ID of your project is: cool-ml-demos


## Create the local service account key

This key is necessary for some API calls, like generating a signed URL for Google Storage.

In [55]:
SERVICE_ACCOUNT_ID = 'sa-geral'
FULL_SERVICE_ACCOUNT_ID = f'sa-geral@{PROJECT_ID}.iam.gserviceaccount.com'
MEMBER = f'serviceAccount:{FULL_SERVICE_ACCOUNT_ID}'
ROLE = 'roles/editor'

In [None]:
# Create a Service Account without any roles
!gcloud iam service-accounts create $SERVICE_ACCOUNT_ID \
    --description="Interact with GCP Services" \
    --display-name="SA Demo"

In [None]:
# Assign a role for this account
!gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member=$MEMBER \
    --role=$ROLE

In [42]:
# Download the key for a local directory
!gcloud iam service-accounts keys create ~/key.json \
  --iam-account $FULL_SERVICE_ACCOUNT_ID

created key [56079b9ee7e93fca75691d564ffea7ca40641db9] of type [json] as [/home/jupyter/key.json] for [sa-geral@cool-ml-demos.iam.gserviceaccount.com]


## Define REGION_NAME

This variable will be used by some services.

In [79]:
REGION_NAME = 'us-east1'

## Google Cloud Storage

We need to create 4 buckets to build our architecture. As you can see, these buckets are commom for all architectures we will develop.

<img src="./images/4_1_arch_storage.png"
     style="width:70%"
     />

 - **Original Document**: Store the original file uploaded by the user
 - **Raw Results**: Results from the Document AI processing
 - **Anonymized Documents**: Results from the DLP processing
 - **Test** (not shown in the diagram): Test the Document AI and DLP API
 
We will use the gsutil utility to create the buckets. The web console could be used as well.

These buckets will be used to store the original files and results of DocumentAI and DLP.

In [80]:
# First define the names of the buckets
TEST_BUCKET = f'{PROJECT_ID}-test'
ORIGINAL_BUCKET = f'{PROJECT_ID}-original'
DOCAI_BUCKET = f'{PROJECT_ID}-docai'
DLP_BUCKET = f'{PROJECT_ID}-dlp'

In [None]:
# Create a bucket in GCS to store the original documents
!gsutil mb -p $PROJECT_ID -c regional -l $REGION_NAME gs://$ORIGINAL_BUCKET/

# Create a bucket in GCS to store the document ai results
!gsutil mb -p $PROJECT_ID -c regional -l $REGION_NAME gs://$DOCAI_BUCKET/

# Create a bucket in GCS to store the dlp results
!gsutil mb -p $PROJECT_ID -c regional -l $REGION_NAME gs://$DLP_BUCKET/

# Create a bucket in GCS for testing
!gsutil mb -p $PROJECT_ID -c regional -l $REGION_NAME gs://$TEST_BUCKET/

Before we continue, navigate to Google Cloud Web console and check if the buckets were created:

<img src="./images/1_9_storage.png"
     style="width:30%"
     />
     
If you have any problem, you can try to re-execute the previous cell or create the buckets using the Web UI.

## IMPORTANT: Execute the next two cells

In [73]:
%%capture output --no-stderr
print(f'SERVICE_ACCOUNT_ID = \'{SERVICE_ACCOUNT_ID}\'')
print(f'FULL_SERVICE_ACCOUNT_ID = \'{FULL_SERVICE_ACCOUNT_ID}\'')
print(f'PROJECT_ID = \'{PROJECT_ID}\'')
print(f'TEST_BUCKET = \'{TEST_BUCKET}\'')
print(f'ORIGINAL_BUCKET = \'{ORIGINAL_BUCKET}\'')
print(f'DOCAI_BUCKET = \'{DOCAI_BUCKET}\'')
print(f'DLP_BUCKET = \'{DLP_BUCKET}\'')

In [74]:
with open('./docai_module/config.py', 'a') as f:
    f.write(output.stdout)