<a href="https://colab.research.google.com/github/gstripling00/introduction_to_neural_networks/blob/Notebooks/Vertex_AI_Batch_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Vertex AI Batch Prediction

This is the **third** part of a series. I recommend to check out the previous articles as prerequisite for this part. 

1. [Train Machine Learning models with Vertex AI Training](https://medium.com/google-cloud/how-to-train-ml-models-with-vertex-ai-training-f9046bfbcfab)
2. [Serving Machine Learning models with Google Vertex AI](https://medium.com/google-cloud/serving-machine-learning-models-with-google-vertex-ai-5d9644ededa3)
3. TODO add article as soon as published


Your feedback and questions are highly appreciated. <br>You can find me on Twitter [@HeyerSascha](https://twitter.com/HeyerSascha) or connect with me via [LinkedIn](https://www.linkedin.com/in/saschaheyer/). <br>Even better, subscribe to my [YouTube](https://www.youtube.com/channel/UC--Sm3D-rqCUeLXmraypdPQ) channel ❤️.

## Authentication and Dependencies

In [None]:
from google.colab import auth
auth.authenticate_user()

In [None]:
! gcloud config set project sascha-playground-doit

In [None]:
!pip install google-cloud-aiplatform==1.20.0 "shapely<2" # temp workaround see https://github.com/googleapis/python-aiplatform/issues/1852

## Init SDK

In [None]:
from google.cloud import aiplatform 

In [None]:
aiplatform.init(project='sascha-playground-doit')

## Upload model

If you want to know how the container-image (prediction container) was created check out my previous article: </br>[Serving Machine Learning models with Google Vertex AI](https://medium.com/google-cloud/serving-machine-learning-models-with-google-vertex-ai-5d9644ededa3)

### gcloud

In [None]:
!gcloud ai models upload \
  --container-ports=80 \
  --container-predict-route="/predict" \
  --container-health-route="/health" \
  --region=us-central1 \
  --display-name=sentiment-batch-example \
  --container-image-uri=gcr.io/sascha-playground-doit/sentiment-fast-api

### SDK

In [None]:
 model = aiplatform.Model.upload(
        display_name="sentiment-batch-example",
        serving_container_image_uri="gcr.io/sascha-playground-doit/sentiment-fast-api",
        serving_container_predict_route="/predict",
        serving_container_health_route="/health",
        serving_container_ports=[80]
    )

## Reference the model

In [None]:
#you can use the following code to create a reference to a model if the model is already uploaded 
model = aiplatform.Model('projects/sascha-playground-doit/locations/us-central1/models/6575275247170224128')

## Start Batch Prediction / Cloud Storage

In [None]:
batch_prediction_job = model.batch_predict(
    instances_format='jsonl',
   job_display_name=f"batch_predict_sentiment",
    gcs_source=['gs://doit-vertex-demo/batch/input/batch-key-2.jsonl'],
    gcs_destination_prefix='gs://doit-vertex-demo/batch/output',
    machine_type="n1-standard-4",
    starting_replica_count=4
)

## Start Batch Prediction / BigQuery

In [None]:
batch_prediction_job = model.batch_predict(
    job_display_name=f"batch_predict_sentiment",
    machine_type="n1-standard-4",
    starting_replica_count=2,

    instances_format="bigquery",
    predictions_format="bigquery",
    bigquery_source='bq://sascha-playground-doit.batch.data',
    bigquery_destination_prefix="bq://sascha-playground-doit.batch",
)

## IMDB Dataset preprocessing (dataset used in this notebook)
Using the test dataset to run the batch prediction on. 

https://ai.stanford.edu/~amaas/data/sentiment/

In [None]:
!wget https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz

In [None]:
!tar -xvf "/content/aclImdb_v1.tar.gz"

In [None]:
!pip install jsonlines==3.1.0

In [None]:
import jsonlines
import os
import csv

sentences = []
sentences_raw = []

index = 0

path = '/content/aclImdb/test/neg'
for filename in os.listdir(path):
    if filename.endswith("txt"):
      
      with open(path + '/' + filename, "r") as file:
        text = file.read()
        sentence = {"text":text, "key": index, "test": "test"}
        sentences.append(sentence)
        sentences_raw.append(str(text))
        index = index + 1

path = '/content/aclImdb/test/pos'
for filename in os.listdir(path):
    if filename.endswith("txt"):

      with open(path + '/' + filename, "r") as file:
        text = file.read()
        sentence = {"text":text, "key": index, "test": "test"}
        sentences.append(sentence)
        sentences_raw.append(str(text))
        index = index + 1

with jsonlines.open('batch-key-2.jsonl', 'w') as writer:
      writer.write_all(sentences)



In [None]:
print(sentences_raw[1])