# Goal of this notebook
* Run end-end training on sklearn 
* Save model and register on Vertex AI
* Make batch predictions with registered model
* Make local predictions with registered model

**For Documentation Needs**

* High level documentation can be found [here](https://cloud.google.com/vertex-ai/docs/predictions/deploy-model-api)
* See SDK documentation for end-to-end docs [here](https://cloud.google.com/python/docs/reference/aiplatform/latest/aiplatform)
* Also guides on AutoML [here](https://cloud.google.com/automl/docs/reference/rest/v1/projects.locations.models/batchPredict#BatchPredictInputConfig)


In [196]:
PROJECT_ID = 'YOUR-PROJECT-ID' #SET THIS TO YOUR PROJECT ID
BUCKET = "gs://YOUR-BUCKET" #BE SURE TO gsutil mb -l <REGION> <LOG_BUCKET> to create the bucket on GCP

### Register the model to Vertex
  1. Export model artifact - [Guide on this part here](https://cloud.google.com/vertex-ai/docs/training/exporting-model-artifacts#scikit-learn)
  2. Deploy and upload the model using the artifact - [Guide](https://cloud.google.com/vertex-ai/docs/model-registry/import-model)
  3. Use the pre-built container guide to find the right container (if using custom containers) - [Guide](https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers)

In [197]:
import os
import pickle

from google.cloud import storage
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier

digits = datasets.load_digits()
classifier = RandomForestClassifier()
classifier.fit(digits.data, digits.target)

artifact_filename = 'digits.pkl'

# Save model artifact to local filesystem (doesn't persist)
local_path = artifact_filename
with open(local_path, 'wb') as model_file:
    pickle.dump(classifier, model_file)

In [198]:
from google.cloud import aiplatform

model = aiplatform.Model.upload_scikit_learn_model_file(
        display_name='digits test',
        model_file_path=local_path,
        description='test for deploying models to vertex',
        sync=False, #this will not bind up your notebook instance with the creation operation
    ) #note this will automatcially designate the latest sklearn serving container

In [199]:
model

<google.cloud.aiplatform.models.Model object at 0x7feba3217390> is waiting for upstream dependencies to complete.

Creating Model
Create Model backing LRO: projects/633325234048/locations/us-central1/models/674647140663820288/operations/4328195155866681344


### Now the model is loaded and ready for batch predictions
Guide on Python SDK [Here](https://cloud.google.com/vertex-ai/docs/predictions/batch-predictions)

When trying to understand what formats work for batch predictions - reference [this](https://cloud.google.com/vertex-ai/docs/predictions/batch-predictions) guide
For arrays w/ Sklearn this is best practice
[One more guide on autoML here for input format](https://cloud.google.com/automl/docs/reference/rest/v1/projects.locations.models/batchPredict#BatchPredictInputConfig)

In [200]:
import tensorflow as tf
import json

### FORMATTING TO GET ONE LIST OF FEATURES PER LINE PER DOCS
gcs_input_uri = BUCKET + "/" + "data/digits_test.jsonl"
with open("digits_test.jsonl", "w") as f:
    for enum, i in enumerate(digits.data):
        i = list(i)
        if enum == digits.data.shape[0] - 1: 
            f.write(json.dumps(i))
        else:
            f.write(json.dumps(i) + "\n")

# upload
data_directory = BUCKET + "/data"
storage_path = os.path.join(data_directory, 'digits_test.jsonl')
blob = storage.blob.Blob.from_string(storage_path, client=storage.Client())
blob.upload_from_filename("digits_test.jsonl")

Model created. Resource name: projects/633325234048/locations/us-central1/models/674647140663820288
To use this Model in another session:
model = aiplatform.Model('projects/633325234048/locations/us-central1/models/674647140663820288')


In [201]:
batch_prediction_job = model.batch_predict(
        job_display_name='test batch predict job sklearn',
        gcs_source=gcs_input_uri,
        gcs_destination_prefix=BUCKET+"/predictions",
        machine_type='n1-standard-2',
        # accelerator_count=accelerator_count,
        # accelerator_type=accelerator_type, #if you want gpus
        starting_replica_count=1,
        max_replica_count=2,
        sync=False,
    )

Creating BatchPredictionJob
BatchPredictionJob created. Resource name: projects/633325234048/locations/us-central1/batchPredictionJobs/6640973815708909568
To use this BatchPredictionJob in another session:
bpj = aiplatform.BatchPredictionJob('projects/633325234048/locations/us-central1/batchPredictionJobs/6640973815708909568')
View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/6640973815708909568?project=633325234048
BatchPredictionJob projects/633325234048/locations/us-central1/batchPredictionJobs/6640973815708909568 current state:
JobState.JOB_STATE_PENDING
BatchPredictionJob projects/633325234048/locations/us-central1/batchPredictionJobs/6640973815708909568 current state:
JobState.JOB_STATE_PENDING
BatchPredictionJob projects/633325234048/locations/us-central1/batchPredictionJobs/6640973815708909568 current state:
JobState.JOB_STATE_PENDING
BatchPredictionJob projects/633325234048/locations/us-central1/batchPredictionJobs/

### Make local predictions with registered model

Copy/paste from above `model.resource_name`

In [186]:
#reload a model and get the artifact locally

model = aiplatform.Model('projects/633325234048/locations/us-central1/models/3962274868644282368')
artifact_uri = model.to_dict()['artifactUri']

print(f"The saved model's artifact path is: {artifact_uri}")

The saved model's artifact path is: gs://wortz-project-vertex-staging-us-central1/vertex_ai_auto_staging/2022-04-22-16:28:47.148


In [193]:
#load the artifact locally 
!gsutil cp $artifact_uri'/model.pkl' .
artifact_file = open('model.pkl', 'rb')

local_artifact = pickle.load(artifact_file)

E0422 16:35:59.328831373    9399 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


Copying gs://wortz-project-vertex-staging-us-central1/vertex_ai_auto_staging/2022-04-22-16:28:47.148/model.pkl...
/ [1 files][  5.4 MiB/  5.4 MiB]                                                
Operation completed over 1 objects/5.4 MiB.                                      


## Make local predictions

In [195]:
local_artifact.predict(digits.data)

array([0, 1, 2, ..., 8, 9, 8])

In [211]:
endpoint = model.deploy(
        deployed_model_display_name="deployed digits test",
        traffic_percentage=100,
        # traffic_split=traffic_split, #for ab tests
        machine_type='n1-standard-2',
        min_replica_count=1,
        max_replica_count=2,
        # accelerator_type=accelerator_type,
        # accelerator_count=accelerator_count,
        sync=False,
    )

Creating Endpoint
Create Endpoint backing LRO: projects/633325234048/locations/us-central1/endpoints/8512621332381302784/operations/6145397605510676480


In [224]:
# taken from above

preds = endpoint.predict(digits.data.tolist()) #tolist is needed - arrays are not taken
print(f"Online prediction results: {preds[:1][0]}")

Online prediction results: [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 0.0, 9.0, 5.0, 5.0, 6.0, 5.0, 0.0, 9.0, 8.0, 9.0, 8.0, 4.0, 1.0, 7.0, 7.0, 3.0, 5.0, 1.0, 0.0, 0.0, 2.0, 2.0, 7.0, 8.0, 2.0, 0.0, 1.0, 2.0, 6.0, 3.0, 3.0, 7.0, 3.0, 3.0, 4.0, 6.0, 6.0, 6.0, 4.0, 9.0, 1.0, 5.0, 0.0, 9.0, 5.0, 2.0, 8.0, 2.0, 0.0, 0.0, 1.0, 7.0, 6.0, 3.0, 2.0, 1.0, 7.0, 4.0, 6.0, 3.0, 1.0, 3.0, 9.0, 1.0, 7.0, 6.0, 8.0, 4.0, 3.0, 1.0, 4.0, 0.0, 5.0, 3.0, 6.0, 9.0, 6.0, 1.0, 7.0, 5.0, 4.0, 4.0, 7.0, 2.0, 8.0, 2.0, 2.0, 5.0, 7.0, 9.0, 5.0, 4.0, 8.0, 8.0, 4.0, 9.0, 0.0, 8.0, 9.0, 8.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 0.0, 9.0, 5.0, 5.0, 6.0, 5.0, 0.0, 9.0, 8.0, 9.0, 8.0, 4.0, 1.0, 7.0, 7.0, 3.0, 5.0, 1.0, 0.0, 0.0, 2.0, 2.0, 7.0, 8.0, 2.0, 0.0, 1.0, 2.0, 6.0, 3.0, 3.0, 7.0, 3.0, 3.0, 4.