# Deploying the Pipeline
This notebook assumes one has created, built and commiting the artifacts required. Here we will deploy only a new pipeline

## Environment Setup
**NOTE:** Set Project ID to your project  

In [50]:
PROJECT_ID = 'mmlops3'
PREFIX = PROJECT_ID
REGION = 'us-central1'
JOB_DIR_ROOT='gs://{}-artifact-store/jobs'.format(PREFIX)
NAMESPACE='kubeflow'
ZONE='us-central1-a'
ARTIFACT_STORE_URI='gs://{}-artifact-store'.format(PREFIX)
GCS_STAGING_PATH='{}/staging'.format(ARTIFACT_STORE_URI)
GKE_CLUSTER_NAME='{}-cluster'.format(PREFIX)


!gcloud container clusters get-credentials $GKE_CLUSTER_NAME --zone $ZONE
HOST_TEMP=!(kubectl describe configmap inverse-proxy-config -n $NAMESPACE | grep "googleusercontent.com")
INVERSE_PROXY_HOSTNAME=HOST_TEMP[0]


Fetching cluster endpoint and auth data.
kubeconfig entry generated for mmlops3-cluster.


## Deploying the pipeline
Select a pipeline name, ensure it is not already in use at the allocated hostname (else a 500 error will be displayed). Deploy the pipeline. 

In [51]:
PIPELINE_NAME='covertype_classifier_training_v02'

!kfp --endpoint {INVERSE_PROXY_HOSTNAME} pipeline upload -p {PIPELINE_NAME} covertype_training_pipeline.yaml

  import cryptography.exceptions
Pipeline d7e7272e-6f80-4a82-8665-037443c79fb1 has been submitted

Pipeline Details
------------------
ID           d7e7272e-6f80-4a82-8665-037443c79fb1
Name         covertype_classifier_training_v02
Description
Uploaded at  2021-04-07T04:33:58+00:00
+-----------------------------+--------------------------------------------------+
| Parameter Name              | Default Value                                    |
| project_id                  |                                                  |
+-----------------------------+--------------------------------------------------+
| region                      |                                                  |
+-----------------------------+--------------------------------------------------+
| source_table_name           |                                                  |
+-----------------------------+--------------------------------------------------+
| gcs_root                    |                      

This command will return a list of pipelines depolyed at the given hostname. We see that `covertype_classifier_training` has been deployed. This list also allows us to copy the pipeline ID. 

In [52]:
!kfp --endpoint {INVERSE_PROXY_HOSTNAME} pipeline list

  import cryptography.exceptions
+--------------------------------------+-----------------------------------+---------------------------+
| Pipeline ID                          | Name                              | Uploaded at               |
| d7e7272e-6f80-4a82-8665-037443c79fb1 | covertype_classifier_training_v02 | 2021-04-07T04:33:58+00:00 |
+--------------------------------------+-----------------------------------+---------------------------+
| 8d2f1468-7cfa-49bf-bf1b-11728e7970e6 | covertype_classifier_training     | 2021-04-06T06:45:31+00:00 |
+--------------------------------------+-----------------------------------+---------------------------+


#### Viewing the pipeline
The deployed pipeline can be viewed through the Kubeflow Pipeline UI given at the URL below. 

In [53]:
print('https://{}'.format(INVERSE_PROXY_HOSTNAME))

https://3ea90122a145b3e7-dot-us-central2.pipelines.googleusercontent.com


## Run Experiment 
Now that the pipeline is deployed we want to run an experiment, this will cause the pipeline to run, pulling the data from bigquery and splitting it, training the models, evaluating them and deploy the best performing model. This experiment takes approximately an hour to execute and will result in a deployed model which can be interacted with through GCP's AI platform predicting service. 

**NOTE:** Change the PIPELINE_ID to reflect the ID copied from above.  

In [54]:
PIPELINE_ID='d7e7272e-6f80-4a82-8665-037443c79fb1'

EXPERIMENT_NAME='Covertype_Classifier_Training_v02'
RUN_ID='Run_001'
SOURCE_TABLE='covertype_dataset.covertype'
DATASET_ID='splits'
EVALUATION_METRIC='accuracy'
EVALUATION_METRIC_THRESHOLD='0.69'
MODEL_ID='covertype_classifier'
VERSION_ID='v02'
REPLACE_EXISTING_VERSION=True

In [55]:
!kfp --endpoint {INVERSE_PROXY_HOSTNAME} run submit \
-e {EXPERIMENT_NAME} \
-r {RUN_ID} \
-p {PIPELINE_ID} \
project_id={PROJECT_ID} \
gcs_root={GCS_STAGING_PATH} \
region={REGION} \
source_table_name={SOURCE_TABLE} \
dataset_id={DATASET_ID} \
evaluation_metric_name={EVALUATION_METRIC} \
evaluation_metric_threshold={EVALUATION_METRIC_THRESHOLD} \
model_id={MODEL_ID} \
version_id={VERSION_ID} \
replace_existing_version={REPLACE_EXISTING_VERSION}

  import cryptography.exceptions
Creating experiment Covertype_Classifier_Training_v02.
Run ff86ae15-6e36-498e-814e-ea9060e78e35 is submitted
+--------------------------------------+---------+----------+---------------------------+
| run id                               | name    | status   | created at                |
| ff86ae15-6e36-498e-814e-ea9060e78e35 | Run_001 |          | 2021-04-07T04:35:56+00:00 |
+--------------------------------------+---------+----------+---------------------------+


## Testing model
To test the model we can use the AI platforms prediction API to ask for a prediction based on a JSON input aternatively we can use the prediction UI and input: *{"instances":[[2395,0,0,60,6,1170,218,238,156,1054,"Cache","C2717"]]}* in the test case window.

We write a prediction JSON file with a set of data points, the correct cover types are 6 and 1 respectively.

In [None]:
%%writefile predict.json
[3366,122,15,789,127,2881,244,227,107,2437,"Commanche","C8772"]
[2791,340,15,30,10,3906,188,217,168,5401,"Rawah","C7745"]

In [None]:
INPUT_DATA_FILE="./predict.json"

!gcloud ai-platform predict --model {MODEL_ID} \
  --version {VERSION_ID} \
  --json-instances {INPUT_DATA_FILE}