<font size=-1>Licensed under the Apache License, Version 2.0 (the \"License\");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at [https://www.apache.org/licenses/LICENSE-2.0](https://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and limitations under the License.</font>

In [7]:
import os
import time

import tensorflow_data_validation as tfdv

In [8]:
!python -c "import tensorflow_data_validation; print('TFDV version: {}'.format(tensorflow_data_validation.__version__))"

TFDV version: 0.21.5


# Deploying and triggering the Data Drift Monitor Flex template

This notebooks steps through deploying and triggering the Data Drift Monitor Flex template


## Deploying the template


### Configure environment settings

Update  the below constants  with the settings reflecting your  environment. 

- `TEMPLATE_LOCATION` - the GCS location for the template.


In [9]:
!gsutil ls

gs://artifacts.mlops-dev-env.appspot.com/
gs://dataflow-staging-us-central1-881178567352/
gs://hostedkfp-default-36un4wco1q/
gs://hostedkfp-default-w2dc42i8jo/
gs://jk-mlops-dev-sandbox/
gs://mlops-dev-env-staging/
gs://mlops-dev-env.appspot.com/
gs://mlops-dev-env_cloudbuild/
gs://mlops-dev-workspace/
gs://staging.mlops-dev-env.appspot.com/


In [16]:
TEMPLATE_NAME = 'drift-analyzer'
TEMPLATE_LOCATION = 'gs://mlops-dev-workspace/flex-templates'
METADATA_FILE = 'drift_analyzer_template/metadata.json'

TEMPLATE_PATH = '{}/{}.json'.format(TEMPLATE_LOCATION, TEMPLATE_NAME)
PROJECT_ID = !(gcloud config get-value core/project)
PROJECT_ID = PROJECT_ID[0]
TEMPLATE_IMAGE='gcr.io/{}/{}:latest'.format(PROJECT_ID, TEMPLATE_NAME)

### Build the template docker image


In [19]:
!gcloud builds submit --tag {TEMPLATE_IMAGE} drift_analyzer

Creating temporary tarball archive of 27 file(s) totalling 49.5 KiB before compression.
Uploading tarball of [drift_analyzer] to [gs://mlops-dev-env_cloudbuild/source/1589068895.3-5232a70c88764973b223a103f9cb103c.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/mlops-dev-env/builds/2aaaecd2-6d38-44c6-a214-1351867201b4].
Logs are available at [https://console.cloud.google.com/cloud-build/builds/2aaaecd2-6d38-44c6-a214-1351867201b4?project=881178567352].
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "2aaaecd2-6d38-44c6-a214-1351867201b4"

FETCHSOURCE
Fetching storage object: gs://mlops-dev-env_cloudbuild/source/1589068895.3-5232a70c88764973b223a103f9cb103c.tgz#1589068895656844
Copying gs://mlops-dev-env_cloudbuild/source/1589068895.3-5232a70c88764973b223a103f9cb103c.tgz#1589068895656844...
/ [1 files][ 10.7 KiB/ 10.7 KiB]                                                
Operation completed over 1 objects/10.7 KiB.               

### Deploy the template


In [20]:
!gcloud beta dataflow flex-template build {TEMPLATE_PATH} \
  --image {TEMPLATE_IMAGE} \
  --sdk-language "PYTHON" \
  --metadata-file {METADATA_FILE}

Successfully saved container spec in flex template file.
Template File GCS Location: gs://mlops-dev-workspace/flex-templates/drift-analyzer.json
Container Spec:

{
    "image": "gcr.io/mlops-dev-env/drift-analyzer:latest",
    "metadata": {
        "description": "Data drift detector Python flex template.",
        "name": "Data drift detector Python flex template",
        "parameters": [
            {
                "helpText": "A full name of the BQ request-response log table",
                "label": "Request response log table.",
                "name": "request_response_log_table",
                "regexes": [
                    "[-_.a-zA-Z0-9]+"
                ]
            },
            {
                "helpText": "A type of instances in request_response log_table",
                "label": "Instance type.",
                "name": "instance_type",
                "regexes": [
                    "LIST|OBJECT"
                ]
            },
            {
              

### Run template

In [23]:
JOB_NAME = "data-drift-{}".format(time.strftime("%Y%m%d-%H%M%S"))

PARAMETERS = {
    'request_response_log_table': 'mlops-dev-env.data_validation.covertype_classifier_logs_tf',
    'instance_type': 'OBJECT',
    'start_time': '2020-05-09T5:05:14',
    'end_time': '2020-05-09T18:05:14',
    'output_path': 'gs://mlops-dev-workspace/drift_monitor/output/tf',
    'schema_file': 'gs://mlops-dev-workspace/drift_monitor/schema/schema.pbtxt',
    'setup_file': './setup.py',

}

PARAMETERS = ','.join(['{}={}'.format(key,value) for key, value in PARAMETERS.items()])

In [24]:
!gcloud beta dataflow flex-template run {JOB_NAME} \
--template-file-gcs-location {TEMPLATE_PATH} \
--parameters {PARAMETERS}

job:
  createTime: '2020-05-10T00:59:52.214378Z'
  currentStateTime: '1970-01-01T00:00:00Z'
  id: 2020-05-09_17_59_51-11189043958712317435
  location: us-central1
  name: data-drift-20200510-005949
  projectId: mlops-dev-env
  startTime: '2020-05-10T00:59:52.214378Z'
