<font size=-1>Licensed under the Apache License, Version 2.0 (the \"License\");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at [https://www.apache.org/licenses/LICENSE-2.0](https://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and limitations under the License.</font>

In [1]:
import os
import time

import tensorflow_data_validation as tfdv

In [2]:
!python -c "import tensorflow_data_validation; print('TFDV version: {}'.format(tensorflow_data_validation.__version__))"

TFDV version: 0.21.5


# Deploying and triggering the Data Drift Monitor Flex template

This notebooks steps through deploying and triggering the Data Drift Monitor Flex template


## Deploying the template


### Configure environment settings

Update  the below constants  with the settings reflecting your  environment. 

- `TEMPLATE_LOCATION` - the GCS location for the template.


In [9]:
!gsutil ls

gs://artifacts.mlops-dev-env.appspot.com/
gs://dataflow-staging-us-central1-881178567352/
gs://hostedkfp-default-36un4wco1q/
gs://hostedkfp-default-w2dc42i8jo/
gs://jk-mlops-dev-sandbox/
gs://mlops-dev-env-staging/
gs://mlops-dev-env.appspot.com/
gs://mlops-dev-env_cloudbuild/
gs://mlops-dev-workspace/
gs://staging.mlops-dev-env.appspot.com/


In [6]:
TEMPLATE_NAME = 'drift-detector'
TEMPLATE_LOCATION = 'gs://mlops-dev-workspace/flex-templates'
METADATA_FILE = 'drift_detector/metadata.json'

TEMPLATE_PATH = '{}/{}.json'.format(TEMPLATE_LOCATION, TEMPLATE_NAME)
PROJECT_ID = !(gcloud config get-value core/project)
PROJECT_ID = PROJECT_ID[0]
TEMPLATE_IMAGE='gcr.io/{}/{}:latest'.format(PROJECT_ID, TEMPLATE_NAME)

### Build the template docker image


In [4]:
!gcloud builds submit --tag {TEMPLATE_IMAGE} drift_detector

tion==0.21.5) (1.4.1)
Collecting tensorflow-serving-api<3,>=1.15
  Downloading tensorflow_serving_api-2.1.0-py2.py3-none-any.whl (38 kB)
Collecting google-api-python-client<2,>=1.7.11
  Downloading google_api_python_client-1.8.3-py3-none-any.whl (58 kB)
Collecting typing<3.8.0,>=3.7.0; python_version < "3.5.3"
  Downloading typing-3.7.4.1-py3-none-any.whl (25 kB)
Collecting decorator
  Downloading decorator-4.4.2-py2.py3-none-any.whl (9.2 kB)
Collecting jedi>=0.10
  Downloading jedi-0.17.0-py2.py3-none-any.whl (1.1 MB)
Collecting traitlets>=4.2
  Downloading traitlets-4.3.3-py2.py3-none-any.whl (75 kB)
Collecting backcall
  Downloading backcall-0.1.0.tar.gz (9.7 kB)
Collecting pygments
  Downloading Pygments-2.6.1-py3-none-any.whl (914 kB)
Collecting pexpect; sys_platform != "win32"
  Downloading pexpect-4.8.0-py2.py3-none-any.whl (59 kB)
Collecting pickleshare
  Downloading pickleshare-0.7.5-py2.py3-none-any.whl (6.9 kB)
Collecting prompt-toolkit<2.1.0,>=2.0.0
  Downloading prompt_too

### Deploy the template


In [12]:
!gcloud beta dataflow flex-template build {TEMPLATE_PATH} \
  --image {TEMPLATE_IMAGE} \
  --sdk-language "PYTHON" \
  --metadata-file {METADATA_FILE}

Successfully saved container spec in flex template file.
Template File GCS Location: gs://mlops-dev-workspace/flex-templates/drift-detector.json
Container Spec:

{
    "image": "gcr.io/mlops-dev-env/drift-detector:latest",
    "metadata": {
        "description": "Data drift detector Python flex template.",
        "name": "Data drift detector Python flex template",
        "parameters": [
            {
                "helpText": "A full name of the BQ request-response log table",
                "label": "Request response log table.",
                "name": "request_response_log_table",
                "regexes": [
                    "[-_.a-zA-Z0-9]+"
                ]
            },
            {
                "helpText": "A type of instances in request_response log_table",
                "label": "Instance type.",
                "name": "instance_type",
                "regexes": [
                    "LIST|OBJECT"
                ]
            },
            {
              

### Run template

In [18]:
TEMPLATE_PATH

'gs://mlops-dev-workspace/flex-templates/drift-detector.json'

In [16]:
JOB_NAME = "data-drift-{}".format(time.strftime("%Y%m%d-%H%M%S"))

PARAMETERS = {
    'request_response_log_table': 'mlops-dev-env.data_validation.covertype_classifier_logs_tf',
    'instance_type': 'OBJECT',
    'start_time': '2020-05-15T00:15:00',
    'end_time': '2020-05-15T02:51:00',
    'output_path': 'gs://mlops-dev-workspace/drift_monitor/output/tf',
    'schema_file': 'gs://mlops-dev-workspace/drift_monitor/schema/schema.pbtxt',
    'setup_file': './setup.py',

}

PARAMETERS = ','.join(['{}={}'.format(key,value) for key, value in PARAMETERS.items()])

In [17]:
!gcloud beta dataflow flex-template run {JOB_NAME} \
--template-file-gcs-location {TEMPLATE_PATH} \
--parameters {PARAMETERS}

job:
  createTime: '2020-05-15T03:05:40.928537Z'
  currentStateTime: '1970-01-01T00:00:00Z'
  id: 2020-05-14_20_05_40-9719059282455536528
  location: us-central1
  name: data-drift-20200515-030536
  projectId: mlops-dev-env
  startTime: '2020-05-15T03:05:40.928537Z'
