## Evaluation

Batch prediction and evaluation are very similar. They are based on DataFlow pipeline and CloudML provides Evaluate and Prediction DataFlow transform. Datalab can generate DataFlow pipeline code template for you, just like Preprocessing.

Run "%ml evaluate" to generate input cell.

In [None]:
%ml evaluate

After fill in the required fields, run it to generate code.

In [None]:
%%ml evaluate
preprocessed_eval_data_path: /content/datalab/tmp/ml/iris/preprocessed/features_eval
metadata_path: /content/datalab/tmp/ml/iris/preprocessed/metadata.yaml
model_dir: /content/datalab/tmp/ml/iris/model/model
output_dir: /content/datalab/tmp/ml/iris/evaluate
output_prediction_name: predictions


Run the generated code. Optionally uncomment the code for creating confusion matrix plot. Note that the confusionmatrix code is only generated in classification case. 

In [9]:

# header
"""
Following code is generated from command line:
%%ml evaluate
preprocessed_eval_data_path: /content/datalab/tmp/ml/iris/preprocessed/features_eval
metadata_path: /content/datalab/tmp/ml/iris/preprocessed/metadata.yaml
model_dir: /content/datalab/tmp/ml/iris/model/model
output_dir: /content/datalab/tmp/ml/iris/evaluate
output_prediction_name: predictions

Please modify as appropriate!!!
"""

# imports
import apache_beam as beam
from apache_beam.io import fileio
import google.cloud.ml as ml
import google.cloud.ml.analysis as analysis
import google.cloud.ml.dataflow.io.tfrecordio as tfrecordio
import google.cloud.ml.io as io
import json
import os

# defines
def extract_values((example, prediction)):
  import tensorflow as tf
  tf_example = tf.train.Example()
  tf_example.ParseFromString(example.values()[0])
  feature_map = tf_example.features.feature
  values = {'target': feature_map['species'].int64_list.value[0]}
  values.update(prediction)
  return values

OUTPUT_DIR = '/content/datalab/tmp/ml/iris/evaluate'
pipeline = beam.Pipeline('DirectPipelineRunner')

# evaluation
eval_parameters = tfrecordio.TFRecordParameters(
    file_path_prefix='/content/datalab/tmp/ml/iris/preprocessed/features_eval',
    file_name_suffix='',
    shard_file=False,
    compress_file=True)
eval_features = pipeline | io.LoadFeatures('LoadEvalFeatures', eval_parameters)
trained_model = pipeline | io.LoadModel('LoadModel', '/content/datalab/tmp/ml/iris/model/model')
evaluations = (eval_features | ml.Evaluate(trained_model, label='Evaluate')
    | beam.Map('ExtractEvaluationResults', extract_values))
eval_data_sink = beam.io.TextFileSink(os.path.join(OUTPUT_DIR, 'eval'), shard_name_template='')
evaluations | beam.Write('WriteEval', eval_data_sink)

# analysis
def make_data_for_analysis(values):
  return {
      'target': values['target'],
      'predicted': values['predictions'],
      'score': 0.0,
  }

metadata = pipeline | io.LoadMetadata('/content/datalab/tmp/ml/iris/preprocessed/metadata.yaml')
analysis_source = evaluations | beam.Map('CreateAnalysisSource', make_data_for_analysis)
confusion_matrix, precision_recall, logloss = (analysis_source |
    analysis.AnalyzeModel('Analyze Model', metadata))
confusion_matrix_file = os.path.join(OUTPUT_DIR, 'analyze_cm.json')
confusion_matrix_sink = beam.io.TextFileSink(confusion_matrix_file, shard_name_template='')
confusion_matrix | beam.io.Write('WriteConfusionMatrix', confusion_matrix_sink)

# run pipeline
pipeline.run()

# View Confusion Matrix with the following code:
#
import datalab.ml
import yaml
with ml.util._file.open_local_or_gcs(confusion_matrix_file, 'r') as f:
  data = [yaml.load(line) for line in f.read().rstrip().split('\n')]
datalab.ml.ConfusionMatrix([d['predicted'] for d in data],
                           [d['target'] for d in data],
                           [d['count'] for d in data]).plot()


Also check the eval output file:

In [10]:
!head /content/datalab/tmp/ml/iris/evaluate/eval

{u'score': [0.20627270638942719, 0.643595278263092, 0.15013201534748077], 'target': 2L, u'key': ['107'], u'predictions': 1}
{u'score': [0.20522856712341309, 0.6443212628364563, 0.1504502147436142], 'target': 1L, u'key': ['100'], u'predictions': 1}
{u'score': [0.40772390365600586, 0.5516511797904968, 0.04062490537762642], 'target': 1L, u'key': ['99'], u'predictions': 1}
{u'score': [0.8944299221038818, 0.1028687059879303, 0.002701378194615245], 'target': 0L, u'key': ['13'], u'predictions': 0}
{u'score': [0.21631915867328644, 0.6909329295158386, 0.09274788945913315], 'target': 1L, u'key': ['70'], u'predictions': 1}
{u'score': [0.8793689012527466, 0.11243349313735962, 0.008197642862796783], 'target': 0L, u'key': ['11'], u'predictions': 0}
{u'score': [0.8749828934669495, 0.11722233891487122, 0.007794762961566448], 'target': 0L, u'key': ['37'], u'predictions': 0}
{u'score': [0.06351552158594131, 0.6299393177032471, 0.30654507875442505], 'target': 1L, u'key': ['69'], u'predictions': 1}

To generate a pipeline that runs in cloud, simply run "%ml evaluate --cloud". Also all paths need to be GCS paths. Let's define the variables first.

In [11]:
import os

bucket = 'gs://' + datalab_project_id() + '-sampledata'
eval_data_path = os.path.join(bucket, 'iris', 'preprocessed', 'features_eval')
metadata_path = os.path.join(bucket, 'iris', 'preprocessed', 'metadata.yaml')
model_path = os.path.join(bucket, 'iris', 'trained', 'model')
output_dir = os.path.join(bucket, 'iris', 'evaluate')
eval_file = os.path.join(output_dir, 'eval*')

Then run it to generate the code:

In [None]:
%%ml evaluate --cloud
preprocessed_eval_data_path: $eval_data_path
metadata_path: $metadata_path
model_dir: $model_path
output_dir: $output_dir

The generated code is like the following:

```
# header
"""
Following code is generated from command line:
%%ml evaluate --cloud
preprocessed_eval_data_path: $eval_data_path
metadata_path: $metadata_path
model_dir: $model_path
output_dir: $output_dir

Please modify as appropriate!!!
"""

# imports
import apache_beam as beam
from apache_beam.io import fileio
import google.cloud.ml as ml
import google.cloud.ml.analysis as analysis
import google.cloud.ml.dataflow.io.tfrecordio as tfrecordio
import google.cloud.ml.io as io
import json
import os

# defines
def extract_values((example, prediction)):
  import tensorflow as tf
  tf_example = tf.train.Example()
  tf_example.ParseFromString(example.values()[0])
  feature_map = tf_example.features.feature
  values = {'target': feature_map['species'].int64_list.value[0]}
  values.update(prediction)
  return values

OUTPUT_DIR = 'gs://cloud-ml-test-automated-sampledata/iris/evaluate'
import datetime
options = {
    'staging_location': os.path.join(OUTPUT_DIR, 'tmp', 'staging'),
    'temp_location': os.path.join(OUTPUT_DIR, 'tmp'),
    'job_name': 'evaluate' + '-' + datetime.datetime.now().strftime('%y%m%d-%H%M%S'),
    'project': 'cloud-ml-test-automated',
    'extra_packages': ['gs://cloud-ml/sdk/cloudml-0.1.2.latest.tar.gz'],
    'teardown_policy': 'TEARDOWN_ALWAYS',
    'no_save_main_session': True
}
opts = beam.pipeline.PipelineOptions(flags=[], **options)
pipeline = beam.Pipeline('DataflowPipelineRunner', options=opts)


# evaluation
eval_parameters = tfrecordio.TFRecordParameters(
    file_path_prefix='gs://cloud-ml-test-automated-sampledata/iris/preprocessed/features_eval',
    file_name_suffix='',
    shard_file=False,
    compress_file=True)
eval_features = pipeline | io.LoadFeatures('LoadEvalFeatures', eval_parameters)
trained_model = pipeline | io.LoadModel('LoadModel', 'gs://cloud-ml-test-automated-sampledata/iris/trained/model')
evaluations = (eval_features | ml.Evaluate(trained_model, label='Evaluate')
    | beam.Map('ExtractEvaluationResults', extract_values))
eval_data_sink = beam.io.TextFileSink(os.path.join(OUTPUT_DIR, 'eval'))
evaluations | beam.Write('WriteEval', eval_data_sink)

# analysis

# run pipeline
pipeline.run()
```

After you run the above generated code, you can go to Developer Console to see the DataFlow job: https://pantheon.corp.google.com/dataflow (and select the right project). Also you can check the results as below:

In [16]:
!gsutil cat $eval_file | head -10

{u'score': [0.9643456935882568, 0.03509025275707245, 0.000563996727578342], 'target': 0L, u'key': ['4'], u'predictions': 0}
{u'score': [0.985572099685669, 0.013915198855102062, 0.0005126940086483955], 'target': 0L, u'key': ['20'], u'predictions': 0}
{u'score': [0.9754565358161926, 0.024147897958755493, 0.00039562786696478724], 'target': 0L, u'key': ['43'], u'predictions': 0}
{u'score': [0.019101984798908234, 0.7742735147476196, 0.20662443339824677], 'target': 1L, u'key': ['88'], u'predictions': 1}
{u'score': [0.05851663649082184, 0.6017542481422424, 0.33972907066345215], 'target': 1L, u'key': ['76'], u'predictions': 1}
{u'score': [0.038775887340307236, 0.9083987474441528, 0.05282538756728172], 'target': 1L, u'key': ['63'], u'predictions': 1}
{u'score': [0.9870647192001343, 0.012523564510047436, 0.00041171154589392245], 'target': 0L, u'key': ['47'], u'predictions': 0}
{u'score': [0.0019517333712428808, 0.08908909559249878, 0.9089592099189758], 'target': 2L, u'key': ['146'], u'prediction