## Evaluation

Batch prediction and evaluation are very similar. They are based on DataFlow pipeline and CloudML provides Evaluate and Prediction DataFlow transform. Datalab can generate DataFlow pipeline code template for you, just like Preprocessing.

Run "%mlalpha evaluate" to generate input cell.

In [None]:
%mlalpha evaluate

After fill in the required fields, you will have:
```
%%ml evaluate
preprocessed_eval_data_path: /content/datalab/tmp/ml/iris/preprocessed/features_eval.tfrecord.Z
metadata_path: /content/datalab/tmp/ml/iris/preprocessed/metadata.yaml
model_dir: /content/datalab/tmp/ml/iris/model/model
output_dir: /content/datalab/tmp/ml/iris/evaluate
output_prediction_name: predictions
output_score_name: scores
```

Run the generated code. Optionally uncomment the code after the comment "View Confusion Matrix with the following code" to also create confusion matrix graph. Note that the confusionmatrix code is only generated if Datalab detects it is a classficiation.

In [4]:

# header
"""
Following code is generated from command line:
%%mlalpha evaluate
preprocessed_eval_data_path: /content/datalab/tmp/ml/iris/preprocessed/features_eval.tfrecord.gz
metadata_path: /content/datalab/tmp/ml/iris/preprocessed/metadata.yaml
model_dir: /content/datalab/tmp/ml/iris/model/model
output_dir: /content/datalab/tmp/ml/iris/evaluate
output_prediction_name: predicted_index
output_score_name: scores

Please modify as appropriate!!!
"""

# imports
import apache_beam as beam
import google.cloud.ml as ml
import google.cloud.ml.analysis as analysis
import google.cloud.ml.io as io
import json
import os

# defines
def extract_values((example, prediction)):
  import tensorflow as tf
  tf_example = tf.train.Example()
  tf_example.ParseFromString(example.values()[0])
  feature_map = tf_example.features.feature
  values = {'target': feature_map['species'].int64_list.value[0]}
  values.update(prediction)
  return values

OUTPUT_DIR = '/content/datalab/tmp/ml/iris/evaluate'
pipeline = beam.Pipeline('DirectPipelineRunner')


# evaluation

eval_features = (pipeline | 'ReadEval' >> io.LoadFeatures('/content/datalab/tmp/ml/iris/preprocessed/features_eval.tfrecord.gz'))
trained_model = pipeline | 'LoadModel' >> io.LoadModel('/content/datalab/tmp/ml/iris/model/model')
evaluations = (eval_features | 'Evaluate' >> ml.Evaluate(trained_model) |
    beam.Map('ExtractEvaluationResults', extract_values))
eval_data_sink = beam.io.TextFileSink(os.path.join(OUTPUT_DIR, 'eval'), shard_name_template='')
evaluations | beam.io.textio.WriteToText(os.path.join(OUTPUT_DIR, 'eval'), shard_name_template='')

# analysis
def make_data_for_analysis(values):
  return {
      'target': values['target'],
      'predicted': values['predicted_index'],
      'score': values['scores'][values['predicted_index']],
  }

metadata = pipeline | io.LoadMetadata('/content/datalab/tmp/ml/iris/preprocessed/metadata.yaml')
analysis_source = evaluations | beam.Map('CreateAnalysisSource', make_data_for_analysis)
confusion_matrix, precision_recall, logloss = (analysis_source |
    'Analyze Model' >> analysis.AnalyzeModel(metadata))
confusion_matrix_file = os.path.join(OUTPUT_DIR, 'analyze_cm.json')
confusion_matrix_sink = beam.io.TextFileSink(confusion_matrix_file, shard_name_template='')
confusion_matrix | beam.io.Write('WriteConfusionMatrix', confusion_matrix_sink)

# run pipeline
pipeline.run()

# View Confusion Matrix with the following code:
#
import datalab.mlalpha
import yaml
with ml.util._file.open_local_or_gcs(confusion_matrix_file, 'r') as f:
  data = [yaml.load(line) for line in f.read().rstrip().split('\n')]
datalab.mlalpha.ConfusionMatrix([d['predicted'] for d in data],
                           [d['target'] for d in data],
                           [d['count'] for d in data]).plot()




Also check the eval output file:

In [5]:
!head /content/datalab/tmp/ml/iris/evaluate/eval

{u'predicted_index': 1, u'predicted_label': 'versicolor', 'target': 2L, u'key': '107', u'scores': [0.00020953394414391369, 0.9861494302749634, 0.013641098514199257]}
{u'predicted_index': 1, u'predicted_label': 'versicolor', 'target': 1L, u'key': '100', u'scores': [3.5766664950642735e-05, 0.9999605417251587, 3.7191889532550704e-06]}
{u'predicted_index': 1, u'predicted_label': 'versicolor', 'target': 1L, u'key': '99', u'scores': [0.0016989074647426605, 0.998301088809967, 5.1294335889906506e-08]}
{u'predicted_index': 0, u'predicted_label': 'setosa', 'target': 0L, u'key': '13', u'scores': [0.9999939203262329, 6.045279405952897e-06, 7.037634122755438e-16]}
{u'predicted_index': 1, u'predicted_label': 'versicolor', 'target': 1L, u'key': '70', u'scores': [1.6221962141571566e-05, 0.999983549118042, 2.96937997745772e-07]}
{u'predicted_index': 0, u'predicted_label': 'setosa', 'target': 0L, u'key': '11', u'scores': [0.9999990463256836, 8.985851991383242e-07, 7.794925517434922e-16]}
{u'predic

To generate a pipeline that runs in cloud, simply run "%mlalpha evaluate --cloud". Also all paths need to be GCS paths. Let's define the variables first.

In [11]:
import os

bucket = 'gs://' + datalab_project_id() + '-sampledata'
eval_data_path = os.path.join(bucket, 'iris', 'preprocessed', 'features_eval.tfrecord.gz')
metadata_path = os.path.join(bucket, 'iris', 'preprocessed', 'metadata.yaml')
model_path = os.path.join(bucket, 'iris', 'trained', 'model')
output_dir = os.path.join(bucket, 'iris', 'evaluate')
eval_file = os.path.join(output_dir, 'eval*')

Then copy the following to generate the Cloud DataFlow pipeline. Note that we don't provide "output_prediction_name" this time, so the generated pipeline code does not include eval analysis.
```
%%mlalpha evaluate --cloud
preprocessed_eval_data_path: $eval_data_path
metadata_path: $metadata_path
model_dir: $model_path
output_dir: $output_dir
```

In [19]:

# header
"""
Following code is generated from command line:
%%mlalpha evaluate --cloud
preprocessed_eval_data_path: $eval_data_path
metadata_path: $metadata_path
model_dir: $model_path
output_dir: $output_dir

Please modify as appropriate!!!
"""

# imports
import apache_beam as beam
import google.cloud.ml as ml
import google.cloud.ml.analysis as analysis
import google.cloud.ml.io as io
import json
import os

# defines
def extract_values((example, prediction)):
  import tensorflow as tf
  tf_example = tf.train.Example()
  tf_example.ParseFromString(example.values()[0])
  feature_map = tf_example.features.feature
  values = {'target': feature_map['species'].int64_list.value[0]}
  values.update(prediction)
  return values

OUTPUT_DIR = 'gs://cloud-ml-test-automated-sampledata/iris/evaluate'
import datetime
options = {
    'staging_location': os.path.join(OUTPUT_DIR, 'tmp', 'staging'),
    'temp_location': os.path.join(OUTPUT_DIR, 'tmp'),
    'job_name': 'evaluate' + '-' + datetime.datetime.now().strftime('%y%m%d-%H%M%S'),
    'project': 'cloud-ml-test-automated',
    'extra_packages': ['gs://cloud-ml/sdk/cloudml-0.1.6-alpha.tar.gz'],
    'teardown_policy': 'TEARDOWN_ALWAYS',
    'no_save_main_session': True
}
opts = beam.pipeline.PipelineOptions(flags=[], **options)
pipeline = beam.Pipeline('DataflowPipelineRunner', options=opts)


# evaluation

eval_features = (pipeline | 'ReadEval' >> io.LoadFeatures('gs://cloud-ml-test-automated-sampledata/iris/preprocessed/features_eval.tfrecord.gz'))
trained_model = pipeline | 'LoadModel' >> io.LoadModel('gs://cloud-ml-test-automated-sampledata/iris/trained/model')
evaluations = (eval_features | 'Evaluate' >> ml.Evaluate(trained_model) |
    beam.Map('ExtractEvaluationResults', extract_values))
eval_data_sink = beam.io.TextFileSink(os.path.join(OUTPUT_DIR, 'eval'), shard_name_template='')
evaluations | beam.io.textio.WriteToText(os.path.join(OUTPUT_DIR, 'eval'), shard_name_template='')

# analysis

# run pipeline
pipeline.run()




<DataflowPipelineResult <Job
 id: u'2016-09-28_23_36_18-16258675668926611840'
 projectId: u'cloud-ml-test-automated'
 steps: []
 tempFiles: []
 type: TypeValueValuesEnum(JOB_TYPE_BATCH, 1)> at 0x7ff47804bb50>

After you run the above generated code, you can go to Developer Console to see the DataFlow job: https://pantheon.corp.google.com/dataflow (and select the right project). Also you can check the results as below:

In [16]:
!gsutil cat $eval_file | head -10

{u'score': [0.9643456935882568, 0.03509025275707245, 0.000563996727578342], 'target': 0L, u'key': ['4'], u'predictions': 0}
{u'score': [0.985572099685669, 0.013915198855102062, 0.0005126940086483955], 'target': 0L, u'key': ['20'], u'predictions': 0}
{u'score': [0.9754565358161926, 0.024147897958755493, 0.00039562786696478724], 'target': 0L, u'key': ['43'], u'predictions': 0}
{u'score': [0.019101984798908234, 0.7742735147476196, 0.20662443339824677], 'target': 1L, u'key': ['88'], u'predictions': 1}
{u'score': [0.05851663649082184, 0.6017542481422424, 0.33972907066345215], 'target': 1L, u'key': ['76'], u'predictions': 1}
{u'score': [0.038775887340307236, 0.9083987474441528, 0.05282538756728172], 'target': 1L, u'key': ['63'], u'predictions': 1}
{u'score': [0.9870647192001343, 0.012523564510047436, 0.00041171154589392245], 'target': 0L, u'key': ['47'], u'predictions': 0}
{u'score': [0.0019517333712428808, 0.08908909559249878, 0.9089592099189758], 'target': 2L, u'key': ['146'], u'prediction