## Evaluation

Batch prediction and evaluation are very similar. They are based on DataFlow pipeline and CloudML provides Evaluate and Prediction DataFlow transform. Datalab can generate DataFlow pipeline code template for you, just like Preprocessing.

Run "%ml evaluate" to generate input cell. After fill in the required fields, run it to generate code.

In [None]:
%%ml evaluate
preprocessed_eval_data_path: /content/datalab/ml/census/preprocessed/features_eval-00000-of-00001
metadata_path: /content/datalab/ml/census/preprocessed/metadata.yaml
model_dir: /content/datalab/ml/census/model/model
output_dir: /content/datalab/ml/census/evaluate

In [None]:

# header
"""
Following code is generated from command line:
%%ml evaluate
preprocessed_eval_data_path: /content/datalab/ml/census/preprocessed/features_eval-00000-of-00001
metadata_path: /content/datalab/ml/census/preprocessed/metadata.yaml
model_dir: /content/datalab/ml/census/model/model
output_dir: /content/datalab/ml/census/evaluate

Please modify as appropriate!!!
"""

# imports
import apache_beam as beam
import google.cloud.ml as ml
import google.cloud.ml.analysis as analysis
import google.cloud.ml.io as io
import json
import os

# defines
def extract_values((example, prediction)):
  feature = json.loads(example.values()[0])['features']['feature']
  values = {'target': feature['target']['floatList']['value'][0]}
  values.update(prediction)
  return values

OUTPUT_DIR = '/content/datalab/ml/census/evaluate'
pipeline = beam.Pipeline('DirectPipelineRunner')


# evaluation
eval_data_source = beam.io.TextFileSource('/content/datalab/ml/census/preprocessed/features_eval-00000-of-00001', strip_trailing_newlines=True)
eval_features = pipeline | beam.Read('ReadEvalData', eval_data_source)
trained_model = pipeline | io.LoadModel('/content/datalab/ml/census/model/model')
evaluations = (eval_features | ml.Evaluate(trained_model, label='Evaluate')
    | beam.Map('CreateEvaluations', extract_values))
eval_data_sink = beam.io.TextFileSink(os.path.join(OUTPUT_DIR, 'eval'))
evaluations | beam.Write('WriteEval', eval_data_sink)

# analysis

# run pipeline
pipeline.run()


To generate a pipeline that runs in cloud, simply add --cloud to "%ml evaluate". Also all paths need to be GCS paths.