# Using Tensorflow with H2O 

This notebook shows how to use the tensorflow backend to tackle a simple image classification problem.

We start by connecting to our h2o cluster:

In [1]:
import h2o
h2o.init(port=54321, nthreads=-1)

Checking whether there is an H2O instance running at http://localhost:54321. connected.


0,1
H2O cluster uptime:,54 mins 37 secs
H2O cluster version:,3.11.0.99999
H2O cluster version age:,6 days
H2O cluster name:,ubuntu
H2O cluster total nodes:,1
H2O cluster free memory:,8.86 Gb
H2O cluster total cores:,8
H2O cluster allowed cores:,8
H2O cluster status:,"locked, healthy"
H2O connection url:,http://localhost:54321


Then we make sure that the H2O cluster has the DeepWater distribution

In [2]:
from h2o.estimators.deepwater import H2ODeepWaterEstimator
if not H2ODeepWaterEstimator.available(): exit

Load some python utilities library 

In [3]:
import sys, os
import os.path
import pandas as pd
import numpy as np
import random

and finally we configure the IPython notebook to have nice visualizations

In [4]:
%matplotlib inline
from IPython.display import Image, display, HTML
import matplotlib.pyplot as plt

## Configuration

Set the path to your h2o installation
and download the 'bigdata' dataset using `./gradlew syncBigdataLaptop` from the H2O source distribution.

In [5]:
H2O_PATH=os.path.expanduser("~/h2o-3/")

## Image Classification Task

H2O DeepWater allows you to specify a list of URIs (file paths) or URLs (links) to images, together with a response column (either a class membership (enum) or regression target (numeric)).

For this example, we use a small dataset that has a few hundred images, and three classes: cat, dog and mouse.

In [6]:
frame = h2o.import_file(H2O_PATH + "/bigdata/laptop/deepwater/imagenet/cat_dog_mouse.csv")
print(frame.dim)
print(frame.head(5))

Parse progress: |█████████████████████████████████████████████████████████| 100%
[267, 2]


C1,C2
bigdata/laptop/deepwater/imagenet/cat/102194502_49f003abd9.jpg,cat
bigdata/laptop/deepwater/imagenet/cat/11146807_00a5f35255.jpg,cat
bigdata/laptop/deepwater/imagenet/cat/1140846215_70e326f868.jpg,cat
bigdata/laptop/deepwater/imagenet/cat/114170569_6cbdf4bbdb.jpg,cat
bigdata/laptop/deepwater/imagenet/cat/1217664848_de4c7fc296.jpg,cat





To build a LeNet image classification model in H2O, simply specify `network = "lenet"` and the **Tensorflow** backend to use the tensorflow lenet implementation:

In [14]:
model = H2ODeepWaterEstimator(epochs      = 500, 
                              network     = "lenet", 
                              image_shape = [28,28],  ## provide image size
                              channels    = 3,
                              backend     = "tensorflow",
                              model_id    = "deepwater_tf_simple")

model.train(x = [0], # file path e.g. xxx/xxx/xxx.jpg
            y = 1, # label cat/dog/mouse
            training_frame = frame)

model.show()

deepwater Model Build progress: |█████████████████████████████████████████| 100%
Model Details
H2ODeepWaterEstimator :  Deep Water
Model Key:  deepwater_tf_simple


ModelMetricsMultinomial: deepwater
** Reported on train data. **

MSE: 0.347845369945
RMSE: 0.589784172342
LogLoss: 0.981696653168
Mean Per-Class Error: 0.482660793786
Confusion Matrix: vertical: actual; across: predicted



0,1,2,3,4
cat,dog,mouse,Error,Rate
74.0,7.0,9.0,0.1777778,16 / 90
42.0,26.0,17.0,0.6941176,59 / 85
49.0,4.0,39.0,0.5760870,53 / 92
165.0,37.0,65.0,0.4794007,128 / 267


Top-3 Hit Ratios: 


0,1
k,hit_ratio
1,0.5205992
2,0.82397
3,1.0


Scoring History: 


0,1,2,3,4,5,6,7,8,9
,timestamp,duration,training_speed,epochs,iterations,samples,training_rmse,training_logloss,training_classification_error
,2016-11-30 09:36:48,0.000 sec,,0.0,0,0.0,,,
,2016-11-30 09:36:50,1.989 sec,564 obs/sec,3.8352060,1,1024.0,0.8077921,3.3077188,0.6629213
,2016-11-30 09:36:55,7.348 sec,2149 obs/sec,57.5280899,15,15360.0,0.6257361,0.9911735,0.4831461
,2016-11-30 09:37:00,12.678 sec,2385 obs/sec,111.2209738,29,29696.0,0.5897842,0.9816967,0.4794007
,2016-11-30 09:37:06,18.014 sec,2479 obs/sec,164.9138577,43,44032.0,0.8095860,10.8698045,0.6554307
,2016-11-30 09:37:11,23.315 sec,2534 obs/sec,218.6067416,57,58368.0,0.8095859,10.8075838,0.6554307
,2016-11-30 09:37:16,28.628 sec,2567 obs/sec,272.2996255,71,72704.0,0.8095858,10.7393858,0.6554307
,2016-11-30 09:37:22,33.936 sec,2590 obs/sec,325.9925094,85,87040.0,0.8095855,10.6550124,0.6554307
,2016-11-30 09:37:27,39.239 sec,2608 obs/sec,379.6853933,99,101376.0,0.8095851,10.5413824,0.6554307


If you'd like to build your own Tensorflow network architecture, then this is easy as well.
In this example script, we are using the **Tensorflow** backend. 
Models can easily be imported/exported between H2O and Tensorflow since H2O uses Tensorflow's format for model definition.

In [8]:
def simple_model(w, h, channels, classes):
    import json
    import tensorflow as tf    
    # always create a new graph inside ipython or
    # the default one will be used and can lead to
    # unexpected behavior
    graph = tf.Graph() 
    with graph.as_default():
        size = w * h * channels
        x = tf.placeholder(tf.float32, [None, size])
        W = tf.Variable(tf.zeros([size, classes]))
        b = tf.Variable(tf.zeros([classes]))
        y = tf.matmul(x, W) + b

        # labels
        y_ = tf.placeholder(tf.float32, [None, classes])
     
        # accuracy
        correct_prediction = tf.equal(tf.argmax(y, 1),                                                                                                                                                                                                                                   
                                       tf.argmax(y_, 1))                       
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
        
        # train
        cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))
        train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
        
        tf.add_to_collection("train", train_step)
        # this is required by the h2o tensorflow backend
        global_step = tf.Variable(0, name="global_step", trainable=False)
        
        init = tf.initialize_all_variables()
        tf.add_to_collection("init", init)
        tf.add_to_collection("logits", y)
        saver = tf.train.Saver()
        meta = json.dumps({
                "inputs": {"batch_image_input": x.name, "categorical_labels": y_.name}, 
                "outputs": {"categorical_logits": y.name}, 
                "metrics": {"accuracy": accuracy.name, "total_loss": cross_entropy.name},
                "parameters": {"global_step": global_step.name},
        })
        print(meta)
        tf.add_to_collection("meta", meta)
        filename = "/tmp/lenet_tensorflow.meta"
        tf.train.export_meta_graph(filename, saver_def=saver.as_saver_def())
    return filename

In [9]:
filename = simple_model(28, 28, 3, classes=3)

{"metrics": {"total_loss": "Mean_1:0", "accuracy": "Mean:0"}, "inputs": {"categorical_labels": "Placeholder_1:0", "batch_image_input": "Placeholder:0"}, "parameters": {"global_step": "global_step:0"}, "outputs": {"categorical_logits": "add:0"}}


In [13]:
model = H2ODeepWaterEstimator(epochs                  = 500, 
                              network_definition_file = filename,  ## specify the model
                              image_shape             = [28,28],  ## provide expected image size
                              channels                = 3,
                              backend                 = "tensorflow",
                              model_id                = "deepwater_tf_custom")

model.train(x = [0], # file path e.g. xxx/xxx/xxx.jpg
            y = 1, # label cat/dog/mouse
            training_frame = frame)

model.show()

deepwater Model Build progress: |█████████████████████████████████████████| 100%
Model Details
H2ODeepWaterEstimator :  Deep Water
Model Key:  deepwater_tf_custom


ModelMetricsMultinomial: deepwater
** Reported on train data. **

MSE: 6.60075876885e+12
RMSE: 2569194.18668
LogLoss: -14.4921790248
Mean Per-Class Error: 0.0
Confusion Matrix: vertical: actual; across: predicted



0,1,2,3,4
cat,dog,mouse,Error,Rate
90.0,0.0,0.0,0.0,0 / 90
0.0,85.0,0.0,0.0,0 / 85
0.0,0.0,92.0,0.0,0 / 92
90.0,85.0,92.0,0.0,0 / 267


Top-3 Hit Ratios: 


0,1
k,hit_ratio
1,1.0
2,1.0
3,1.0


Scoring History: 


0,1,2,3,4,5,6,7,8,9
,timestamp,duration,training_speed,epochs,iterations,samples,training_rmse,training_logloss,training_classification_error
,2016-11-30 09:20:03,0.000 sec,,0.0,0,0.0,,,
,2016-11-30 09:20:05,2.017 sec,534 obs/sec,3.8352060,1,1024.0,1327446.7778600,,0.5655431
,2016-11-30 09:20:10,7.083 sec,9107 obs/sec,237.7827715,62,63488.0,2569194.1866800,-14.4921790,0.0
,2016-11-30 09:20:15,12.159 sec,10548 obs/sec,475.5655431,124,126976.0,2569194.1866800,-14.4921790,0.0
,2016-11-30 09:20:15,12.745 sec,10635 obs/sec,502.4119850,131,134144.0,2569194.1866800,-14.4921790,0.0
