# H2O TensorFlow Deep Learning Demo

You can also follow this [Youtube video](https://www.youtube.com/watch?v=62TFK641gG8&feature=youtu.be).
## Prerequisites
1. Install TensorFlow from [https://www.tensorflow.org](https://www.tensorflow.org)
2. Download Sparkling Water from [here](http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.0/latest.html)
3. Follow [instructions to setup PySparkling](https://github.com/h2oai/sparkling-water/blob/master/py/README.rst) (especially steps 1 and 2)
4. Launch a Jupyter Notebook that connects to PySparkling:
```
cd ~/Downloads
unzip sparkling-water-2.0.0.zip
cd sparkling-water-2.0.0
~/sparkling-water-2.0.0$ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" bin/pysparkling
```
5. Point the Notebook to [this file](https://raw.githubusercontent.com/h2oai/sparkling-water/master/py/examples/notebooks/TensorFlowDeepLearning.ipynb) (e.g., download it first, then upload into the Notebook)

### Introduction
In this tutorial, we'll build a simple 2-layer deep artificial neural network to classify handwritten digits [MNIST](http://yann.lecun.com/exdb/mnist/). If you are not familiar with these terms, please check out our [Deep Learning Booklet](https://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/booklets/v2_2015/PDFs/online/DeepLearning_Vignette.pdf).

## Connect to H2O

We connect to an H2O cluster (here: 3 nodes), and import the MNIST dataset (pre-split into 60k rows for training and 10k rows for testing). Each row has 28^2=784 grayscale pixel values from 0 to 255.

In [1]:
## Read MNIST data into H2O
import h2o
from pysparkling import H2OContext
h2o.__version__
hc = H2OContext.getOrCreate(sc)
print(hc)
DATASET_DIR="http://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/mnist"
train_frame = h2o.import_file("{}/{}".format(DATASET_DIR, "train.csv.gz"))
test_frame = h2o.import_file("{}/{}".format(DATASET_DIR, "test.csv.gz"))

  def _ipython_display_formatter_default(self):
  def _formatters_default(self):
  def _deferred_printers_default(self):
  def _singleton_printers_default(self):
  def _type_printers_default(self):
  def _singleton_printers_default(self):
  def _type_printers_default(self):
  def _deferred_printers_default(self):


0,1
H2O cluster uptime:,6 seconds 696 milliseconds
H2O cluster version:,3.8.2.6
H2O cluster name:,sparkling-water-arno_-1896306748
H2O cluster total nodes:,3
H2O cluster total free memory:,2.88 GB
H2O cluster total cores:,24
H2O cluster allowed cores:,24
H2O cluster healthy:,True
H2O Connection ip:,172.16.2.20
H2O Connection port:,54327


H2OContext: ip=172.16.2.20, port=54327 (open UI at http://172.16.2.20:54327 )

Parse Progress: [##################################################] 100%

Parse Progress: [##################################################] 100%


### Let's first confirm that TensorFlow is working properly

In [2]:
## can simulate larger clusters here
NODES=3

In [3]:
import tensorflow
## Initialize TensorFlow session and test it
def map_fun(i):
  import tensorflow as tf
  with tf.Graph().as_default() as g:
    hello = tf.constant('Sparkling, TensorFlow!', name="hello_constant")
    with tf.Session() as sess:
      return sess.run(hello)
sc.parallelize(list(range(NODES)), NODES).map(map_fun).collect()

['Sparkling, TensorFlow!', 'Sparkling, TensorFlow!', 'Sparkling, TensorFlow!']

Next, we expose the data in H2O through the Spark DataFrame API (no copy made), such that the Python process with TensorFlow can access the data from the PySpark(ling) context.

In [4]:
train_df = hc.as_spark_frame(train_frame).repartition(NODES)
test_df = hc.as_spark_frame(test_frame).repartition(NODES)
#train_df.printSchema()

## Define a TensorFlow Deep Learning model 
Now, we define a TensorFlow Deep Learning model with 2 hidden layers of 50 neurons each, and the Rectifier activation function. We use the Softmax function to turn the 10 output neuron activation values into 10 class probabilities. We initialize the weights and biases with Gaussian noise. We train the model with Gradient descent with a fixed learning rate, no momentum, and use mini-batch for faster training.

In [5]:
import tensorflow as tf

## Define the number of hidden neurons per layer
HN=50

# - it loads local training data into numpy array (from Spark -> Python)
# - train TF Deep Learning model with 2 hidden layer
# - output accuracy on training data
def create_nn(data_train, data_test, iterations, batch_size):
    ## input
    x = tf.placeholder(tf.float32, [None, 784])
    ## weights
    W = [tf.Variable(tf.random_normal([784,HN],stddev=0.1))
        ,tf.Variable(tf.random_normal([HN, HN],stddev=0.1))
        ,tf.Variable(tf.random_normal([HN, 10],stddev=0.1))]
    ## biases
    b = [tf.Variable(tf.random_normal([HN],    stddev=0.1))
        ,tf.Variable(tf.random_normal([HN],    stddev=0.1))
        ,tf.Variable(tf.random_normal([10],    stddev=0.1))]
    ## hidden layer activation
    h1 = tf.nn.relu(   tf.matmul(x,  W[0]) + b[0])
    h2 = tf.nn.relu(   tf.matmul(h1, W[1]) + b[1])
    ## output
    y = tf.nn.softmax( tf.matmul(h2, W[2]) + b[2])
    ## storage for actual labels
    y_ = tf.placeholder(tf.float32, [None, 10])
    ## cost function
    cross_entropy = -tf.reduce_sum(y_*tf.log(y))                    
    ## optimizer
    train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
    
    # Train the model
    init = tf.initialize_all_variables()
    sess = tf.Session()
    sess.run(init)
    print("Training TensorFlow Deep Learning model")
    for i in range(iterations):
      #print("TensorFlow iter: ", i, " session: ", sess)
      batch_xs, batch_ys = data_train.next_batch(batch_size)
      sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
        
    model = [(sess.run(W[0]),sess.run(W[1]),sess.run(W[2]),sess.run(b[0]),sess.run(b[1]),sess.run(b[2]))]

    # Model evaluation
    correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
    batch_xs, batch_ys = data_test.next_batch(batch_size)
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    print("Training Accuracy:", sess.run(accuracy, feed_dict={x: batch_xs, y_: batch_ys}))
    #print(sess.run(tf.argmax(y,1), feed_dict={x: batch_xs, y_: batch_ys}))
    
    sess.close()
    return iter(model)
    
    # Export the model
    #from tensorflow_serving.session_bundle import exporter
    #export_path = "/tmp/xxx/"
    #saver = tf.train.Saver(sharded=True)
    #model_exporter = exporter.Exporter(saver)
    #signature = exporter.classification_signature(input_tensor=x, scores_tensor=y)
    #model_exporter.init(sess.graph.as_graph_def(), default_graph_signature=signature)
    #model_exporter.export(export_path, tf.constant(FLAGS.export_version), sess)
    
## Internal Helpers

# Sampling with replacement to provide a batch size
# Load everything into numpy datastructure
import numpy as np

def expand1hot(response, levels):
    nrows = response.shape[0]
    result = np.zeros((nrows, levels), dtype=np.float32)
    result[np.arange(nrows), response.astype(np.int8)] = 1.0
    return result

class RowData:
    def __init__(self, it):
        self._part_array = np.array([ [a for a in x] for x in it], dtype=np.float32)
        # Definition of input features
        self._x = list(range(784))
        # Index of response
        self._y = 784

    def next_batch(self, n):
        # Sample from local data without replacement
        dim = self._part_array.shape[0] # number of rows
        sample = np.random.choice(dim, n, replace=False)
        data = self._part_array[sample, :]
        # Data coming from H2O, pixel values are 0..255 -> normalize to 0..1
        # FIXME: this should be done via RDD or H2O API directly !
        train = data[:, self._x]/255
        response = expand1hot(data[:, self._y], 10)
        return (train, response)

## Run TensorFlow on each H2O/PySparkling node

We use the Spark Map/Reduce paradigm to distribute the training across multiple worker nodes, each node trains on its local data (stored in H2O, accessed by TensorFlow via JVM -> Python serialization provided by the PySpark(ling) API).

Here, we train only for a short time for demo purposes. This is certainly not the best quality model we can build.

In [6]:
# Number of batches to iterate
ITERATIONS = 100
# Batch size (per iteration)
BATCH_SIZE = 100
# Use MNIST dataset provided by TensorFlow - for debugging only
USE_TF_MNIST=False

def train_nn(iterations, batch_size, use_tf_mnist=False):
    def perPartition(it):
        if not use_tf_mnist:
            train_data = RowData(it)
            test_data = train_data
        else:
            from tensorflow.examples.tutorials.mnist import input_data
            mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
            train_data = mnist.train
            test_data = mnist.train
            
        return create_nn(train_data, test_data, iterations, batch_size)
        
    return perPartition

coeffs_per_node = train_df.rdd.mapPartitions(train_nn(ITERATIONS, BATCH_SIZE, USE_TF_MNIST)).collect()

In [7]:
# Now, we have the weights and biases for each node
print(len(coeffs_per_node))    ## Number of nodes
print(len(coeffs_per_node[0])) ## Number of weight and bias arrays 

3
6


## Convert the TensorFlow model into a H2O Deep Learning model

In [8]:
# Average the weights and biases across all node-local models
avg_coeffs = [c for c in coeffs_per_node[0]]
for i in range(0,len(avg_coeffs)):
    for node in range(1,NODES):
        avg_coeffs[i] = avg_coeffs[i] + coeffs_per_node[node][i]
avg_coeffs = [c/NODES for c in avg_coeffs]

num_weights=int(len(coeffs_per_node[0])/2)

## Convert the model coefficients (weights/biases) to H2O Frames
H2O_w = [h2o.H2OFrame(np.transpose(c)) for c in avg_coeffs[0:num_weights]]
H2O_b = [h2o.H2OFrame(np.transpose(np.matrix(c))) for c in avg_coeffs[num_weights:2*num_weights]]

print([c.dim for c in H2O_w])
print([c.dim for c in H2O_b])


Parse Progress: [##################################################] 100%

Parse Progress: [##################################################] 100%

Parse Progress: [##################################################] 100%

Parse Progress: [##################################################] 100%

Parse Progress: [##################################################] 100%

Parse Progress: [##################################################] 100%
[[50, 784], [50, 50], [10, 50]]
[[50, 1], [50, 1], [10, 1]]


In [9]:
#Initialize an H2O Model with those weights/biases
from h2o.estimators.deeplearning import H2ODeepLearningEstimator

## Create an H2O Deep Learning model from the TensorFlow model
dlmodel = H2ODeepLearningEstimator(
    model_id="model_from_TF", ## we want to be able to find the model in Flow later
    hidden=[HN,HN],           ## same Network layout as TF - two hidden layers
    epochs=0,                 ## no training done in H2O - just copy over the model from TF
    ignore_const_cols=False,  ## keep all input features (unless we also drop const cols in TF)
    sparse=True,              ## faster as 0 input remains 0 -> sparse activation -> sparse updates
    variable_importances=True
    ### Initialize the H2O model with the TensorFlow model state
    ### Requires H2O 3.8.2.1 or later
    ,initial_weights=[H2O_w[0],H2O_w[1],H2O_w[2]]
    ,initial_biases =[H2O_b[0],H2O_b[1],H2O_b[2]]
)
train_frame[784] = train_frame[784].asfactor()
dlmodel.train(x=list(range(784)),y=784,training_frame=train_frame)


deeplearning Model Build Progress: [                                                  ] 00%


## Score the H2O Deep Learning model in H2O (with the TensorFlow state)

In [10]:
## We can let H2O evaluate the performance of the TensorFlow model on the test set
dlmodel.model_performance(test_frame)


ModelMetricsMultinomial: deeplearning
** Reported on test data. **

MSE: 0.778273808871
R^2: 0.907184785182
LogLoss: 2.18184795723

Confusion Matrix: vertical: actual; across: predicted



0,1,2,3,4,5,6,7,8,9,10,11
0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,Error,Rate
63.0,0.0,0.0,0.0,0.0,1.0,132.0,0.0,784.0,0.0,0.9357143,917 / 980
0.0,0.0,1.0,0.0,0.0,3.0,104.0,0.0,1027.0,0.0,1.0,"1,135 / 1,135"
0.0,0.0,102.0,0.0,0.0,0.0,831.0,1.0,95.0,3.0,0.9011628,"930 / 1,032"
0.0,0.0,1.0,0.0,0.0,0.0,122.0,1.0,884.0,2.0,1.0,"1,010 / 1,010"
0.0,0.0,0.0,0.0,179.0,1.0,682.0,0.0,107.0,13.0,0.8177189,803 / 982
0.0,0.0,0.0,0.0,1.0,25.0,436.0,0.0,430.0,0.0,0.9719731,867 / 892
0.0,0.0,3.0,0.0,1.0,1.0,930.0,0.0,21.0,2.0,0.0292276,28 / 958
0.0,0.0,2.0,1.0,2.0,0.0,298.0,1.0,720.0,4.0,0.9990272,"1,027 / 1,028"
0.0,0.0,0.0,0.0,1.0,3.0,827.0,0.0,142.0,1.0,0.8542094,832 / 974



Top-10 Hit Ratios: 


0,1
k,hit_ratio
1,0.1446
2,0.3175
3,0.5003
4,0.6101
5,0.6873000
6,0.733
7,0.8392000
8,0.9147
9,0.9926000




In [11]:
## Overall classification error of the TF model (in H2O form) on the test set - not very good yet - needs more training
dlmodel.model_performance(test_frame).confusion_matrix()['Error'][-1]

0.8554

## Extract the Java scoring code (POJO) for the TensorFlow model

In [13]:
#dlmodel.download_pojo()  ## too large for Github

```
Filepath: /model_from_TF.java
/*
  Licensed under the Apache License, Version 2.0
    http://www.apache.org/licenses/LICENSE-2.0.html

  AUTOGENERATED BY H2O at 2016-06-06T18:29:29.587-07:00
  3.8.2.3
  
  Standalone prediction code with sample test data for DeepLearningModel named model_from_TF

  How to download, compile and execute:
      mkdir tmpdir
      cd tmpdir
      curl http:/localhost/127.0.0.1:54321/3/h2o-genmodel.jar > h2o-genmodel.jar
      curl http:/localhost/127.0.0.1:54321/3/Models.java/model_from_TF > model_from_TF.java
      javac -cp h2o-genmodel.jar -J-Xmx2g -J-XX:MaxPermSize=128m model_from_TF.java

     (Note:  Try java argument -XX:+PrintCompilation to show runtime JIT compiler behavior.)
*/
import java.util.Map;
import hex.genmodel.GenModel;
import hex.genmodel.annotations.ModelPojo;
...
  // Pass in data in a double[], pre-aligned to the Model's requirements.
  // Jam predictions into the preds[] array; preds[0] is reserved for the
  // main prediction (class for classifiers or value for regression),
  // and remaining columns hold a probability distribution for classifiers.
  public final double[] score0( double[] data, double[] preds ) {
    java.util.Arrays.fill(preds,0);
    java.util.Arrays.fill(NUMS,0);
    int i = 0, ncats = 0;
    final int n = data.length;
    for(; i<n; ++i) {
      NUMS[i] = Double.isNaN(data[i]) ? 0 : (data[i] - NORMSUB.VALUES[i])*NORMMUL.VALUES[i];
    }
    java.util.Arrays.fill(ACTIVATION[0],0);
    for (i=0; i<NUMS.length; ++i) {
      ACTIVATION[0][CATOFFSETS[CATOFFSETS.length-1] + i] = Double.isNaN(NUMS[i]) ? 0 : NUMS[i];
    }
    for (i=1; i<ACTIVATION.length; ++i) {
      java.util.Arrays.fill(ACTIVATION[i],0);
      int cols = ACTIVATION[i-1].length;
      int rows = ACTIVATION[i].length;
      int extra=cols-cols%8;
      int multiple = (cols/8)*8-1;
      int idx = 0;
      float[] a = WEIGHT[i];
      double[] x = ACTIVATION[i-1];
      double[] y = BIAS[i];
      double[] res = ACTIVATION[i];
      for (int row=0; row<rows; ++row) {
        double psum0 = 0, psum1 = 0, psum2 = 0, psum3 = 0, psum4 = 0, psum5 = 0, psum6 = 0, psum7 = 0;
        for (int col = 0; col < multiple; col += 8) {
          int off = idx + col;
          psum0 += a[off    ] * x[col    ];
          psum1 += a[off + 1] * x[col + 1];
          psum2 += a[off + 2] * x[col + 2];
          psum3 += a[off + 3] * x[col + 3];
          psum4 += a[off + 4] * x[col + 4];
          psum5 += a[off + 5] * x[col + 5];
          psum6 += a[off + 6] * x[col + 6];
          psum7 += a[off + 7] * x[col + 7];
        }
        res[row] += psum0 + psum1 + psum2 + psum3;
        res[row] += psum4 + psum5 + psum6 + psum7;
        for (int col = extra; col < cols; col++)
          res[row] += a[idx + col] * x[col];
        res[row] += y[row];
        idx += cols;
      }
      if (i<ACTIVATION.length-1) {
        for (int r=0; r<ACTIVATION[i].length; ++r) {
          ACTIVATION[i][r] = Math.max(0, ACTIVATION[i][r]);
        }
      }
      if (i == ACTIVATION.length-1) {
        double max = ACTIVATION[i][0];
        for (int r=1; r<ACTIVATION[i].length; r++) {
          if (ACTIVATION[i][r]>max) max = ACTIVATION[i][r];
        }
        double scale = 0;
        for (int r=0; r<ACTIVATION[i].length; r++) {
          ACTIVATION[i][r] = Math.exp(ACTIVATION[i][r] - max);
          scale += ACTIVATION[i][r];
        }
        for (int r=0; r<ACTIVATION[i].length; r++) {
          if (Double.isNaN(ACTIVATION[i][r]))
            throw new RuntimeException("Numerical instability, predicted NaN.");
          ACTIVATION[i][r] /= scale;
          preds[r+1] = ACTIVATION[i][r];
        }
      }
    }
    preds[0] = hex.genmodel.GenModel.getPrediction(preds, PRIOR_CLASS_DISTRIB, data, 0.5);
    return preds;
  }
}
...

```

## Continue Training the Deep Learning model in H2O

In [12]:
## Train in H2O for 1 more epoch (one full pass over the training data)
dlmodel.epochs=1
dlmodel.train(x=list(range(784)),y=784,training_frame=train_frame)


deeplearning Model Build Progress: [##################################################] 100%


In [13]:
## Check the classification error of the H2O model after a bit of training in H2O - much better!
p=dlmodel.model_performance(test_frame)
p.confusion_matrix()['Error'][-1]

0.0726

## Inspect the model in Flow
Since the model is now in H2O, we can inspect it from [Flow](http://localhost:54321), run ```print(hc)``` to see the URL to connect to Flow.

![alt text](./getCloud.png "Cloud status")

For example, we can graphically inspect the variable importance or the confusion matrix. We can also score the model on the test set in Flow, continue training the model from this checkpoint, or inspect the Java scoring code (POJO). We highly recommend you to get familiar with Flow if you're not already.
![alt text](./TF_model_in_Flow.png "TF model in Flow")