In [1]:
%load_ext autoreload
%autoreload 2

In [9]:
from __future__ import (print_function, division, absolute_import)

In [10]:
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

from tools import array_in

# Tensorflow Introduction
## A toy problem
Let's assume we are responsible for a machine that produces a particular powder ingredient for baking. And you found it little surprising to hear that the quality of the output depends on the humidity within the last inches of the output pipe. Indeed, at certain humidity levels the probability of the product clumping together or sticking to the pipe increased. It's your job to find out what to do. 

Now, that machine has two technical and somewhat mysterious parameters $\beta_1$ and $\beta_2$ that can be measured and tuned. You suspect that the humidity is somehow influenced by these parameters and you want to prove it now. 

A reasonable first hypothesis could be written in mathematical terms as:

$$
h= A_1 \cdot \beta_1 + A_2 \cdot \beta_2 + C
$$

where $h$ is the measured humidity, and $A_1, A_2$ and $C$ are three model parameters that we need to compute now. Any non-zero value for $A_1, A_2$ will confirm our suspicion.

Here's how we take some measurements. Note that we also recorded the weekday and the hour of the day when the measurement took place. At this point in time we consider them irrelevant.

### Data 

In [11]:
from measurements import measure

In [26]:
data = measure(5)
data.head()

Unnamed: 0,beta1,beta2,hour,humidity,weekday
0,0.052852,-0.481176,8,19.809202,2
1,-0.496018,1.102635,0,15.910741,0
2,4.09273,-4.053264,23,27.410046,4
3,3.379585,-1.427107,4,24.614017,2
4,4.457862,4.324174,23,26.260317,1


In [28]:
data = measure(10000)
data.describe()

Unnamed: 0,beta1,beta2,hour,humidity,weekday
count,10000.0,10000.0,10000.0,10000.0,10000.0
mean,0.017523,-0.013838,11.616,19.416244,2.9885
std,2.891635,2.888683,6.920414,6.599335,1.994785
min,-4.999856,-4.999501,0.0,3.543744,0.0
25%,-2.515663,-2.498346,6.0,14.156107,1.0
50%,0.005618,-0.003567,12.0,19.463711,3.0
75%,2.525692,2.511405,18.0,24.459729,5.0
max,4.999936,4.999768,23.0,40.495883,6.0


### Working with tensors

Next, we'll simply encode the hypotheses in Tensorflow as an *affine* function of those technical parameters $\beta_1$ and $\beta_2$ using matrix multiplication:

$$
h(\beta_1, \beta_2) = (A_1, A_2) \cdot
\left( 
\begin {array} {c}
\beta_1 \\
\beta_2
\end{array}
\right) + C
$$

Now we try to find $A_1, A_2$ such that the above function $h$ best reproduces the associated measurement for the humidity.

In [29]:
# A tensor (matrix) for the coefficients A_i
A = tf.Variable([[1., 2.]], name="coefficients_A", dtype=tf.float32) # Starting with arbitrary values

# A scalar tensor for the *bias* C
C = tf.Variable(3., name="bias_C", dtype=tf.float32)

Placeholder for the input data and the *labels*. In machine learning, *labels* represent the *true* values. It's what we want the hypothesis to mimic (reproduce). Here it's the measurements of the humidity - or the *true humidity*, if you wish.

In [30]:
beta = tf.placeholder(shape=(2,None), name="beta", dtype=tf.float32) 
lbls = tf.placeholder(shape=(1,None), name="true_humidity", dtype=tf.float32)  

### The hypothesis function as a computational graph

The linear humidity hypothesis ```h``` represents a computational graph that we can actually navigate to see its branches and leaves like demonstrated below.

In [34]:
h = tf.matmul(A, beta) + C
h.op.inputs[0].op, h.op.inputs[0].op.inputs[0], h.op.inputs[0].op.inputs[1]

(<tf.Operation 'MatMul_2' type=MatMul>,
 <tf.Tensor 'coefficients_A_1/read:0' shape=(1, 2) dtype=float32>,
 <tf.Tensor 'beta_1:0' shape=(2, ?) dtype=float32>)

In [39]:
true_humidity = list(data['humidity'])
true_humidity[:5]

[11.55043632984082,
 11.877147020088872,
 14.760552579017997,
 13.675038889426459,
 12.71854178824518]

We use a Tensorflow session object to evaluate the computational graph at the particular values given by ```beta_input```

In [40]:
beta_input = [list(data['beta1']), list(data['beta2'])]

In [41]:
beta_input

[[-2.194795854547047,
  -1.7636154445689192,
  -2.7728287850555358,
  -2.711533135736872,
  -2.0309069523093393,
  -4.242566112166909,
  0.7121427667319216,
  -0.4926047494193053,
  -3.3497251140447992,
  2.4181839423814813,
  0.9284137708500548,
  -2.92654616120666,
  -2.62416676725282,
  3.404798883953763,
  -3.5887129618366154,
  -2.061477038980213,
  -3.646634187472877,
  0.6640945789503716,
  -4.354993483047846,
  -2.2557942647185882,
  -4.123006807207663,
  2.7389945323453224,
  -3.790083132138722,
  -2.8991788144910604,
  1.713812916194164,
  -2.813814577706416,
  -3.5866978439219723,
  -3.667019276962132,
  2.332917219957471,
  0.19904848301392253,
  -3.797177988933208,
  -1.0681528501099513,
  -1.2822112267263952,
  3.279903812012595,
  4.127980593600167,
  -2.7083599962857385,
  3.8701968334179497,
  1.2860777951360465,
  0.27266302578203927,
  -2.312638401789214,
  0.12718424236766968,
  2.1442863688457994,
  2.690556896240979,
  -3.6790927732950927,
  0.44436879777883487,
 

In [37]:
init = tf.global_variables_initializer()
with tf.Session() as session:
    session.run(init)
    model_humidity = session.run(h, feed_dict={beta: beta_input })

ValueError: setting an array element with a sequence.

The results differ vastly from the true humidity - unsurprisingly so - we haven't tuned the parameters $A$ and $C$ yet.  

In [21]:
zip(true_humidity, list(model_humidity.squeeze()))

[(15.73248461739454, 7.135569),
 (21.889868246241644, -3.230329),
 (10.711489756167234, 8.704295),
 (13.208130994156985, 9.905912),
 (14.069801464344964, -7.2277393)]

### Distance between model and reality

We want the model humidity be as close as possible to the *true* humidity. To achieve that we need to measure how close the model has come to the true humidity. The mean squared difference is a good candidate for that purpose, and we'll use the placeholder ```lbls``` ('labels' is commonly used in machine learning) that we introduced previously to represent the true humidity.

In [22]:
sq_distance = tf.losses.mean_squared_error(h,lbls)
true_humidity = [true_humidity] # We need to fit the shape to the hypothesis input placeholder

In [23]:
with tf.Session() as session:
    session.run(init)
    error = session.run(sq_distance, feed_dict={beta: beta_input, lbls: true_humidity })
print(error)

InvalidArgumentError: Incompatible shapes: [1,5] vs. [1,10000]
	 [[node mean_squared_error/SquaredDifference (defined at <ipython-input-22-5bfb9e4baecc>:1)  = SquaredDifference[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_true_humidity_0_1, add)]]

Caused by op u'mean_squared_error/SquaredDifference', defined at:
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/wgiersche/workspace/venv/lib/python2.7/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 499, in start
    self.io_loop.start()
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/tornado/ioloop.py", line 1073, in start
    handler_func(fd_obj, events)
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
    self._handle_recv()
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
    callback(*args, **kwargs)
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
    handler(stream, idents, msg)
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2714, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2818, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2878, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-22-5bfb9e4baecc>", line 1, in <module>
    sq_distance = tf.losses.mean_squared_error(h,lbls)
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/tensorflow/python/ops/losses/losses_impl.py", line 671, in mean_squared_error
    losses = math_ops.squared_difference(predictions, labels)
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 8258, in squared_difference
    "SquaredDifference", x=x, y=y, name=name)
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/wgiersche/workspace/venv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Incompatible shapes: [1,5] vs. [1,10000]
	 [[node mean_squared_error/SquaredDifference (defined at <ipython-input-22-5bfb9e4baecc>:1)  = SquaredDifference[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_true_humidity_0_1, add)]]


### Tuning the parameters

We'll use some advanced concepts that we can't cover to the full extend here. Please refer to any introductory source of your liking to get familiar with the concept of gradient descent.

The *gradient* is a measure of how much the value of a function changes, when its input changes infinitesimaly. If e.g. the gradient of the distance with respect to $A_1$ is positive then that means making $A_1$ a bit smaller would also make the distance smaller. And that's exactly what we're trying to achieve. So what we'll do is iteratively substract a fraction (definded by some learning rate) of the gradient from the values of $A$ and $B$. And we do this in vector form.

In [146]:
learning_rate = 1e-2

In [147]:
grad_A = tf.gradients(sq_distance, A)
grad_C = tf.gradients(sq_distance, C)

In [148]:
Tune_A = tf.assign_add( A, tf.multiply(grad_A[0], -learning_rate))
Tune_C = tf.assign_add( C, tf.multiply(grad_C[0], -learning_rate))

In [149]:
test_data = measure(10)
test_true = [list(test_data['humidity'])]
test_beta = [list(test_data['beta1']), list(test_data['beta2'])]

In [150]:
data = measure(10000)
true_humidity = [list(data['humidity'])]
beta_input = [list(data['beta1']), list(data['beta2'])]

Note that the below implementation is deliberately simple inefficient to keep things simple and readable.

In [151]:
errors = []
with tf.Session() as session:
    session.run(init)
    for count in range(200):
        
        # compute the current error/distance
        error = session.run(sq_distance, feed_dict={beta: beta_input, lbls: true_humidity })
        errors.append(error)
        
        # tune and tweak the parameters a little
        session.run([Tune_A, Tune_C], feed_dict={beta: beta_input, lbls: true_humidity })

    # Storing test results and parameters
    test_results, a, c = session.run([h, A, C], feed_dict={beta: test_beta})
        
print(errors)

[209.6048, 185.51477, 167.22884, 153.04346, 141.77051, 132.5804, 124.893486, 118.30402, 112.52771, 107.3645, 102.67336, 98.35419, 94.3358, 90.5667, 87.00981, 83.63739, 80.429016, 77.368835, 74.444595, 71.6464, 68.96613, 66.396935, 63.93293, 61.5689, 59.3001, 57.12235, 55.031593, 53.02417, 51.09662, 49.24564, 47.468136, 45.76114, 44.12181, 42.547436, 41.035446, 39.583332, 38.188732, 36.849377, 35.56301, 34.327618, 33.141136, 32.001617, 30.907213, 29.856138, 28.846668, 27.877163, 26.946043, 26.051794, 25.192936, 24.368084, 23.575878, 22.815035, 22.084322, 21.382534, 20.708525, 20.061197, 19.4395, 18.842415, 18.268972, 17.71822, 17.189272, 16.681274, 16.19338, 15.724803, 15.274772, 14.84256, 14.42746, 14.028793, 13.645909, 13.278184, 12.925015, 12.585824, 12.260075, 11.947203, 11.646725, 11.358141, 11.080983, 10.8147955, 10.559144, 10.313621, 10.077812, 9.851341, 9.633836, 9.424942, 9.224313, 9.031631, 8.846571, 8.668842, 8.498149, 8.334217, 8.176765, 8.02555, 7.880325, 7.7408485, 7.60689

The steadily decreasing list of differences above is a very welcome sign. It means: Our model's output values are indeed getting closer and closer to the *true* humidity. I dare say our model is getting better. And we should definitely see that when we explicitly compare true humidity with the one that our model *predicts*.

In [152]:
zip(list(test_results[0]), test_true[0])

[(6.66325, 5.523764775538433),
 (7.041059, 8.327010385711805),
 (13.917435, 11.770791417017335),
 (5.384243, 6.342719303490137),
 (12.05286, 13.060140138861419),
 (16.964622, 16.803717234725916),
 (8.140636, 9.52450880641151),
 (11.898044, 14.687673245174535),
 (11.872751, 18.924829215105383),
 (23.782742, 21.554171031016892)]


The parameters $A_1, A_2$ and $C$ have eventually converged to:

In [131]:
list(a[0]), c

([2.0003443, -0.50254774], 14.83859)

That concludes the first part of this exercise. You learned to construct and execute computational graphs with Tensorflow and apply one of the most fundamental techniques, namely gradient descent to solve an analytical problem. 

---
### 
So, are we happy? To a large degree we could indeed be. We have found convincing evidence that those
mysterious parameters indeed have some influence on the humidity, so tuning them appropriately may help overcome the afforementioned problems. 
