Feature engineering is the task of deciding how to represent the data that goes in and out of your Machine Learning system.  Feature Engineering is a vital part of network design, because it turns out that how features (i.e. data) is represented has a huge impact on how easy it will be for your system to learn the task that you are interested in.

To illustrate how important Feature Engineering is, one good approach is to "make up" some synthetic examples with synthetic inputs & outputs, and see if we can learn the task at hand. 

## Counting Bits in Binary 

For this exercise, we are going to use "counting the binary bits" (also referred to as "*popcount*" or "*Hamming Distance*") as the function we are trying to learn.  This function cannot be represented by a linear function, se we'll be using a 2-layer network.  We will constrain our input our values in the range $[0,1024)$, and our outputs will be in the range $[0,10]$.  We can think of our training set as looking somewhat like this:

$$
\begin{array}{|c|c|}
\hline input & popcount(input) \\\hline
  374 & 5 \\\hline
  924 & 6 \\\hline
  708 & 4 \\\hline
  6 & 2 \\\hline
  ... & ... \\\hline
\end{array}
$$

Let's try passing these integer values directly into the first layer of the network and see what happens!

In [38]:
import tensorflow as tf 
import random

inputs = tf.placeholder(shape=(None, 1), dtype=tf.float32, name='input')
popcount = tf.placeholder(shape=(None, 1), dtype=tf.float32, name='popcount')

input_weight = tf.Variable(expected_shape=(1, 16), 
                           initial_value=tf.truncated_normal((1, 16), mean=0, stddev=0.1))
input_bias = tf.Variable(expected_shape=(1,16), 
                         initial_value=tf.truncated_normal((1,16), mean=0, stddev=0.1))

# "Perceptron" 
mid_layer = tf.nn.sigmoid(tf.add(tf.matmul(inputs, input_weight), input_bias))

out_weight = tf.Variable(expected_shape=(16, 1), 
                         initial_value=tf.truncated_normal((16, 1), mean=0, stddev=0.1))
out_bias = tf.Variable(expected_shape=(1, 1), 
                       initial_value=tf.truncated_normal((1, 1), mean=0, stddev=0.1))

# Perceptron formula again.
output = tf.add(tf.matmul(mid_layer, out_weight), out_bias)

# Our error function is computed as "Squared Difference" between the computed output
# and the expected value. 
loss = tf.reduce_mean(tf.pow(output - popcount, 2))

# Learning rate and optimizer similar to our previous examples. 
learning_rate = 0.001
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss)

def f(x): 
    popcount = 0 
    while x: 
        if x&1:
            popcount+=1
        x>>=1 
    return popcount

def makeinput(x):
    return [x,]

with tf.Session() as sess: 
    sess.run([tf.local_variables_initializer(), 
              tf.global_variables_initializer()])
    
    # Training: 
    #
    # Iterate many times with random inputs to "learn" the parameters
    # stored in input_wegiht, input_bias, out_weight, out_bias Variables above.  
    # For this example, we only pass in **even** numbers in the range [0,20]
    # 
    # We will use odd values during our validation phase below to ensure that 
    # we never validate on any of the inputs that were in the training set. 
    print "TRAIN"
    train_iterations = 100000
    batch_size = 128
    max_value = 16
    for it in xrange(train_iterations):
        feed = {inputs:[], popcount:[]}
        for bt in xrange(batch_size):
            v = random.randrange(0, max_value)
            feed[inputs].append(makeinput(v))
            feed[popcount].append(makeinput(f(v)))
        _, l = sess.run([optimizer, loss], feed_dict=feed)
        if it % 1000 == 0:
            print "i=",it," loss=", l
            
    # Once we have learned the parameters, we can validate by passing inputs 
    # never seen before.  For this case, we expand the range of our inputs 
    # to include all odd numbers in the range [-40,40].  
    print "VALIDATE"
    validate_iterations = 25
    for it in xrange(validate_iterations):
        v = random.randrange(0, max_value)
        i = [makeinput(v),]
        e = [makeinput(f(v)),]
        out, l = sess.run([output, loss], feed_dict={inputs:i, popcount:e})
        print "input=",i," out=", out, "actual=,", e," loss=", l


TRAIN
i= 0  loss= 4.62908
i= 1000  loss= 0.419672
i= 2000  loss= 0.350638
i= 3000  loss= 0.413125
i= 4000  loss= 0.352155
i= 5000  loss= 0.327442
i= 6000  loss= 0.344848
i= 7000  loss= 0.356394
i= 8000  loss= 0.305952
i= 9000  loss= 0.308031
i= 10000  loss= 0.283118
i= 11000  loss= 0.369738
i= 12000  loss= 0.303562
i= 13000  loss= 0.308296
i= 14000  loss= 0.333424
i= 15000  loss= 0.30559
i= 16000  loss= 0.291579
i= 17000  loss= 0.351013
i= 18000  loss= 0.338948
i= 19000  loss= 0.362444
i= 20000  loss= 0.304563
i= 21000  loss= 0.180726
i= 22000  loss= 0.266295
i= 23000  loss= 0.270902
i= 24000  loss= 0.25015
i= 25000  loss= 0.241592
i= 26000  loss= 0.272542
i= 27000  loss= 0.272879
i= 28000  loss= 0.263778
i= 29000  loss= 0.211322
i= 30000  loss= 0.282963
i= 31000  loss= 0.319068
i= 32000  loss= 0.230075
i= 33000  loss= 0.284408
i= 34000  loss= 0.301685
i= 35000  loss= 0.247171
i= 36000  loss= 0.319898
i= 37000  loss= 0.253544
i= 38000  loss= 0.240501
i= 39000  loss= 0.2356
i= 40000  lo