### Background

This training and testing is largely based on the TensorFlow tutorial:
[MNIST for Beginners](https://www.tensorflow.org/versions/r0.10/tutorials/mnist/beginners/index.html#mnist-for-ml-beginners)

In [1]:
import tensorflow as tf
import numpy as np

In [2]:
def get_batch(data, labels, size):
    indices = np.random.randint(0, data.shape[0], size)
    return data[indices, :], labels[indices, :]

### Train Model

* Read in the training data
* Normalize the training data so that each feature value is in the range [0, 1]
* Set up placeholders and variables to feed into the TensorFlow graph
* Define the linear model
* Define a loss function that will be minimized during training
* Define which algorithm will be used to minimize the loss function

In [3]:
training_set = np.load('training_set.npy')
training_set_shift = training_set - training_set.min(axis=0)
training_set_norm = training_set_shift / training_set_shift.max(axis=0)
training_labels = np.load('training_labels.npy')

x = tf.placeholder(tf.float32, [None, training_set.shape[1]])
W = tf.Variable(tf.zeros([training_set.shape[1], training_labels.shape[1]]))
b = tf.Variable(tf.zeros([training_labels.shape[1]]))

# The linear model y=W*x + b with softmax() to turn it into probabilities across the categories
y = tf.nn.softmax(tf.matmul(x, W) + b)

# The known values
y_ = tf.placeholder(tf.float32, [None, training_labels.shape[1]])

# loss function
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
# print out cross entropy value to track convergence
cross_entropy = tf.Print(cross_entropy, [cross_entropy], "CrossE", first_n=50)

# how to train/optimize the model
train_step = tf.train.GradientDescentOptimizer(0.8).minimize(cross_entropy)

### Run Training

* Initialize the variables and create a TensorFlow session
* Run the training step many times, each time grabbing a new batch of training data

In [4]:
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

for i in range(10000):
  batch_xs, batch_ys = get_batch(training_set_norm, training_labels, 500)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

### Test Model

* Read and normalize the test data
* Define the operation that calculates the accuracy, in this case it is the fraction of predictions that match the known labels. 

In [5]:
test_set = np.load('test_set.npy')
test_set_shift = test_set - test_set.min(axis=0)
test_set_norm = test_set_shift / test_set_shift.max(axis=0)
test_labels = np.load('test_labels.npy')

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: test_set_norm, y_: test_labels}))

0.959167


### Save Model
Saves all variables currently defined in the "session" which can be restored later

In [6]:
saver = tf.train.Saver()
save_path = saver.save(sess, "model.ckpt")