# Neural Network Tutorial

(This notebook follows a Torch tutorial. http://dp.readthedocs.org/en/latest/neuralnetworktutorial/)

We begin with a simple neural network example. The first line loads the dp package, whose first matter of business is to load its dependencies (see init.lua):

If you got the error messege `module 'dp' not found:`, 
try to install via luarocks (http://dp.readthedocs.org/en/latest/).

    $> sudo luarocks install dp

In [3]:
require 'dp'

{
  XpLog : table: 0x14f180c0
  mkdir : function: 0x14f17fe8
  FKDKaggle : table: 0x151d1f48
  SAVE_DIR : /Users/Calvin/save
  ListView : table: 0x14610fe0
  is_file : function: 0x14615398
  BillionWords : table: 0x15150690
  ImageNet : table: 0x14f35310
  SequenceView : table: 0x135b2178
  SentenceSet : table: 0x145f9a78
  TextSet : table: 0x12fa8cc0
  Cifar100 : table: 0x15190ee8
  NotMnist : table: 0x14f705b0
  download_file : function: 0x14efb188
  reverseDist : function: 0x15171f68
  XpLogEntry : table: 0x151a2308
  FaceDetection : table: 0x14ed45a0
  Preprocess : table: 0x15158eb0
  AdaptiveDecay : table: 0x151e5d70
  Pipeline : table: 0x1512c520
  reload : function: 0x14f2f2d8
  SaveToFile : table: 0x151e1ec0
  printG : function: 0x14f5f810
  Perplexity : table: 0x151c8c28
  distReport : function: 0x151709c8
  Standardize : table: 0x15148c40
  unzip : function: 0x15163248
  images2tensor : function: 0x14f2f2b8
  Channel : table: 0x14f71250
  decompress_tarball : function: 0x14f7

## Command-line Arguments

Lets define some command-line arguments. These will be stored into table `opt`, which will be printed when the script is launched. Command-line arguments make it easy to control the experiment and try out different hyper-parameters without needing to modify any code.

In [4]:
--[[command line arguments]]--
cmd = torch.CmdLine()
cmd:text()
cmd:text('Image Classification using MLP Training/Optimization')
cmd:text('Example:')
cmd:text('$> th neuralnetwork.lua --batchSize 128 --momentum 0.5')
cmd:text('Options:')
cmd:option('--learningRate', 0.1, 'learning rate at t=0')
cmd:option('--lrDecay', 'linear', 'type of learning rate decay : adaptive | linear | schedule | none')
cmd:option('--minLR', 0.00001, 'minimum learning rate')
cmd:option('--saturateEpoch', 300, 'epoch at which linear decayed LR will reach minLR')
cmd:option('--schedule', '{}', 'learning rate schedule')
cmd:option('--maxWait', 4, 'maximum number of epochs to wait for a new minima to be found. After that, the learning rate is decayed by decayFactor.')
cmd:option('--decayFactor', 0.001, 'factor by which learning rate is decayed for adaptive decay.')
cmd:option('--maxOutNorm', 1, 'max norm each layers output neuron weights')
cmd:option('--momentum', 0, 'momentum')
cmd:option('--hiddenSize', '{200,200}', 'number of hidden units per layer')
cmd:option('--batchSize', 32, 'number of examples per batch')
cmd:option('--cuda', false, 'use CUDA')
cmd:option('--useDevice', 1, 'sets the device (GPU) to use')
cmd:option('--maxEpoch', 100, 'maximum number of epochs to run')
cmd:option('--maxTries', 30, 'maximum number of epochs to try to find a better local minima for early-stopping')
cmd:option('--dropout', false, 'apply dropout on hidden neurons')
cmd:option('--batchNorm', false, 'use batch normalization. dropout is mostly redundant with this')
cmd:option('--dataset', 'Mnist', 'which dataset to use : Mnist | NotMnist | Cifar10 | Cifar100')
cmd:option('--standardize', false, 'apply Standardize preprocessing')
cmd:option('--zca', false, 'apply Zero-Component Analysis whitening')
cmd:option('--progress', false, 'display progress bar')
cmd:option('--silent', false, 'dont print anything to stdout')
cmd:text()
opt = cmd:parse(arg or {})
opt.schedule = dp.returnString(opt.schedule)
opt.hiddenSize = dp.returnString(opt.hiddenSize)
if not opt.silent then
   table.print(opt)
end

{
   batchNorm : false
   batchSize : 32
   cuda : false
   dataset : "Mnist"
   decayFactor : 0.001
   dropout : false
   hiddenSize : {200,200}
   learningRate : 0.1
   lrDecay : "linear"
   maxEpoch : 100
   maxOutNorm : 1
   maxTries : 30
   maxWait : 4
   minLR : 1e-05
   momentum : 0
   progress : false
   saturateEpoch : 300
   schedule : {}
   silent : false
   standardize : false
   useDevice : 1
   zca : false
}	


## Preprocess

The `--standardize` and `--zca` cmd-line arguments can be toggled on to perform some preprocessing on the data.

In [5]:
--[[preprocessing]]--
local input_preprocess = {}
if opt.standardize then
   table.insert(input_preprocess, dp.Standardize())
end
if opt.zca then
   table.insert(input_preprocess, dp.ZCA())
end
if opt.lecunlcn then
   table.insert(input_preprocess, dp.GCN())
   table.insert(input_preprocess, dp.LeCunLCN{progress=true})
end

A very common and easy preprocessing technique is to Standardize the datasource, which subtracts the mean and divides by the standard deviation. Both statistics (mean and standard deviation) are measured on the `train` set only. This is a common pattern when preprocessing data. When statistics need to be measured across different examples (as in ZCA and LecunLCN preprocesses), we fit the preprocessor on the `train` set and apply it to all sets (`train`, `valid` and `test`). However, some preprocesses require that statistics be measured only on each example, as is the case for global constrast normalization ([GCN]](preprocess.md#dp.GCN)), such that there is no fitting.

## DataSource

We intend to build and train a neural network so we need some data, which we encapsulate in a DataSource object. dp provides the option of training on different datasets, notably MNIST, NotMNIST, CIFAR-10 or CIFAR-100. The default for this script is the archetypal MNIST (don't leave home without it). However, you can use the `--dataset` argument to specify a different image classification dataset.

If you have a trouble to pull `mnist4.zip`, you can manually locate that file in `~/data/mnist/`.

In [9]:
--[[data]]--

if opt.dataset == 'Mnist' then
   ds = dp.Mnist{input_preprocess = input_preprocess}
elseif opt.dataset == 'NotMnist' then
   ds = dp.NotMnist{input_preprocess = input_preprocess}
elseif opt.dataset == 'Cifar10' then
   ds = dp.Cifar10{input_preprocess = input_preprocess}
elseif opt.dataset == 'Cifar100' then
   ds = dp.Cifar100{input_preprocess = input_preprocess}
else
    error("Unknown Dataset")
end

decompressing file: 	/Users/Calvin/data/mnist/mnist4.zip	
Archive:  /Users/Calvin/data/mnist/mnist4.zip
  inflating: ./train.th7             


  inflating: ./test.th7              


A DataSource contains up to three DataSets:  `train`, `valid` and `test`. The first is for training the model. The second is used for early-stopping and cross-validation. The third is used for publishing papers and comparing results across different models.

## Model of Modules

Ok so we have a DataSource, now we need a model. Let's build a multi-layer perceptron (MLP) with one or more parameterized non-linear layers (note that in the case of hidden layers being ommitted (`--hiddenSize '{}'`), the model is just a linear classifier):

In [10]:
--[[Model]]--

model = nn.Sequential()
model:add(nn.Convert(ds:ioShapes(), 'bf')) -- to batchSize x nFeature (also type converts)

-- hidden layers
inputSize = ds:featureSize()
for i,hiddenSize in ipairs(opt.hiddenSize) do
   model:add(nn.Linear(inputSize, hiddenSize)) -- parameters
   if opt.batchNorm then
      model:add(nn.BatchNormalization(hiddenSize))
   end
   model:add(nn.Tanh())
   if opt.dropout then
      model:add(nn.Dropout())
   end
   inputSize = hiddenSize
end

-- output layer
model:add(nn.Linear(inputSize, #(ds:classes())))
model:add(nn.LogSoftMax())

Output and hidden layers are defined using a Linear, which contains the parameters that will be learned, followed by a non-linear transfer function like Tanh (for the hidden neurons) and LogSoftMax (for the output layer). The latter might seem odd (why not use SoftMax instead?), but the ClassNLLCriterion only works with LogSoftMax (or with SoftMax + Log).

The Linear modules are constructed using 2 arguments,  `inputSize` (number of input units) and `outputSize` (number of output units). For the first layer, the `inputSize` is the number of features in the input image. In our case, that is `1x28x28=784`, which is what `ds:featureSize()` will return.

Now for the odd looking nn.Convert Module. It has two purposes. First, whatever type of Tensor received, it will output the type of Tensor used by the Module. Second, it can convert from different Tensor `shapes`. The `shape` of a typical image is bchw, short for batch, color/channel, height and width. Modules like SpatialConvolution and SpatialMaxPooling expect this type of input. Our MLP, on the other hand, expects an input of shape bf, short for batch, feature. Its a pretty simple conversion actually, all you need to do is flatten the chw dimensions to a single f dimension (in this case, of size 784).

For those not familiar with the nn package, all the `nn.*` in the above snippet of code are Module subclasses. This is true even for the Sequential. Although the latter is special. It is a Container of other Modules, i.e. a composite.

## Propagator

Next we initialize some Propagators. Each such Propagator will propagate examples from a different DataSet. Samplers iterate over DataSets to generate Batches of examples (inputs and targets) to propagate through the `model`:

In [11]:
--[[Propagators]]--
if opt.lrDecay == 'adaptive' then
   ad = dp.AdaptiveDecay{max_wait = opt.maxWait, decay_factor=opt.decayFactor}
elseif opt.lrDecay == 'linear' then
   opt.decayFactor = (opt.minLR - opt.learningRate)/opt.saturateEpoch
end

train = dp.Optimizer{
   acc_update = opt.accUpdate,
   loss = nn.ModuleCriterion(nn.ClassNLLCriterion(), nil, nn.Convert()),
   epoch_callback = function(model, report) -- called every epoch
      -- learning rate decay
      if report.epoch > 0 then
         if opt.lrDecay == 'adaptive' then
            opt.learningRate = opt.learningRate*ad.decay
            ad.decay = 1
         elseif opt.lrDecay == 'schedule' and opt.schedule[report.epoch] then
            opt.learningRate = opt.schedule[report.epoch]
         elseif opt.lrDecay == 'linear' then 
            opt.learningRate = opt.learningRate + opt.decayFactor
         end
         opt.learningRate = math.max(opt.minLR, opt.learningRate)
         if not opt.silent then
            print("learningRate", opt.learningRate)
         end
      end
   end,
   callback = function(model, report) -- called every batch
      if opt.accUpdate then
         model:accUpdateGradParameters(model.dpnn_input, model.output, opt.learningRate)
      else
         model:updateGradParameters(opt.momentum) -- affects gradParams
         model:updateParameters(opt.learningRate) -- affects params
      end
      model:maxParamNorm(opt.maxOutNorm) -- affects params
      model:zeroGradParameters() -- affects gradParams 
   end,
   feedback = dp.Confusion(),
   sampler = dp.ShuffleSampler{batch_size = opt.batchSize},
   progress = opt.progress
}
valid = dp.Evaluator{
   feedback = dp.Confusion(),  
   sampler = dp.Sampler{batch_size = opt.batchSize}
}
test = dp.Evaluator{
   feedback = dp.Confusion(),
   sampler = dp.Sampler{batch_size = opt.batchSize}
}

For this example, we use an Optimizer for the training DataSet, and two Evaluators, one for cross-validation and another for testing. Now lets explore the different constructor arguments.

### `sampler`

The Evaluators use a simple Sampler which iterates sequentially through the DataSet. On the other hand, the Optimizer uses a ShuffleSampler. This Sampler shuffles the (indices of a) DataSet before each pass over all examples in a DataSet. This shuffling is useful for training since the model must learn from varying sequences of batches through the DataSet. Which makes the training algorithm more stochastic (subject to the constraint that each example is presented once and only once per epoch).

### `loss`

Each Propagator can also specify a loss for training or evaluation. This argument is only mandatory for the Optimizer, as it is required for backpropagation. If you have previously used the nn package, there is nothing new here. The loss is a Criterion. Each example has a single target class and our Model output is LogSoftMax so we use a ClassNLLCriterion. The criterion is wrapped in ModuleCriterion, which is a decorator that allows us to pass each input and target through a module before it is passed on to the decorated criterion. In our case, we want to make sure each target gets converted to the type of the loss.

### `feedback`

The feedback parameter is used to provide us with, you guessed it, feedback (like performance measures and statistics after each epoch). We use Confusion, which is a wrapper for the optim package's ConfusionMatrix. While our Loss measures the Negative Log-Likelihood (NLL) of the model on different datasets, our Feedback measures classification accuracy (which is what we will use for early-stopping and comparing our model to the state-of-the-art).

### `callback`

Since the Optimizer is used to train the model on a DataSet, we need to specify a callback function that will be called after successive forward/backward calls. Among other things, the callback should either updateParameters or accUpdateGradParameters. Depending on what is specified in the command-line, it can also be used to updateGradParameters (commonly known as momentum learning). You can also choose to regularize it with weightDecay or maxParamNorm, (personally, I prefer the latter to the former). In any case, the callback is a function that you can define to fit your needs.

### `epoch_callback`

While the callback argument is called every batch, the epoch_callback is called between epochs. This is useful for decaying hyper-parameters such as the learning rate, which is what we do in this example. Since the learning rate is the most important hyper-parameter, it is a good idea to try different learning rate decay schedules during hyper-optimization.

The --lrDecay 'linear' decay is the easiest to use (the default cmd-line argument). It involves specifying a starting learning rate --learningRate, a minimum learning rate --minLR and the epoch at which that minimum will be reached : --saturateEpoch.

The --lrDecay 'adaptive' uses an exponentially decaying learning rate. By default this mode only decays the learning rate when a minima hasn't been found for --maxWait epochs. But by using --maxWait -1, we can decay the learning rate every epoch with the following rule : lr = lr*decayFactor. This will decay the learning rate much faster than a linear decay.

Another option (--lrDecayschedule) is to specify the learning rate--schedule` manually by specifying a table mapping learning rates to epochs like '{[200] = 0.01, [300] = 0.001}', which will decay the learning rate to the given value at the given epoch.

Of course, because this argument is just another callback function, you can use it however you please by coding your own function.

### `acc_update`

When set to true, the gradients w.r.t. parameters (a.k.a. gradParameters) are accumulated directly into the parameters (a.k.a. parameters) to produce an update after the forward and backward pass. In other words, for acc_update=true, the sequence for propagating a batch is essentially:

1. updateOutput
2. updateGradInput
3. accUpdateGradParameters

Instead of the more common:

1. updateOutput
2. updateGradInput
3. accGradParameters
4. updateParameters

This means that no gradParameters are actually used internally. The default value if false. Some methods do not work with acc_update as they require the the gradParameters tensors be populated before being added to the parameters. This is the case for updateGradParameters (momentum learning) and weightDecay.

### `progress`

Finally, we allow for the Optimizer's progress bar to be switched on so that we can monitor training progress.

## Experiment

Now its time to put this all together to form an Experiment:

In [12]:
--[[Experiment]]--

xp = dp.Experiment{
   model = model,
   optimizer = train,
   validator = valid,
   tester = test,
   observer = {
      dp.FileLogger(),
      dp.EarlyStopper{
         error_report = {'validator','feedback','confusion','accuracy'},
         maximize = true,
         max_epochs = opt.maxTries
      }
   },
   random_seed = os.time(),
   max_epoch = opt.maxEpoch
}

### Observer

The Experiment can be initialized using a list of Observers. The order is not important. Observers listen to mediator Channels. The Mediator calls them back when certain events occur. In particular, they may listen to the doneEpoch Channel to receive a report from the Experiment after each epoch. A report is nothing more than a bunch of nested tables matching the object structure of the experiment. After each epoch, the component objects of the Experiment (except Observers) can each submit a report to its composite parent thereby forming a tree of reports. The Observers can analyse these and modify the components which they are assigned to (in this case, Experiment). Observers may be attached to Experiments, Propagators, Visitors, etc.

### FileLogger

Here we use a simple FileLogger which will store serialized reports in a simple text file for later use. Each experiment has a unique ID which is included in the corresponding reports, thus allowing the FileLogger to name its file appropriately.

### EarlyStopper

The EarlyStopper is used for stopping the Experiment when the error has not decreased, or accuracy has not been maximized. It also saves to disk the best version of the Experiment when it finds a new one. It is initialized with a channel to maximize or minimize (the default is to minimize). In this case, we intend to early-stop the experiment on a field of the report, in particular the accuracy field of the confusion table of the feedback table of the validator. This {'validator','feedback','confusion','accuracy'} happens to measure the accuracy of the Model on the validation DataSet after each training epoch. So by early-stopping on this measure, we hope to find a Model that generalizes well. The parameter max_epochs indicates how many consecutive epochs of training can occur without finding a new best model before the experiment is signaled to stop by the doneExperiment Mediator Channel.

## Running the Experiment

Once we have initialized the experiment, we need only run it on the datasource to begin training.

In [13]:
xp:run(ds)

==> epoch # 1 for optimizer :	


==> example speed = 12005.248724198 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.012680900638777	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.8883	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9205	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9254	
==> epoch # 2 for optimizer :	
learningRate	0.0996667	
==> example speed = 12960.909405483 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0072663029688749	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.93184	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9406	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9446	
==> epoch # 3 for optimizer :	
learningRate	0.0993334	


==> example speed = 13045.26678063 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0057252667801146	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.9477	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.949	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.95	
==> epoch # 4 for optimizer :	
learningRate	0.0990001	
==> example speed = 12711.484342217 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0047952777237153	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.95654	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9563	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9566	
==> epoch # 5 for optimizer :	
learningRate	0.0986668	


==> example speed = 13002.122534108 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0041577624015475	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.96222	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9598	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9603	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 6 for optimizer :	
learningRate	0.0983335	
==> example speed = 12262.992596543 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0037269017884712	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.96636	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9628	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9631	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 7 for optimizer :	
learningRate	0.0980002	


==> example speed = 12928.291833463 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0033465400241032	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.96964	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9669	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9652	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 8 for optimizer :	
learningRate	0.0976669	
==> example speed = 11338.133065433 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0030726780987977	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.97288	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9661	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9663	
==> epoch # 9 for optimizer :	
learningRate	0.0973336	


==> example speed = 12318.769495228 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0028642991677042	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.9748	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9664	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9679	
==> epoch # 10 for optimizer :	
learningRate	0.0970003	
==> example speed = 12799.475802718 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.002672306250463	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.97662	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9676	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9695	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 11 for optimizer :	
learningRate	0.096667	


==> example speed = 12933.251007265 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0025164904230903	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.97924	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9697	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9684	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 12 for optimizer :	
learningRate	0.0963337	
==> example speed = 12980.145468599 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0023981546546169	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.98002	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9718	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9722	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 13 for optimizer :	
learningRate	0.0960004	


==> example speed = 12046.276867368 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0022687325645182	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.98128	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9705	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9714	
==> epoch # 14 for optimizer :	
learningRate	0.0956671	
==> example speed = 12913.863323715 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0021489548732926	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.98294	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9718	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9739	
==> epoch # 15 for optimizer :	
learningRate	0.0953338	
==> example speed = 11846.053587165 examples/s	


Mingming.local:1438830384:1:optimizer:loss avgErr 0.0020779247435501	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.98274	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9737	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.975	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 16 for optimizer :	
learningRate	0.0950005	
==> example speed = 12369.682424787 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0019733026542487	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.98408	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9746	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9745	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 17 for optimizer :	
learningRate	0.0946672	
==> example speed = 13032.841124962 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0019218221888643	
Mingming.local:1438830384:1:optimizer

==> example speed = 12864.178778637 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0018303194976572	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.98622	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9751	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9748	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 19 for optimizer :	
learningRate	0.0940006	
==> example speed = 11896.983785762 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.001781126733682	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.98666	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9744	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9761	
==> epoch # 20 for optimizer :	
learningRate	0.0936673	


==> example speed = 11320.007783628 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0017280557069612	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.98704	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9767	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9763	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 21 for optimizer :	
learningRate	0.093334	
==> example speed = 12563.764571518 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.001667178212658	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.98766	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9771	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9767	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 22 for optimizer :	
learningRate	0.0930007	


==> example speed = 12118.086206913 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0016241946991758	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.98826	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9754	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.977	
==> epoch # 23 for optimizer :	
learningRate	0.0926674	


==> example speed = 12303.297901387 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0015961189712455	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.98844	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9713	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9764	
==> epoch # 24 for optimizer :	
learningRate	0.0923341	
==> example speed = 12569.045326434 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0015394483816462	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.98946	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9727	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9748	
==> epoch # 25 for optimizer :	
learningRate	0.0920008	


==> example speed = 12892.013758107 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0015102322104702	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.98934	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9768	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9782	
==> epoch # 26 for optimizer :	
learningRate	0.0916675	
==> example speed = 12870.364381047 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.001476799684578	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.9897	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9709	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9725	
==> epoch # 27 for optimizer :	
learningRate	0.0913342	


==> example speed = 12588.652929282 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0014476106598127	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99032	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.977	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9784	
==> epoch # 28 for optimizer :	
learningRate	0.0910009	
==> example speed = 11468.86854441 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0014155623099914	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99086	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9754	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9772	
==> epoch # 29 for optimizer :	
learningRate	0.0906676	


==> example speed = 12954.334000914 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0013904414211491	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.9907	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9778	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9787	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 30 for optimizer :	
learningRate	0.0903343	
==> example speed = 13082.909126162 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0013618815349777	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.9916	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9783	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9789	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 31 for optimizer :	
learningRate	0.090001	
==> example speed = 12358.308881232 examples/s	


Mingming.local:1438830384:1:optimizer:loss avgErr 0.0013367883618983	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99162	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9763	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9797	
==> epoch # 32 for optimizer :	
learningRate	0.0896677	
==> example speed = 13037.310239136 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0013128595637443	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99198	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9778	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9795	
==> epoch # 33 for optimizer :	
learningRate	0.0893344	
==> example speed = 12465.361645228 examples/s	


Mingming.local:1438830384:1:optimizer:loss avgErr 0.0012823231851735	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99224	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9787	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9782	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 34 for optimizer :	
learningRate	0.0890011	


==> example speed = 12692.49801244 examples/s	


Mingming.local:1438830384:1:optimizer:loss avgErr 0.0012714623908622	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.9924	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9783	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9799	


==> epoch # 35 for optimizer :	
learningRate	0.0886678	


==> example speed = 12845.836207919 examples/s	


Mingming.local:1438830384:1:optimizer:loss avgErr 0.0012593490944031	


Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99222	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.978	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.98	


==> epoch # 36 for optimizer :	
learningRate	0.0883345	


==> example speed = 12365.197716502 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0012318011223081	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.9929	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9764	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9794	
==> epoch # 37 for optimizer :	
learningRate	0.0880012	
==> example speed = 11416.19533137 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0012317491399752	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99256	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9779	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.979	
==> epoch # 38 for optimizer :	
learningRate	0.0876679	


==> example speed = 12250.401555647 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0012069451313965	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99256	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9779	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9787	
==> epoch # 39 for optimizer :	
learningRate	0.0873346	


==> example speed = 12564.4209417 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0011909162897546	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99322	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9787	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.98	
==> epoch # 40 for optimizer :	
learningRate	0.0870013	
==> example speed = 13023.912891056 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0011829063401755	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99312	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9784	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9803	
==> epoch # 41 for optimizer :	
learningRate	0.086668	
==> example speed = 12988.436174948 examples/s	


Mingming.local:1438830384:1:optimizer:loss avgErr 0.0011556369835593	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99392	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9791	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.979	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 42 for optimizer :	
learningRate	0.0863347	
==> example speed = 13018.392593972 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0011491813484371	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99358	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9767	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9802	
==> epoch # 43 for optimizer :	
learningRate	0.0860014	
==> example speed = 12983.566423154 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0011346706456584	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99444	
Mingming.local:1438830384:1:validator:confu

==> example speed = 12973.318595425 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0011345734376488	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99378	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9809	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9803	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 45 for optimizer :	
learningRate	0.0853348	


==> example speed = 12813.042181669 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0011104919509359	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99414	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9773	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9786	
==> epoch # 46 for optimizer :	
learningRate	0.0850015	
==> example speed = 11389.238693538 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0011151399677256	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99426	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9788	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9794	
==> epoch # 47 for optimizer :	
learningRate	0.0846682	
==> example speed = 12846.376800231 examples/s	


Mingming.local:1438830384:1:optimizer:loss avgErr 0.0010887182846636	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99444	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9791	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9803	
==> epoch # 48 for optimizer :	
learningRate	0.0843349	
==> example speed = 12986.234041543 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0010840497986231	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99454	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9789	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9803	
==> epoch # 49 for optimizer :	
learningRate	0.0840016	


==> example speed = 12655.06536136 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0010750676081854	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99454	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9806	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9814	
==> epoch # 50 for optimizer :	
learningRate	0.0836683	


==> example speed = 12358.351848892 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0010603279452475	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99452	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9802	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9803	
==> epoch # 51 for optimizer :	
learningRate	0.083335	
==> example speed = 13001.056933497 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0010524556404985	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99484	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9794	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9804	
==> epoch # 52 for optimizer :	
learningRate	0.0830017	
==> example speed = 12995.343393946 examples/s	


Mingming.local:1438830384:1:optimizer:loss avgErr 0.0010489264100897	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.995	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.98	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9822	
==> epoch # 53 for optimizer :	
learningRate	0.0826684	
==> example speed = 13040.662464897 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0010330553910676	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99512	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9798	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9818	
==> epoch # 54 for optimizer :	
learningRate	0.0823351	
==> example speed = 12822.353814209 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0010402103082934	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99466	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9789	
Mingming.local:1438830384:1:tester:confusion accuracy

==> example speed = 13003.834147838 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0010282927425012	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99516	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9772	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9807	
==> epoch # 56 for optimizer :	
learningRate	0.0816685	
==> example speed = 13017.321900935 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0010138760337517	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99512	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9807	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9811	
==> epoch # 57 for optimizer :	
learningRate	0.0813352	


==> example speed = 13023.577238499 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0010136864885497	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.9953	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9791	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9809	
==> epoch # 58 for optimizer :	
learningRate	0.0810019	
==> example speed = 12997.336761688 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0010063021545357	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99514	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.98	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9817	
==> epoch # 59 for optimizer :	
learningRate	0.0806686	


==> example speed = 13009.734322887 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0010001659140171	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99522	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.98	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9814	
==> epoch # 60 for optimizer :	
learningRate	0.0803353	
==> example speed = 13024.584248078 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00098899253424316	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99548	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9804	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9821	
==> epoch # 61 for optimizer :	
learningRate	0.080002	


==> example speed = 13026.630291646 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00098585489293185	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99538	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9811	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9831	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 62 for optimizer :	
learningRate	0.0796687	
==> example speed = 13027.356957626 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00097453959869528	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99562	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9799	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9813	
==> epoch # 63 for optimizer :	
learningRate	0.0793354	


==> example speed = 13013.594031481 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00096692859466214	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.9959	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9794	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9816	
==> epoch # 64 for optimizer :	
learningRate	0.0790021	
==> example speed = 13056.599962881 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00097182100964989	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.9956	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.981	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9809	
==> epoch # 65 for optimizer :	
learningRate	0.0786688	
==> example speed = 13046.015005002 examples/s	


Mingming.local:1438830384:1:optimizer:loss avgErr 0.00096308701916634	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99584	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.981	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9813	
==> epoch # 66 for optimizer :	
learningRate	0.0783355	
==> example speed = 13026.043677252 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.000959024777077	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99586	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.98	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9817	
==> epoch # 67 for optimizer :	
learningRate	0.0780022	


==> example speed = 12928.860111673 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00095222550375682	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99576	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.98	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9812	
==> epoch # 68 for optimizer :	
learningRate	0.0776689	
==> example speed = 13064.851915622 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00094910704690601	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99606	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9807	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.982	
==> epoch # 69 for optimizer :	
learningRate	0.0773356	


==> example speed = 13014.356394876 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0009482668412729	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99594	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9765	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9783	
==> epoch # 70 for optimizer :	
learningRate	0.0770023	


==> example speed = 13004.01315561 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00094365990315384	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99584	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9812	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.982	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 71 for optimizer :	
learningRate	0.076669	
==> example speed = 13054.557496894 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00093214907624351	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99618	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9797	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9817	
==> epoch # 72 for optimizer :	
learningRate	0.0763357	


==> example speed = 13051.180255848 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00092384555260832	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99624	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9799	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9811	
==> epoch # 73 for optimizer :	
learningRate	0.0760024	
==> example speed = 13020.226514412 examples/s	


Mingming.local:1438830384:1:optimizer:loss avgErr 0.00092541953881822	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99604	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9794	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9809	
==> epoch # 74 for optimizer :	
learningRate	0.0756691	


==> example speed = 12973.530472163 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00091730438716163	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.9964	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9812	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.982	
==> epoch # 75 for optimizer :	
learningRate	0.0753358	
==> example speed = 13085.950052858 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0009177488588405	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99658	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9776	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9759	
==> epoch # 76 for optimizer :	
learningRate	0.0750025	


==> example speed = 13059.065102225 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00090949309777768	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99634	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9711	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9728	
==> epoch # 77 for optimizer :	
learningRate	0.0746692	


==> example speed = 13025.802573829 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00090860059405899	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99652	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9819	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9825	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 78 for optimizer :	
learningRate	0.0743359	
==> example speed = 13058.96670647 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00090616522434592	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99638	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9802	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9822	
==> epoch # 79 for optimizer :	
learningRate	0.0740026	


==> example speed = 13056.746284218 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00090109108263381	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.9964	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9803	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9808	
==> epoch # 80 for optimizer :	
learningRate	0.0736693	
==> example speed = 13067.457777319 examples/s	


Mingming.local:1438830384:1:optimizer:loss avgErr 0.0008967868474894	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99638	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9811	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9818	
==> epoch # 81 for optimizer :	
learningRate	0.073336	
==> example speed = 13066.958667414 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00089859647202261	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.9965	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9806	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9817	
==> epoch # 82 for optimizer :	
learningRate	0.0730027	
==> example speed = 13053.650661574 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00089533807013431	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99664	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9817	
Mingming.local:1438830384:1:tester:confusion accu

==> example speed = 13052.370255649 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00088304498327732	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99648	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9808	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9814	
==> epoch # 84 for optimizer :	
learningRate	0.0723361	


==> example speed = 13071.67364198 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.000885516645372	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99648	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9806	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9825	
==> epoch # 85 for optimizer :	
learningRate	0.0720028	
==> example speed = 13057.016174397 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00088330774408195	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99652	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9799	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9822	
==> epoch # 86 for optimizer :	
learningRate	0.0716695	


==> example speed = 13054.832985261 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00088214082166616	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99652	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9809	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9818	
==> epoch # 87 for optimizer :	
learningRate	0.0713362	


==> example speed = 13058.699988244 examples/s	


Mingming.local:1438830384:1:optimizer:loss avgErr 0.00087362034735549	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99682	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.982	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9829	
SaveToFile: saving to /Users/Calvin/save/Mingming.local:1438830384:1.dat	
==> epoch # 88 for optimizer :	
learningRate	0.0710029	
==> example speed = 12999.874659793 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00087125914070715	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99672	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9809	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.982	
==> epoch # 89 for optimizer :	
learningRate	0.0706696	
==> example speed = 13069.237133437 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00086763750401841	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99668	
Mingming.local:1438830384:1:validator:con

==> example speed = 13055.877346152 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00086284329332613	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.9968	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9797	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9803	
==> epoch # 91 for optimizer :	
learningRate	0.070003	


==> example speed = 12692.133904472 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00086367925535907	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99672	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9815	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9826	
==> epoch # 92 for optimizer :	
learningRate	0.0696697	
==> example speed = 13088.734250111 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.0008640336190396	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99678	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.981	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9815	
==> epoch # 93 for optimizer :	
learningRate	0.0693364	


==> example speed = 13005.64622431 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00085610268385092	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99706	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9814	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9818	
==> epoch # 94 for optimizer :	
learningRate	0.0690031	


==> example speed = 13078.21703627 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00085911493317722	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99674	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9817	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9824	
==> epoch # 95 for optimizer :	
learningRate	0.068669800000001	
==> example speed = 13078.805912364 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00086003277828493	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99708	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9804	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9833	
==> epoch # 96 for optimizer :	
learningRate	0.068336500000001	
==> example speed = 13066.351319188 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00084473250915891	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99712	
Mingming.local:1438830384:1:validator:confusion ac

==> example speed = 13025.551771014 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00084968017767125	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99694	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9814	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9818	
==> epoch # 98 for optimizer :	
learningRate	0.067669900000001	
==> example speed = 13095.82055043 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00084763783664583	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99714	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9822	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9827	
==> epoch # 99 for optimizer :	
learningRate	0.067336600000001	


==> example speed = 13088.539832368 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00084568746335509	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.99702	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.982	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9833	
==> epoch # 100 for optimizer :	
learningRate	0.067003300000001	


==> example speed = 12995.887784398 examples/s	
Mingming.local:1438830384:1:optimizer:loss avgErr 0.00083685147374363	
Mingming.local:1438830384:1:optimizer:confusion accuracy = 0.9972	
Mingming.local:1438830384:1:validator:confusion accuracy = 0.9806	
Mingming.local:1438830384:1:tester:confusion accuracy = 0.9816	



We don't initialize the Experiment with the DataSource so that we may easily save it to disk, thereby keeping this snapshot separate from its data (which shouldn't be modified by the experiment).

Let's run the script from the cmd-line (with default arguments):

    nicholas@xps:~/projects/dp$ th examples/neuralnetwork.lua

First it prints the command-line arguments stored in opt:

    {
       batchNorm : false
       batchSize : 32
       cuda : false
       dataset : "Mnist"
       dropout : false
       hiddenSize : {200,200}
       learningRate : 0.1
       lecunlcn : false
       maxEpoch : 100
       maxOutNorm : 1
       maxTries : 30
       momentum : 0
       progress : false
       schedule : {[200]=0.01,[400]=0.001}
       silent : false
       standardize : false
       useDevice : 1
       zca : false
    }  

After that it prints the model.

    Model : 
    nn.Sequential {
      [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> output]
      (1): nn.Convert
      (2): nn.Linear(784 -> 200)
      (3): nn.Tanh
      (4): nn.Linear(200 -> 200)
      (5): nn.Tanh
      (6): nn.Linear(200 -> 10)
      (7): nn.LogSoftMax
    }

The `FileLogger` then prints where the epoch logs will be saved. This can be controlled with the `$TORCH_DATA_PATH` or `$DEEP_SAVE_PATH` environment variables. It defaults to $HOME/save.

    FileLogger: log will be written to /home/nicholas/save/xps:1432747515:1/log 
    
Finally, we get to the fun part : the actual training. Every epoch, some performance data gets printed to `stdout`:

    ==> epoch # 1 for optimizer :   
    ==> example speed = 4508.3691689025 examples/s  
    xps:1432747515:1:optimizer:loss avgErr 0.012714211946021    
    xps:1432747515:1:optimizer:confusion accuracy = 0.8877  
    xps:1432747515:1:validator:confusion accuracy = 0.9211  
    xps:1432747515:1:tester:confusion accuracy = 0.9292 
    ==> epoch # 2 for optimizer :   
    ==> example speed = 4526.7213369494 examples/s  
    xps:1432747515:1:optimizer:loss avgErr 0.0072034133582363   
    xps:1432747515:1:optimizer:confusion accuracy = 0.93302 
    xps:1432747515:1:validator:confusion accuracy = 0.9405  
    xps:1432747515:1:tester:confusion accuracy = 0.9428 
    ==> epoch # 3 for optimizer :   
    ==> example speed = 4486.8207535058 examples/s  
    xps:1432747515:1:optimizer:loss avgErr 0.0056732489919492   
    xps:1432747515:1:optimizer:confusion accuracy = 0.94704 
    xps:1432747515:1:validator:confusion accuracy = 0.9512  
    xps:1432747515:1:tester:confusion accuracy = 0.9518 
    ==> epoch # 4 for optimizer :   
    ==> example speed = 4524.4831336064 examples/s  
    xps:1432747515:1:optimizer:loss avgErr 0.0047361240094285   
    xps:1432747515:1:optimizer:confusion accuracy = 0.95672 
    xps:1432747515:1:validator:confusion accuracy = 0.9565  
    xps:1432747515:1:tester:confusion accuracy = 0.9584 
    ==> epoch # 5 for optimizer :   
    ==> example speed = 4527.7260154406 examples/s  
    xps:1432747515:1:optimizer:loss avgErr 0.0041567858616232   
    xps:1432747515:1:optimizer:confusion accuracy = 0.96188 
    xps:1432747515:1:validator:confusion accuracy = 0.9603  
    xps:1432747515:1:tester:confusion accuracy = 0.9613 
    SaveToFile: saving to /home/nicholas/save/xps:1432747515:1.dat  
    ==> epoch # 6 for optimizer :   
    ==> example speed = 4519.2735741475 examples/s  
    xps:1432747515:1:optimizer:loss avgErr 0.0037086909102431   
    xps:1432747515:1:optimizer:confusion accuracy = 0.9665  
    xps:1432747515:1:validator:confusion accuracy = 0.9602  
    xps:1432747515:1:tester:confusion accuracy = 0.9629 
    ==> epoch # 7 for optimizer :   
    ==> example speed = 4528.1356378239 examples/s  
    xps:1432747515:1:optimizer:loss avgErr 0.0033203622647625   
    xps:1432747515:1:optimizer:confusion accuracy = 0.97062 
    xps:1432747515:1:validator:confusion accuracy = 0.966   
    xps:1432747515:1:tester:confusion accuracy = 0.9665 
    SaveToFile: saving to /home/nicholas/save/xps:1432747515:1.dat  
    
After 5 epochs, the experiment starts early-stopping by saving to disk the version of the model with the lowest `xps:1432747515:1:validator:confusion accuracy`. The first part of that string (`xps:1432747515:1`) is the unique id of the experiment. It concatenates the hostname of the computer (`xps` in this case) and a time-stamp.

## Loading the saved Experiment

The experiment is saved at `/home/nicholas/save/xps:1432747515:1.dat`. You can load it and access the `model` with :

    require 'dp'
    require 'cuda' -- if you used cmd-line argument --cuda

    xp = torch.load("/home/nicholas/save/xps:1432747515:1.dat")
    model = xp:model()
    print(torch.type(model))
    nn.Serial
    
For efficiency, the `model` here is decorated with a nn.Serial. You can access the `model` you passed to the experiment by adding :

    model = model.module
    print(torch.type(model))
    nn.Sequential