training process hang! #11

bigwhite · 2017-01-23T04:02:07Z

I started the training . it ran ok for a short time. When train loop reached to epoch 4，It hang! and never recover.

the snapshot is below:

The environment is as same as issue 10。

bigwhite · 2017-01-23T06:06:48Z

Does the "hang status" mean training is over?

showforj · 2017-01-24T12:51:12Z

朋友你 Cycler' object has no attribute 'by_key 的问题怎么解决的

martin-gorner · 2017-02-03T13:32:54Z

I think your training simply finished, yes.
Look towards the end of the samples files for an iterations=XXXXX parameter. That is where you change the number of iterations.

martin-gorner · 2017-02-03T13:33:32Z

@showforj update the cycler module
pip3 install --upgrade cycler

martin-gorner · 2017-02-03T13:38:24Z

@showforj
Correction: pip3 install --upgrade matplotlib
The reason is that on Linux, you need to install matplotlib with apt-get because that also pulls in the graphics backend on which matplotlib relies.
But unfortunately, the version of martplotlib you get is not the freshest.

bigwhite · 2017-02-06T01:13:13Z

I have seen the comments in each sample files:

//mnist_1.0_softmax.py
# final max test accuracy = 0.9268 (10K iterations). Accuracy should peak above 0.92 in the first 2000 iterations.

//mnist_2.0_five_layers_sigmoid.py
# Some results to expect:
# (In all runs, if sigmoids are used, all biases are initialised at 0, if RELUs are used,
# all biases are initialised at 0.1 apart from the last one which is initialised at 0.)

## learning rate = 0.003, 10K iterations
# final test accuracy = 0.9788 (sigmoid - slow start, training cross-entropy not stabilised in the end)
# final test accuracy = 0.9825 (relu - above 0.97 in the first 1500 iterations but noisy curves)

## now with learning rate = 0.0001, 10K iterations
# final test accuracy = 0.9722 (relu - slow but smooth curve, would have gone higher in 20K iterations)

## decaying learning rate from 0.003 to 0.0001 decay_speed 2000, 10K iterations
# final test accuracy = 0.9746 (sigmoid - training cross-entropy not stabilised)
# final test accuracy = 0.9824 (relu - training set fully learned, test accuracy stable)
... ...

@martin-gorner thanks

bigwhite closed this as completed Feb 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training process hang! #11

training process hang! #11

bigwhite commented Jan 23, 2017

bigwhite commented Jan 23, 2017

showforj commented Jan 24, 2017

martin-gorner commented Feb 3, 2017

martin-gorner commented Feb 3, 2017

martin-gorner commented Feb 3, 2017

bigwhite commented Feb 6, 2017

training process hang! #11

training process hang! #11

Comments

bigwhite commented Jan 23, 2017

bigwhite commented Jan 23, 2017

showforj commented Jan 24, 2017

martin-gorner commented Feb 3, 2017

martin-gorner commented Feb 3, 2017

martin-gorner commented Feb 3, 2017

bigwhite commented Feb 6, 2017