training process hang! #11

Closed
bigwhite opened this Issue Jan 23, 2017 · 6 comments

Projects

None yet

3 participants

@bigwhite

I started the training . it ran ok for a short time. When train loop reached to epoch 4,It hang! and never recover.

the snapshot is below:

training-hang-snapshot

The environment is as same as issue 10

@bigwhite

Does the "hang status" mean training is over?

@showforj

朋友 你 Cycler' object has no attribute 'by_key 的问题怎么解决的

@martin-gorner
Owner

I think your training simply finished, yes.
Look towards the end of the samples files for an iterations=XXXXX parameter. That is where you change the number of iterations.

@martin-gorner
Owner

@showforj update the cycler module
pip3 install --upgrade cycler

@martin-gorner
Owner

@showforj
Correction: pip3 install --upgrade matplotlib
The reason is that on Linux, you need to install matplotlib with apt-get because that also pulls in the graphics backend on which matplotlib relies.
But unfortunately, the version of martplotlib you get is not the freshest.

@bigwhite
bigwhite commented Feb 6, 2017

I have seen the comments in each sample files:

//mnist_1.0_softmax.py
# final max test accuracy = 0.9268 (10K iterations). Accuracy should peak above 0.92 in the first 2000 iterations.

//mnist_2.0_five_layers_sigmoid.py
# Some results to expect:
# (In all runs, if sigmoids are used, all biases are initialised at 0, if RELUs are used,
# all biases are initialised at 0.1 apart from the last one which is initialised at 0.)

## learning rate = 0.003, 10K iterations
# final test accuracy = 0.9788 (sigmoid - slow start, training cross-entropy not stabilised in the end)
# final test accuracy = 0.9825 (relu - above 0.97 in the first 1500 iterations but noisy curves)

## now with learning rate = 0.0001, 10K iterations
# final test accuracy = 0.9722 (relu - slow but smooth curve, would have gone higher in 20K iterations)

## decaying learning rate from 0.003 to 0.0001 decay_speed 2000, 10K iterations
# final test accuracy = 0.9746 (sigmoid - training cross-entropy not stabilised)
# final test accuracy = 0.9824 (relu - training set fully learned, test accuracy stable)
... ...

@martin-gorner thanks

@bigwhite bigwhite closed this Feb 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment