Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training details about different sizes #27

Open
LucyLu-LX opened this issue Sep 3, 2018 · 4 comments
Open

Training details about different sizes #27

LucyLu-LX opened this issue Sep 3, 2018 · 4 comments

Comments

@LucyLu-LX
Copy link

LucyLu-LX commented Sep 3, 2018

python preprocess.py, resize image to 256256,384384,512512,640640,736*736, and train respectively could speed up training process.

I am kinda confused the meaning of train respectively?
Does this mean to train the network in a coarse-to-fine process, which initals the network from 256x256 and then finetunes it on larger sizes?
Does this accelerate the converge of the network than train it on size 736x736 directly?

@huoyijie
Copy link
Owner

huoyijie commented Sep 4, 2018

1.Does this mean to train the network in a coarse-to-fine process, which initals the network from 256x256 and then finetunes it on larger sizes?
Yes
2.Does this accelerate the converge of the network than train it on size 736x736 directly?
Yes, because direct training on size 736 is very slow

my own training method:
set cfg.train_task_id = '2T256'
set patience 5(between 2~6)
python preprocess.py && python label.py && python advanced_east.py

when end of training, copy the best saved weights file(.h5) to initials the training of size 384, modify cfg.train_task_id = '2T384' and cfg.initial_epoch="the ending epoch" and cfg.load_weights=True and continue train.

then train 512 and so on. You could try this method, maybe there are better ways.

@hcnhatnam
Copy link

@huoyijie whether the network still remembers what they learned in 256 while training 736?

@globalmaster
Copy link

Hi,
I download tianchi ICPR dataset,set cfg.train_task_id = '3T256',run python3 preprocess.py && python3 label.py && python3 advanced_east.py. But I get this error. The output information is shown below. How can I fix this error? Can you help me?@LucyLu-LX @huoyijie @hcnhatnam

Epoch 00008: val_loss improved from 0.43569 to 0.42750, saving model to model/weights_3T256.008-0.427.h5
Epoch 9/24
1125/1125 [==============================] - 157s 139ms/step - loss: 0.2762 - val_loss: 0.4373

Epoch 00009: val_loss did not improve from 0.42750
Epoch 10/24
1125/1125 [==============================] - 156s 139ms/step - loss: 0.2579 - val_loss: 0.4435

Epoch 00010: val_loss did not improve from 0.42750
Epoch 11/24
1125/1125 [==============================] - 156s 139ms/step - loss: 0.2466 - val_loss: 0.4710

Epoch 00011: val_loss did not improve from 0.42750
Epoch 12/24
1125/1125 [==============================] - 156s 139ms/step - loss: 0.2342 - val_loss: 0.4633

Epoch 00012: val_loss did not improve from 0.42750
Epoch 13/24
1125/1125 [==============================] - 156s 139ms/step - loss: 0.2228 - val_loss: 0.4724

Epoch 00013: val_loss did not improve from 0.42750
Epoch 00013: early stopping

@hcnhatnam
Copy link

hcnhatnam commented Mar 20, 2019

@globalmaster it isn't error. The training is stopped early(early stopping) to avoid overfit. Looks like the model is not converging and this is still my problem. @globalmaster, Can you share for me dataset with google driver link?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants