support save snapshot by iteration #1204

fanlu · 2019-09-19T02:14:02Z

the epoch num may not be very large on large dataset training because of time limit. And average_checkpoints is very important to achieve lower CER. So I suspect that it's necessary to support save snapshot when iterations(10000,20000 etc.).
Any idea will be appreciate, thanks

kan-bayashi

Sorry for late review.

kan-bayashi · 2019-09-23T11:27:50Z

espnet/asr/pytorch_backend/asr.py

@@ -428,11 +429,11 @@ def train(args):
    # we used an empty collate function instead which returns list
    train_iter = {'main': ChainerDataLoader(
        dataset=TransformDataset(train, lambda data: converter([load_tr(data)])),
-        batch_size=1, num_workers=args.n_iter_processes,
+        batch_size=1, num_workers=args.n_iter_processes, pin_memory=True,


Might be better not to include this PR.

OK. I will delete it

kan-bayashi · 2019-09-23T11:27:55Z

espnet/asr/pytorch_backend/asr.py

        shuffle=not use_sortagrad, collate_fn=lambda x: x[0])}
    valid_iter = {'main': ChainerDataLoader(
        dataset=TransformDataset(valid, lambda data: converter([load_cv(data)])),
-        batch_size=1, shuffle=False, collate_fn=lambda x: x[0],
+        batch_size=1, pin_memory=True, shuffle=False, collate_fn=lambda x: x[0],


kan-bayashi · 2019-09-23T11:30:01Z

espnet/asr/asr_utils.py

@@ -284,6 +284,21 @@ def torch_snapshot(trainer):
    return torch_snapshot


+def torch_snapshot_iter(savefun=torch.save,
+                        filename='snapshot.iter.{.updater.iteration}'):


I think that it is not necessary to make a new function.
Just reuse torch_snapshot is fine.

kan-bayashi · 2019-09-23T11:35:31Z

espnet/asr/pytorch_backend/asr.py

@@ -490,6 +494,8 @@ def train(args):

    # save snapshot which contains model and optimizer states
    trainer.extend(torch_snapshot(), trigger=(1, 'epoch'))
+    if args.save_interval_iters > 0:
+        trainer.extend(torch_snapshot_iter(), trigger=(args.save_interval_iters, 'iteration'))


Why don't you reuse torch_snapshot instead of torch_snapshot_iter?

Suggested change

trainer.extend(torch_snapshot_iter(), trigger=(args.save_interval_iters, 'iteration'))

trainer.extend(torch_snapshot(filename='snapshot.iter.{.updater.iteration}'),

trigger=(args.save_interval_iters, 'iteration'))

codecov · 2019-09-23T14:41:23Z

Codecov Report

Merging #1204 into v.0.6.0 will increase coverage by <.01%.
The diff coverage is 100%.

@@             Coverage Diff             @@
##           v.0.6.0    #1204      +/-   ##
===========================================
+ Coverage    78.32%   78.32%   +<.01%     
===========================================
  Files          100      100              
  Lines         9295     9296       +1     
===========================================
+ Hits          7280     7281       +1     
  Misses        2015     2015

Impacted Files	Coverage Δ
espnet/bin/asr_train.py	`64.53% <100%> (+0.2%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d62a17...0a4058b. Read the comment docs.

espnet/asr/pytorch_backend/asr.py

sw005320 · 2019-09-25T15:10:33Z

Thanks a lot!

fanlu added 5 commits September 19, 2019 09:59

support save snapshot by iteration

3d4e3c9

support save snapshot by iteration

0948a65

no iteration save as default

c8524a1

fix format and duplicate code

58037b1

fix indent error

7c91878

kan-bayashi added the New Features label Sep 20, 2019

sw005320 requested a review from kan-bayashi September 20, 2019 07:04

sw005320 modified the milestones: v.0.5.3, v.0.6.0 Sep 20, 2019

kan-bayashi reviewed Sep 23, 2019

View reviewed changes

remove pin_memory and duplicate torch_snapshot for iter

5b57091

sw005320 reviewed Sep 23, 2019

View reviewed changes

espnet/asr/pytorch_backend/asr.py Show resolved Hide resolved

sw005320 reviewed Sep 23, 2019

View reviewed changes

espnet/asr/pytorch_backend/asr.py Show resolved Hide resolved

change to if else code bolck

0a4058b

sw005320 merged commit 2626d0c into espnet:v.0.6.0 Sep 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support save snapshot by iteration #1204

support save snapshot by iteration #1204

fanlu commented Sep 19, 2019

kan-bayashi left a comment

kan-bayashi Sep 23, 2019

fanlu Sep 23, 2019

kan-bayashi Sep 23, 2019

kan-bayashi Sep 23, 2019

kan-bayashi Sep 23, 2019

codecov bot commented Sep 23, 2019 •

edited

Loading

sw005320 commented Sep 25, 2019

	trainer.extend(torch_snapshot_iter(), trigger=(args.save_interval_iters, 'iteration'))
	trainer.extend(torch_snapshot(filename='snapshot.iter.{.updater.iteration}'),
	trigger=(args.save_interval_iters, 'iteration'))

support save snapshot by iteration #1204

support save snapshot by iteration #1204

Conversation

fanlu commented Sep 19, 2019

kan-bayashi left a comment

Choose a reason for hiding this comment

kan-bayashi Sep 23, 2019

Choose a reason for hiding this comment

fanlu Sep 23, 2019

Choose a reason for hiding this comment

kan-bayashi Sep 23, 2019

Choose a reason for hiding this comment

kan-bayashi Sep 23, 2019

Choose a reason for hiding this comment

kan-bayashi Sep 23, 2019

Choose a reason for hiding this comment

codecov bot commented Sep 23, 2019 • edited Loading

Codecov Report

sw005320 commented Sep 25, 2019

codecov bot commented Sep 23, 2019 •

edited

Loading