New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor updates to sequence training and adjusting priors. #1345
Conversation
@@ -307,24 +303,25 @@ while [ $x -lt $num_iters ]; do | |||
nnet3-am-copy --set-raw-nnet=- $dir/$x.mdl $dir/$[$x+1].mdl || exit 1; | |||
|
|||
rm $nnets_list | |||
[ ! -f $dir/$[$x+1].mdl ] && echo "$0: Did not create $dir/$[$x+1].mdl" && exit 1; | |||
if [ -f $dir/$[$x-1].mdl ] && $cleanup && \ | |||
[ $[($x-1)%$keep_model_iters] -ne 0 ] && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this has very good defaults... $keep_model_iters defaults to 1 which means all models are kept initially.
And I'm concerned that the code is too complicated... users will assume that these keep_model_iters are kept permanently, not deleted at the end.
I think it would be better to delete the [x-5]'th model (so we have 5 models for debugging purposes), and have the cleanup stage at the end delete the last 5 models... and just hardcode this 5 in the code, I don't see that anyone would ever want to configure it. Right now I think it's configurable to a confusing extent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... I'd be OK to have a 'keep_model_iters' flag, default, say, 100, that would make it save more models [but permanently, not temporarily].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the default as 100. keep_model_iters keeps models permanently. The remove models stage at the end also uses the same shift. This is same as the code that is used in train_tdnn.sh.
--num-jobs-compute-prior $num_archives_priors \ | ||
--cmd "$cmd $prior_queue_opt" --use-gpu false \ | ||
if [ $stage -le $num_iters ]; then | ||
steps/nnet3/adjust_priors.sh --egs-type degs \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this doesn't automatically happen on the last iteration anyway, i.e. inside the loop?
Perhaps by being careful about the loop indexes we could make sure it happens in the loop, to avoid code duplication.
Also I think it would be helpful if the script would wait at the very end, for any straggling adjust-priors jobs.
Otherwise the decoding might start before they are done; also the adjust-priors jobs should touch .error on error, and the script at the end should detect this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I move the code around and added a wait.
No description provided.