-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring nnet3 python scripts and adding raw nnet training #1066
Conversation
I would like to review this. I will do it later today unless there is a hurry. I would like to initiate some discussions : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
performed a cursory check, will review in detail later today.
default=0.0) | ||
parser.add_argument("--include-log-softmax", type=str, action=nnet3_train_lib.StrToBoolAction, | ||
help="add the final softmax layer ", default=True, choices = ["false", "true"]) | ||
parser.add_argument("--add-lda", type=str, action=nnet3_train_lib.StrToBoolAction, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember adding this option when I wanted to test if the LDA layer was still necessary. Experimentation showed it was important when using iVectors; so I think we should be adding this everytime unless we are using initial CNN layers. Even in that case we don't need an additional option, as we can remove it when we see that the CNN layers are being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script does not assume a classification task or a notion of classes in the output layer. So this option is required here.
Also the LDA code works only with sparse matrices.
clipping_threshold = args.clipping_threshold, | ||
ng_per_element_scale_options = args.ng_per_element_scale_options, | ||
ng_affine_options = args.ng_affine_options, | ||
label_delay = args.label_delay, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script is very similar to the normal lstm/make_configs.py
and unlike TDNN config generator it is not very complicated. So I would recommend just combining the two scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script writes config variables like objective_type, add_lda, include_final_sigmoid required by train_raw_rnn.py, which will crash the train_rnn.py script, unless changes are made to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets try to reduce the number of scripts, as maintenance will be a pain. You could add a new python module for all the convenience functions you want, to ensure that you don't severely lengthen the top level script. So if you have multiple config file types, you can different writers and readers for all these types in the new module, and call the appropriate one.
@@ -105,6 +111,49 @@ def GetSuccessfulModels(num_models, log_file_pattern, difference_threshold=1.0): | |||
|
|||
return [accepted_models, max_index+1] | |||
|
|||
def GetAverageNnetModel(dir, iter, nnets_list, run_opts, use_raw_nnet = False, shrink = None): | |||
scale = 1.0 | |||
if shrink is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you test if shrink is actually helping in your case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done any experiment on this. I used this since it is tested in acoustic model training.
@@ -351,7 +475,7 @@ def ComputePresoftmaxPriorScale(dir, alidir, num_jobs, run_opts, | |||
WriteKaldiMatrix(output_file, [scaled_counts]) | |||
ForceSymlink("../presoftmax_prior_scale.vec", "{0}/configs/presoftmax_prior_scale.vec".format(dir)) | |||
|
|||
def PrepareInitialAcousticModel(dir, alidir, run_opts): | |||
def PrepareInitialAcousticModel(dir, alidir, run_opts, use_raw_nnet = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you just want to initialize a raw nnet which is not an acoustic model, you should just create a wrapper for the nnet3-init
and not try to reuse this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a function PrepareInitialNetwork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please do this.
@@ -478,13 +603,16 @@ def GetLearningRate(iter, num_jobs, num_iters, num_archives_processed, | |||
|
|||
return num_jobs * effective_learning_rate | |||
|
|||
def DoShrinkage(iter, model_file, non_linearity, shrink_threshold): | |||
def DoShrinkage(iter, model_file, name, non_linearity, shrink_threshold, use_raw_nnet = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freewym Could you please run a simple experiment where you disable shrinkage for RNN training. IIRC it did not lead to WER improvements when I last tested this. It was just ensuring good gradient means, and we have other methods like self-repair to ensure this right now.
Regarding Vijay's comments:
I think it would be easy to re-factor the scripts and move common lines into methods that can be imported by the train scripts and thus create very short train scripts. |
Could you implement your proposal ? BTW please remove support for realignment, when making these changes. I --Vijay On Thu, Sep 29, 2016 at 2:06 AM, Vimal Manohar notifications@github.com
|
Please look at the commint d074e56 for refactoring DNN training. A similar one can be done for RNN. |
@vimalmanohar added some comments on your commit d074e56 . Could you also please update the steps/nnet3/train_rnn.py script to use your new library, wherever possible. |
5416bbc
to
0782aab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vijayaditya I have made the changes and tried to merge the RNN and DNN scripts. But it seems there are many differing options. So for now, I have kept the functions in two separate libraries. If you think there is a way to merge them, let me know.
could you resolve the conflicts I will start the review. |
@vijayaditya Resolved conflicts with master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completed one round, will resume after the requested changes have been made.
@freewym It would be better to have another set of eyes review this, as this is a change which affects all the recipes. Do you have time to review this ?
@@ -0,0 +1,348 @@ | |||
#!/usr/bin/env python | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the comment
This is a module with methods which will be used by acoustic model training and raw model (i.e., generic neural network without transition model) training scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW mention that these are normal frame level training scripts.
import logging | ||
import imp | ||
|
||
imp.load_source('nnet3_train_lib', 'steps/nnet3/nnet3_train_lib.py') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you importing the same module three different times ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not importing three times. First is to load module from source, then import module and then import functions from module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks very redundant. If you just want to import the functions then directly use from <module> import *
, you would just have to modify the sys.path
in the script to add the module address as these are not in the syspath by default. regarding the import commands just pick one and stay consistent. BTW I would recommend using the import <module>
as the namespace specification when calling function makes it easy to debug, and helps avoid issues when functions with similar signatures are defined in different modules (e.g. this happens a lot in python modules for chain and xent training).
|
||
# Set off jobs doing some diagnostics, in the background. | ||
# Use the egs dir from the previous iteration for the diagnostics | ||
logger.info("Training neural net (pass {0})".format(iter)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually add details-of-interest, e.g. learning rate per iteration, shrinkage value etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an example script for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache_read_opt = "" | ||
if iter > 0 and (iter <= (num_hidden_layers-1) * add_layers_period) and (iter % add_layers_period == 0): | ||
|
||
do_average = False # if we've just mixed up, don't do averaging but take the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we no longer have mix-up so please update this comment.
@@ -86,6 +95,14 @@ def GetArgs(): | |||
parser.add_argument("--lstm-delay", type=str, default=None, | |||
help="option to have different delays in recurrence for each lstm") | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment describing this group of options.
egs_dir = egs_dir), wait = wait) | ||
|
||
def ComputeProgress(dir, iter, egs_dir, run_opts, mb_size=256, wait=False, use_raw_nnet = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use_raw_nnet
does not seem to be appropriate name for the variable as you are using raw networks in both cases. Better use the name get_raw_nnet_from_am=True
.
@@ -48,6 +48,7 @@ def GetArgs(): | |||
parser.add_argument("--comparison-dir", type=str, action='append', help="other experiment directories for comparison. These will only be used for plots, not tables") | |||
parser.add_argument("--start-iter", type=int, help="Iteration from which plotting will start", default = 1) | |||
parser.add_argument("--is-chain", type=str, default = False, action = train_lib.StrToBoolAction, help="Iteration from which plotting will start") | |||
parser.add_argument("--is-linear-objf", type=str, default = True, action = train_lib.StrToBoolAction, help="Nnet trained with linear objective as against with quadratic objective") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provide the choices argument for bool type variable. See other boolean inputs for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think you can make this multiple choice variable with options {linear,quadratic}
. This would help future-proof the code as we can add more cost functions.
@@ -116,6 +112,13 @@ def GetArgs(): | |||
parser.add_argument("--use-presoftmax-prior-scale", type=str, action=nnet3_train_lib.StrToBoolAction, | |||
help="if true, a presoftmax-prior-scale is added", | |||
choices=['true', 'false'], default = True) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment describing this group of options and where you plan to use them.
|
||
if args.use_dense_targets: | ||
target_type = "dense" | ||
compute_accuracy = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment here describing why compute_accuracy is false for dense type. This is not readily obvious as even dense posterior targets can exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will test this out and see if the code supports this.
# we add compulsary arguments as named arguments for readability | ||
parser = argparse.ArgumentParser(description=""" | ||
Trains an RNN acoustic model using the cross-entropy objective. | ||
RNNs include LSTMs, BLSTMs and GRUs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the usage message.
Perhaps some of you could run your work with this for a while, to see if On Wed, Oct 5, 2016 at 8:53 PM, Vijayaditya Peddinti <
|
OK. Will do. |
91d7334
to
5b17a4c
Compare
@vijayaditya Made all the changes. |
@vijayaditya, where are we on this pull request? |
@@ -86,6 +99,16 @@ def GetArgs(): | |||
parser.add_argument("--lstm-delay", type=str, default=None, | |||
help="option to have different delays in recurrence for each lstm") | |||
|
|||
# Options to convert input MFCC into Fbank features. This is useful when a | |||
# LDA layer is not added (such as when using dense targets) | |||
parser.add_argument("--cepstral-lifter", type=float, dest = "cepstral_lifter", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why you need to use filterbanks instead of MFCCs? E.g. you have a convolutional architecture?
The notion that may people have Fbank being better than MFCCs is mostly a misunderstanding, since filterbanks are generally extracted with a higher dimension and that's what matters. If this turns out to be removable, it would be a good simplification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it might be useful in cases where LDA is not used.
"e.g. 22.0", default=22.0) | ||
|
||
parser.add_argument("--add-idct", type=str, action=nnet3_train_lib.StrToBoolAction, | ||
help="Add an IDCT after input to convert MFCC to Fbank", default = False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... unless this fbank stuff is being used, I think it's better to remove it for now.
What testing has been done on regular RNN and DNN training, since this affects those scripts? |
It needs to be made much more clear, in comments in the source, which of these files are deprecated and which are supposed to be the way forward. Does libs/train_lib.py deprecate nnet3_train_lib.py? This should be clarified in both source files, if so. Are there now python scripts that still use the old library that need to be upgraded to the new library? If so, that should be mentioned in comments in the source too. |
I'd rather not add any feature to these scripts unless there is a concrete On Mon, Oct 17, 2016 at 10:44 PM, Vimal Manohar notifications@github.com
|
No testing has been done on regular RNN and DNN training. But I can do that. |
The organization seems OK, but we need to have some clarity on what the On Mon, Oct 17, 2016 at 10:55 PM, Vimal Manohar notifications@github.com
|
Initially, we wanted to combine the DNN and RNN training libraries, but there are a few options that are different like min_deriv_time, num_chunk_per_minibatch and shrinkage. So that wasn't done. @vijayaditya might have better opinion on this. |
I believe this organization is more scalable in the long term. If @pegahgh could use these training libraries to simplify the multilingual training script, that would be a good way to check the usefulness of these changes. |
Can you please write something here explaining what the various libraries Think about it carefully-- not time sensitive. On Mon, Oct 17, 2016 at 11:09 PM, Vimal Manohar notifications@github.com
|
I already use the new libraries for multilingual setup. I committed new On Mon, Oct 17, 2016 at 11:09 PM, Vimal Manohar notifications@github.com
|
Pegah, if you could explain how they differ from the existing libraries it On Tue, Oct 18, 2016 at 11:47 AM, pegahgh notifications@github.com wrote:
|
Its your call, if you want to refactor all these other scripts. Till now we re-implemented any necessary functions in these other scripts as there was no common library and we did not want to create dependency with nnet3 in these other scripts. As your proposal requires maintaining a parallel sub-directory structure in the steps directory, corresponding to the python libs package, corresponding to the sub-directory structure of shell and perl scripts in steps (there are a lot more perl/shell scripts in the parent directory than in nnet3), I would wait for @danpovey 's comments. Also remember that we have had some python scripts even in the utils/data directory which required some functions like RunKaldiCommand or GetFeatDim. So you should also decide if you want utils/data/ to depend on steps/libs or if you want to store the common python functions in a different location. |
Sorry I meant As your proposal requires maintaining a parallel sub-directory structure in the steps directory, corresponding to the python libs package, _in addition_ to the sub-directory structure of shell and perl scripts |
Let's keep it localized to nnet3 for now. For the most part I'd prefer On Fri, Nov 11, 2016 at 12:46 PM, Vijayaditya Peddinti <
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created the first python file conforming to PEP8 standards: egs/wsj/s5/steps/libs/nnet3/train/common.py.
Please let me know if there are some any issues that need to be fixed. @danpovey @vijayaditya
Looks good to me. On Mon, Nov 14, 2016 at 10:15 PM, Vimal Manohar notifications@github.com
|
This branch has conflicts.. |
@vimalmanohar Could you let me know what is the expected merge date for this PR ? |
I have to re-run all the scripts again after reorganizing libraries and adding max-deriv-time. It will take at least a week. |
If the changes are not too extensive it might be easier to make them to the On Thu, Nov 17, 2016 at 11:23 PM, Vimal Manohar notifications@github.com
|
OK I will just hack the configs dir to make it back-compatible and make the Vijay On Nov 17, 2016 23:57, "Daniel Povey" notifications@github.com wrote:
|
e7f7075
to
b69c161
Compare
|
||
logger = logging.getLogger(__name__) | ||
logger = logging.getLogger('libs') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does just this call to getLogger use the name 'libs'?
BTW, please fix the issues that Gaofeng found (see email).
009300e
to
6102c60
Compare
All the top-level scripts must use this. I will fix it if not.
It will automatically add 'subloggers' for all modules in libs like
libs.common, libs.nnet3 etc. These subloggers will inherit the properties
of the top-level logger such as the verbosity level etc.
On Fri, Nov 25, 2016 at 10:03 PM Daniel Povey ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In egs/wsj/s5/steps/nnet3/train_dnn.py
<#1066 (review)>:
>
-logger = logging.getLogger(__name__)
+logger = logging.getLogger('libs')
why does just this call to getLogger use the name 'libs'?
BTW, please fix the issues that Gaofeng found (see email).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1066 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEATV2yxrBFTw2rJDLCOtCF2yfjA-LIlks5rB6FygaJpZM4KHQuy>
.
--
Vimal Manohar
PhD Student
Electrical & Computer Engineering
Johns Hopkins University
|
@@ -478,7 +481,7 @@ def Main(): | |||
GeneratePlots(args.exp_dir, args.output_dir, | |||
comparison_dir = args.comparison_dir, | |||
start_iter = args.start_iter, | |||
is_chain = args.is_chain) | |||
objective_type = args.objective_type) | |||
|
|||
if __name__ == "__main__": | |||
Main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vimal, I think the module loading in this file should also be changed, train_lib = imp.load_source('ntl', 'steps/nnet3/nnet3_train_lib.py')
is deleted in your PR
remove_egs=True, | ||
get_raw_nnet_from_am=True): | ||
try: | ||
if remove_egs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the code crashed for me with the error:
TypeError: 'bool' object is not callable
Looks like there is a name clash after changing the function naming style. I'd suggest to rename the function to e.g. remove_nnet_egs.
"backpropagated up to t=-5 and t=154 in the forward and backward LSTM sequence respectively; " | ||
"otherwise, the derivative will be backpropagated to the end of the sequence.") | ||
parser.add_argument("--trainer.num-chunk-per-minibatch", | ||
"--trainer.rnn.num-chunk-per-minibatch", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vimal, this line is a bit confusing. Is it intentional, or is it some kind of bug?
It was intentional, but I should remove the option
"--trainer.rnn.num-chunk-per-minibatch" and keep only
"--trainer.num-chunk-per-minibatch".
since it is not an RNN-specific option. It should be removed from the
section # RNN specific trainer options.
On Mon, Nov 28, 2016 at 7:44 PM Daniel Povey ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In egs/wsj/s5/steps/nnet3/chain/train.py
<#1066 (review)>:
>
# RNN specific trainer options
- parser.add_argument("--trainer.num-chunk-per-minibatch", type=int, dest='num_chunk_per_minibatch',
- default=512,
- help="Number of sequences to be processed in parallel every minibatch" )
- parser.add_argument("--trainer.deriv-truncate-margin", type=int, dest='deriv_truncate_margin',
- default = None,
- help="If specified, it is the number of frames that the derivative will be backpropagated through the chunk boundaries, "
- "e.g., During BLSTM model training if the chunk-width=150 and deriv-truncate-margin=5, then the derivative will be "
- "backpropagated up to t=-5 and t=154 in the forward and backward LSTM sequence respectively; "
- "otherwise, the derivative will be backpropagated to the end of the sequence.")
+ parser.add_argument("--trainer.num-chunk-per-minibatch",
+ "--trainer.rnn.num-chunk-per-minibatch",
Vimal, this line is a bit confusing. Is it intentional, or is it some kind
of bug?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1066 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEATV5QfqctJSh5-78yV4sPi0W2K5Ct-ks5rC3VwgaJpZM4KHQuy>
.
--
Vimal Manohar
PhD Student
Electrical & Computer Engineering
Johns Hopkins University
|
Is this merged? Does this need to be closed? |
Yes you can close it, I merged all changes from this PR.
…On Fri, Dec 2, 2016 at 9:09 PM, Vimal Manohar ***@***.***> wrote:
Is this merged? Does this need to be closed?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1066 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu8zpfhPguNhAmoxOe7qteJNHNFCoks5rEM88gaJpZM4KHQuy>
.
|
All changes merged in #1229 |
Fixes a bug that would have affected nnet3 (non-chain) TDNN training since PR #1066 was merged 2 weeks ago. Would have slowed it down, and affected results in an unpredictable way.
No description provided.