Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring nnet3 python scripts and adding raw nnet training #1066

Closed
wants to merge 2 commits into from

Conversation

vimalmanohar
Copy link
Contributor

No description provided.

@vijayaditya
Copy link
Contributor

I would like to review this. I will do it later today unless there is a hurry.

I would like to initiate some discussions :
E.g. given the recent changes in rnn training ( including self-repair and gradient clipping) do we still need to maintain two training scripts for frame level training (both in am training and raw model training). Which has additional stages for model shrinkage.

Copy link
Contributor

@vijayaditya vijayaditya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

performed a cursory check, will review in detail later today.

default=0.0)
parser.add_argument("--include-log-softmax", type=str, action=nnet3_train_lib.StrToBoolAction,
help="add the final softmax layer ", default=True, choices = ["false", "true"])
parser.add_argument("--add-lda", type=str, action=nnet3_train_lib.StrToBoolAction,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember adding this option when I wanted to test if the LDA layer was still necessary. Experimentation showed it was important when using iVectors; so I think we should be adding this everytime unless we are using initial CNN layers. Even in that case we don't need an additional option, as we can remove it when we see that the CNN layers are being used.

Copy link
Contributor Author

@vimalmanohar vimalmanohar Sep 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script does not assume a classification task or a notion of classes in the output layer. So this option is required here.
Also the LDA code works only with sparse matrices.

clipping_threshold = args.clipping_threshold,
ng_per_element_scale_options = args.ng_per_element_scale_options,
ng_affine_options = args.ng_affine_options,
label_delay = args.label_delay,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is very similar to the normal lstm/make_configs.py and unlike TDNN config generator it is not very complicated. So I would recommend just combining the two scripts.

Copy link
Contributor Author

@vimalmanohar vimalmanohar Sep 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script writes config variables like objective_type, add_lda, include_final_sigmoid required by train_raw_rnn.py, which will crash the train_rnn.py script, unless changes are made to that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets try to reduce the number of scripts, as maintenance will be a pain. You could add a new python module for all the convenience functions you want, to ensure that you don't severely lengthen the top level script. So if you have multiple config file types, you can different writers and readers for all these types in the new module, and call the appropriate one.

@@ -105,6 +111,49 @@ def GetSuccessfulModels(num_models, log_file_pattern, difference_threshold=1.0):

return [accepted_models, max_index+1]

def GetAverageNnetModel(dir, iter, nnets_list, run_opts, use_raw_nnet = False, shrink = None):
scale = 1.0
if shrink is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you test if shrink is actually helping in your case ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done any experiment on this. I used this since it is tested in acoustic model training.

@@ -351,7 +475,7 @@ def ComputePresoftmaxPriorScale(dir, alidir, num_jobs, run_opts,
WriteKaldiMatrix(output_file, [scaled_counts])
ForceSymlink("../presoftmax_prior_scale.vec", "{0}/configs/presoftmax_prior_scale.vec".format(dir))

def PrepareInitialAcousticModel(dir, alidir, run_opts):
def PrepareInitialAcousticModel(dir, alidir, run_opts, use_raw_nnet = False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you just want to initialize a raw nnet which is not an acoustic model, you should just create a wrapper for the nnet3-init and not try to reuse this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a function PrepareInitialNetwork.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please do this.

@@ -478,13 +603,16 @@ def GetLearningRate(iter, num_jobs, num_iters, num_archives_processed,

return num_jobs * effective_learning_rate

def DoShrinkage(iter, model_file, non_linearity, shrink_threshold):
def DoShrinkage(iter, model_file, name, non_linearity, shrink_threshold, use_raw_nnet = False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freewym Could you please run a simple experiment where you disable shrinkage for RNN training. IIRC it did not lead to WER improvements when I last tested this. It was just ensuring good gradient means, and we have other methods like self-repair to ensure this right now.

@vimalmanohar
Copy link
Contributor Author

Regarding Vijay's comments:

I would like to initiate some discussions :
E.g. given the recent changes in rnn training ( including self-repair and gradient clipping) do we still need to maintain two training scripts for frame level training (both in am training and raw model training). Which has additional stages for model shrinkage.

I think it would be easy to re-factor the scripts and move common lines into methods that can be imported by the train scripts and thus create very short train scripts.
e.g. the main difference between the raw model training and AM training is the realignment stage, and the setting-up stage from ali-dir etc. But the entire method TrainOneIteration and TrainNewModels can be moved into a different python library like in steps/nnet3/libs/train_lib.py.

@vijayaditya
Copy link
Contributor

Could you implement your proposal ?

BTW please remove support for realignment, when making these changes. I
don't think it is worth the increase in training time. We could always
align with the DNN after training is complete, reinitialize the model and
retrain, when we are desperate. Even this method was not giving us any
appreciable results.

--Vijay

On Thu, Sep 29, 2016 at 2:06 AM, Vimal Manohar notifications@github.com
wrote:

Regarding Vijay's comments:

I would like to initiate some discussions :
E.g. given the recent changes in rnn training ( including self-repair and
gradient clipping) do we still need to maintain two training scripts for
frame level training (both in am training and raw model training). Which
has additional stages for model shrinkage.

I think it would be easy to re-factor the scripts and move common lines
into methods that can be imported by the train scripts and thus create very
short train scripts.
e.g. the main difference between the raw model training and AM training is
the realignment stage, and the setting-up stage from ali-dir etc. But the
entire method TrainOneIteration and TrainNewModels can be moved into a
different python library like in steps/nnet3/libs/train_lib.py.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1066 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADtwoD2D3ZE22ldyg10SPcI_BkrN9AU9ks5qu1VggaJpZM4KHQuy
.

@vimalmanohar
Copy link
Contributor Author

Please look at the commint d074e56 for refactoring DNN training. A similar one can be done for RNN.

@vijayaditya
Copy link
Contributor

@vimalmanohar added some comments on your commit d074e56 . Could you also please update the steps/nnet3/train_rnn.py script to use your new library, wherever possible.

@vimalmanohar vimalmanohar force-pushed the raw_python_script branch 2 times, most recently from 5416bbc to 0782aab Compare October 2, 2016 00:20
Copy link
Contributor Author

@vimalmanohar vimalmanohar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vijayaditya I have made the changes and tried to merge the RNN and DNN scripts. But it seems there are many differing options. So for now, I have kept the functions in two separate libraries. If you think there is a way to merge them, let me know.

@vijayaditya
Copy link
Contributor

could you resolve the conflicts I will start the review.

@vimalmanohar
Copy link
Contributor Author

@vijayaditya Resolved conflicts with master.

Copy link
Contributor

@vijayaditya vijayaditya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed one round, will resume after the requested changes have been made.

@freewym It would be better to have another set of eyes review this, as this is a change which affects all the recipes. Do you have time to review this ?

@@ -0,0 +1,348 @@
#!/usr/bin/env python


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the comment

This is a module with methods which will be used by acoustic model training and raw model (i.e., generic neural network without transition model) training scripts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW mention that these are normal frame level training scripts.

import logging
import imp

imp.load_source('nnet3_train_lib', 'steps/nnet3/nnet3_train_lib.py')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you importing the same module three different times ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not importing three times. First is to load module from source, then import module and then import functions from module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks very redundant. If you just want to import the functions then directly use from <module> import *, you would just have to modify the sys.pathin the script to add the module address as these are not in the syspath by default. regarding the import commands just pick one and stay consistent. BTW I would recommend using the import <module> as the namespace specification when calling function makes it easy to debug, and helps avoid issues when functions with similar signatures are defined in different modules (e.g. this happens a lot in python modules for chain and xent training).


# Set off jobs doing some diagnostics, in the background.
# Use the egs dir from the previous iteration for the diagnostics
logger.info("Training neural net (pass {0})".format(iter))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually add details-of-interest, e.g. learning rate per iteration, shrinkage value etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an example script for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache_read_opt = ""
if iter > 0 and (iter <= (num_hidden_layers-1) * add_layers_period) and (iter % add_layers_period == 0):

do_average = False # if we've just mixed up, don't do averaging but take the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we no longer have mix-up so please update this comment.

@@ -86,6 +95,14 @@ def GetArgs():
parser.add_argument("--lstm-delay", type=str, default=None,
help="option to have different delays in recurrence for each lstm")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment describing this group of options.

egs_dir = egs_dir), wait = wait)

def ComputeProgress(dir, iter, egs_dir, run_opts, mb_size=256, wait=False, use_raw_nnet = False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_raw_nnet does not seem to be appropriate name for the variable as you are using raw networks in both cases. Better use the name get_raw_nnet_from_am=True.

@@ -48,6 +48,7 @@ def GetArgs():
parser.add_argument("--comparison-dir", type=str, action='append', help="other experiment directories for comparison. These will only be used for plots, not tables")
parser.add_argument("--start-iter", type=int, help="Iteration from which plotting will start", default = 1)
parser.add_argument("--is-chain", type=str, default = False, action = train_lib.StrToBoolAction, help="Iteration from which plotting will start")
parser.add_argument("--is-linear-objf", type=str, default = True, action = train_lib.StrToBoolAction, help="Nnet trained with linear objective as against with quadratic objective")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provide the choices argument for bool type variable. See other boolean inputs for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think you can make this multiple choice variable with options {linear,quadratic}. This would help future-proof the code as we can add more cost functions.

@@ -116,6 +112,13 @@ def GetArgs():
parser.add_argument("--use-presoftmax-prior-scale", type=str, action=nnet3_train_lib.StrToBoolAction,
help="if true, a presoftmax-prior-scale is added",
choices=['true', 'false'], default = True)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment describing this group of options and where you plan to use them.


if args.use_dense_targets:
target_type = "dense"
compute_accuracy = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment here describing why compute_accuracy is false for dense type. This is not readily obvious as even dense posterior targets can exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will test this out and see if the code supports this.

# we add compulsary arguments as named arguments for readability
parser = argparse.ArgumentParser(description="""
Trains an RNN acoustic model using the cross-entropy objective.
RNNs include LSTMs, BLSTMs and GRUs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the usage message.

@danpovey
Copy link
Contributor

danpovey commented Oct 6, 2016

Perhaps some of you could run your work with this for a while, to see if
any bugs show up.

On Wed, Oct 5, 2016 at 8:53 PM, Vijayaditya Peddinti <
notifications@github.com> wrote:

@vijayaditya requested changes on this pull request.

Completed one round, will resume after the requested changes have been
made.

@freewym https://github.com/freewym It would be better to have another
set of eyes review this, as this is a change which affects all the recipes.

Do you have time to review this ?

In egs/wsj/s5/steps/nnet3/libs/rnn_train_lib.py
#1066 (review):

@@ -0,0 +1,348 @@
+#!/usr/bin/env python
+
+

Please add the comment

This is a module with methods which will be used by acoustic model training and raw model (i.e., generic neural network without transition model) training scripts.


In egs/wsj/s5/steps/nnet3/libs/rnn_train_lib.py
#1066 (review):

@@ -0,0 +1,348 @@
+#!/usr/bin/env python
+
+
+# Copyright 2016 Vijayaditya Peddinti.
+# 2016 Vimal Manohar
+# Apache 2.0.
+
+import logging
+import imp
+
+imp.load_source('nnet3_train_lib', 'steps/nnet3/nnet3_train_lib.py')

why are you importing the same module three different times ?

In egs/wsj/s5/steps/nnet3/libs/rnn_train_lib.py
#1066 (review):

@@ -0,0 +1,348 @@
+#!/usr/bin/env python
+
+

BTW mention that these are normal frame level training scripts.

In egs/wsj/s5/steps/nnet3/libs/rnn_train_lib.py
#1066 (review):

  •    open('{0}/.error'.format(dir), 'w').close()
    
  •    raise Exception("There was error during training iteration {0}".format(iter))
    
    +def TrainOneIteration(dir, iter, srand, egs_dir,
  •                  num_jobs, num_archives_processed, num_archives,
    
  •                  learning_rate, shrinkage_value, num_chunk_per_minibatch,
    
  •                  num_hidden_layers, add_layers_period,
    
  •                  left_context, right_context, min_deriv_time,
    
  •                  momentum, max_param_change, shuffle_buffer_size,
    
  •                  cv_minibatch_size, run_opts,
    
  •                  compute_accuracy = True, use_raw_nnet = False):
    
  • Set off jobs doing some diagnostics, in the background.

  • Use the egs dir from the previous iteration for the diagnostics

  • logger.info("Training neural net (pass {0})".format(iter))

I usually add details-of-interest, e.g. learning rate per iteration,

shrinkage value etc.

In egs/wsj/s5/steps/nnet3/libs/rnn_train_lib.py
#1066 (review):

  • else:
  •    f = open('{0}/srand'.format(dir), 'w')
    
  •    f.write(str(srand))
    
  •    f.close()
    
  • ComputeTrainCvProbabilities(dir, iter, egs_dir, run_opts, mb_size=cv_minibatch_size, use_raw_nnet = use_raw_nnet, compute_accuracy = compute_accuracy)
  • if iter > 0:
  •    ComputeProgress(dir, iter, egs_dir, run_opts, mb_size=cv_minibatch_size, use_raw_nnet = use_raw_nnet)
    
  • an option for writing cache (storing pairs of nnet-computations

  • and computation-requests) during training.

  • cache_read_opt = ""
  • if iter > 0 and (iter <= (num_hidden_layers-1) * add_layers_period) and (iter % add_layers_period == 0):
  •    do_average = False # if we've just mixed up, don't do averaging but take the
    

we no longer have mix-up so please update this comment.

In egs/wsj/s5/steps/nnet3/lstm/make_configs.py
#1066 (review):

@@ -86,6 +95,14 @@ def GetArgs():
parser.add_argument("--lstm-delay", type=str, default=None,
help="option to have different delays in recurrence for each lstm")

add a comment describing this group of options.

In egs/wsj/s5/steps/nnet3/lstm/make_configs.py
#1066 (review):

 config_files[config_dir + '/init.config'] = init_config_lines
  • prev_layer_output = nodes.AddLdaLayer(config_lines, "L0", prev_layer_output, config_dir + '/lda.mat')
  • if add_lda:

Write a comment giving an example of a scenario where this variable will

be false.

In egs/wsj/s5/steps/nnet3/nnet3_train_lib.py
#1066 (review):

@@ -205,6 +260,28 @@ def ParseModelConfigVarsFile(var_file):

 raise Exception('Error while parsing the file {0}'.format(var_file))

+def ParseModelConfigGenericVarsFile(var_file):

Could you rename this to ParseGenericModelVarsFile. The current name is a

bit difficult to parse.

In egs/wsj/s5/steps/nnet3/nnet3_train_lib.py
#1066 (review):

@@ -242,6 +319,53 @@ def GenerateEgs(data, alidir, egs_dir,
egs_dir = egs_dir,
egs_opts = egs_opts if egs_opts is not None else '' ))

+def GenerateEgsFromTargets(data, targets_scp, egs_dir,

may be call it GenerateEgsUsingTargets. Also add a comment saying that

this differs from the other method which uses alignments.

In egs/wsj/s5/steps/nnet3/nnet3_train_lib.py
#1066 (review):

     "ark,bg:nnet3-merge-egs --minibatch-size={mb_size} ark:{egs_dir}/valid_diagnostic.egs ark:- |"
 """.format(command = run_opts.command,
            dir = dir,
            iter = iter,
            mb_size = mb_size,
            model = model,
  •           compute_prob_opts = compute_prob_opts,
    

Code can me made more compact.

comput_prob_opts = '--compute-accuracy' if compute_accuracy else ''

In egs/wsj/s5/steps/nnet3/nnet3_train_lib.py
#1066 (review):

            egs_dir = egs_dir), wait = wait)

+def ComputeProgress(dir, iter, egs_dir, run_opts, mb_size=256, wait=False, use_raw_nnet = False):

use_raw_nnet does not seem to be appropriate name for the variable as you
are using raw networks in both cases. Better use the name

get_raw_nnet_from_am=True.

In egs/wsj/s5/steps/nnet3/report/generate_plots.py
#1066 (review):

@@ -48,6 +48,7 @@ def GetArgs():
parser.add_argument("--comparison-dir", type=str, action='append', help="other experiment directories for comparison. These will only be used for plots, not tables")
parser.add_argument("--start-iter", type=int, help="Iteration from which plotting will start", default = 1)
parser.add_argument("--is-chain", type=str, default = False, action = train_lib.StrToBoolAction, help="Iteration from which plotting will start")

  • parser.add_argument("--is-linear-objf", type=str, default = True, action = train_lib.StrToBoolAction, help="Nnet trained with linear objective as against with quadratic objective")

provide the choices argument for bool type variable. See other boolean

inputs for example.

In egs/wsj/s5/steps/nnet3/report/generate_plots.py
#1066 (review):

@@ -48,6 +48,7 @@ def GetArgs():
parser.add_argument("--comparison-dir", type=str, action='append', help="other experiment directories for comparison. These will only be used for plots, not tables")
parser.add_argument("--start-iter", type=int, help="Iteration from which plotting will start", default = 1)
parser.add_argument("--is-chain", type=str, default = False, action = train_lib.StrToBoolAction, help="Iteration from which plotting will start")

  • parser.add_argument("--is-linear-objf", type=str, default = True, action = train_lib.StrToBoolAction, help="Nnet trained with linear objective as against with quadratic objective")

Actually I think you can make this multiple choice variable with options
{linear,quadratic}. This would help future-proof the code as we can add

more cost functions.

In egs/wsj/s5/steps/nnet3/tdnn/make_configs.py
#1066 (review):

@@ -116,6 +112,13 @@ def GetArgs():
parser.add_argument("--use-presoftmax-prior-scale", type=str, action=nnet3_train_lib.StrToBoolAction,
help="if true, a presoftmax-prior-scale is added",
choices=['true', 'false'], default = True)
+

add a comment describing this group of options and where you plan to use

them.

In egs/wsj/s5/steps/nnet3/train_raw_dnn.py
#1066 (review):

  •        raise Exception("Mismatch between num-targets provided to "
    
  •                        "script vs configs")
    
  • if (args.stage <= -5):
  •    logger.info("Initializing a basic network for estimating preconditioning matrix")
    
  •    RunKaldiCommand("""
    
    +{command} {dir}/log/nnet_init.log \
  • nnet3-init --srand=-2 {dir}/configs/init.config {dir}/init.raw
  • """.format(command = run_opts.command,
  •           dir = args.dir))
    
  • default_egs_dir = '{0}/egs'.format(args.dir)
  • if args.use_dense_targets:
  •    target_type = "dense"
    
  •    compute_accuracy = False
    

add a comment here describing why compute_accuracy is false for dense

type. This is not readily obvious as even dense posterior targets can exist.

In egs/wsj/s5/steps/nnet3/train_raw_rnn.py
#1066 (review):

+logger = logging.getLogger(name)
+logger.setLevel(logging.INFO)
+handler = logging.StreamHandler()
+handler.setLevel(logging.INFO)
+formatter = logging.Formatter('%(asctime)s [%(filename)s:%(lineno)s - %(funcName)s - %(levelname)s ] %(message)s')
+handler.setFormatter(formatter)
+logger.addHandler(handler)
+logger.info('Starting RNN trainer (train_raw_rnn.py)')
+
+
+def GetArgs():

  • we add compulsary arguments as named arguments for readability

  • parser = argparse.ArgumentParser(description="""
  • Trains an RNN acoustic model using the cross-entropy objective.
  • RNNs include LSTMs, BLSTMs and GRUs.

Update the usage message.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1066 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVuzisUB6a7X6FjsmLIwyBqE3gmWaKks5qxEaPgaJpZM4KHQuy
.

@freewym
Copy link
Contributor

freewym commented Oct 6, 2016

OK. Will do.

@vimalmanohar
Copy link
Contributor Author

@vijayaditya Made all the changes.

@danpovey
Copy link
Contributor

@vijayaditya, where are we on this pull request?

@@ -86,6 +99,16 @@ def GetArgs():
parser.add_argument("--lstm-delay", type=str, default=None,
help="option to have different delays in recurrence for each lstm")

# Options to convert input MFCC into Fbank features. This is useful when a
# LDA layer is not added (such as when using dense targets)
parser.add_argument("--cepstral-lifter", type=float, dest = "cepstral_lifter",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you need to use filterbanks instead of MFCCs? E.g. you have a convolutional architecture?

The notion that may people have Fbank being better than MFCCs is mostly a misunderstanding, since filterbanks are generally extracted with a higher dimension and that's what matters. If this turns out to be removable, it would be a good simplification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it might be useful in cases where LDA is not used.

"e.g. 22.0", default=22.0)

parser.add_argument("--add-idct", type=str, action=nnet3_train_lib.StrToBoolAction,
help="Add an IDCT after input to convert MFCC to Fbank", default = False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... unless this fbank stuff is being used, I think it's better to remove it for now.

@danpovey
Copy link
Contributor

What testing has been done on regular RNN and DNN training, since this affects those scripts?

@danpovey
Copy link
Contributor

It needs to be made much more clear, in comments in the source, which of these files are deprecated and which are supposed to be the way forward. Does libs/train_lib.py deprecate nnet3_train_lib.py? This should be clarified in both source files, if so. Are there now python scripts that still use the old library that need to be upgraded to the new library? If so, that should be mentioned in comments in the source too.

@danpovey
Copy link
Contributor

I'd rather not add any feature to these scripts unless there is a concrete
scenario where it's known to be useful... otherwise we add work to maintain
these features and they won't get tested properly.

On Mon, Oct 17, 2016 at 10:44 PM, Vimal Manohar notifications@github.com
wrote:

@vimalmanohar commented on this pull request.

In egs/wsj/s5/steps/nnet3/lstm/make_configs.py
#1066:

@@ -86,6 +99,16 @@ def GetArgs():
parser.add_argument("--lstm-delay", type=str, default=None,
help="option to have different delays in recurrence for each lstm")

  • Options to convert input MFCC into Fbank features. This is useful when a

  • LDA layer is not added (such as when using dense targets)

  • parser.add_argument("--cepstral-lifter", type=float, dest = "cepstral_lifter",

I thought it might be useful in cases where LDA is not used.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1066, or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVu5RuUWF6xIsAAW8iKdLWdMFsO8fUks5q1DJ4gaJpZM4KHQuy
.

@vimalmanohar
Copy link
Contributor Author

No testing has been done on regular RNN and DNN training. But I can do that.
libs/train_lib.py does not in the current commit deprecate nnet3_train_lib.py. libs/train_lib.py contains only the top level functions related to DNN training, while libs/rnn_train_lib.py contains the functions related to RNN training.
nnet3_train_lib.py remains almost the same as before. I am not sure if we should move all the functions in nnet3_train_lib.py to libs/train_lib.py.

@danpovey
Copy link
Contributor

The organization seems OK, but we need to have some clarity on what the
plan is for the long term. Have you and Vijay agreed something with regard
to that?

On Mon, Oct 17, 2016 at 10:55 PM, Vimal Manohar notifications@github.com
wrote:

No testing has been done on regular RNN and DNN training. But I can do
that.
libs/train_lib.py does not in the current commit deprecate
nnet3_train_lib.py. libs/train_lib.py contains only the top level functions
related to DNN training, while libs/rnn_train_lib.py contains the functions
related to RNN training.
nnet3_train_lib.py remains almost the same as before. I am not sure if we
should move all the functions in nnet3_train_lib.py to libs/train_lib.py.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1066 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu_-bJ3M0NMCrxM9Pw09HeTETrFIxks5q1DUOgaJpZM4KHQuy
.

@vimalmanohar
Copy link
Contributor Author

Initially, we wanted to combine the DNN and RNN training libraries, but there are a few options that are different like min_deriv_time, num_chunk_per_minibatch and shrinkage. So that wasn't done. @vijayaditya might have better opinion on this.

@vimalmanohar
Copy link
Contributor Author

I believe this organization is more scalable in the long term. If @pegahgh could use these training libraries to simplify the multilingual training script, that would be a good way to check the usefulness of these changes.

@danpovey
Copy link
Contributor

Can you please write something here explaining what the various libraries
are (the ones you propose to add and the existing one), what their purposes
are and how they relate to each other, and which top-level scripts use
which libraries? Also which scripts should use which libraries (in the
future, in your ideal scenario), and which scripts are deprecated and
should not be used at all?

Think about it carefully-- not time sensitive.

On Mon, Oct 17, 2016 at 11:09 PM, Vimal Manohar notifications@github.com
wrote:

I believe this organization is more scalable in the long term. If @pegahgh
https://github.com/pegahgh could use these training libraries to
simplify the multilingual training script, that would be a good way to
check the usefulness of these changes.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1066 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu8h17JJRHZ3st3VodvFlel_WOy76ks5q1DhugaJpZM4KHQuy
.

@pegahgh
Copy link
Contributor

pegahgh commented Oct 18, 2016

I already use the new libraries for multilingual setup. I committed new
scripts for multilingual setup yesterday, which uses recent raw python
scripts.

On Mon, Oct 17, 2016 at 11:09 PM, Vimal Manohar notifications@github.com
wrote:

I believe this organization is more scalable in the long term. If @pegahgh
https://github.com/pegahgh could use these training libraries to
simplify the multilingual training script, that would be a good way to
check the usefulness of these changes.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1066 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AI-PtD3RBhXqw1msPV1tOFGwjDNiq_6bks5q1Dh1gaJpZM4KHQuy
.

@danpovey
Copy link
Contributor

Pegah, if you could explain how they differ from the existing libraries it
would be good, you might provide some complementary info to what Vimal is
preparing.

On Tue, Oct 18, 2016 at 11:47 AM, pegahgh notifications@github.com wrote:

I already use the new libraries for multilingual setup. I committed new
scripts for multilingual setup yesterday, which uses recent raw python
scripts.

On Mon, Oct 17, 2016 at 11:09 PM, Vimal Manohar notifications@github.com
wrote:

I believe this organization is more scalable in the long term. If
@pegahgh
https://github.com/pegahgh could use these training libraries to
simplify the multilingual training script, that would be a good way to
check the usefulness of these changes.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1066 (comment),
or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AI-
PtD3RBhXqw1msPV1tOFGwjDNiq_6bks5q1Dh1gaJpZM4KHQuy>
.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1066 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu-Y2g7DQezMa-B62pHJ_4ZDaFF2Tks5q1OoKgaJpZM4KHQuy
.

@vijayaditya
Copy link
Contributor

Its your call, if you want to refactor all these other scripts. Till now we re-implemented any necessary functions in these other scripts as there was no common library and we did not want to create dependency with nnet3 in these other scripts.

As your proposal requires maintaining a parallel sub-directory structure in the steps directory, corresponding to the python libs package, corresponding to the sub-directory structure of shell and perl scripts in steps (there are a lot more perl/shell scripts in the parent directory than in nnet3), I would wait for @danpovey 's comments.

Also remember that we have had some python scripts even in the utils/data directory which required some functions like RunKaldiCommand or GetFeatDim. So you should also decide if you want utils/data/ to depend on steps/libs or if you want to store the common python functions in a different location.

@vijayaditya
Copy link
Contributor

Sorry I meant

As your proposal requires maintaining a parallel sub-directory structure in the steps directory, corresponding to the python libs package, _in addition_ to the sub-directory structure of shell and perl scripts

@danpovey
Copy link
Contributor

Let's keep it localized to nnet3 for now. For the most part I'd prefer
python scripts in steps/ to be self-contained, and I don't anticipate a
general move of converting bash scripts to python scripts.

On Fri, Nov 11, 2016 at 12:46 PM, Vijayaditya Peddinti <
notifications@github.com> wrote:

Sorry I meant

As your proposal requires maintaining a parallel sub-directory structure
in the steps directory, corresponding to the python libs package, in
addition
to the sub-directory structure of shell and perl scripts


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1066 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu9fmuxYiPq0alzPRFqokiT5SfUEfks5q9Kn_gaJpZM4KHQuy
.

@vimalmanohar
Copy link
Contributor Author

Reorganizing libraries into packages, and incorporating changes from #1194 and dealing with issues #847 and #520.

Copy link
Contributor Author

@vimalmanohar vimalmanohar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created the first python file conforming to PEP8 standards: egs/wsj/s5/steps/libs/nnet3/train/common.py.
Please let me know if there are some any issues that need to be fixed. @danpovey @vijayaditya

@danpovey
Copy link
Contributor

Looks good to me.

On Mon, Nov 14, 2016 at 10:15 PM, Vimal Manohar notifications@github.com
wrote:

@vimalmanohar commented on this pull request.

I have created the first python file conforming to PEP8 standards:
egs/wsj/s5/steps/libs/nnet3/train/common.py.
Please let me know if there are some any issues that need to be fixed.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1066 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVu9blDMFawA_Md1xNy0QPSsKlXkLnks5q-SPMgaJpZM4KHQuy
.

@vijayaditya vijayaditya changed the title raw_python_script: Adding raw nnet training Refactoring nnet3 python scripts and adding raw nnet training Nov 15, 2016
@danpovey
Copy link
Contributor

This branch has conflicts..

@vijayaditya
Copy link
Contributor

@vimalmanohar Could you let me know what is the expected merge date for this PR ?
Xconfig testing requires some changes to the training scripts and I would like to know where these have to be done.

@vimalmanohar
Copy link
Contributor Author

vimalmanohar commented Nov 18, 2016

I have to re-run all the scripts again after reorganizing libraries and adding max-deriv-time. It will take at least a week.
It is better to add your xconfig testing and all nnet3 work over this branch.

@danpovey
Copy link
Contributor

If the changes are not too extensive it might be easier to make them to the
old scripts and have Vimal port them to the new scripts... that would mean
you don't have to redo all your testing. I'm concerned that by merging
these two substantial changes into one pull request, we'd never be done
with it. But up to you, Vijay.

On Thu, Nov 17, 2016 at 11:23 PM, Vimal Manohar notifications@github.com
wrote:

I have to re-run all the scripts again after adding max-deriv-time. It
will take at least a week.
It is better to add your xconfig testing and all nnet3 work over this
branch.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1066 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVuxEpZK539HF7yN3AcvgAfH5hnDSzks5q_ShEgaJpZM4KHQuy
.

@vijayaditya
Copy link
Contributor

OK I will just hack the configs dir to make it back-compatible and make the
actual changes once Vimal checks in the recipes. This will help us avoid
two rounds of training script testing.

Vijay

On Nov 17, 2016 23:57, "Daniel Povey" notifications@github.com wrote:

If the changes are not too extensive it might be easier to make them to the
old scripts and have Vimal port them to the new scripts... that would mean
you don't have to redo all your testing. I'm concerned that by merging
these two substantial changes into one pull request, we'd never be done
with it. But up to you, Vijay.

On Thu, Nov 17, 2016 at 11:23 PM, Vimal Manohar notifications@github.com
wrote:

I have to re-run all the scripts again after adding max-deriv-time. It
will take at least a week.
It is better to add your xconfig testing and all nnet3 work over this
branch.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1066 (comment),
or mute
the thread
<https://github.com/notifications/unsubscribe-auth/
ADJVuxEpZK539HF7yN3AcvgAfH5hnDSzks5q_ShEgaJpZM4KHQuy>
.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1066 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADtwoCvtpOUJISuknFx_XhGnbufUX1iOks5q_TAxgaJpZM4KHQuy
.

@vimalmanohar vimalmanohar force-pushed the raw_python_script branch 2 times, most recently from e7f7075 to b69c161 Compare November 23, 2016 19:18

logger = logging.getLogger(__name__)
logger = logging.getLogger('libs')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does just this call to getLogger use the name 'libs'?
BTW, please fix the issues that Gaofeng found (see email).

@vimalmanohar
Copy link
Contributor Author

vimalmanohar commented Nov 26, 2016 via email

@@ -478,7 +481,7 @@ def Main():
GeneratePlots(args.exp_dir, args.output_dir,
comparison_dir = args.comparison_dir,
start_iter = args.start_iter,
is_chain = args.is_chain)
objective_type = args.objective_type)

if __name__ == "__main__":
Main()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vimal, I think the module loading in this file should also be changed, train_lib = imp.load_source('ntl', 'steps/nnet3/nnet3_train_lib.py') is deleted in your PR

@danpovey danpovey mentioned this pull request Nov 27, 2016
remove_egs=True,
get_raw_nnet_from_am=True):
try:
if remove_egs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the code crashed for me with the error:
TypeError: 'bool' object is not callable
Looks like there is a name clash after changing the function naming style. I'd suggest to rename the function to e.g. remove_nnet_egs.

"backpropagated up to t=-5 and t=154 in the forward and backward LSTM sequence respectively; "
"otherwise, the derivative will be backpropagated to the end of the sequence.")
parser.add_argument("--trainer.num-chunk-per-minibatch",
"--trainer.rnn.num-chunk-per-minibatch",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vimal, this line is a bit confusing. Is it intentional, or is it some kind of bug?

@vimalmanohar
Copy link
Contributor Author

vimalmanohar commented Nov 29, 2016 via email

@vimalmanohar
Copy link
Contributor Author

Is this merged? Does this need to be closed?

@danpovey
Copy link
Contributor

danpovey commented Dec 3, 2016 via email

@vimalmanohar
Copy link
Contributor Author

All changes merged in #1229

danpovey pushed a commit that referenced this pull request Dec 18, 2016
Fixes a bug that would have affected nnet3 (non-chain) TDNN training since PR #1066 was merged 2 weeks ago.  Would have slowed it down, and affected results in an unpredictable way.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants