Refactoring nnet3 python scripts and adding raw nnet training #1066

vimalmanohar · 2016-09-27T05:08:42Z

No description provided.

vijayaditya · 2016-09-27T09:16:44Z

I would like to review this. I will do it later today unless there is a hurry.

I would like to initiate some discussions :
E.g. given the recent changes in rnn training ( including self-repair and gradient clipping) do we still need to maintain two training scripts for frame level training (both in am training and raw model training). Which has additional stages for model shrinkage.

vijayaditya

performed a cursory check, will review in detail later today.

vijayaditya · 2016-09-27T09:48:51Z

egs/wsj/s5/steps/nnet3/lstm/make_raw_configs.py

+                        default=0.0)
+    parser.add_argument("--include-log-softmax", type=str, action=nnet3_train_lib.StrToBoolAction,
+                        help="add the final softmax layer ", default=True, choices = ["false", "true"])
+    parser.add_argument("--add-lda", type=str, action=nnet3_train_lib.StrToBoolAction,


I remember adding this option when I wanted to test if the LDA layer was still necessary. Experimentation showed it was important when using iVectors; so I think we should be adding this everytime unless we are using initial CNN layers. Even in that case we don't need an additional option, as we can remove it when we see that the CNN layers are being used.

This script does not assume a classification task or a notion of classes in the output layer. So this option is required here.
Also the LDA code works only with sparse matrices.

vijayaditya · 2016-09-27T09:51:12Z

egs/wsj/s5/steps/nnet3/lstm/make_raw_configs.py

+                clipping_threshold = args.clipping_threshold,
+                ng_per_element_scale_options = args.ng_per_element_scale_options,
+                ng_affine_options = args.ng_affine_options,
+                label_delay = args.label_delay,


This script is very similar to the normal lstm/make_configs.py and unlike TDNN config generator it is not very complicated. So I would recommend just combining the two scripts.

This script writes config variables like objective_type, add_lda, include_final_sigmoid required by train_raw_rnn.py, which will crash the train_rnn.py script, unless changes are made to that.

Lets try to reduce the number of scripts, as maintenance will be a pain. You could add a new python module for all the convenience functions you want, to ensure that you don't severely lengthen the top level script. So if you have multiple config file types, you can different writers and readers for all these types in the new module, and call the appropriate one.

vijayaditya · 2016-09-27T09:51:51Z

egs/wsj/s5/steps/nnet3/nnet3_train_lib.py

@@ -105,6 +111,49 @@ def GetSuccessfulModels(num_models, log_file_pattern, difference_threshold=1.0):

    return [accepted_models, max_index+1]

+def GetAverageNnetModel(dir, iter, nnets_list, run_opts, use_raw_nnet = False, shrink = None):
+    scale = 1.0
+    if shrink is not None:


Could you test if shrink is actually helping in your case ?

I haven't done any experiment on this. I used this since it is tested in acoustic model training.

vijayaditya · 2016-09-27T09:54:39Z

egs/wsj/s5/steps/nnet3/nnet3_train_lib.py

@@ -351,7 +475,7 @@ def ComputePresoftmaxPriorScale(dir, alidir, num_jobs, run_opts,
    WriteKaldiMatrix(output_file, [scaled_counts])
    ForceSymlink("../presoftmax_prior_scale.vec", "{0}/configs/presoftmax_prior_scale.vec".format(dir))

-def PrepareInitialAcousticModel(dir, alidir, run_opts):
+def PrepareInitialAcousticModel(dir, alidir, run_opts, use_raw_nnet = False):


If you just want to initialize a raw nnet which is not an acoustic model, you should just create a wrapper for the nnet3-init and not try to reuse this method.

I can add a function PrepareInitialNetwork.

Yes please do this.

vijayaditya · 2016-09-27T09:56:08Z

egs/wsj/s5/steps/nnet3/nnet3_train_lib.py

@@ -478,13 +603,16 @@ def GetLearningRate(iter, num_jobs, num_iters, num_archives_processed,

    return num_jobs * effective_learning_rate

-def DoShrinkage(iter, model_file, non_linearity, shrink_threshold):
+def DoShrinkage(iter, model_file, name, non_linearity, shrink_threshold, use_raw_nnet = False):


@freewym Could you please run a simple experiment where you disable shrinkage for RNN training. IIRC it did not lead to WER improvements when I last tested this. It was just ensuring good gradient means, and we have other methods like self-repair to ensure this right now.

vimalmanohar · 2016-09-29T06:06:21Z

Regarding Vijay's comments:

I would like to initiate some discussions :
E.g. given the recent changes in rnn training ( including self-repair and gradient clipping) do we still need to maintain two training scripts for frame level training (both in am training and raw model training). Which has additional stages for model shrinkage.

I think it would be easy to re-factor the scripts and move common lines into methods that can be imported by the train scripts and thus create very short train scripts.
e.g. the main difference between the raw model training and AM training is the realignment stage, and the setting-up stage from ali-dir etc. But the entire method TrainOneIteration and TrainNewModels can be moved into a different python library like in steps/nnet3/libs/train_lib.py.

vijayaditya · 2016-09-29T06:12:52Z

Could you implement your proposal ?

BTW please remove support for realignment, when making these changes. I
don't think it is worth the increase in training time. We could always
align with the DNN after training is complete, reinitialize the model and
retrain, when we are desperate. Even this method was not giving us any
appreciable results.

--Vijay

On Thu, Sep 29, 2016 at 2:06 AM, Vimal Manohar notifications@github.com
wrote:

Regarding Vijay's comments:

I would like to initiate some discussions :
E.g. given the recent changes in rnn training ( including self-repair and
gradient clipping) do we still need to maintain two training scripts for
frame level training (both in am training and raw model training). Which
has additional stages for model shrinkage.

I think it would be easy to re-factor the scripts and move common lines
into methods that can be imported by the train scripts and thus create very
short train scripts.
e.g. the main difference between the raw model training and AM training is
the realignment stage, and the setting-up stage from ali-dir etc. But the
entire method TrainOneIteration and TrainNewModels can be moved into a
different python library like in steps/nnet3/libs/train_lib.py.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1066 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADtwoD2D3ZE22ldyg10SPcI_BkrN9AU9ks5qu1VggaJpZM4KHQuy
.

vimalmanohar · 2016-09-29T06:42:58Z

Please look at the commint d074e56 for refactoring DNN training. A similar one can be done for RNN.

vijayaditya · 2016-10-01T20:26:17Z

@vimalmanohar added some comments on your commit d074e56 . Could you also please update the steps/nnet3/train_rnn.py script to use your new library, wherever possible.

vimalmanohar

@vijayaditya I have made the changes and tried to merge the RNN and DNN scripts. But it seems there are many differing options. So for now, I have kept the functions in two separate libraries. If you think there is a way to merge them, let me know.

vijayaditya · 2016-10-04T18:41:11Z

could you resolve the conflicts I will start the review.

vimalmanohar · 2016-10-04T22:03:16Z

@vijayaditya Resolved conflicts with master.

vijayaditya

Completed one round, will resume after the requested changes have been made.

@freewym It would be better to have another set of eyes review this, as this is a change which affects all the recipes. Do you have time to review this ?

vijayaditya · 2016-10-06T00:18:11Z

egs/wsj/s5/steps/nnet3/libs/rnn_train_lib.py

@@ -0,0 +1,348 @@
+#!/usr/bin/env python
+
+


Please add the comment

This is a module with methods which will be used by acoustic model training and raw model (i.e., generic neural network without transition model) training scripts.

BTW mention that these are normal frame level training scripts.

vijayaditya · 2016-10-06T00:18:57Z

egs/wsj/s5/steps/nnet3/libs/rnn_train_lib.py

+import logging
+import imp
+
+imp.load_source('nnet3_train_lib', 'steps/nnet3/nnet3_train_lib.py')


why are you importing the same module three different times ?

Its not importing three times. First is to load module from source, then import module and then import functions from module.

It looks very redundant. If you just want to import the functions then directly use from <module> import *, you would just have to modify the sys.pathin the script to add the module address as these are not in the syspath by default. regarding the import commands just pick one and stay consistent. BTW I would recommend using the import <module> as the namespace specification when calling function makes it easy to debug, and helps avoid issues when functions with similar signatures are defined in different modules (e.g. this happens a lot in python modules for chain and xent training).

vijayaditya · 2016-10-06T00:21:32Z

egs/wsj/s5/steps/nnet3/libs/rnn_train_lib.py

+
+    # Set off jobs doing some diagnostics, in the background.
+    # Use the egs dir from the previous iteration for the diagnostics
+    logger.info("Training neural net (pass {0})".format(iter))


I usually add details-of-interest, e.g. learning rate per iteration, shrinkage value etc.

Is there an example script for this?

vijayaditya · 2016-10-06T00:22:21Z

egs/wsj/s5/steps/nnet3/libs/rnn_train_lib.py

+    cache_read_opt = ""
+    if iter > 0 and (iter <= (num_hidden_layers-1) * add_layers_period) and (iter % add_layers_period == 0):
+
+        do_average = False # if we've just mixed up, don't do averaging but take the


we no longer have mix-up so please update this comment.

vijayaditya · 2016-10-06T00:25:30Z

egs/wsj/s5/steps/nnet3/lstm/make_configs.py

@@ -86,6 +95,14 @@ def GetArgs():
    parser.add_argument("--lstm-delay", type=str, default=None,
                        help="option to have different delays in recurrence for each lstm")



add a comment describing this group of options.

vijayaditya · 2016-10-06T00:43:35Z

egs/wsj/s5/steps/nnet3/nnet3_train_lib.py

               egs_dir = egs_dir), wait = wait)

+def ComputeProgress(dir, iter, egs_dir, run_opts, mb_size=256, wait=False, use_raw_nnet = False):


use_raw_nnet does not seem to be appropriate name for the variable as you are using raw networks in both cases. Better use the name get_raw_nnet_from_am=True.

vijayaditya · 2016-10-06T00:44:41Z

egs/wsj/s5/steps/nnet3/report/generate_plots.py

@@ -48,6 +48,7 @@ def GetArgs():
    parser.add_argument("--comparison-dir", type=str, action='append', help="other experiment directories for comparison. These will only be used for plots, not tables")
    parser.add_argument("--start-iter", type=int, help="Iteration from which plotting will start", default = 1)
    parser.add_argument("--is-chain", type=str, default = False, action = train_lib.StrToBoolAction, help="Iteration from which plotting will start")
+    parser.add_argument("--is-linear-objf", type=str, default = True, action = train_lib.StrToBoolAction, help="Nnet trained with linear objective as against with quadratic objective")


provide the choices argument for bool type variable. See other boolean inputs for example.

Actually I think you can make this multiple choice variable with options {linear,quadratic}. This would help future-proof the code as we can add more cost functions.

vijayaditya · 2016-10-06T00:46:52Z

egs/wsj/s5/steps/nnet3/tdnn/make_configs.py

@@ -116,6 +112,13 @@ def GetArgs():
    parser.add_argument("--use-presoftmax-prior-scale", type=str, action=nnet3_train_lib.StrToBoolAction,
                        help="if true, a presoftmax-prior-scale is added",
                        choices=['true', 'false'], default = True)
+


add a comment describing this group of options and where you plan to use them.

vijayaditya · 2016-10-06T00:49:54Z

egs/wsj/s5/steps/nnet3/train_raw_dnn.py

+
+    if args.use_dense_targets:
+        target_type = "dense"
+        compute_accuracy = False


add a comment here describing why compute_accuracy is false for dense type. This is not readily obvious as even dense posterior targets can exist.

I will test this out and see if the code supports this.

vijayaditya · 2016-10-06T00:51:09Z

egs/wsj/s5/steps/nnet3/train_raw_rnn.py

+    # we add compulsary arguments as named arguments for readability
+    parser = argparse.ArgumentParser(description="""
+    Trains an RNN acoustic model using the cross-entropy objective.
+    RNNs include LSTMs, BLSTMs and GRUs.


Update the usage message.

danpovey · 2016-10-06T00:57:43Z

Perhaps some of you could run your work with this for a while, to see if
any bugs show up.

On Wed, Oct 5, 2016 at 8:53 PM, Vijayaditya Peddinti <
notifications@github.com> wrote:

@vijayaditya requested changes on this pull request.

Completed one round, will resume after the requested changes have been
made.

@freewym https://github.com/freewym It would be better to have another
set of eyes review this, as this is a change which affects all the recipes.

Do you have time to review this ?

In egs/wsj/s5/steps/nnet3/libs/rnn_train_lib.py
#1066 (review):

@@ -0,0 +1,348 @@
+#!/usr/bin/env python
+
+

Please add the comment

This is a module with methods which will be used by acoustic model training and raw model (i.e., generic neural network without transition model) training scripts.

In egs/wsj/s5/steps/nnet3/libs/rnn_train_lib.py
#1066 (review):

@@ -0,0 +1,348 @@
+#!/usr/bin/env python
+
+
+# Copyright 2016 Vijayaditya Peddinti.
+# 2016 Vimal Manohar
+# Apache 2.0.
+
+import logging
+import imp
+
+imp.load_source('nnet3_train_lib', 'steps/nnet3/nnet3_train_lib.py')

why are you importing the same module three different times ?

In egs/wsj/s5/steps/nnet3/libs/rnn_train_lib.py
#1066 (review):

@@ -0,0 +1,348 @@
+#!/usr/bin/env python
+
+

BTW mention that these are normal frame level training scripts.

In egs/wsj/s5/steps/nnet3/libs/rnn_train_lib.py
#1066 (review):
   open('{0}/.error'.format(dir), 'w').close()
   raise Exception("There was error during training iteration {0}".format(iter))
+def TrainOneIteration(dir, iter, srand, egs_dir,
                 num_jobs, num_archives_processed, num_archives,
                 learning_rate, shrinkage_value, num_chunk_per_minibatch,
                 num_hidden_layers, add_layers_period,
                 left_context, right_context, min_deriv_time,
                 momentum, max_param_change, shuffle_buffer_size,
                 cv_minibatch_size, run_opts,
                 compute_accuracy = True, use_raw_nnet = False):
Set off jobs doing some diagnostics, in the background.

Use the egs dir from the previous iteration for the diagnostics

logger.info("Training neural net (pass {0})".format(iter))
I usually add details-of-interest, e.g. learning rate per iteration,

shrinkage value etc.

In egs/wsj/s5/steps/nnet3/libs/rnn_train_lib.py
#1066 (review):
else:
   f = open('{0}/srand'.format(dir), 'w')
   f.write(str(srand))
   f.close()
ComputeTrainCvProbabilities(dir, iter, egs_dir, run_opts, mb_size=cv_minibatch_size, use_raw_nnet = use_raw_nnet, compute_accuracy = compute_accuracy)
if iter > 0:
   ComputeProgress(dir, iter, egs_dir, run_opts, mb_size=cv_minibatch_size, use_raw_nnet = use_raw_nnet)
an option for writing cache (storing pairs of nnet-computations

and computation-requests) during training.

cache_read_opt = ""

if iter > 0 and (iter <= (num_hidden_layers-1) * add_layers_period) and (iter % add_layers_period == 0):
   do_average = False # if we've just mixed up, don't do averaging but take the
we no longer have mix-up so please update this comment.

In egs/wsj/s5/steps/nnet3/lstm/make_configs.py
#1066 (review):

@@ -86,6 +95,14 @@ def GetArgs():
parser.add_argument("--lstm-delay", type=str, default=None,
help="option to have different delays in recurrence for each lstm")

add a comment describing this group of options.

In egs/wsj/s5/steps/nnet3/lstm/make_configs.py
#1066 (review):
 config_files[config_dir + '/init.config'] = init_config_lines
prev_layer_output = nodes.AddLdaLayer(config_lines, "L0", prev_layer_output, config_dir + '/lda.mat')

if add_lda:

Write a comment giving an example of a scenario where this variable will

be false.

In egs/wsj/s5/steps/nnet3/nnet3_train_lib.py
#1066 (review):

@@ -205,6 +260,28 @@ def ParseModelConfigVarsFile(var_file):
 raise Exception('Error while parsing the file {0}'.format(var_file))
+def ParseModelConfigGenericVarsFile(var_file):

Could you rename this to ParseGenericModelVarsFile. The current name is a

bit difficult to parse.

In egs/wsj/s5/steps/nnet3/nnet3_train_lib.py
#1066 (review):

@@ -242,6 +319,53 @@ def GenerateEgs(data, alidir, egs_dir,
egs_dir = egs_dir,
egs_opts = egs_opts if egs_opts is not None else '' ))

+def GenerateEgsFromTargets(data, targets_scp, egs_dir,

may be call it GenerateEgsUsingTargets. Also add a comment saying that

this differs from the other method which uses alignments.

In egs/wsj/s5/steps/nnet3/nnet3_train_lib.py
#1066 (review):
     "ark,bg:nnet3-merge-egs --minibatch-size={mb_size} ark:{egs_dir}/valid_diagnostic.egs ark:- |"
 """.format(command = run_opts.command,
            dir = dir,
            iter = iter,
            mb_size = mb_size,
            model = model,
          compute_prob_opts = compute_prob_opts,
Code can me made more compact.

comput_prob_opts = '--compute-accuracy' if compute_accuracy else ''

In egs/wsj/s5/steps/nnet3/nnet3_train_lib.py
#1066 (review):
            egs_dir = egs_dir), wait = wait)
+def ComputeProgress(dir, iter, egs_dir, run_opts, mb_size=256, wait=False, use_raw_nnet = False):

use_raw_nnet does not seem to be appropriate name for the variable as you
are using raw networks in both cases. Better use the name

get_raw_nnet_from_am=True.

In egs/wsj/s5/steps/nnet3/report/generate_plots.py
#1066 (review):

@@ -48,6 +48,7 @@ def GetArgs():
parser.add_argument("--comparison-dir", type=str, action='append', help="other experiment directories for comparison. These will only be used for plots, not tables")
parser.add_argument("--start-iter", type=int, help="Iteration from which plotting will start", default = 1)
parser.add_argument("--is-chain", type=str, default = False, action = train_lib.StrToBoolAction, help="Iteration from which plotting will start")

parser.add_argument("--is-linear-objf", type=str, default = True, action = train_lib.StrToBoolAction, help="Nnet trained with linear objective as against with quadratic objective")

provide the choices argument for bool type variable. See other boolean

inputs for example.

In egs/wsj/s5/steps/nnet3/report/generate_plots.py
#1066 (review):

@@ -48,6 +48,7 @@ def GetArgs():
parser.add_argument("--comparison-dir", type=str, action='append', help="other experiment directories for comparison. These will only be used for plots, not tables")
parser.add_argument("--start-iter", type=int, help="Iteration from which plotting will start", default = 1)
parser.add_argument("--is-chain", type=str, default = False, action = train_lib.StrToBoolAction, help="Iteration from which plotting will start")

parser.add_argument("--is-linear-objf", type=str, default = True, action = train_lib.StrToBoolAction, help="Nnet trained with linear objective as against with quadratic objective")

Actually I think you can make this multiple choice variable with options
{linear,quadratic}. This would help future-proof the code as we can add

more cost functions.

In egs/wsj/s5/steps/nnet3/tdnn/make_configs.py
#1066 (review):

@@ -116,6 +112,13 @@ def GetArgs():
parser.add_argument("--use-presoftmax-prior-scale", type=str, action=nnet3_train_lib.StrToBoolAction,
help="if true, a presoftmax-prior-scale is added",
choices=['true', 'false'], default = True)
+

add a comment describing this group of options and where you plan to use

them.

In egs/wsj/s5/steps/nnet3/train_raw_dnn.py
#1066 (review):
       raise Exception("Mismatch between num-targets provided to "
                       "script vs configs")
if (args.stage <= -5):
   logger.info("Initializing a basic network for estimating preconditioning matrix")
   RunKaldiCommand("""
+{command} {dir}/log/nnet_init.log \
nnet3-init --srand=-2 {dir}/configs/init.config {dir}/init.raw

""".format(command = run_opts.command,
          dir = args.dir))
default_egs_dir = '{0}/egs'.format(args.dir)
if args.use_dense_targets:
   target_type = "dense"
   compute_accuracy = False
add a comment here describing why compute_accuracy is false for dense

type. This is not readily obvious as even dense posterior targets can exist.

In egs/wsj/s5/steps/nnet3/train_raw_rnn.py
#1066 (review):

+logger = logging.getLogger(name)
+logger.setLevel(logging.INFO)
+handler = logging.StreamHandler()
+handler.setLevel(logging.INFO)
+formatter = logging.Formatter('%(asctime)s [%(filename)s:%(lineno)s - %(funcName)s - %(levelname)s ] %(message)s')
+handler.setFormatter(formatter)
+logger.addHandler(handler)
+logger.info('Starting RNN trainer (train_raw_rnn.py)')
+
+
+def GetArgs():

we add compulsary arguments as named arguments for readability

parser = argparse.ArgumentParser(description="""

Trains an RNN acoustic model using the cross-entropy objective.

RNNs include LSTMs, BLSTMs and GRUs.

Update the usage message.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1066 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVuzisUB6a7X6FjsmLIwyBqE3gmWaKks5qxEaPgaJpZM4KHQuy
.

freewym · 2016-10-06T01:13:22Z

OK. Will do.

vimalmanohar · 2016-10-06T03:21:07Z

@vijayaditya Made all the changes.

danpovey · 2016-10-12T23:09:33Z

@vijayaditya, where are we on this pull request?

danpovey · 2016-10-17T19:20:28Z

egs/wsj/s5/steps/nnet3/lstm/make_configs.py

@@ -86,6 +99,16 @@ def GetArgs():
    parser.add_argument("--lstm-delay", type=str, default=None,
                        help="option to have different delays in recurrence for each lstm")

+    # Options to convert input MFCC into Fbank features. This is useful when a
+    # LDA layer is not added (such as when using dense targets)
+    parser.add_argument("--cepstral-lifter", type=float, dest = "cepstral_lifter",


Is there a reason why you need to use filterbanks instead of MFCCs? E.g. you have a convolutional architecture?

The notion that may people have Fbank being better than MFCCs is mostly a misunderstanding, since filterbanks are generally extracted with a higher dimension and that's what matters. If this turns out to be removable, it would be a good simplification.

I thought it might be useful in cases where LDA is not used.

danpovey · 2016-10-17T19:22:08Z

egs/wsj/s5/steps/nnet3/tdnn/make_configs.py

+                        "e.g. 22.0", default=22.0)
+
+    parser.add_argument("--add-idct", type=str, action=nnet3_train_lib.StrToBoolAction,
+                        help="Add an IDCT after input to convert MFCC to Fbank", default = False)


... unless this fbank stuff is being used, I think it's better to remove it for now.

danpovey · 2016-10-17T19:32:40Z

What testing has been done on regular RNN and DNN training, since this affects those scripts?

danpovey · 2016-10-17T21:02:46Z

It needs to be made much more clear, in comments in the source, which of these files are deprecated and which are supposed to be the way forward. Does libs/train_lib.py deprecate nnet3_train_lib.py? This should be clarified in both source files, if so. Are there now python scripts that still use the old library that need to be upgraded to the new library? If so, that should be mentioned in comments in the source too.

danpovey · 2016-10-18T02:47:19Z

I'd rather not add any feature to these scripts unless there is a concrete
scenario where it's known to be useful... otherwise we add work to maintain
these features and they won't get tested properly.

On Mon, Oct 17, 2016 at 10:44 PM, Vimal Manohar notifications@github.com
wrote:

@vimalmanohar commented on this pull request.

In egs/wsj/s5/steps/nnet3/lstm/make_configs.py
#1066:

@@ -86,6 +99,16 @@ def GetArgs():
parser.add_argument("--lstm-delay", type=str, default=None,
help="option to have different delays in recurrence for each lstm")

Options to convert input MFCC into Fbank features. This is useful when a

LDA layer is not added (such as when using dense targets)

parser.add_argument("--cepstral-lifter", type=float, dest = "cepstral_lifter",

I thought it might be useful in cases where LDA is not used.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1066, or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVu5RuUWF6xIsAAW8iKdLWdMFsO8fUks5q1DJ4gaJpZM4KHQuy
.

vimalmanohar · 2016-10-18T02:55:06Z

No testing has been done on regular RNN and DNN training. But I can do that.
libs/train_lib.py does not in the current commit deprecate nnet3_train_lib.py. libs/train_lib.py contains only the top level functions related to DNN training, while libs/rnn_train_lib.py contains the functions related to RNN training.
nnet3_train_lib.py remains almost the same as before. I am not sure if we should move all the functions in nnet3_train_lib.py to libs/train_lib.py.

danpovey · 2016-10-18T02:58:24Z

The organization seems OK, but we need to have some clarity on what the
plan is for the long term. Have you and Vijay agreed something with regard
to that?

On Mon, Oct 17, 2016 at 10:55 PM, Vimal Manohar notifications@github.com
wrote:

No testing has been done on regular RNN and DNN training. But I can do
that.
libs/train_lib.py does not in the current commit deprecate
nnet3_train_lib.py. libs/train_lib.py contains only the top level functions
related to DNN training, while libs/rnn_train_lib.py contains the functions
related to RNN training.
nnet3_train_lib.py remains almost the same as before. I am not sure if we
should move all the functions in nnet3_train_lib.py to libs/train_lib.py.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1066 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu_-bJ3M0NMCrxM9Pw09HeTETrFIxks5q1DUOgaJpZM4KHQuy
.

vimalmanohar · 2016-10-18T03:03:33Z

Initially, we wanted to combine the DNN and RNN training libraries, but there are a few options that are different like min_deriv_time, num_chunk_per_minibatch and shrinkage. So that wasn't done. @vijayaditya might have better opinion on this.

vimalmanohar · 2016-10-18T03:09:31Z

I believe this organization is more scalable in the long term. If @pegahgh could use these training libraries to simplify the multilingual training script, that would be a good way to check the usefulness of these changes.

danpovey · 2016-10-18T03:11:56Z

Can you please write something here explaining what the various libraries
are (the ones you propose to add and the existing one), what their purposes
are and how they relate to each other, and which top-level scripts use
which libraries? Also which scripts should use which libraries (in the
future, in your ideal scenario), and which scripts are deprecated and
should not be used at all?

Think about it carefully-- not time sensitive.

On Mon, Oct 17, 2016 at 11:09 PM, Vimal Manohar notifications@github.com
wrote:

I believe this organization is more scalable in the long term. If @pegahgh
https://github.com/pegahgh could use these training libraries to
simplify the multilingual training script, that would be a good way to
check the usefulness of these changes.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1066 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu8h17JJRHZ3st3VodvFlel_WOy76ks5q1DhugaJpZM4KHQuy
.

pegahgh · 2016-10-18T15:47:18Z

I already use the new libraries for multilingual setup. I committed new
scripts for multilingual setup yesterday, which uses recent raw python
scripts.

On Mon, Oct 17, 2016 at 11:09 PM, Vimal Manohar notifications@github.com
wrote:

I believe this organization is more scalable in the long term. If @pegahgh
https://github.com/pegahgh could use these training libraries to
simplify the multilingual training script, that would be a good way to
check the usefulness of these changes.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1066 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AI-PtD3RBhXqw1msPV1tOFGwjDNiq_6bks5q1Dh1gaJpZM4KHQuy
.

danpovey · 2016-10-18T17:02:49Z

Pegah, if you could explain how they differ from the existing libraries it
would be good, you might provide some complementary info to what Vimal is
preparing.

On Tue, Oct 18, 2016 at 11:47 AM, pegahgh notifications@github.com wrote:

I already use the new libraries for multilingual setup. I committed new
scripts for multilingual setup yesterday, which uses recent raw python
scripts.

On Mon, Oct 17, 2016 at 11:09 PM, Vimal Manohar notifications@github.com
wrote:

I believe this organization is more scalable in the long term. If
@pegahgh
https://github.com/pegahgh could use these training libraries to
simplify the multilingual training script, that would be a good way to
check the usefulness of these changes.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1066 (comment),
or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AI-
PtD3RBhXqw1msPV1tOFGwjDNiq_6bks5q1Dh1gaJpZM4KHQuy>
.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1066 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu-Y2g7DQezMa-B62pHJ_4ZDaFF2Tks5q1OoKgaJpZM4KHQuy
.

vijayaditya · 2016-11-11T17:44:05Z

Its your call, if you want to refactor all these other scripts. Till now we re-implemented any necessary functions in these other scripts as there was no common library and we did not want to create dependency with nnet3 in these other scripts.

As your proposal requires maintaining a parallel sub-directory structure in the steps directory, corresponding to the python libs package, corresponding to the sub-directory structure of shell and perl scripts in steps (there are a lot more perl/shell scripts in the parent directory than in nnet3), I would wait for @danpovey 's comments.

Also remember that we have had some python scripts even in the utils/data directory which required some functions like RunKaldiCommand or GetFeatDim. So you should also decide if you want utils/data/ to depend on steps/libs or if you want to store the common python functions in a different location.

vijayaditya · 2016-11-11T17:46:38Z

Sorry I meant

As your proposal requires maintaining a parallel sub-directory structure in the steps directory, corresponding to the python libs package, _in addition_ to the sub-directory structure of shell and perl scripts

danpovey · 2016-11-11T20:13:41Z

Let's keep it localized to nnet3 for now. For the most part I'd prefer
python scripts in steps/ to be self-contained, and I don't anticipate a
general move of converting bash scripts to python scripts.

On Fri, Nov 11, 2016 at 12:46 PM, Vijayaditya Peddinti <
notifications@github.com> wrote:

Sorry I meant

As your proposal requires maintaining a parallel sub-directory structure
in the steps directory, corresponding to the python libs package, in
addition to the sub-directory structure of shell and perl scripts

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1066 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu9fmuxYiPq0alzPRFqokiT5SfUEfks5q9Kn_gaJpZM4KHQuy
.

vimalmanohar · 2016-11-14T23:00:21Z

Reorganizing libraries into packages, and incorporating changes from #1194 and dealing with issues #847 and #520.

vimalmanohar

I have created the first python file conforming to PEP8 standards: egs/wsj/s5/steps/libs/nnet3/train/common.py.
Please let me know if there are some any issues that need to be fixed. @danpovey @vijayaditya

danpovey · 2016-11-15T03:45:16Z

Looks good to me.

On Mon, Nov 14, 2016 at 10:15 PM, Vimal Manohar notifications@github.com
wrote:

@vimalmanohar commented on this pull request.

I have created the first python file conforming to PEP8 standards:
egs/wsj/s5/steps/libs/nnet3/train/common.py.
Please let me know if there are some any issues that need to be fixed.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1066 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVu9blDMFawA_Md1xNy0QPSsKlXkLnks5q-SPMgaJpZM4KHQuy
.

danpovey · 2016-11-17T02:44:41Z

This branch has conflicts..

vijayaditya · 2016-11-17T23:26:00Z

@vimalmanohar Could you let me know what is the expected merge date for this PR ?
Xconfig testing requires some changes to the training scripts and I would like to know where these have to be done.

vimalmanohar · 2016-11-18T04:23:30Z

I have to re-run all the scripts again after reorganizing libraries and adding max-deriv-time. It will take at least a week.
It is better to add your xconfig testing and all nnet3 work over this branch.

danpovey · 2016-11-18T04:57:19Z

If the changes are not too extensive it might be easier to make them to the
old scripts and have Vimal port them to the new scripts... that would mean
you don't have to redo all your testing. I'm concerned that by merging
these two substantial changes into one pull request, we'd never be done
with it. But up to you, Vijay.

On Thu, Nov 17, 2016 at 11:23 PM, Vimal Manohar notifications@github.com
wrote:

I have to re-run all the scripts again after adding max-deriv-time. It
will take at least a week.
It is better to add your xconfig testing and all nnet3 work over this
branch.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1066 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVuxEpZK539HF7yN3AcvgAfH5hnDSzks5q_ShEgaJpZM4KHQuy
.

vijayaditya · 2016-11-18T06:19:45Z

OK I will just hack the configs dir to make it back-compatible and make the
actual changes once Vimal checks in the recipes. This will help us avoid
two rounds of training script testing.

Vijay

On Nov 17, 2016 23:57, "Daniel Povey" notifications@github.com wrote:

If the changes are not too extensive it might be easier to make them to the
old scripts and have Vimal port them to the new scripts... that would mean
you don't have to redo all your testing. I'm concerned that by merging
these two substantial changes into one pull request, we'd never be done
with it. But up to you, Vijay.

On Thu, Nov 17, 2016 at 11:23 PM, Vimal Manohar notifications@github.com
wrote:

I have to re-run all the scripts again after adding max-deriv-time. It
will take at least a week.
It is better to add your xconfig testing and all nnet3 work over this
branch.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1066 (comment),
or mute
the thread
<https://github.com/notifications/unsubscribe-auth/
ADJVuxEpZK539HF7yN3AcvgAfH5hnDSzks5q_ShEgaJpZM4KHQuy>
.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1066 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADtwoCvtpOUJISuknFx_XhGnbufUX1iOks5q_TAxgaJpZM4KHQuy
.

danpovey · 2016-11-26T03:03:10Z

egs/wsj/s5/steps/nnet3/train_dnn.py


-logger = logging.getLogger(__name__)
+logger = logging.getLogger('libs')


why does just this call to getLogger use the name 'libs'?
BTW, please fix the issues that Gaofeng found (see email).

vimalmanohar · 2016-11-26T03:20:38Z

All the top-level scripts must use this. I will fix it if not. It will automatically add 'subloggers' for all modules in libs like libs.common, libs.nnet3 etc. These subloggers will inherit the properties of the top-level logger such as the verbosity level etc.

On Fri, Nov 25, 2016 at 10:03 PM Daniel Povey ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In egs/wsj/s5/steps/nnet3/train_dnn.py <#1066 (review)>: > -logger = logging.getLogger(__name__) +logger = logging.getLogger('libs') why does just this call to getLogger use the name 'libs'? BTW, please fix the issues that Gaofeng found (see email). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1066 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEATV2yxrBFTw2rJDLCOtCF2yfjA-LIlks5rB6FygaJpZM4KHQuy> .

-- Vimal Manohar PhD Student Electrical & Computer Engineering Johns Hopkins University

GaofengCheng · 2016-11-27T01:30:42Z

egs/wsj/s5/steps/nnet3/report/generate_plots.py

@@ -478,7 +481,7 @@ def Main():
    GeneratePlots(args.exp_dir, args.output_dir,
                  comparison_dir = args.comparison_dir,
                  start_iter = args.start_iter,
-                  is_chain = args.is_chain)
+                  objective_type = args.objective_type)

 if __name__ == "__main__":
    Main()


Vimal, I think the module loading in this file should also be changed, train_lib = imp.load_source('ntl', 'steps/nnet3/nnet3_train_lib.py') is deleted in your PR

danpovey · 2016-11-27T21:07:38Z

egs/wsj/s5/steps/libs/nnet3/train/common.py

+                   remove_egs=True,
+                   get_raw_nnet_from_am=True):
+    try:
+        if remove_egs:


This part of the code crashed for me with the error:
TypeError: 'bool' object is not callable
Looks like there is a name clash after changing the function naming style. I'd suggest to rename the function to e.g. remove_nnet_egs.

danpovey · 2016-11-29T00:44:28Z

egs/wsj/s5/steps/nnet3/chain/train.py

-                        "backpropagated up to t=-5 and t=154 in the forward and backward LSTM sequence respectively; "
-                        "otherwise, the derivative will be backpropagated to the end of the sequence.")
+    parser.add_argument("--trainer.num-chunk-per-minibatch",
+                        "--trainer.rnn.num-chunk-per-minibatch",


Vimal, this line is a bit confusing. Is it intentional, or is it some kind of bug?

vimalmanohar · 2016-11-29T01:17:10Z

It was intentional, but I should remove the option "--trainer.rnn.num-chunk-per-minibatch" and keep only "--trainer.num-chunk-per-minibatch". since it is not an RNN-specific option. It should be removed from the section # RNN specific trainer options.

On Mon, Nov 28, 2016 at 7:44 PM Daniel Povey ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In egs/wsj/s5/steps/nnet3/chain/train.py <#1066 (review)>: > # RNN specific trainer options - parser.add_argument("--trainer.num-chunk-per-minibatch", type=int, dest='num_chunk_per_minibatch', - default=512, - help="Number of sequences to be processed in parallel every minibatch" ) - parser.add_argument("--trainer.deriv-truncate-margin", type=int, dest='deriv_truncate_margin', - default = None, - help="If specified, it is the number of frames that the derivative will be backpropagated through the chunk boundaries, " - "e.g., During BLSTM model training if the chunk-width=150 and deriv-truncate-margin=5, then the derivative will be " - "backpropagated up to t=-5 and t=154 in the forward and backward LSTM sequence respectively; " - "otherwise, the derivative will be backpropagated to the end of the sequence.") + parser.add_argument("--trainer.num-chunk-per-minibatch", + "--trainer.rnn.num-chunk-per-minibatch", Vimal, this line is a bit confusing. Is it intentional, or is it some kind of bug? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1066 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEATV5QfqctJSh5-78yV4sPi0W2K5Ct-ks5rC3VwgaJpZM4KHQuy> .

-- Vimal Manohar PhD Student Electrical & Computer Engineering Johns Hopkins University

…cript

vimalmanohar · 2016-12-03T02:08:59Z

Is this merged? Does this need to be closed?

danpovey · 2016-12-03T02:53:23Z

Yes you can close it, I merged all changes from this PR.

…

On Fri, Dec 2, 2016 at 9:09 PM, Vimal Manohar ***@***.***> wrote: Is this merged? Does this need to be closed? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1066 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu8zpfhPguNhAmoxOe7qteJNHNFCoks5rEM88gaJpZM4KHQuy> .

vimalmanohar · 2016-12-03T02:59:08Z

All changes merged in #1229

Fixes a bug that would have affected nnet3 (non-chain) TDNN training since PR #1066 was merged 2 weeks ago. Would have slowed it down, and affected results in an unpredictable way.

vijayaditya requested changes Sep 27, 2016

View reviewed changes

vimalmanohar force-pushed the raw_python_script branch 2 times, most recently from 5416bbc to 0782aab Compare October 2, 2016 00:20

vimalmanohar commented Oct 2, 2016

View reviewed changes

vijayaditya requested changes Oct 6, 2016

View reviewed changes

vimalmanohar force-pushed the raw_python_script branch from 91d7334 to 5b17a4c Compare October 6, 2016 03:08

vijayaditya mentioned this pull request Oct 8, 2016

WIP: Added babel_multilang example directory for multilingual setup using babel languages. #1027

Closed

danpovey reviewed Oct 17, 2016

View reviewed changes

vimalmanohar commented Nov 15, 2016

View reviewed changes

vijayaditya changed the title ~~raw_python_script: Adding raw nnet training~~ Refactoring nnet3 python scripts and adding raw nnet training Nov 15, 2016

freewym mentioned this pull request Nov 17, 2016

fixed "max_deriv_time unset" issue for BLSTM #1165

Merged

vimalmanohar force-pushed the raw_python_script branch 2 times, most recently from e7f7075 to b69c161 Compare November 23, 2016 19:18

danpovey reviewed Nov 26, 2016

View reviewed changes

vimalmanohar force-pushed the raw_python_script branch from 009300e to 6102c60 Compare November 26, 2016 03:11

GaofengCheng reviewed Nov 27, 2016

View reviewed changes

danpovey mentioned this pull request Nov 27, 2016

Lstm integration #1219

Closed

danpovey reviewed Nov 27, 2016

View reviewed changes

danpovey reviewed Nov 29, 2016

View reviewed changes

vimalmanohar added 2 commits November 30, 2016 15:59

Merge branch 'master' of github.com:kaldi-asr/kaldi into raw_python_s…

fbde08c

…cript

Merge branch 'master' of github.com:kaldi-asr/kaldi into raw_python_s…

35df58b

…cript

vimalmanohar closed this Dec 3, 2016

		@@ -86,6 +95,14 @@ def GetArgs():
		parser.add_argument("--lstm-delay", type=str, default=None,
		help="option to have different delays in recurrence for each lstm")

		egs_dir = egs_dir), wait = wait)

		def ComputeProgress(dir, iter, egs_dir, run_opts, mb_size=256, wait=False, use_raw_nnet = False):


		logger = logging.getLogger(__name__)
		logger = logging.getLogger('libs')

Refactoring nnet3 python scripts and adding raw nnet training #1066

Refactoring nnet3 python scripts and adding raw nnet training #1066

Conversation

vimalmanohar commented Sep 27, 2016

vijayaditya commented Sep 27, 2016

vijayaditya left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vimalmanohar Sep 27, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vimalmanohar Sep 27, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vimalmanohar commented Sep 29, 2016

vijayaditya commented Sep 29, 2016

vimalmanohar commented Sep 29, 2016

vijayaditya commented Oct 1, 2016

vimalmanohar left a comment

Choose a reason for hiding this comment

vijayaditya commented Oct 4, 2016

vimalmanohar commented Oct 4, 2016

vijayaditya left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danpovey commented Oct 6, 2016

Do you have time to review this ?

why are you importing the same module three different times ?

BTW mention that these are normal frame level training scripts.

Set off jobs doing some diagnostics, in the background.

Use the egs dir from the previous iteration for the diagnostics

shrinkage value etc.

an option for writing cache (storing pairs of nnet-computations

and computation-requests) during training.

we no longer have mix-up so please update this comment.

add a comment describing this group of options.

be false.

bit difficult to parse.

this differs from the other method which uses alignments.

comput_prob_opts = '--compute-accuracy' if compute_accuracy else ''

get_raw_nnet_from_am=True.

inputs for example.

more cost functions.

them.

type. This is not readily obvious as even dense posterior targets can exist.

we add compulsary arguments as named arguments for readability

freewym commented Oct 6, 2016

vimalmanohar commented Oct 6, 2016

danpovey commented Oct 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danpovey commented Oct 17, 2016

danpovey commented Oct 17, 2016

danpovey commented Oct 18, 2016

@vimalmanohar commented on this pull request.

Options to convert input MFCC into Fbank features. This is useful when a

LDA layer is not added (such as when using dense targets)

vimalmanohar Sep 27, 2016 •

edited

Loading

vimalmanohar Sep 27, 2016 •

edited

Loading

vimalmanohar left a comment •

edited

Loading

vimalmanohar commented Nov 18, 2016 •

edited

Loading