Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer learning wsj->rm #1633

Merged
merged 43 commits into from
Sep 15, 2017
Merged

Conversation

pegahgh
Copy link
Contributor

@pegahgh pegahgh commented May 18, 2017

This is the Transfer learning setup which transfers the model trained on wsj to rm.
The preliminary result without any tuning shows improvement(2.71 -> 2.21)

@danpovey
Copy link
Contributor

danpovey commented May 18, 2017 via email

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Various requests for more documentation.
At this point it's not easy for me to see what you are doing, so I'm not expressing a view on the actual content.

# This function reads an xconfig file and returns it as a list of layers
# (usually we use the variable name 'all_layers' elsewhere for this).
# It will die if the xconfig file is empty or if there was
# some error parsing it.
def read_xconfig_file(xconfig_filename):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no documentation of aux_layers. Also they are components, not layers so the name is wrong.

# This function reads existing model file with nnet3 format and returns it as
# list of layers with name and dimension to be used as auxilary information
# to generate xconfig.
def read_model(model_filename):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no documentation of what the return format of this function is, and the name is too non-specific (e.g. get_model_component_info). Don't use the word 'layer' if what you really mean is component, they are different things. And grep for '^component-node', not -node. You could otherwise get problems in future if something that looks like that appears in the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I grep '-node', since we need both component-node and input-node!

steps/nnet3/xconfig_to_configs.py --xconfig-file $dir/configs/network.xconfig \
--config-dir $dir/configs/ \
--nnet-edits="rename-node old-name=output-0 new-name=output"
--nnet-edits=$dir/configs/edits.config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nnet-edits expects a string, not a filename. also you are already supplying --config-dir as an arg to the script and I see that elsewhere you are treating edits.config as a standard part of that config directory, so there seems to be some disagreement about whether scripts are supposed to pick that up automatically, or whether it needs to be supplied explicitly.
If you are intending to change how that option works you should rename it and chang the documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the script to accept nnet-edits as filename and use it using edits-config. If you agree with that, I will update the description and option name as edits-config.
Also edits.config passed explicitly to xconfig_to_configs.py but train.py automatically picks up edits.config at prepare_acoustic_model function.

@@ -0,0 +1,184 @@
#!/bin/bash
set -e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add explanation of what this script does.

@@ -28,6 +28,12 @@ def get_args():
epilog='Search egs/*/*/local/{nnet3,chain}/*sh for examples')
parser.add_argument('--xconfig-file', required=True,
help='Filename of input xconfig file')
parser.add_argument('--existing-model',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc-string doesn't explain what this option expects, which is presumably a filename of a .mdl
file containing a previously trained neural net.

@@ -219,17 +225,19 @@ def write_config_files(config_dir, all_layers):
raise


def add_back_compatibility_info(config_dir, nnet_edits=None):
def add_back_compatibility_info(config_dir, existing_model=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is differently named in kaldi_52. Please make this PR against kaldi_52 as it will save me hassle in merging.

@@ -310,7 +310,8 @@ def train(args, run_opts, background_process_handler):
logger.info("Creating denominator FST")
chain_lib.create_denominator_fst(args.dir, args.tree_dir, run_opts)

if (args.stage <= -4):
if (args.stage <= -4) and os.path.exists("{dir}/config/init.config".format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The directory should be configs instead of config

@danpovey
Copy link
Contributor

danpovey commented May 22, 2017 via email

@danpovey danpovey changed the base branch from master to kaldi_52 May 23, 2017 21:35
@pegahgh
Copy link
Contributor Author

pegahgh commented May 24, 2017

@vimalmanohar you mentioned you already tested this PR. Would you let me update me with your point about this PR.

elif key == "input-dim":
input_dim = int(value)
elif key == "output-dim":
output_dim = int(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key can also be "dim", in which case it would be both input-dim and output-dim.

key_to_value['name'] = layer_name
if input_dim != -1:
if output_dim == -1:
# The layer is input layer type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what this is for. There may never be only input_dim defined with output_dim not defined.

# The layer is input layer type.
key_to_value['dim'] = input_dim
elif input_str is not None:
key_to_value['dim'] = output_dim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input_str is None for input-node, but its dim is still defined by 'dim'

all_layers = []
try:
f = open(model_filename, 'r')
except Exeption as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception

@pegahgh pegahgh changed the base branch from kaldi_52 to master June 7, 2017 18:57
@pegahgh
Copy link
Contributor Author

pegahgh commented Jun 9, 2017

@vimalmanohar I fixed some issues about this PR and merged it with master. Since you used this PR, do you have any comments on that?

@@ -424,6 +424,19 @@ def prepare_initial_acoustic_model(dir, run_opts, srand=-1):
common_train_lib.prepare_initial_network(dir, run_opts,
srand=srand)

# edits 0.raw using edits.config before adding transition model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an example for edits.config here and explain that it is used for transfer learning.

"".format(dir))
common_lib.execute_command(
"""{command} {dir}/log/edit.log \
nnet3-copy --edits-config={edits_config} {dir}/0.raw \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you move this inside prepare_initial_network as this is not just for chain objf, but generic for anything.
Also you are modifying 0.raw itself. Its better to write an intermediate model like pre_edit.raw.

if len(key_value) == 2:
key = key_value[0]
value = key_value[1]
if key == "name":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give examples for different types of lines that will be parsed. input-node, component-node etc.

key_to_value = dict()
layer_names.append(layer_name)
key_to_value['name'] = layer_name
if input_dim != -1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment which type of lines go to which case.

@@ -60,6 +60,10 @@ def get_dim_from_layer_name(all_layers, current_layer, full_layer_name):
for layer in all_layers:
if layer is current_layer:
break
# if "." used in layer name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give example for when this is applicable.

@@ -219,17 +227,20 @@ def write_config_files(config_dir, all_layers):
raise


def add_nnet_context_info(config_dir, nnet_edits=None):
def add_nnet_context_info(config_dir, existing_model=None,
edits_config=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check indentation.

model = """ - | nnet3-copy --edits-config={0} - {1}""".format(edits_config,
model)
common_lib.execute_command("""nnet3-init {0} {1}/ref.config """
""" {2} """.format(existing_model if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check indentation.
I am not sure if @danpovey wants this to be in Google standards, since the rest of this file is not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking this, Vimal.
I'd like it to conform to Google standards as much as possible, but don't be too strict with the column-length of 80, it's hard to stick to that in Python, what with the indentation.

@@ -202,11 +202,13 @@ EOF
echo " relu-renorm-layer name=prefinal-affine-lang-${lang_index} input=tdnn_bn dim=1024"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I this supposed to be in this PR? Also I do not see any WSJ -> RM examples.

@pegahgh
Copy link
Contributor Author

pegahgh commented Jul 14, 2017

@vimalmanohar would you 1) review the PR and comment on that and also 2) test it on MGB to verify if you can get the best results you had?
The new weight transfer degrades performance on wsj->rm, but it gives good improvement on libri->wsj.
Since the PR name is transfer-learning-wsj-rm, I didn't add scripts for libri->wsj, but I can add them on next commit and rename PR.

--nnet-edits="rename-node old-name=output-0 new-name=output"
--edits-config=$dir/configs/edits.config

cat <<EOF >> $dir/configs/vars
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this script relevant to this PR? Also include_log_softmax is no longer used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated xconfig_to_config to accept edits-config instead of nnet-edit, so this script should be updated.

@@ -0,0 +1,192 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this different from _1a?

Copy link
Contributor Author

@pegahgh pegahgh Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a soft link to _1a

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a softlink. Git will show it as softlink if it is. There is already local/chain/run_tdnn_wsj_rm.sh which is a softlink to _1a.

# and use already trained model on wsj and remove the last layer and
# add new randomly initialized layer and retrain the whole network.
# while training new added layer using rm data.
# The chain config is as run_tdnn_5n.sh and the result is:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is as" -> "is as in"

/export/b0{3,4,5,6}/$USER/kaldi-data/egs/rm-$(date +'%m_%d_%H_%M')/s5c/$dir/egs/storage $dir/egs/storage
fi
echo "$0: set the learning-rate-factor for initial network to be zero."
nnet3-am-copy --raw=true --edits="set-learning-rate-factor name=* learning-rate-factor=$primary_lr_factor" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use nnet3-copy instead of nnet3-am-copy --raw=true

mkdir -p $dir
mkdir -p $dir/configs
cat <<EOF > $dir/configs/network.xconfig
output-layer name=output-tmp input=tdnn6.renorm dim=$num_targets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create an empty xconfig file instead and also remove edits.config?

@@ -32,37 +32,51 @@ int main(int argc, char *argv[]) {
"Initialize un-smoothed phone language model for 'chain' training\n"
"Output in FST format (epsilon-free deterministic acceptor)\n"
"\n"
"Usage: chain-est-phone-lm [options] <phone-seqs-rspecifier> <phone-lm-fst-out>\n"
"Usage: chain-est-phone-lm [options] <phone-seqs-rspecifier-1> ... <phone-seqs-rspecifier-n> <phone-lm-fst-out>\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use [ ] to indicate that they are optional.

@@ -654,6 +654,32 @@ void ReadEditConfig(std::istream &edit_config_is, Nnet *nnet) {
}
}
KALDI_LOG << "Set learning rates for " << num_learning_rates_set << " nodes.";
} else if (directive == "set-learning-rate-factor") {
std::string name_pattern = "*";
// name_pattern defaults to '*' if non is given.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non -> none

# 2016 Vimal Manohar
# Apache 2.0.

""" This script is based on steps/nnet3/chain/train.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will do that, but I believe amount of changes to train.py is very low and we can still apply them to train.py!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add in the description how this script is different from train.py and what it is for.

args.lat_dir)

# Set some variables.
num_jobs = 4 #common_lib.get_number_of_jobs(args.lat_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this fixed at 4?

logger.info("Copying the properties from {0} to {1}".format(egs_dir, args.dir))
common_train_lib.copy_egs_properties_to_exp_dir(egs_dir, args.dir)

if (args.stage <= -2) and os.path.exists(args.dir+"/configs/init.config"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stage is not required in this script. But you should check that init.raw exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not still sure, if it is good idea to have separate script train_more.py!!

@danpovey
Copy link
Contributor

danpovey commented Jul 17, 2017 via email

@danpovey
Copy link
Contributor

danpovey commented Jul 17, 2017 via email

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a lot of comments-- after reading through only part of the PR.

But the closer I look at it, the less happy I am with the architecture. My particular concern is the way in which the already-trained models are modified. The way the edits-config is used, and the interaction with the train.py scripts, makes it very confusing. And I think you are using "rename-node" in an unnecessary way where it would be better to just overwrite the old layer name.

I think a better approach than this, would be to have the example script in local/ create a model file directly, e.g. named $dir/input.raw, and pass it in as an optional argument to the training scripts; if this is supplied to the training scripts, then the configs/ directory would not be interrogated (and they wouldn't attempt to initialize the LDA-like matrix).

I'm pondering whether it might be a better idea, even for the existing training scripts, to better separate the model initialization and the training, so that the input would be a .raw file (with LDA layer already initialized), instead of having the training scripts do this. But we can do this as a separate thread.


# configs for transfer learning
common_egs_dir=
#srcdir=../../wsj/s5/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should probably replace your own directory with the original $srcdir.
Also, for all scripts like this, I think you should check at the top of the script whether the required inputs, such as $src_mdl, are present.

srcdir=/export/a09/pegahgh/kaldi-transfer-learning/egs/wsj/s5-sp
src_mdl=$srcdir/exp/chain/tdnn1d_sp/final.mdl
src_lang=$srcdir/data/lang
src_gmm_mdl=$srcdir/exp/tri4b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why call this _mdl if it's a directory? It should be src_gmm_dir.

# while training new added layer using rm data.
# The chain config is as run_tdnn_5n.sh and the result is:
# System tdnn_5n tdnn_wsj_rm_1a tdnn_wsj_rm_1b tdnn_wsj_rm_1c
# WER 2.71 2.09 3.45 3.38
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should explain more about why you added the 1b and 1c scripts. I can see that the lexicon in this case may have words without pronunciations, since you use WSJ's lexicon. I guess you added these to demonstrate different ways to do it. But you're not clearly explaining what the difference is.
You say it uses " a src-tree-dir to generate new target alignment and lattices using source model".
But it seems to me a key difference that you didn't mention is that it uses the lexicon from WSJ. I'd like a clear explanation of what 1c is as well.

utils/create_split_dir.pl \
/export/b0{3,4,5,6}/$USER/kaldi-data/egs/rm-$(date +'%m_%d_%H_%M')/s5c/$dir/egs/storage $dir/egs/storage
fi
echo "$0: set the learning-rate-factor for initial network to be zero."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this 'zero' in the echo statement is not right. also I don't see that an echo statement is helpful here, you can use a comment.

/export/b0{3,4,5,6}/$USER/kaldi-data/egs/rm-$(date +'%m_%d_%H_%M')/s5c/$dir/egs/storage $dir/egs/storage
fi
echo "$0: set the learning-rate-factor for initial network to be zero."
nnet3-copy --edits="set-learning-rate-factor name=* learning-rate-factor=$primary_lr_factor" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see what the point of the primary_lr_factor is here-- if you are setting a learning-rate factor for all layers, IMO it would make more sense to just multiply the initial and final learning rates by that factor..

@@ -20,32 +21,64 @@
logger.addHandler(logging.NullHandler())


def create_phone_lm(dir, tree_dir, run_opts, lm_opts=None):
"""Create a phone LM for chain training
def create_phone_lm(dir, tree_dir, run_opts, alignment_dirs=None, lm_opts=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert your changes to this function and create another version of this function that takes its input from multiple alignment dirs. This will mean that we don't need to so carefully test whether this will break the normal code path; it also keeps the individual functions simple.
Also revert your changes to chain-est-phone-lm; what you need to do can be implemented much more simply.
You can just run a command for each source alignment dir that uses ali-to-phones to dump the phone sequences into a single gzipped archive for that source. And then you can combine the gzipped archives while repeating some of them the requisite number of times, e.g. "gunzip -c $dir/src1_phones.gz $dir/src2_phones.gz $dir/src2_phones.gz|", assuming the second source was repeated twice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vimal pointed out that this might fail due to argument list too long. But you can work around that by doing:
for x in [arg_list]; do gunzip -c $x; done | ...
I don't like the C++ code changes because having a scale that is required to be an integer is kind of weird, and because IMO it's just more elegant to repeat the input that many times, since that is what the code does. And also because of the principle that the "unusual case" shouldn't cause code to be modified in things used in the "common case". I want to keep the common case simple, which makes the codebase easier to maintain.

"{0}/ref.raw".format(config_dir))
def add_nnet_context_info(config_dir, existing_model=None,
edits_config=None):
"""This will be removed when python script refactoring is done."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like you have resolved a git conflict wrongly, this comment is an old comment that we removed.

right_context_final = (args.chunk_right_context_final + model_right_context if
args.chunk_right_context_final >= 0 else -1)

# Initialize as "raw" nnet, prior to training the LDA-like preconditioning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment seems misplaced.

"creating does not have an output node called 'output' "
"(e.g. for multilingual setups). You can set this to "
"an edit-string like: "
"an edits-config can contain string like: "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment still doesn't really explain what the option does. Explaining what an option is useful for is not a substitute for saying what it does.
Also you should say edits-config filename not config filename, because they are different things.
But I think you should just remove this option completely, because I think this option is based on a misunderstanding. I think you didn't realize that it's possible to overwrite nodes and components by creating a new node with the old one's name. Just, if an older layer was of a different type (e.g. LSTM vs. TDNN), there may be confusingly named 'orphone' components and component-nodes left around, so it may make sense to do 'remove-orphans' in this situation. You could have this code do remove-orphans automatically in the case when you provide an initial nnet to this script.

model)
out = common_lib.get_command_stdout('nnet3-info "{0}" | head -n 4 '
.format(model))
if edits_config is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole code block is unreadable and very unclear. And It doesn't look correct to me either (the - " at the beginning).

i.e. layer in *.mdl -> corresponding 'XconfigExistingLayer' layer
'input-node name=ivector dim=100' ->
'existing name=ivector dim=100'
'component-node name=tdnn1.affine ** input-node=1000 '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input-node -> input-dim and output-node -> output-dim

class XconfigExistingLayer(XconfigLayerBase):
"""
This class is for lines like
'existing name=tdnn1.affine dim=40'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation is not meaningful. Where I wrote:

 This class is for lines like 
     xxxxxx

the line xxxxx would appear in an xconfig file. But these wouldn't appear in an xconfig file.
I don't want to have to give you the documentation word for word. Figure it out.

echo "e.g.: $0 exp/tri1_ali exp/tri2_ali exp/chain/tdnn_1a_sp";
echo "Options: "
echo " --cmd (run.pl|queue.pl...) # specify how to run the sub-processes.";
echo "--lm-opts # options for phone LM generation";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please document the 'weights' here. It's important so it should go in the usage message as well as in the comment.

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more mostly cosmetic comments.

if [ -z $weights ]; then
# If 'weights' is not specified, comma-separated array '1' with dim
#'num_alignments' is defined as 'weights'.
for n in `seq 1 $num_alignments`;do weights="$weights,1"; done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't look right, there would be a leading comma.
And it's not ignored by cut:

echo ,a,b,c | cut -d, -f1

mac:nnet3: echo ,a,b,c | cut -d, -f2
a

That means this code path was not tested!

Copy link
Contributor

@vimalmanohar vimalmanohar Sep 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pegahgh Since you are already creating w_arr below, you can use w_arr = (); and then for num_alignment times do w_arr += (1.0). Then in line 90, you can use w=${w_arr[$n]}

rm $adir/alignment_files.txt 2>/dev/null || true
for x in `seq $w`;do
for j in `seq $num_jobs`;do
echo $adir/ali.$j.gz >> $adir/alignment_files.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are making this more complicated than it needs to be by writing to separate files. Write them all to the same file, $dir/alignment_files.txt. Then you don't need a loop in the $cmd script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need loop, since we need different final.mdl for "ali-to-phones $adir/final.mdl"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the complexity of this command is underestimated, and this is being made more difficult that it previously was.

@pegahgh it is better to write the alignments in $dir/alignment.$n.txt instead of inside $adir, since you may not have write access to that. Also there would be race-conditions if multiple experiments are run using the same alignment directories.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also write model in the file in a same line as alignment is written. Then we can have single file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will still be one loop inside the command. There needs to be more escaping to split each line into model and alignment, which might again make it more complicated.

}
}
KALDI_LOG << "Set learning rate factors for " << num_learning_rate_factors_set
<< " nodes.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change 'nodes' to 'components' since the learning rate factor applies to components, not nodes.

@@ -283,6 +283,9 @@ void CollapseModel(const CollapseModelConfig &config,
Note: this sets the 'underlying' learning rate, i.e. it will get
multiplied by any 'learning-rate-factor' set in the nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodes -> components

@@ -283,6 +283,9 @@ void CollapseModel(const CollapseModelConfig &config,
Note: this sets the 'underlying' learning rate, i.e. it will get
multiplied by any 'learning-rate-factor' set in the nodes.

set-learning-rate-factor [name=<name-pattern>] learning-rate-factor=<learning-rate-factor>
Sets the learning rate factor for any updatable nodes matching the name pattern.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodes -> components

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! Thanks, also probably I should change this issue for set-learning-rate. Do you agree?

@danpovey
Copy link
Contributor

danpovey commented Sep 5, 2017 via email

@danpovey
Copy link
Contributor

danpovey commented Sep 5, 2017 via email

@danpovey danpovey merged commit 82686ee into kaldi-asr:master Sep 15, 2017
kronos-cm added a commit to kronos-cm/kaldi that referenced this pull request Sep 16, 2017
* 'master' of https://github.com/kaldi-asr/kaldi: (43 commits)
  [src,scripts,egs] Transfer learning for ASR with nnet3 (kaldi-asr#1633)
  [src,scripts,egs] Attention modeling, with example scripts (kaldi-asr#1731)
  [src] Fix bug in block matrix addition (thanks: Sidhi Adkoli).
  [egs] Fix inconseqential input-checking bug in Swbd example script (kaldi-asr#1886)
  [build] dependency-check: that python2.7 and python3 exist and 2.7 is default (kaldi-asr#1876)
  [scripts] A cosmetic change to info messages in chain training (kaldi-asr#1880)
  [doc] Keep tutorial code up to date (thanks: Luwei Yang)
  [scripts] Bug-fix in long-utterance-segmentation script (thanks: Armin Oliya) (kaldi-asr#1877)
  [egs] Fixed some issues in the multilingual BABEL example scripts (kaldi-asr#1850)
  [build] Cosmetic fix in Makefile
  Remove memory leaks and unused variables (when CUDA is not enabled) (kaldi-asr#1866)
  [scripts] Fix default for egs.cmd in nnet3 training scripts (kaldi-asr#1865)
  [doc] Fix to how documentation is built (thanks: David van Leeuwen)
  [scripts] Add --decode-extra-opts in steps/decode.sh (required for speech activity detection scripts) (kaldi-asr#1859)
  [src] Adding documentation for lattice discriminative training functions (kaldi-asr#1854)
  [src] Typo fixes in documenation. (kaldi-asr#1857)
  [egs] Update to score.sh in fisher_swbd setup, allow --iter option (kaldi-asr#1853)
  [scripts] bug-fix in TFRNNLM rescoring script (no 'ark' needed for unk.probs file) (kaldi-asr#1851)
  [src] Remove repeated parameter documentation. (kaldi-asr#1849)
  [egs] Aspire example scripts: Update autoencoder example to xconfig (kaldi-asr#1847)
  ...
Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants