Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xconfigs : extension #1197

Merged
merged 2 commits into from
Nov 21, 2016
Merged

Conversation

vijayaditya
Copy link
Contributor

This is an extension of PR #1170 . I am currently testing the full pipeline. This also includes the commits from Xconfigs.

@vijayaditya vijayaditya mentioned this pull request Nov 17, 2016
@vijayaditya
Copy link
Contributor Author

@GaofengCheng Do you have time to write an xconfig class for the HLSTM. This would help us find any modifications necessary to support architectures which access internal nodes of "layers".

@GaofengCheng
Copy link
Contributor

@vijayaditya I could do if a slight delay is tolerable

@vijayaditya
Copy link
Contributor Author

@GaofengCheng no hurry do it at your convenience.

@vijayaditya
Copy link
Contributor Author

I think this is ready for a round of review.
TODO: Make modifications to conform to the Google style doc. This mainly consists of adding doc_strings and conforming to 80 character/line limit.

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some small comments.

@@ -0,0 +1,233 @@
#!/bin/bash

# 6i is based on run_lstm_6h.sh, but changing the HMM context from triphone to left biphone.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment out of date? Perhaps give it a different letter and run the comparison script so you can update the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

train_stage=-1
get_egs_stage=-10
speed_perturb=true
dir=exp/chain/lstm_6i_xconf # Note: _sp will get added to this if $speed_perturb == true.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest 6i_xconf -> 6j or something like that.. don't want to start using _xconf in these names, we can just quietly switch over.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@@ -0,0 +1,30 @@
# This library has classes and methods to form neural network computation graphs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might want to add copyright header.



# 'ref.config' : which is a version of the config file used to generate
# a model for getting left and right context it doesn't read anything for the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it -> (it


class XconfigLayerBase(object):
""" A base-class for classes representing layers of xconfig files.
This mainly just sets self.layer_type, self.name and self.config/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove "This mainly just sets self.layer_type, self.name and self.config", as the base-class has a bunch of functions... that comment was outdated I think.

def set_default_configs(self):
raise Exception("Child classes must override set_default_configs().")

# this is expected to be called after set_configs and before check_configs()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is expected to be called -> is called in the constructor.
I think this default function definition is potentially dangerous, but I guess I could be OK with it.


def input_dim(self):
dim = 0
for input_name in self.get_input_descriptor_names():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this function should exist, at least not with this implementation-- in general we don't want to make any assumptions that if there are multiple input descriptors, they will be appended together. Having just 1 input is the norm, and auxiliary inputs might not be appended with the primary one [if you wanted that, you could just use Append() in the descriptor.]
Elsewhere the code sets
input_dim = self.descriptors['input']['dim']
which I think makes more sense [if there is an 'input' descriptor]. If you must have this function, I'd prefer it to use that expression.


def set_default_configs(self):
self.config = {'input' : '[-1]',
'cell-dim' : -1, # this is a compulsary argument
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compulsory

configs.append("component name={0}.c1 type=ElementwiseProductComponent input-dim={1} output-dim={2}".format(name, 2 * cell_dim, cell_dim))
configs.append("component name={0}.c2 type=ElementwiseProductComponent input-dim={1} output-dim={2}".format(name, 2 * cell_dim, cell_dim))
configs.append("component name={0}.m type=ElementwiseProductComponent input-dim={1} output-dim={2}".format(name, 2 * cell_dim, cell_dim))
configs.append("component name={0}.c type=ClipGradientComponent dim={1} {2}".format(name, cell_dim, clipgrad_str))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vijay, we have checked in a change to how the gradients are clipped, and that change should be applied to this PR. The component name is different and the list of options is different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not included this for now as I am running comparison experiments with the 6i setup (to see if I forgot some {param,bias}-stddev initialization .

I will make the change when I am done with the experiments.

# Apache 2.0.

""" This module contains the top level xconfig parsing functions.
It has been separated from the utils module to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't have to explain changes-- no-one saw the original.

@vijayaditya
Copy link
Contributor Author

@freewym Could you please help check the blstm model. It is available here
/export/b01/vpeddinti/xconfig/egs/swbd/s5c/exp/chain/blstm_6j_sp/configs/ref.raw

@freewym
Copy link
Contributor

freewym commented Nov 19, 2016

ok will do

On Fri, Nov 18, 2016 at 7:05 PM Vijayaditya Peddinti <
notifications@github.com> wrote:

@freewym https://github.com/freewym Could you please help check the
blstm model. It is available here

/export/b01/vpeddinti/xconfig/egs/swbd/s5c/exp/chain/blstm_6j_sp/configs/ref.raw


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1197 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADWAkmBP-QNc4IfKRXhUpO1tKwQuoOZNks5q_j1YgaJpZM4K1qh5
.

Sent from my iPhone

@vijayaditya
Copy link
Contributor Author

vijayaditya commented Nov 19, 2016

Removing pretraining seems to help a lot.

# System                  7f     7g
# WER on train_dev(tg)    14.46  13.85
# WER on train_dev(fg)    13.23  12.67
# WER on eval2000(tg)     17.0   16.5
# WER on eval2000(fg)     15.4   14.8
# Final train prob     -0.0882071 -0.0885075
# Final valid prob     -0.107545  -0.113462
# Final train prob (xent) -1.26246 -1.25788
# Final valid prob (xent) -1.35525 -1.37058

@danpovey
Copy link
Contributor

Great! What is the improvement from, do you think?

On Sat, Nov 19, 2016 at 5:16 PM, Vijayaditya Peddinti <
notifications@github.com> wrote:

System 7f 7g# WER on train_dev(tg) 14.46 13.85# WER on train_dev(fg) 13.23 12.67# WER on eval2000(tg) 17.0 16.5# WER on eval2000(fg) 15.4 14.8# Final train prob -0.0882071 -0.0885075# Final valid prob -0.107545 -0.113462# Final train prob (xent) -1.26246 -1.25788# Final valid prob (xent) -1.35525 -1.37058


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1197 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu-q4hWVdeEdXJq-x2j83AEXWix1Eks5q_3UpgaJpZM4K1qh5
.

@vijayaditya
Copy link
Contributor Author

The only thing that changed was the removal of layer-wise pretraining.

(I updated the previous message when I realized that I had forgotten to write the details of the two experiments, but github doesn't seem to send email alerts when an existing message is updated.)

@freewym
Copy link
Contributor

freewym commented Nov 19, 2016

@vijayaditya It seems that the max-change option is missing for all diagonal matrices in BLSTM, according to /export/b01/vpeddinti/xconfig/egs/swbd/s5c/exp/chain/blstm_6j_sp/configs/final.config

@vijayaditya
Copy link
Contributor Author

OK, thanks I will modify that.

--Vijay

On Sat, Nov 19, 2016 at 5:35 PM, Yiming Wang notifications@github.com
wrote:

@vijayaditya https://github.com/vijayaditya It seems that the
max-change option is missing for all diagonal matrices in BLSTM, according
to /export/b01/vpeddinti/xconfig/egs/swbd/s5c/exp/chain/blstm_6j_sp/configs/final.config


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1197 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADtwoL1aSyWasl8Uqe7s1DixlLS9NI6rks5q_3mUgaJpZM4K1qh5
.

@danpovey
Copy link
Contributor

I'd like to get this xconfig stuff checked in ASAP, assuming it won't break
the existing setups-- that way we can accelerate the process of converting
recipes to the new style by getting others involved. Unless you think it's
better to wait a bit..

On Sat, Nov 19, 2016 at 5:36 PM, Vijayaditya Peddinti <
notifications@github.com> wrote:

OK, thanks I will modify that.

--Vijay

On Sat, Nov 19, 2016 at 5:35 PM, Yiming Wang notifications@github.com
wrote:

@vijayaditya https://github.com/vijayaditya It seems that the
max-change option is missing for all diagonal matrices in BLSTM,
according
to /export/b01/vpeddinti/xconfig/egs/swbd/s5c/exp/chain/blstm_
6j_sp/configs/final.config


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1197 (comment),
or mute
the thread
<https://github.com/notifications/unsubscribe-auth/
ADtwoL1aSyWasl8Uqe7s1DixlLS9NI6rks5q_3mUgaJpZM4K1qh5>
.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1197 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu8VD6uR_GyRTG4gsZ-IpGp1U6Lv6ks5q_3nxgaJpZM4K1qh5
.

@vijayaditya
Copy link
Contributor Author

The lstm training will be done in few hours. I would like to check it in
after that.

There are still some python style changes to be made, but I can do this
after the code is checked in.

--Vijay

On Sat, Nov 19, 2016 at 5:44 PM, Daniel Povey notifications@github.com
wrote:

I'd like to get this xconfig stuff checked in ASAP, assuming it won't break
the existing setups-- that way we can accelerate the process of converting
recipes to the new style by getting others involved. Unless you think it's
better to wait a bit..

On Sat, Nov 19, 2016 at 5:36 PM, Vijayaditya Peddinti <
notifications@github.com> wrote:

OK, thanks I will modify that.

--Vijay

On Sat, Nov 19, 2016 at 5:35 PM, Yiming Wang notifications@github.com
wrote:

@vijayaditya https://github.com/vijayaditya It seems that the
max-change option is missing for all diagonal matrices in BLSTM,
according
to /export/b01/vpeddinti/xconfig/egs/swbd/s5c/exp/chain/blstm_
6j_sp/configs/final.config


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1197 (comment),
or mute
the thread
<https://github.com/notifications/unsubscribe-auth/
ADtwoL1aSyWasl8Uqe7s1DixlLS9NI6rks5q_3mUgaJpZM4K1qh5>
.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1197 (comment),
or mute
the thread
<https://github.com/notifications/unsubscribe-
auth/ADJVu8VD6uR_GyRTG4gsZ-IpGp1U6Lv6ks5q_3nxgaJpZM4K1qh5>
.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1197 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADtwoNXC3N8Vyjaz7F9i3v2OvwPEqWnUks5q_3uzgaJpZM4K1qh5
.

@vijayaditya vijayaditya force-pushed the xconfig_vijay branch 2 times, most recently from 72e3261 to 4611e41 Compare November 21, 2016 02:04
@vijayaditya
Copy link
Contributor Author

I think this is ready for merge. I will complete any style changes necessary over the next few days.
TODO: Test the architectures which use auxiliary outputs ( @GaofengCheng will test HLSTMs).

Note : I will cleanup the local/chain/run_{tdnn,lstm,blstm}.sh scripts before merge.

@danpovey
Copy link
Contributor

It's OK with me-- merge it yourself when you think it's ready. Thanks!!

@GaofengCheng
Copy link
Contributor

@vijayaditya I will help test after you have merged.

@danpovey
Copy link
Contributor

trying to fix it by code change

danpovey added a commit that referenced this pull request Nov 21, 2016
…ontext. This prevents a crash that has been happening since pull request #1197.
          2. Added ability to provide strings with '=' as values in
xconfig.
          3. Added swbd recipes for TDNN, LSTM and BLSTM using xconfig.
@vijayaditya vijayaditya merged commit 07a5d51 into kaldi-asr:master Nov 21, 2016
@vijayaditya vijayaditya mentioned this pull request Nov 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants