New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cosmetic and other minor fixes for segmentation #1784
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're making progress. I have a lot of comments but they are all very minor.
@@ -0,0 +1,322 @@ | |||
#! /usr/bin/env python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the kind of script that should go in utils/, probably in utils/ctm/. Please create a directory utils/ctm/, put this there, put fix_ctm.sh and convert_ctm.pl there, and make soft links from their old locations.
head -n 10 $dir/utt2dur | paste - $temp | \ | ||
awk '{ dur += $2; frames += $4; } END { shift = dur / frames; if (shift > 0.0098 && shift < 0.0102) shift = 0.01; print shift; }' || exit 1; | ||
frame_shift=$(cat $dir/frame_shift) | ||
if [ -z "$frame_shift" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failing silently here is a problem because not all invocations of this script check the return status: see steps/nnet3/chain/get_egs.sh. Probably best to fix wsj/s5/steps/nnet3/chain/get_egs.sh to check the status, but also add an error message here.
# Apache 2.0 | ||
|
||
if [ $# -ne 1 ]; then | ||
echo "This script outputs a mapping from recording to a list of utterances " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script doesn't have a very good, or well-described, interface. It seems that in addition to doing what it says it does, it also creates a segments file if one did not already exist. I don't see why that behavior would be desired; and I don't even see why it's necessary, because you could easily do it in a pipe.
I don't recommend to have it create the segments file, because for a script whose primary purpose is to output to stdout, to have a side effect, is very confusing.
a new wav.scp file. | ||
If --reco2vol is provided, then for each recording, the volume factor | ||
specified in that file is applied. | ||
Otherwise, a volume factor is chosen randomly between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
randomly -> from a uniform distribution
if [ $# -ne 2 ]; then | ||
echo "This script adds a sox line in wav.scp to resample the audio at a " | ||
echo "different sampling-rate" | ||
echo "Usage: $0 <frequency> <data-dir>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need example.
@@ -36,7 +36,8 @@ class NnetComputerFromEg { | |||
|
|||
// Compute the output (which will have the same number of rows as the number | |||
// of Indexes in the output of the eg), and put it in "output". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation is wrong now that you have changed the code.
"in compressed format (recommended). Update: this is now " | ||
"only relevant if the features being read are un-compressed; " | ||
"if already compressed, we keep we same compressed format when " | ||
"dumping-egs."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove '-'
po.Register("compress", &compress, "If true, write egs in " | ||
"compressed format."); | ||
po.Register("compress", &compress, "If true, write egs with input features " | ||
"in compressed format (recommended). Update: this is now " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace 'Update: this is now only' with 'This is only'. We don't document the changes to options, except if likely to break things, because the average user won't be aware of, and won't care about, the history.
@@ -197,6 +197,11 @@ int main(int argc, char *argv[]) { | |||
"--online-ivectors option"); | |||
po.Register("length-tolerance", &length_tolerance, "Tolerance for " | |||
"difference in num-frames between feat and ivector matrices"); | |||
po.Register("targets-length-tolerance", &targets_length_tolerance, | |||
"Tolerance for " | |||
"difference in num-frames after subsampling between " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parenthesize (after subsampling)
po.Register("targets-length-tolerance", &targets_length_tolerance, | ||
"Tolerance for " | ||
"difference in num-frames after subsampling between " | ||
"feat and target matrices"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feat -> feature
9b655fe
to
4abdf3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some very small comments.
awk '{ dur += $2; frames += $4; } END { shift = dur / frames; if (shift > 0.0098 && shift < 0.0102) shift = 0.01; print shift; }' || exit 1; | ||
frame_shift=$(cat $dir/frame_shift) | ||
if [ -z "$frame_shift" ]; then | ||
echo "$0: Could not read get frame shift from directory $dir" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to be redirecting this to stderr.
@@ -7,6 +7,17 @@ | |||
# the wav.scp to perturb the volume (typically useful for training data when | |||
# using systems that don't have cepstral mean normalization). | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how much testing have you done with this? Can you please test it with directories that have pipes and those that have raw wav files, and make sure it doesn't break in a very obvious way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested for various cases. Also addressed all other comments.
src/nnet3/nnet-example.h
Outdated
|
||
/// This constructor creates NnetIo with name "name", indexes with n=0, x=0, | ||
/// and t values ranging from t_begin to t_begin + feats.NumRows() - 1, and | ||
/// and t values ranging from t_begin to t_begin + t_stride * feats.NumRows() - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need parenthesis in excpression (also below), and watch line length.
// of Indexes in the output of the eg), and put it in "output". | ||
void Compute(const NnetExample &eg, Matrix<BaseFloat> *output) { | ||
// of Indexes in the output with the name 'output_name' of the eg), | ||
// and put it in "output". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a pre-existing problem, but please change "output" to '*output'.
|
||
ParseOptions po(usage); | ||
po.Register("binary", &binary_write, "Write output in binary mode"); | ||
po.Register("apply-exp", &apply_exp, "If true, apply exp function to " | ||
"output"); | ||
po.Register("output-name", &output_name, "Do computation for " | ||
"specified output"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output -> output-node.
src/nnet3bin/nnet3-get-egs.cc
Outdated
@@ -179,10 +177,10 @@ int main(int argc, char *argv[]) { | |||
ParseOptions po(usage); | |||
|
|||
po.Register("compress", &compress, "If true, write egs with input features " | |||
"in compressed format (recommended). Update: this is now " | |||
"in compressed format (recommended). Update: this is " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove 'Update: '
If there are no more changes, I can rebase and create a cleaner PR. |
I forgot about this. It's OK, I'm going to squash and merge. |
* 'master' of https://github.com/kaldi-asr/kaldi: [egs] Update tedlium and aspire recipes to use xconfigs (avoid crash) (kaldi-asr#1790) [egs] fix bugs in Multi-database English LVCSR recipe (kaldi-asr#1785) [src,egs,scripts] Cosmetic and other minor fixes, some required for segmentation PR (kaldi-asr#1784) [egs] update multi_condition script in swbd (kaldi-asr#1788) [src] Circumvent Visual Studio 2017 bug regarding name resolution (kaldi-asr#1783) [tools] adding phonetisaurus install scripts (PR#1734) [scripts] update nnet3 scripts to fix bug where rejecting 'bad' models was not happening (kaldi-asr#1777) [src] make dithering in feature processing more efficient by using random state, thanks: liximin244@gmail.com
No description provided.