Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timit fbank result is ok? and how to add some features such as delta-delta? #59

Open
zhangjiulong opened this issue Jun 6, 2016 · 16 comments

Comments

@zhangjiulong
Copy link

Hi I tested timit data using eesen, but the result is not good as follows:

training process

EPOCH 11 RUNNING ... ENDS [2016-Jun-6 17:02:47]: lrate 4e-05, TRAIN ACCURACY 23.4300%, VALID ACCURACY 17.3147%
EPOCH 12 RUNNING ... ENDS [2016-Jun-6 17:07:02]: lrate 4e-05, TRAIN ACCURACY 25.2924%, VALID ACCURACY 16.1223%
EPOCH 13 RUNNING ... ENDS [2016-Jun-6 17:11:18]: lrate 4e-05, TRAIN ACCURACY 26.1150%, VALID ACCURACY 18.4033%
EPOCH 14 RUNNING ... ENDS [2016-Jun-6 17:15:33]: lrate 4e-05, TRAIN ACCURACY 26.6806%, VALID ACCURACY 19.5179%
EPOCH 15 RUNNING ... ENDS [2016-Jun-6 17:19:51]: lrate 4e-05, TRAIN ACCURACY 27.1350%, VALID ACCURACY 18.6625%
EPOCH 16 RUNNING ... ENDS [2016-Jun-6 17:24:07]: lrate 2e-05, TRAIN ACCURACY 27.4092%, VALID ACCURACY 20.1400%
EPOCH 17 RUNNING ... ENDS [2016-Jun-6 17:28:23]: lrate 1e-05, TRAIN ACCURACY 27.5363%, VALID ACCURACY 20.2177%
finished, too small rel. improvement .0777
Training succeeded. The final model exp/train_phn_l5_c320/final.nnet
Removing features tmpdir exp/train_phn_l5_c320/ptrXL @ pingan-nlp-001
cv.ark  train.ark

testing process

rjb1_sx64-0000000-0000248 out-moded 
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjb1_sx64-0000000-0000248 is 0.454562 over 246 frames.
mrjh0_sa1-0000000-0000385 she had your dark suit in greasy wash water all 
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_sa1-0000000-0000385 is 0.577131 over 383 frames.
mrjh0_sa2-0000000-0000317 how ask me to carry an oily rag like 
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_sa2-0000000-0000317 is 0.483511 over 315 frames.
mrjh0_si1145-0000000-0000487 how unauthentic 
LOG (latgen-faster:RebuildRepository():determinize-lattice-pruned.cc:294) Rebuilding repository.
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_si1145-0000000-0000487 is 0.258022 over 485 frames.
mrjh0_si1775-0000000-0000306 how unauthentic 
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_si1775-0000000-0000306 is 0.384129 over 304 frames.
mrjh0_si515-0000000-0000296 out-moded 
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_si515-0000000-0000296 is 0.429838 over 294 frames.
mrjh0_sx155-0000000-0000394 how unauthentic

I checked the ark file of timit and tedlium data and I found some difference, but I do not know how the difference come.
The tedlium's ark file like this :

AlGore_2009  [
  510340.6 586395.1 608272.1 621239.9 642546.4 653072.2 651401.9 651305.8 653922.6 659371.4 654681.1 652654.5 646230.6 645681.9 650887.6 655483.5 666377.6 671666.1 672115.6 669366.7 669373.2 681050.7 703447.4 715073.2 709013.8 702928.3 713154.4 718430.6 711170 688705.3 658752.9 641324.2 630078.5 628411.7 623944.6 627934.9 639849.6 641777.4 643522.4 627100.5 39020
  6946354 9087419 9763794 1.018412e+07 1.091917e+07 1.127568e+07 1.123372e+07 1.124698e+07 1.134869e+07 1.154412e+07 1.137156e+07 1.1279e+07 1.104712e+07 1.103819e+07 1.121266e+07 1.137482e+07 1.174376e+07 1.193318e+07 1.195034e+07 1.184173e+07 1.18378e+07 1.224044e+07 1.303755e+07 1.345864e+07 1.321482e+07 1.298168e+07 1.337751e+07 1.358918e+07 1.332344e+07 1.25133e+07 1.146994e+07 1.091216e+07 1.056395e+07 1.05361e+07 1.041939e+07 1.053006e+07 1.088328e+07 1.093435e+07 1.097803e+07 1.044419e+07 0 ]

And the timit's is like this:

fadg0_sa1  [
  3077.437 3576.837 3893.808 4497.17 4646.433 4888.595 5084.933 5245.375 5266.312 5316.513 5304.906 5279.905 5159.947 5092.513 5093.656 5096.891 5198.106 5342.096 5525.816 5622.102 5590.077 5587.714 5621.955 5658.111 5640.733 5684.978 5922.412 6028.531 5843.909 5494.285 5123.665 4873.254 4768.456 4619.075 4454.212 4446.68 4533.783 4809.863 5073.438 5097.519 372
  28369.65 38061.98 44509.9 59787.96 63547.87 70383.9 75846.95 80695.33 82071.57 83632.43 82730.72 81498.48 78174.86 76341.12 75977.55 75682.39 78059.57 82118.61 87383.28 90191.3 89340.34 89230.34 90614.35 91722.68 90768.06 91814.1 99787.2 103876.6 97762.07 85880.71 74550.43 67565.18 64682.43 60528.35 56254.4 56227.67 58352.52 65413.38 72421.16 72856.04 0 ]

but the scripts is the same as the tedliums', (I just modified exist code),the diff runs like this:

91c91
<      || exit 208;

---
>      || exit 1;
106c106
<      || exit 209;

---
>      || exit 1;

and the scripts is like this:

#!/bin/bash 

# Copyright 2012  Karel Vesely  Johns Hopkins University (Author: Daniel Povey)
# Apache 2.0
# To be run from .. (one directory up from here)
# see ../run.sh for example

# Begin configuration section.
nj=4
cmd=run.pl
fbank_config=conf/fbank.conf
compress=true
# End configuration section.

echo "$0 $@"  # Print the command line for logging

if [ -f path.sh ]; then . ./path.sh; fi
. parse_options.sh || exit 1;

if [ $# != 3 ]; then
   echo "usage: make_fbank.sh [options] <data-dir> <log-dir> <path-to-fbankdir>";
   echo "options: "
   echo "  --fbank-config <config-file>                      # config passed to compute-fbank-feats "
   echo "  --nj <nj>                                        # number of parallel jobs"
   echo "  --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs."
   exit 1;
fi

data=$1
logdir=$2
fbankdir=$3


# make $fbankdir an absolute pathname.
fbankdir=`perl -e '($dir,$pwd)= @ARGV; if($dir!~m:^/:) { $dir = "$pwd/$dir"; } print $dir; ' $fbankdir ${PWD}`

# use "name" as part of name of the archive.
name=`basename $data`

mkdir -p $fbankdir || exit 1;
mkdir -p $logdir || exit 1;

if [ -f $data/feats.scp ]; then
  mkdir -p $data/.backup
  echo "$0: moving $data/feats.scp to $data/.backup"
  mv $data/feats.scp $data/.backup
fi

scp=$data/wav.scp

required="$scp $fbank_config"

for f in $required; do
  if [ ! -f $f ]; then
    echo "make_fbank.sh: no such file $f"
    exit 1;
  fi
done

utils/validate_data_dir.sh --no-text --no-feats $data || exit 1;

if [ -f $data/spk2warp ]; then
  echo "$0 [info]: using VTLN warp factors from $data/spk2warp"
  vtln_opts="--vtln-map=ark:$data/spk2warp --utt2spk=ark:$data/utt2spk"
elif [ -f $data/utt2warp ]; then
  echo "$0 [info]: using VTLN warp factors from $data/utt2warp"
  vtln_opts="--vtln-map=ark:$data/utt2warp"
fi

for n in $(seq $nj); do
  # the next command does nothing unless $fbankdir/storage/ exists, see
  # utils/create_data_link.pl for more info.
  utils/create_data_link.pl $fbankdir/raw_fbank_$name.$n.ark  
done

if [ -f $data/segments ]; then
  echo "$0 [info]: segments file exists: using that."
  split_segments=""
  for n in $(seq $nj); do
    split_segments="$split_segments $logdir/segments.$n"
  done

  utils/split_scp.pl $data/segments $split_segments || exit 1;
  rm $logdir/.error 2>/dev/null

  $cmd JOB=1:$nj $logdir/make_fbank_${name}.JOB.log \
    extract-segments scp,p:$scp $logdir/segments.JOB ark:- \| \
    compute-fbank-feats $vtln_opts --verbose=2 --config=$fbank_config ark:- ark:- \| \
    copy-feats --compress=$compress ark:- \
     ark,scp:$fbankdir/raw_fbank_$name.JOB.ark,$fbankdir/raw_fbank_$name.JOB.scp \
     || exit 208;

else
  echo "$0: [info]: no segments file exists: assuming wav.scp indexed by utterance."
  split_scps=""
  for n in $(seq $nj); do
    split_scps="$split_scps $logdir/wav.$n.scp"
  done

  utils/split_scp.pl $scp $split_scps || exit 1;

  $cmd JOB=1:$nj $logdir/make_fbank_${name}.JOB.log \
    compute-fbank-feats $vtln_opts --verbose=2 --config=$fbank_config scp,p:$logdir/wav.JOB.scp ark:- \| \
    copy-feats --compress=$compress ark:- \
     ark,scp:$fbankdir/raw_fbank_$name.JOB.ark,$fbankdir/raw_fbank_$name.JOB.scp \
     || exit 209;

fi


if [ -f $logdir/.error.$name ]; then
  echo "Error producing fbank features for $name:"
  tail $logdir/make_fbank_${name}.1.log
  exit 1;
fi

# concatenate the .scp files together.
for n in $(seq $nj); do
  cat $fbankdir/raw_fbank_$name.$n.scp || exit 1;
done > $data/feats.scp

rm $logdir/wav.*.scp  $logdir/segments.* 2>/dev/null

nf=`cat $data/feats.scp | wc -l` 
nu=`cat $data/utt2spk | wc -l` 
if [ $nf -ne $nu ]; then
  echo "It seems not all of the feature files were successfully ($nf != $nu);"
  echo "consider using utils/fix_data_dir.sh $data"
fi

echo "Succeeded creating filterbank features for $name"

Is there some thing wrong?
and what is out-moded and journalese mean?

LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mbns0_sx340-0000000-0000242 is 0.466467 over 240 frames.
mbns0_sx430-0000000-0000343 out-moded 
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mbns0_sx430-0000000-0000343 is 0.430763 over 341 frames.
mbns0_sx70-0000000-0000119 journalese 
@fmetze
Copy link
Contributor

fmetze commented Jun 6, 2016

Hi,

my guess is that you will need to reduce the number of parameters in the model - l=5 and c=320 are good settings for Switchboard and TEDLIUM, with hundreds of hours of training data, but not for TIMIT, with just a few. The difference in the ark files shows this (somewhat). The TIMIT speakers are much shorter than the TEDLIUM speakers, and therefore the sum and sum-of-squares of the data in the speaker is much smaller (which is what I think you’re showing). Finally, during decoding, you can see that the network likes to output “outmoded” and “journalese” for some reason. Presumably you are still using the TEDLIUM language model?

Do you know someone who is familiar with the Kaldi TIMIT recipe? I think you need to adapt the Eesen recipe a bit more for it to give good results, the Kaldi TIMIT recipe would probably be a good starting point to see what is being done.

Florian

On Jun 6, 2016, at 5:36 AM, zhangjiulong notifications@github.com wrote:

Hi I tested timit data using eesen, but the result is not good as follows:

training process

EPOCH 11 RUNNING ... ENDS [2016-Jun-6 17:02:47]: lrate 4e-05, TRAIN ACCURACY 23.4300%, VALID ACCURACY 17.3147%
EPOCH 12 RUNNING ... ENDS [2016-Jun-6 17:07:02]: lrate 4e-05, TRAIN ACCURACY 25.2924%, VALID ACCURACY 16.1223%
EPOCH 13 RUNNING ... ENDS [2016-Jun-6 17:11:18]: lrate 4e-05, TRAIN ACCURACY 26.1150%, VALID ACCURACY 18.4033%
EPOCH 14 RUNNING ... ENDS [2016-Jun-6 17:15:33]: lrate 4e-05, TRAIN ACCURACY 26.6806%, VALID ACCURACY 19.5179%
EPOCH 15 RUNNING ... ENDS [2016-Jun-6 17:19:51]: lrate 4e-05, TRAIN ACCURACY 27.1350%, VALID ACCURACY 18.6625%
EPOCH 16 RUNNING ... ENDS [2016-Jun-6 17:24:07]: lrate 2e-05, TRAIN ACCURACY 27.4092%, VALID ACCURACY 20.1400%
EPOCH 17 RUNNING ... ENDS [2016-Jun-6 17:28:23]: lrate 1e-05, TRAIN ACCURACY 27.5363%, VALID ACCURACY 20.2177%
finished, too small rel. improvement .0777
Training succeeded. The final model exp/train_phn_l5_c320/final.nnet
Removing features tmpdir exp/train_phn_l5_c320/ptrXL @ pingan-nlp-001
cv.ark train.ark
testing process

rjb1_sx64-0000000-0000248 out-moded
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjb1_sx64-0000000-0000248 is 0.454562 over 246 frames.
mrjh0_sa1-0000000-0000385 she had your dark suit in greasy wash water all
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_sa1-0000000-0000385 is 0.577131 over 383 frames.
mrjh0_sa2-0000000-0000317 how ask me to carry an oily rag like
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_sa2-0000000-0000317 is 0.483511 over 315 frames.
mrjh0_si1145-0000000-0000487 how unauthentic
LOG (latgen-faster:RebuildRepository():determinize-lattice-pruned.cc:294) Rebuilding repository.
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_si1145-0000000-0000487 is 0.258022 over 485 frames.
mrjh0_si1775-0000000-0000306 how unauthentic
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_si1775-0000000-0000306 is 0.384129 over 304 frames.
mrjh0_si515-0000000-0000296 out-moded
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mrjh0_si515-0000000-0000296 is 0.429838 over 294 frames.
mrjh0_sx155-0000000-0000394 how unauthentic
I checked the ark file of timit and tedlium data and I found some difference, but I do not know how the difference come.
The tedlium's ark file like this :

AlGore_2009 [
510340.6 586395.1 608272.1 621239.9 642546.4 653072.2 651401.9 651305.8 653922.6 659371.4 654681.1 652654.5 646230.6 645681.9 650887.6 655483.5 666377.6 671666.1 672115.6 669366.7 669373.2 681050.7 703447.4 715073.2 709013.8 702928.3 713154.4 718430.6 711170 688705.3 658752.9 641324.2 630078.5 628411.7 623944.6 627934.9 639849.6 641777.4 643522.4 627100.5 39020
6946354 9087419 9763794 1.018412e+07 1.091917e+07 1.127568e+07 1.123372e+07 1.124698e+07 1.134869e+07 1.154412e+07 1.137156e+07 1.1279e+07 1.104712e+07 1.103819e+07 1.121266e+07 1.137482e+07 1.174376e+07 1.193318e+07 1.195034e+07 1.184173e+07 1.18378e+07 1.224044e+07 1.303755e+07 1.345864e+07 1.321482e+07 1.298168e+07 1.337751e+07 1.358918e+07 1.332344e+07 1.25133e+07 1.146994e+07 1.091216e+07 1.056395e+07 1.05361e+07 1.041939e+07 1.053006e+07 1.088328e+07 1.093435e+07 1.097803e+07 1.044419e+07 0 ]

And the timit's is like this:

fadg0_sa1 [
3077.437 3576.837 3893.808 4497.17 4646.433 4888.595 5084.933 5245.375 5266.312 5316.513 5304.906 5279.905 5159.947 5092.513 5093.656 5096.891 5198.106 5342.096 5525.816 5622.102 5590.077 5587.714 5621.955 5658.111 5640.733 5684.978 5922.412 6028.531 5843.909 5494.285 5123.665 4873.254 4768.456 4619.075 4454.212 4446.68 4533.783 4809.863 5073.438 5097.519 372
28369.65 38061.98 44509.9 59787.96 63547.87 70383.9 75846.95 80695.33 82071.57 83632.43 82730.72 81498.48 78174.86 76341.12 75977.55 75682.39 78059.57 82118.61 87383.28 90191.3 89340.34 89230.34 90614.35 91722.68 90768.06 91814.1 99787.2 103876.6 97762.07 85880.71 74550.43 67565.18 64682.43 60528.35 56254.4 56227.67 58352.52 65413.38 72421.16 72856.04 0 ]
but the scripts is the same as the tedliums', (I just modified exist code),the diff runs like this:

91c91

< || exit 208;

 || exit 1;

106c106

< || exit 209;

 || exit 1;

and the scripts is like this:

#!/bin/bash

Copyright 2012 Karel Vesely Johns Hopkins University (Author: Daniel Povey)

Apache 2.0

To be run from .. (one directory up from here)

see ../run.sh for example

Begin configuration section.

nj=4
cmd=run.pl
fbank_config=conf/fbank.conf
compress=true

End configuration section.

echo "$0 $@" # Print the command line for logging

if [ -f path.sh ]; then . ./path.sh; fi
. parse_options.sh || exit 1;

if [ $# != 3 ]; then
echo "usage: make_fbank.sh [options] ";
echo "options: "
echo " --fbank-config # config passed to compute-fbank-feats "
echo " --nj # number of parallel jobs"
echo " --cmd (utils/run.pl|utils/queue.pl ) # how to run jobs."
exit 1;
fi

data=$1
logdir=$2
fbankdir=$3

make $fbankdir an absolute pathname.

fbankdir=perl -e '($dir,$pwd)= @ARGV; if($dir!~m:^/:) { $dir = "$pwd/$dir"; } print $dir; ' $fbankdir ${PWD}

use "name" as part of name of the archive.

name=basename $data

mkdir -p $fbankdir || exit 1;
mkdir -p $logdir || exit 1;

if [ -f $data/feats.scp ]; then
mkdir -p $data/.backup
echo "$0: moving $data/feats.scp to $data/.backup"
mv $data/feats.scp $data/.backup
fi

scp=$data/wav.scp

required="$scp $fbank_config"

for f in $required; do
if [ ! -f $f ]; then
echo "make_fbank.sh: no such file $f"
exit 1;
fi
done

utils/validate_data_dir.sh --no-text --no-feats $data || exit 1;

if [ -f $data/spk2warp ]; then
echo "$0 [info]: using VTLN warp factors from $data/spk2warp"
vtln_opts="--vtln-map=ark:$data/spk2warp --utt2spk=ark:$data/utt2spk"
elif [ -f $data/utt2warp ]; then
echo "$0 [info]: using VTLN warp factors from $data/utt2warp"
vtln_opts="--vtln-map=ark:$data/utt2warp"
fi

for n in $(seq $nj); do

the next command does nothing unless $fbankdir/storage/ exists, see

utils/create_data_link.pl for more info.

utils/create_data_link.pl $fbankdir/raw_fbank_$name.$n.ark
done

if [ -f $data/segments ]; then
echo "$0 [info]: segments file exists: using that."
split_segments=""
for n in $(seq $nj); do
split_segments="$split_segments $logdir/segments.$n"
done

utils/split_scp.pl $data/segments $split_segments || exit 1;
rm $logdir/.error 2>/dev/null

$cmd JOB=1:$nj $logdir/make_fbank_${name}.JOB.log
extract-segments scp,p:$scp $logdir/segments.JOB ark:- |
compute-fbank-feats $vtln_opts --verbose=2 --config=$fbank_config ark:- ark:- |
copy-feats --compress=$compress ark:-
ark,scp:$fbankdir/raw_fbank_$name.JOB.ark,$fbankdir/raw_fbank_$name.JOB.scp
|| exit 208;

else
echo "$0: [info]: no segments file exists: assuming wav.scp indexed by utterance."
split_scps=""
for n in $(seq $nj); do
split_scps="$split_scps $logdir/wav.$n.scp"
done

utils/split_scp.pl $scp $split_scps || exit 1;

$cmd JOB=1:$nj $logdir/make_fbank_${name}.JOB.log
compute-fbank-feats $vtln_opts --verbose=2 --config=$fbank_config scp,p:$logdir/wav.JOB.scp ark:- |
copy-feats --compress=$compress ark:-
ark,scp:$fbankdir/raw_fbank_$name.JOB.ark,$fbankdir/raw_fbank_$name.JOB.scp
|| exit 209;

fi

if [ -f $logdir/.error.$name ]; then
echo "Error producing fbank features for $name:"
tail $logdir/make_fbank_${name}.1.log
exit 1;
fi

concatenate the .scp files together.

for n in $(seq $nj); do
cat $fbankdir/raw_fbank_$name.$n.scp || exit 1;
done > $data/feats.scp

rm $logdir/wav..scp $logdir/segments. 2>/dev/null

nf=cat $data/feats.scp | wc -l
nu=cat $data/utt2spk | wc -l
if [ $nf -ne $nu ]; then
echo "It seems not all of the feature files were successfully ($nf != $nu);"
echo "consider using utils/fix_data_dir.sh $data"
fi

echo "Succeeded creating filterbank features for $name"
Is there some thing wrong?
and what is out-moded and journalese mean?

LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mbns0_sx340-0000000-0000242 is 0.466467 over 240 frames.
mbns0_sx430-0000000-0000343 out-moded
LOG (latgen-faster:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:111) Log-like per frame for utterance mbns0_sx430-0000000-0000343 is 0.430763 over 341 frames.
mbns0_sx70-0000000-0000119 journalese

You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #59, or mute the thread https://github.com/notifications/unsubscribe/AEnA8QBSiH3oS5B64Y-BZpqPWKpzoH38ks5qI-oDgaJpZM4IuuED.

Florian Metze http://www.cs.cmu.edu/directory/florian-metze
Associate Research Professor
Carnegie Mellon University

@zhangjiulong
Copy link
Author

Hi @fmetze , thanks for your suggestion, I will try it.
But the language model I used is built from timit text. And the result is very strange

@yajiemiao
Copy link
Collaborator

Word-based language model built on TIMIT is relatively weak.
I recommend you to compose a phone language model. You plug a fake dictionary which simply contains the duplicates of phones:
A A
B B
....

@zhangjiulong
Copy link
Author

@yajiemiao do you means test the phones eesen recognized, not the word ?

@yajiemiao
Copy link
Collaborator

yep
My very first verification of EESEN was done on TIMIT. I was able to get reasonable (if not state-of-the-art) phone error rates

@zhangjiulong
Copy link
Author

@yajiemiao ok thanks very much.

@double22a
Copy link

@fmetze
Hi, fmetze
With EESEN, do you run some experiments based on Uni-LSTM? About Uni-LSTM, my results are terrible.

@fmetze
Copy link
Contributor

fmetze commented Aug 30, 2016

We have not run such experiments. I think there is some work on how to build uni-directional LSTMs that work for speech (mainly stacking future frames rather than relying on the RNN to learn them), or decompose the sentence BiLSTM into a series of shorter BiLSTMs that one can evaluate quickly,but we have not implemented any of this in Eesen. Would be a great feature, though ;-)

On Aug 29, 2016, at 10:07 PM, baylor0118 notifications@github.com wrote:

@fmetze https://github.com/fmetze
Hi, fmetze
With EESEN, do you run some experiments based on Uni-LSTM? About Uni-LSTM, my results are terrible.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #59 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AEnA8fTDVtVbNLEhzQYGU8Q5IpeY3ZDsks5qk5BKgaJpZM4IuuED.

Florian Metze http://www.cs.cmu.edu/directory/florian-metze
Associate Research Professor
Carnegie Mellon University

@yajiemiao
Copy link
Collaborator

In general, CTC highly depends on BiLSTM for reasonable performance. If you refer to http://www.cs.cmu.edu/~ymiao/pub/icassp2016_ctc.pdf, on Switchboard, Uni-directional models perform >15% worse than Bi-directional models, with the same number of model parameters.

@Aasimrafique
Copy link

@yajiemiao @zhangjiulong can you please share the example tested with TIMIT dataset?

@zhangjiulong
Copy link
Author

I just convert timit format to stm format and runns using tedlium scripts.

@razor1179
Copy link

razor1179 commented Feb 28, 2017

@Aasimrafique, were you able to convert the TIMIT format to STM format as instructed by @zhangjiulong? If so could you please share how you did it exactly.
@yajiemiao , @fmetze, it would be very helpful if you could share TIMIT dataset test.

Thanks.

@riebling
Copy link
Contributor

it would be very helpful if you could share TIMIT dataset test.

Unfortunately, as mentioned in Wikipedia:
TIMIT and NTIMIT are not freely available — either membership of the Linguistic Data Consortium, or a monetary payment, is required for access to the dataset.
We are not permitted to distribute TIMIT data.

@razor1179
Copy link

@riebling I forgot to add in scripts in the end, I do have access to the TIMIT dataset and what I meant to ask was if the TIMIT dataset test scripts could be shared.

@riebling
Copy link
Contributor

Oops, my misunderstanding. My best guess is that at least here at CMU, there is no TIMIT Eesen experiment to share. The only person that seems to have tried this (aside from Yajie, who is no longer with us) is @zhangjiulong

Florian suggests people try adapting Kaldi TIMIT experiment. This does not imply he has done so

We have not run such experiments.

or therefore has any scripts to share.

@razor1179
Copy link

razor1179 commented Mar 29, 2017

@riebling Okay, I see. But I did create a new issue here #128, describing what I've done and the issues I am facing. Could you please suggest how I could move forward?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants