Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recipe for African Accented French #2813

Open
wants to merge 36 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
be8c443
Initial commit.
johnjosephmorgan Nov 1, 2018
a4204c9
Moved results to RESULTS file.
johnjosephmorgan Nov 2, 2018
845bd47
Corrected name of transcript file for yaounde answers training data.
johnjosephmorgan Nov 2, 2018
078889d
I had made some changes to the corpus transcripts that are not in the…
johnjosephmorgan Nov 2, 2018
413f0bc
The transcript file for the yaounde answers that is currently on open…
johnjosephmorgan Nov 2, 2018
055ca6e
Add decoding with large lm rescoring for monophones?
johnjosephmorgan Nov 20, 2018
dd49f32
Fixed bug. eq instead of == in string comparision.
johnjosephmorgan Nov 20, 2018
8f23a41
Changed comments to echoes.
johnjosephmorgan Nov 20, 2018
439dfbd
Do not exit after phonetisaurus align ? output.
johnjosephmorgan Nov 20, 2018
7fe5495
Experiment with lower bottleneck dimension. Lowered from 128 to 96.
johnjosephmorgan Nov 21, 2018
071d7f4
Merge.
johnjosephmorgan Nov 21, 2018
75e3846
Lower l2 regularization from 0.03 to 0.02.
johnjosephmorgan Nov 21, 2018
688f768
Added clauses to stage conditionals to allow small lm decoding exclus…
johnjosephmorgan Nov 26, 2018
588d5b7
Added clauses to stage conditionals to allow small lm decoding exclus…
johnjosephmorgan Nov 26, 2018
3efc399
More fixing stage conditionals for small lm decoding only.
johnjosephmorgan Nov 27, 2018
35f23b5
Inserted a wait before decoding tri3b models with larger lms. The dec…
johnjosephmorgan Nov 28, 2018
8230717
Inserted a wait before decoding with larger LMs and enhanced lexicon.
johnjosephmorgan Nov 28, 2018
1ee6068
added another tuning script for trying to improve the chain model WER…
johnjosephmorgan Nov 28, 2018
77436d8
Updated experiment script.
johnjosephmorgan Nov 28, 2018
76e9ed6
Updated experiment script.
johnjosephmorgan Nov 28, 2018
b45cf38
Updated experiment script.
johnjosephmorgan Nov 28, 2018
0786dfc
Added ascript to experiment with number of epochs.
johnjosephmorgan Nov 29, 2018
38115e6
UUpdate experiment script.
johnjosephmorgan Nov 29, 2018
c615a70
Corrected comparison. I ran it too early. Fewer epochs gave better re…
johnjosephmorgan Nov 29, 2018
a485acf
Added current results. They look bad. Why are they so bad?
johnjosephmorgan Nov 29, 2018
569eda1
Minor cleaning.
johnjosephmorgan Nov 29, 2018
7d5b1e3
Put tri2b models back.
johnjosephmorgan Nov 30, 2018
682f49c
Adding Yenda's scripts for kws.
johnjosephmorgan Dec 1, 2018
e1fe905
Adding a script to do l2 regularization tuning experiments.
johnjosephmorgan Dec 1, 2018
f49a67b
Adjusted number of leaves and gaussians after experimenting.
johnjosephmorgan Dec 3, 2018
652f04b
Add pruned lm modeling.
johnjosephmorgan Dec 3, 2018
0f1824c
I am simplifying the directory and file names in this recipe.
johnjosephmorgan Jan 7, 2019
c2946e5
Simplifying.
johnjosephmorgan Mar 6, 2019
1081fbd
Update.
johnjosephmorgan Mar 6, 2019
ee5726a
Update.
johnjosephmorgan Mar 6, 2019
334ab41
No small medium or large LMs.
johnjosephmorgan Mar 7, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
15 changes: 15 additions & 0 deletions egs/yaounde_fr/s5/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Recipe for the African Accented Speech Corpus

This recipe follows the pattern of the mini_librispeech recipe.
It is built using the African Accented French Corpus available from the Open Speech and Language Resources repository.

Information about the corpus is at:

http://www.openslr.org/57

This recipe uses about 11 hours of speech from the corpus for training, about 1.5 hours for development and about 20 minutes for testing.
Most of the speakers are from Cameroon.
However, there are recordings from speakers from Chad, Congo, and Gabon.

All of the data resources required to run this recipe are freely available on the web.
In addition to the speech data, the cmusphinx French lexicon is used for a pronouncing dictionary and the French open subtitles text corpus is used to build language models.
15 changes: 15 additions & 0 deletions egs/yaounde_fr/s5/cmd.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# you can change cmd.sh depending on what type of queue you are using.
# If you have no queueing system and want to run on a local machine, you
# can change all instances 'queue.pl' to run.pl (but be careful and run
# commands one by one: most recipes will exhaust the memory on your
# machine). queue.pl works with GridEngine (qsub). slurm.pl works
# with slurm. Different queues are configured differently, with different
# queue names and different ways of specifying things like memory;
# to account for these differences you can create and edit the file
# conf/queue.conf to match your queue's configuration. Search for
# conf/queue.conf in http://kaldi-asr.org/doc/queue.html for more information,
# or search for the string 'default_config' in utils/queue.pl or utils/slurm.pl.

export train_cmd="queue.pl --mem 2G"
export decode_cmd="queue.pl --mem 4G"
export mkgraph_cmd="queue.pl --mem 8G"
1 change: 1 addition & 0 deletions egs/yaounde_fr/s5/conf/decode.config
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# empty config, just use the defaults.
1 change: 1 addition & 0 deletions egs/yaounde_fr/s5/conf/mfcc.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
--use-energy=false # only non-default option.
10 changes: 10 additions & 0 deletions egs/yaounde_fr/s5/conf/mfcc_hires.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# config for high-resolution MFCC features, intended for neural network training
# Note: we keep all cepstra, so it has the same info as filterbank features,
# but MFCC is more easily compressible (because less correlated) which is why
# we prefer this method.
--use-energy=false # use average of log energy, not energy.
--num-mel-bins=40 # similar to Google's setup.
--num-ceps=40 # there is no dimensionality reduction.
--low-freq=20 # low cutoff frequency for mel bins... this is high-bandwidth data, so
# there might be some information at the low end.
--high-freq=-400 # high cutoff frequently, relative to Nyquist of 8000 (=7600)
1 change: 1 addition & 0 deletions egs/yaounde_fr/s5/conf/online_cmvn.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# configuration file for apply-cmvn-online, used in the script ../local/run_online_decoding.sh
26 changes: 26 additions & 0 deletions egs/yaounde_fr/s5/local/aafr_download.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

# Copyright 2018 John Morgan
# Apache 2.0.

speech=$1

# where to put the downloaded speech corpus
download_dir=$(pwd)
data_dir=$download_dir/African_Accented_French

# download the corpus from openslr
if [ ! -f $download_dir/aafr.tar.gz ]; then
wget -O $download_dir/aafr.tar.gz $speech
else
echo "$0: The corpus $speech was already downloaded."
fi

if [ ! -d $download_dir/African_Accented_French ]; then
(
cd $download_dir
tar -xzf aafr.tar.gz
)
else
echo "$0: The corpus was already unzipped."
fi
87 changes: 87 additions & 0 deletions egs/yaounde_fr/s5/local/ca16_conv/make_lists.pl
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
#!/usr/bin/env perl

# Copyright 2018 John Morgan
# Apache 2.0.

# make_lists.pl - write lists for acoustic model training
# writes files under data/local/tmp/ca16conv/lists
# This script associates a .wav file with a transcript.

use strict;
use warnings;
use Carp;

BEGIN {
@ARGV == 1 or croak "USAGE: $0 <DATA_SRC_DIR>
Example:
$0 African_Accented_French
";
}

use File::Spec;
use File::Copy;
use File::Basename;

my ($d) = @ARGV;

# initialize variables
my $tmpdir = "data/local/tmp/ca16conv_train";
my $transcripts = "$d/transcripts/train/ca16_conv/transcripts.txt";
# input wav file list
my $w = "$tmpdir/wav_list.txt";
# output temporary wav.scp file
my $o = "$tmpdir/lists/wav.scp";
# output temporary utt2spk file
my $u = "$tmpdir/lists/utt2spk";
# output temporary text files
my $t = "$tmpdir/lists/text";
# initialize hash for transcripts
my %transcript = ();
# done setting variables

system "mkdir -p $tmpdir/lists";
open my $TRANS, '<', $transcripts or croak "problem with $transcripts $!";
# store prompts in hash
LINEA: while ( my $line = <$TRANS> ) {
chomp $line;
my ($j,$sent) = split /\s/, $line, 2;
my ($volume,$directories,$file) = File::Spec->splitpath( $j );
my @dirs = split /\//, $directories;
my $b = basename $file, '.tdf';
my ($x,$d,$s,$y,$i) = split /\_/, $b, 5;
my $bn = 'gabonconv_' . $s . '_' . $i;
# dashes?
$sent =~ s/(\w)(\p{dash_punctuation}+?)/$1 $2/g;
$transcript{$bn} = $sent;
}
close $TRANS;

open my $W, '<', $w or croak "problem with $w $!";
open my $O, '+>', $o or croak "problem with $o $!";
open my $U, '+>', $u or croak "problem with $u $!";
open my $T, '+>', $t or croak "problem with $t $!";

LINE: while ( my $line = <$W> ) {
chomp $line;
my ($volume,$directories,$file) = File::Spec->splitpath( $line );
my @dirs = split /\//, $directories;
my $r = basename $line, ".wav";
my ($x,$d,$s,$y,$i) = split /\_/, $r, 5;
my $speaker = $dirs[-1];

my $bn = 'gabonconv_' . $s . '_' . $i;

# only work with utterances in transcript file
if ( exists $transcript{$bn} ) {
my $fn = $bn . ".wav";
print $T "$bn $transcript{$bn}\n";
print $O "$bn sox $line -t .wav - |\n";
print $U "$bn gabonconv_${s}\n";
} else {
# warn "no transcript for $line";
}
}
close $T;
close $O;
close $U;
close $W;
16 changes: 16 additions & 0 deletions egs/yaounde_fr/s5/local/ca16_conv/prepare_data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/bin/bash

# Copyright 2018 John Morgan
# Apache 2.0.

# set variables
datadir=$1
speech_datadir=$datadir/speech/train/ca16
tmpdir=data/local/tmp/ca16conv_train
# end setting variables

mkdir -p $tmpdir
find $speech_datadir -type f -name "*.wav" | grep conv > $tmpdir/wav_list.txt
local/ca16_conv/make_lists.pl $datadir
utils/utt2spk_to_spk2utt.pl $tmpdir/lists/
utils/fix_data_dir.sh $tmpdir/lists
80 changes: 80 additions & 0 deletions egs/yaounde_fr/s5/local/ca16_read_devtest/make_lists.pl
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
#!/usr/bin/env perl

# Copyright 2018 John Morgan
# Apache 2.0.

# make_lists.pl - write lists for acoustic model training
# writes files under data/local/tmp/ca16read_devtest/lists

use strict;
use warnings;
use Carp;

BEGIN {
@ARGV == 1 or croak "USAGE: $0 <DATA_DIR>
Example:
$0 African_Accented_French";
}

use File::Spec;
use File::Copy;
use File::Basename;

my ($d) = @ARGV;

# initialize variables
my $tmpdir = "data/local/tmp/ca16read_devtest";
my $p = "$d/transcripts/devtest/ca16_read/conditioned.txt";
# input wav file list
my $wav_list = "$tmpdir/wav_list.txt";
# output temporary wav.scp files
my $wav_scp = "$tmpdir/lists/wav.scp";
# output temporary utt2spk files
my $u = "$tmpdir/lists/utt2spk";
# output temporary text files
my $t = "$tmpdir/lists/text";
# initialize hash for prompts
my %p = ();
# done setting variables

system "mkdir -p $tmpdir/lists";
open my $P, '<', $p or croak "problem with $p $!";
# store prompts in hash
LINEA: while ( my $line = <$P> ) {
chomp $line;
my ($j,$sent) = split /\s/, $line, 2;
my ($x,$d,$s,$y,$i) = split /\_/, $j, 5;
my $bn = 'gabonread_' . $s . '_' . $i;
# dashes?
$sent =~ s/(\w)(\p{dash_punctuation}+?)/$1 $2/g;
$p{$bn} = $sent;
}
close $P;

open my $WAVLIST, '<', $wav_list or croak "problem with $wav_list $!";
open my $WAVSCP, '+>', $wav_scp or croak "problem with $wav_scp $!";
open my $U, '+>', $u or croak "problem with $u $!";
open my $T, '+>', $t or croak "problem with $t $!";

LINE: while ( my $line = <$WAVLIST> ) {
chomp $line;
my ($volume,$directories,$file) = File::Spec->splitpath( $line );
my @dirs = split /\//, $directories;
my $r = basename $line, ".wav";
my ($x,$d,$s,$y,$i) = split /\_/, $r, 5;
my $speaker = $dirs[-1];
my $bn = 'gabonread_' . $s . '_' . $i;
# only work with utterances in transcript file
if ( exists $p{$bn} ) {
my $fn = $bn . ".wav";
print $T "$bn $p{$bn}\n";
print $WAVSCP "$bn sox $line -t .wav - |\n";
print $U "$bn gabonread_${s}\n";
} else {
# warn "no transcript for $line";
}
}
close $T;
close $WAVSCP;
close $U;
close $WAVLIST;
30 changes: 30 additions & 0 deletions egs/yaounde_fr/s5/local/ca16_read_devtest/prepare_data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/bin/bash

# Copyright 2018 John Morgan
# Apache 2.0.

# ca16 read devtest prep

if [ $# != 1 ]; then
echo "usage: $0 <CORPUS_DIRECTORY>
example:
$0 African_Accented_French";
exit 1
fi

# set variables
datadir=$1
speech_datadir=$datadir/speech/devtest/ca16
tmpdir=data/local/tmp/ca16read_devtest
# done setting variables

mkdir -p $tmpdir
#get a list of the ca16 read devtest .wav files
find $speech_datadir -type f -name "*.wav" | grep read > $tmpdir/wav_list.txt
# make ca16 read devtest lists
local/ca16_read_devtest/make_lists.pl $datadir
utils/fix_data_dir.sh $tmpdir/lists
mkdir -p data/devtest
for x in spk2utt text utt2spk wav.scp; do
cp $tmpdir/lists/$x data/devtest/
done
85 changes: 85 additions & 0 deletions egs/yaounde_fr/s5/local/ca16_read_train/make_lists.pl
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
#!/usr/bin/env perl

# Copyright 2018 John Morgan
# Apache 2.0.

# make_lists.pl - write lists for acoustic model training
# writes files under data/local/tmp/ca16read_train/lists

use strict;
use warnings;
use Carp;

BEGIN {
@ARGV == 1 or croak "USAGE: $0 <DATA_DIR>
Example:
$0 African_Accented_French
";
}

use File::Spec;
use File::Copy;
use File::Basename;

my ($d) = @ARGV;
# Initialize variables
my $tmpdir = "data/local/tmp/ca16read_train";
my $p = "$d/transcripts/train/ca16_read/conditioned.txt";
# input wav file list
my $wav_list = "$tmpdir/wav_list.txt";

# output temporary wav.scp files
my $wav_scp = "$tmpdir/lists/wav.scp";

# output temporary utt2spk files
my $u = "$tmpdir/lists/utt2spk";

# output temporary text files
my $t = "$tmpdir/lists/text";

# initialize hash for prompts
my %p = ();
# done setting variables

system "mkdir -p $tmpdir/lists";
open my $P, '<', $p or croak "problem with $p $!";
# store prompts in hash
LINEA: while ( my $line = <$P> ) {
chomp $line;
my ($j,$sent) = split /\s/, $line, 2;
my ($x,$d,$s,$y,$i) = split /\_/, $j, 5;
my $bn = 'gabonread_' . $s . '_' . $i;
# dashes?
$sent =~ s/(\w)(\p{dash_punctuation}+?)/$1 $2/g;
$p{$bn} = $sent;
}
close $P;

open my $WAVLIST, '<', $wav_list or croak "problem with $wav_list $!";
open my $WAVSCP, '+>', $wav_scp or croak "problem with $wav_scp $!";
open my $U, '+>', $u or croak "problem with $u $!";
open my $T, '+>', $t or croak "problem with $t $!";

LINE: while ( my $line = <$WAVLIST> ) {
chomp $line;
my ($volume,$directories,$file) = File::Spec->splitpath( $line );
my @dirs = split /\//, $directories;
my $r = basename $line, ".wav";
my ($x,$d,$s,$y,$i) = split /\_/, $r, 5;
my $speaker = $dirs[-1];
my $bn = 'gabonread_' . $s . '_' . $i;

# only work with utterances in transcript file
if ( exists $p{$bn} ) {
my $fn = $bn . ".wav";
print $T "$bn $p{$bn}\n";
print $WAVSCP "$bn sox $line -t .wav - |\n";
print $U "$bn gabonread_${s}\n";
} else {
# warn "no transcript for $line";
}
}
close $T;
close $WAVSCP;
close $U;
close $WAVLIST;