Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add recipe for OCR task on IAM handwriting dataset #4707

Merged
merged 13 commits into from
Nov 6, 2022
1 change: 1 addition & 0 deletions egs2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ See: https://espnet.github.io/espnet/espnet2_tutorial.html#recipes-using-espnet2
| how2_2000h | How2_2000h fbank features | ASR/SUM | ENG->POR | https://arxiv.org/pdf/2110.06263.pdf | |
| hub4_spanish | 1997 Spanish Broadcase News Speech | ASR | SPA | https://catalog.ldc.upenn.edu/LDC98S74 | |
| hui_acg | HUI-audio-corpus-german | TTS | DEU | https://opendata.iisys.de/datasets.html#hui-audio-corpus-german | |
| iam | IAM Handwriting Database 3.0 | OCR | ENG | https://fki.tic.heia-fr.ch/databases/iam-handwriting-database | |
| iemocap | IEMOCAP database: The Interactive Emotional Dyadic Motion Capture database | SLU | ENG | https://sail.usc.edu/iemocap/ | |
| indic_speech | IndicSpeech: Text-to-Speech Corpus for Indian Languages | TTS | 3 indic languages | http://cvit.iiit.ac.in/research/projects/cvit-projects/text-to-speech-dataset-for-indian-languages | |
| iwslt14 | IWSLT14 MT shared task | MT | DEU->ENG | http://dl.fbaipublicfiles.com/fairseq/data/iwslt14/de-en.tgz | |
Expand Down
2 changes: 1 addition & 1 deletion egs2/TEMPLATE/asr1/asr.sh
Original file line number Diff line number Diff line change
Expand Up @@ -552,7 +552,7 @@ if ! "${skip_data_prep}"; then
_suf=""
fi
# Generate dummy wav.scp to avoid error by copy_data_dir.sh
<data/"${dset}"/cmvn.scp awk ' { print($1,"<DUMMY>") }' > data/"${dset}"/wav.scp
<data/"${dset}"/feats.scp awk ' { print($1,"<DUMMY>") }' > data/"${dset}"/wav.scp
utils/copy_data_dir.sh --validate_opts --non-print data/"${dset}" "${data_feats}${_suf}/${dset}"

# Derive the the frame length and feature dimension
Expand Down
1 change: 1 addition & 0 deletions egs2/TEMPLATE/asr1/db.sh
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@ HARPERVALLEY=downloads
TALROMUR=downloads
DCASE=
TEDX_SPANISH=downloads
IAM=downloads
OFUTON=
OPENCPOP=
M_AILABS=downloads
Expand Down
36 changes: 36 additions & 0 deletions egs2/iam/ocr1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
This is a recipe for the IAM handwriting recognition dataset, and is an experiment with using end-to-end
ASR models to solve an OCR task.

To run, first make an account on https://fki.tic.heia-fr.ch/databases/iam-handwriting-database and fill
in the username and password in `local/data.sh`. Then, run `./run.sh`


<!-- Generated by scripts/utils/show_asr_result.sh -->
# RESULTS
## Environments
- date: `Fri Oct 7 05:52:11 EDT 2022`
- python version: `3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0]`
- espnet version: `espnet 202207`
- pytorch version: `pytorch 1.10.0`
- Git hash: `5a6319300231b8193f1b6e8465d572be63150119`
- Commit date: `Sat Sep 24 12:14:08 2022 -0400`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a pre-trained model?

## asr_conformer_full_vocab
### WER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|inference_asr_model_valid.acc.ave/test|2915|25932|80.3|17.4|2.3|0.9|20.6|73.4|

### CER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|inference_asr_model_valid.acc.ave/test|2915|125616|93.9|4.4|1.8|0.7|6.8|73.4|

### TER

|dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
|---|---|---|---|---|---|---|---|---|
|inference_asr_model_valid.acc.ave/test|2915|128531|94.0|4.3|1.7|0.7|6.7|73.4|

1 change: 1 addition & 0 deletions egs2/iam/ocr1/asr.sh
110 changes: 110 additions & 0 deletions egs2/iam/ocr1/cmd.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# ====== About run.pl, queue.pl, slurm.pl, and ssh.pl ======
# Usage: <cmd>.pl [options] JOB=1:<nj> <log> <command...>
# e.g.
# run.pl --mem 4G JOB=1:10 echo.JOB.log echo JOB
#
# Options:
# --time <time>: Limit the maximum time to execute.
# --mem <mem>: Limit the maximum memory usage.
# -–max-jobs-run <njob>: Limit the number parallel jobs. This is ignored for non-array jobs.
# --num-threads <ngpu>: Specify the number of CPU core.
# --gpu <ngpu>: Specify the number of GPU devices.
# --config: Change the configuration file from default.
#
# "JOB=1:10" is used for "array jobs" and it can control the number of parallel jobs.
# The left string of "=", i.e. "JOB", is replaced by <N>(Nth job) in the command and the log file name,
# e.g. "echo JOB" is changed to "echo 3" for the 3rd job and "echo 8" for 8th job respectively.
# Note that the number must start with a positive number, so you can't use "JOB=0:10" for example.
#
# run.pl, queue.pl, slurm.pl, and ssh.pl have unified interface, not depending on its backend.
# These options are mapping to specific options for each backend and
# it is configured by "conf/queue.conf" and "conf/slurm.conf" by default.
# If jobs failed, your configuration might be wrong for your environment.
#
#
# The official documentation for run.pl, queue.pl, slurm.pl, and ssh.pl:
# "Parallelization in Kaldi": http://kaldi-asr.org/doc/queue.html
# =========================================================~


# Select the backend used by run.sh from "local", "stdout", "sge", "slurm", or "ssh"
cmd_backend='local'

# Local machine, without any Job scheduling system
if [ "${cmd_backend}" = local ]; then

# The other usage
export train_cmd="run.pl"
# Used for "*_train.py": "--gpu" is appended optionally by run.sh
export cuda_cmd="run.pl"
# Used for "*_recog.py"
export decode_cmd="run.pl"

# Local machine logging to stdout and log file, without any Job scheduling system
elif [ "${cmd_backend}" = stdout ]; then

# The other usage
export train_cmd="stdout.pl"
# Used for "*_train.py": "--gpu" is appended optionally by run.sh
export cuda_cmd="stdout.pl"
# Used for "*_recog.py"
export decode_cmd="stdout.pl"


# "qsub" (Sun Grid Engine, or derivation of it)
elif [ "${cmd_backend}" = sge ]; then
# The default setting is written in conf/queue.conf.
# You must change "-q g.q" for the "queue" for your environment.
# To know the "queue" names, type "qhost -q"
# Note that to use "--gpu *", you have to setup "complex_value" for the system scheduler.

export train_cmd="queue.pl"
export cuda_cmd="queue.pl"
export decode_cmd="queue.pl"


# "qsub" (Torque/PBS.)
elif [ "${cmd_backend}" = pbs ]; then
# The default setting is written in conf/pbs.conf.

export train_cmd="pbs.pl"
export cuda_cmd="pbs.pl"
export decode_cmd="pbs.pl"


# "sbatch" (Slurm)
elif [ "${cmd_backend}" = slurm ]; then
# The default setting is written in conf/slurm.conf.
# You must change "-p cpu" and "-p gpu" for the "partition" for your environment.
# To know the "partion" names, type "sinfo".
# You can use "--gpu * " by default for slurm and it is interpreted as "--gres gpu:*"
# The devices are allocated exclusively using "${CUDA_VISIBLE_DEVICES}".

export train_cmd="slurm.pl"
export cuda_cmd="slurm.pl"
export decode_cmd="slurm.pl"

elif [ "${cmd_backend}" = ssh ]; then
# You have to create ".queue/machines" to specify the host to execute jobs.
# e.g. .queue/machines
# host1
# host2
# host3
# Assuming you can login them without any password, i.e. You have to set ssh keys.

export train_cmd="ssh.pl"
export cuda_cmd="ssh.pl"
export decode_cmd="ssh.pl"

# This is an example of specifying several unique options in the JHU CLSP cluster setup.
# Users can modify/add their own command options according to their cluster environments.
elif [ "${cmd_backend}" = jhu ]; then

export train_cmd="queue.pl --mem 2G"
export cuda_cmd="queue-freegpu.pl --mem 2G --gpu 1 --config conf/queue.conf"
export decode_cmd="queue.pl --mem 4G"

else
echo "$0: Error: Unknown cmd_backend=${cmd_backend}" 1>&2
return 1
fi
2 changes: 2 additions & 0 deletions egs2/iam/ocr1/conf/fbank.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
--sample-frequency=16000
--num-mel-bins=80
11 changes: 11 additions & 0 deletions egs2/iam/ocr1/conf/pbs.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Default configuration
command qsub -V -v PATH -S /bin/bash
option name=* -N $0
option mem=* -l mem=$0
option mem=0 # Do not add anything to qsub_opts
option num_threads=* -l ncpus=$0
option num_threads=1 # Do not add anything to qsub_opts
option num_nodes=* -l nodes=$0:ppn=1
default gpu=0
option gpu=0
option gpu=* -l ngpus=$0
1 change: 1 addition & 0 deletions egs2/iam/ocr1/conf/pitch.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
--sample-frequency=16000
12 changes: 12 additions & 0 deletions egs2/iam/ocr1/conf/queue.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Default configuration
command qsub -v PATH -cwd -S /bin/bash -j y -l arch=*64*
option name=* -N $0
option mem=* -l mem_free=$0,ram_free=$0
option mem=0 # Do not add anything to qsub_opts
option num_threads=* -pe smp $0
option num_threads=1 # Do not add anything to qsub_opts
option max_jobs_run=* -tc $0
option num_nodes=* -pe mpi $0 # You must set this PE as allocation_rule=1
default gpu=0
option gpu=0
option gpu=* -l gpu=$0 -q g.q
14 changes: 14 additions & 0 deletions egs2/iam/ocr1/conf/slurm.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Default configuration
command sbatch --export=PATH
option name=* --job-name $0
option time=* --time $0
option mem=* --mem-per-cpu $0
option mem=0
option num_threads=* --cpus-per-task $0
option num_threads=1 --cpus-per-task 1
option num_nodes=* --nodes $0
default gpu=0
option gpu=0 -p cpu
option gpu=* -p gpu --gres=gpu:$0 -c $0 # Recommend allocating more CPU than, or equal to the number of GPU
# note: the --max-jobs-run option is supported as a special case
# by slurm.pl and you don't have to handle it in the config file.
55 changes: 55 additions & 0 deletions egs2/iam/ocr1/conf/train_asr_conformer.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# This config is adapted from the Librispeech conformer model

keep_nbest_models: 10
encoder: conformer
encoder_conf:
output_size: 256
attention_heads: 4
linear_units: 1024
num_blocks: 12
dropout_rate: 0.1
positional_dropout_rate: 0.1
attention_dropout_rate: 0.1
input_layer: conv2d
normalize_before: true
macaron_style: true
rel_pos_type: latest
pos_enc_layer_type: rel_pos
selfattention_layer_type: rel_selfattn
activation_type: swish
use_cnn_module: true
cnn_module_kernel: 31

decoder: transformer
decoder_conf:
attention_heads: 4
linear_units: 2048
num_blocks: 6
dropout_rate: 0.1
positional_dropout_rate: 0.1
self_attention_dropout_rate: 0.1
src_attention_dropout_rate: 0.1

model_conf:
ctc_weight: 0.3
lsm_weight: 0.1
length_normalized_loss: false

batch_type: folded
batch_size: 64
accum_grad: 1
max_epoch: 200
patience: none
init: xavier_uniform
best_model_criterion:
- - valid
- acc
- max

optim: adam
optim_conf:
lr: 0.002
weight_decay: 0.000001
scheduler: warmuplr
scheduler_conf:
warmup_steps: 15000
1 change: 1 addition & 0 deletions egs2/iam/ocr1/db.sh
48 changes: 48 additions & 0 deletions egs2/iam/ocr1/local/data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#!/bin/bash
# Set bash to 'debug' mode, it will exit on :
# -e 'error', -u 'undefined variable', -o ... 'error in pipeline', -x 'print commands',
set -e
set -u
set -o pipefail

log() {
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%dT%H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}

SECONDS=0

. ./db.sh
. ./path.sh
. ./cmd.sh

stage=1
stop_stage=2

# Fill in username/password from account on https://fki.tic.heia-fr.ch/register
iam_username=""
iam_password=""

# Set parameters for the feature dimensions used during image extraction,
# see data_prep.py for details
feature_dim=100
downsampling_factor=0.5

data_dir=data/

if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
log "Stage 1.1: Downloading the IAM Handwriting dataset with username ${iam_username} and password ${iam_password}"
mkdir -p ${IAM}
kenzheng99 marked this conversation as resolved.
Show resolved Hide resolved
local/download_and_untar.sh ${IAM} ${iam_username} ${iam_password}
fi

if [ ${stage} -le 2 ] && [ ${stop_stage} -ge 2 ]; then
log "Stage 1.2: Data Preparation - generating text, utt2spk, spk2utt, and feats.scp for train/valid/test splits"
if [ -e ${data_dir} ]; then
echo "Error: directory ${data_dir} already exists, to re-generate please first remove it manually"
exit 1
fi
python local/data_prep.py --feature_dim ${feature_dim} --downsampling_factor ${downsampling_factor}
fi

log "Successfully finished. [elapsed=${SECONDS}s]"