-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Madcat Arabic handwritten text line recognition #2356
Conversation
… fixed bug with character encoding
… with certain gedi files
Guys, let's leave PIL for now and sort this out later.
It's not a deep dependency.
…On Fri, May 4, 2018 at 11:46 AM, jtrmal ***@***.***> wrote:
I wonder if we should use imagemagick or similar package for either
converting the image on-the-fly to some easier format and/or querying the
image properties (size for example).
I know it adds another dependency, but it's a fairly common package and it
might simplify the whole issue for us.
y.
On Fri, May 4, 2018 at 11:27 AM Hossein Hadian ***@***.***>
wrote:
> I'm a little surprised to see you replace scipy with PIL. I though we
were
> going to move, in the long term at least, from PIL to scikit-image, at
> least I see a conversation in my email about this. @hhadian
> <https://github.com/hhadian> might want to comment.
>
> One options is to have a script image/check_dependencies.sh and have it
> check dependencies for all the things used in the image/ scripts-- we can
> add options to enable or disable certain checks as needed depending on
> where it's called form. @hhadian <https://github.com/hhadian>, what do
> you think?
>
> As Ashish explained, it was because scipy inefficiently would load the
> whole image just to get its size and since, AFAIK, PIL is required (when
we
> use the recommended imageio) regardless, I imagined it would make sense
> to use it here.
> As an alternative approach (to get_image2num_frames.py) I think we can
> skip this step and get the features (w/o enforcing image lengths to
allowed
> lengths) and then compute image2num_frames from the features (which will
> be fast), and finally enforce the utterance length when getting the e2e
egs
> in nnet3-chain-e2e-get-egs.cc. This can work for OCR because the
> beginning/end frames are always white pixel padding and we can simply
> repeat them.
>
> I agree with image/check_dependencies.sh. We can remove
> local/check_tools.sh and instead call that in run.sh. Also, we can move
> local/make_features.py to image/ because I guess all the OCR recipes use
> the same copy (is that right Ashish?).
>
> Hossein
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#2356 (comment)>,
or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/
AKisXwk4CGlWCDa5puZk6sPYskFJulOeks5tvHNNgaJpZM4TVAuH>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2356 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu1cxCHT1ghfWbqBMSsr5YbSZsBVMks5tvHfLgaJpZM4TVAuH>
.
|
Madcat ar lm
updating parameters
Madcat ar 2
@@ -0,0 +1,226 @@ | |||
#!/bin/bash | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aarora8, there should be results at the top of these files, obtained from the compare_wer.sh script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently, run_cnn_chainali_1b.sh was giving a little bit worse result than run_flatstart_cnn1a.sh. I am currently running it with higher epochs and more tree-leaves. should i update it after current run completion or with recent results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I realized I haven't run run_cnn_1a.sh, run_cnn_chainali_1b and run_cnn_end2endali_1a, with the recent code. I have currently ran run_flatstart_cnn1a.sh and run_cnn_end2endali_1b (UC) scripts. I updated the results on those two scripts, I will run and update the results for other scripts aswell.
adding latest results, removing dev from extract features to save time
you can just put the current results for now and change it later.
But I think you should create a tuning/ directory and make those run*
scripts links to it. I.e. all 1a and 1b suffixes (etc.) would be within
the tuning/ directory and there would be soft links from one directory
above.
Experiments that differ only in suffix like 1a or 1b are supposed to be
diffferent versions of the same experiment (differently tuned). And you
generally won't change the results after doing the initial experiment. At
least that's how we normally do it.
But before we merge this, it's OK to just take the best current experiment
and make it 1a.
…On Tue, May 15, 2018 at 12:20 AM, Ashish Arora ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In egs/madcat_ar/v1/local/chain/run_cnn_chainali_1b.sh
<#2356 (comment)>:
> @@ -0,0 +1,226 @@
+#!/bin/bash
+
currently, it was giving a little bit worse result than
run_flatstart_cnn1a.sh. I am currently running it with higher epochs and
more tree-leaves. should i update it after current run completion or with
recent results.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2356 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu-OpHkIySqeKQCc5L1nLZe7bCuAIks5tyleMgaJpZM4TVAuH>
.
|
ok, thank you. I will add the results and make it 1a.
________________________________
From: Daniel Povey <notifications@github.com>
Sent: Tuesday, May 15, 2018 12:26:05 AM
To: kaldi-asr/kaldi
Cc: Ashish Arora; Mention
Subject: Re: [kaldi-asr/kaldi] Madcat Arabic handwritten text line recognition (#2356)
you can just put the current results for now and change it later.
But I think you should create a tuning/ directory and make those run*
scripts links to it. I.e. all 1a and 1b suffixes (etc.) would be within
the tuning/ directory and there would be soft links from one directory
above.
Experiments that differ only in suffix like 1a or 1b are supposed to be
diffferent versions of the same experiment (differently tuned). And you
generally won't change the results after doing the initial experiment. At
least that's how we normally do it.
But before we merge this, it's OK to just take the best current experiment
and make it 1a.
On Tue, May 15, 2018 at 12:20 AM, Ashish Arora ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In egs/madcat_ar/v1/local/chain/run_cnn_chainali_1b.sh
<#2356 (comment)>:
> @@ -0,0 +1,226 @@
+#!/bin/bash
+
currently, it was giving a little bit worse result than
run_flatstart_cnn1a.sh. I am currently running it with higher epochs and
more tree-leaves. should i update it after current run completion or with
recent results.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2356 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu-OpHkIySqeKQCc5L1nLZe7bCuAIks5tyleMgaJpZM4TVAuH>
.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#2356 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AcFBRW1XzGdpYC1eVF8iKf6bCQ3INeGqks5tyljdgaJpZM4TVAuH>.
|
Yes, update it later.
…On Tue, May 15, 2018 at 1:32 AM, Ashish Arora ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In egs/madcat_ar/v1/local/train_lm.sh
<#2356 (comment)>:
> +#for order in 3; do
+#rm -f ${lm_dir}/${num_word}_${order}.pocolm/.done
+
+if [ $stage -le 0 ]; then
+ mkdir -p ${dir}/data
+ mkdir -p ${dir}/data/text
+
+ echo "$0: Getting the Data sources"
+
+ rm ${dir}/data/text/* 2>/dev/null || true
+
+ # use the validation data as the dev set.
+ # Note: the name 'dev' is treated specially by pocolm, it automatically
+ # becomes the dev set.
+
+ cat data/dev/text | cut -d " " -f 2- > ${dir}/data/text/dev.txt
I tried using first 5000 lines from the train text here and the rest for
training, but the first 5k lines are occurring again in remaining training.
train_lm.sh, gives an error due to it. currently, to remove the error, I
reverted the change. I was working on a task of adding Arabic gigaworld
corpus text data in the language model. currently, it is not complete.
should i update it later with that some portion of Arabic gigaworld corpus
text data.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2356 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu6ZDGZLdpV_opIiQ3XCg_Z6EmZHNks5tymhngaJpZM4TVAuH>
.
|
It's using 750k utterances from madcat Arabic data. It is getting the line image from the MAR (minimum area rectangle). It contains the recent TDNN training recipe for End2end and regular chain recipe. @hhadian
add README(README.txt)
add scripts for data preparation (text, wav.scp and utt2spk file) (local/prepare_data.sh, local/process_data.py, local/create_line_image_from_page_image.py)
add scripts for feature extraction (local/make_features.py)
add scripts for lexicon, language modeling, grammar (local/train_lm.sh, local/prepare_lexicon.py, local/prepare_dict.sh)
add script for GMM-HMM training and using chain model (local/chain/run_cnn_1a.sh, run.sh, run_end2end.sh, local/chain/run_cnn_chainali_1b.sh, local/chain/run_flatstart_cnn1a.sh, local/chain/compare_wer.sh)
Other (cmd.sh, link to image/steps/utils, v1/local/score.sh, path.sh, local/check_tools.sh)
Some of its info and features are as follows:
-It is getting the line image from the MAR (minimum area rectangle).
-It is currently building the language model from training utterances only.
-Its lexicon size is 95k words, OOV rate is around 1.5%.
-For quick debugging and experiments, it can be run with a subset of the dataset based on writing conditions (writing style, speed, carefulness) of the image.
-It contains the recent TDNN training recipe for End2end and regular chain.
-WER 12.97% with line image formed by stitching the word images.
-WER 15.03% with line image formed using MAR.
To do:
replace PIL in create_line_image_from_page_image
replace convex_hull library routine
update configs