how to extend the default dictionary? #19

attitudechunfeng · 2019-02-21T08:47:31Z

As the title, if I want to add more words which are not included in the default dictionary, what should I do?

boredomed · 2019-03-07T20:31:50Z

You have to recreate the files in $FLITEDIR/lang/cmulex using your new dictionary.
Just follow the steps mentioned here:
https://boredomed.wordpress.com/2019/03/07/festvox-to-flite-tts-conversion/
feel free to ask for any querries

attitudechunfeng · 2019-03-08T04:21:05Z

One more question. Is the "lexicon.out" file represents the new lexicon file, and the content in the file follows the format "word pronounciation"?

boredomed · 2019-03-08T04:47:31Z

Yes lexicon.out is your new or extended lexicon and has to be build on the format of words pronunciation ie their phonetic representation. You can take help from festival/lib/dicts/cmu/cmudict.out
But you can make it simpler by skipping the 0, 1 emphasis if not required.

attitudechunfeng · 2019-03-08T05:32:04Z

Thanks for your quick reply. Your answer has helped me solve the question, so i close this issue.

attitudechunfeng · 2019-03-08T09:45:03Z

Sorry, I've met another problem when i tried to extend the dict. "Also place your lexicon and allowables files in this ‘lex’ directory" in the tutorial, so which files should i exactly put into the lex directory? .out file, allowbles file and more? And how can i generate these files?

boredomed · 2019-03-08T10:18:41Z

Just 2 files lexicon and allowable.
You have to place the lexicon.out file in 'lex' directory. Which in your case is the lexicon that you have extended the format is mentioned in the tutorial 'Lexicon format'.
If you are using English then you can obtain the allowables.scm from festival/lib/dicts/allowables.scm else if its not the case you have to use your own allowables file here on which bases you are creating the word pronunciation in lexicon.

attitudechunfeng · 2019-03-08T10:31:20Z

I've tried as guided, it can't work and the error info is as follows:

cat: alllets.out: No such file or directory
cat: allphones.out: No such file or directory
cat: let2phones.out: No such file or directory
Find probabilities of letter-phone pairs
/festival/src/main/festival: argument for "--heap" not an int
Type -h for help on options.
Align letter-phone pairs with best path
/festival/src/main/festival: argument for "--heap" not an int
Type -h for help on options.
Build letter to phone CART trees
awk: fatal: cannot open file lex.feats' for reading (No such file or directory) awk: fatal: cannot open file lex.feats' for reading (No such file or directory)
awk: fatal: cannot open file lex.feats' for reading (No such file or directory) awk: fatal: cannot open file lex.feats' for reading (No such file or directory)
awk: fatal: cannot open file lex.feats' for reading (No such file or directory) awk: fatal: cannot open file lex.feats' for reading (No such file or directory)
awk: fatal: cannot open file lex.feats' for reading (No such file or directory) awk: fatal: cannot open file lex.feats' for reading (No such file or directory)
awk: fatal: cannot open file lex.feats' for reading (No such file or directory) awk: fatal: cannot open file lex.feats' for reading (No such file or directory)
awk: cmd. line:1: fatal: cannot open file lex.feats' for reading (No such file or directory) awk: fatal: cannot open file lex.feats' for reading (No such file or directory)
Build complete model
/festival/src/main/festival: argument for "--heap" not an int
Type -h for help on options.
cp: cannot stat 'lts_scratch/lex_lts_rules.scm': No such file or directory
Test model
/festival/src/main/festival: argument for "--heap" not an int
Type -h for help on options.
with ALL data -- no held out test set
and i also find the allowables.scm has been replaced. What may cause this problem?

boredomed · 2019-03-08T11:01:32Z

The issue is in the build_lts file in which heap value is not declared.
in build_lts in the cummulate , build , align , merge if statements remove '--heap HEAP' and then run it.

attitudechunfeng · 2019-03-11T03:32:25Z

I remove '--heap HEAP' and re run it, it seems slow. How long it will take to finish the process?

boredomed · 2019-03-11T06:31:43Z

It should not take long
Are you using the allowables from cmu dict?
If yes did you commented the line that makes allowables in the build_lts file in the argument 'lts'?
do it maybe the issue is that its wrongly remaking the allowables file

attitudechunfeng · 2019-03-11T06:33:38Z

I've got it and the process can successfully finish. Thanks very much.

attitudechunfeng · 2019-03-12T03:50:53Z

I've run through the whole process. However, the result seems wrong. With the new dictionary, not only the new word is wrong, but also the original word. What may be the reason? By the way, when run "bulid_lts test", almost all words are failed, is this normal?

boredomed · 2019-03-12T04:01:16Z

Is there 0 and 1 added with the phonemes in lexicon thats created in lts_scratch ?
If yes does your allowables also contain the same phonemes ie with 0 and 1 .
The issue can be the phonemes in your allowables does not match with those in lexicon so allignments failed.
Try removing 0s 1s from lexicon in lts_scratch it not in allowables or vice verca.
and do the steps again .

attitudechunfeng · 2019-03-12T04:16:11Z

I've checked "lts_scratch/lex_entries.out" and "allowables.scm"，some fragments are like below:

( ("a" "a" "a") nil (t r ih1 p ax0 l ey1 ))
( ("a" "a" "b" "e" "r" "g") nil (aa1 b er0 g ))
( ("a" "a" "c" "h" "e" "n") nil (aa1 k ax0 n ))
( ("a" "a" "k" "e" "r") nil (aa1 k er0 ))
( ("a" "a" "l" "s" "e" "t" "h") nil (aa1 l s eh0 th ))
( ("a" "a" "m" "o" "d" "t") nil (aa1 m ax0 t ))
( ("a" "a" "n" "c" "o" "r") nil (aa1 n k ao1 r ))
( ("a" "a" "r" "d" "e" "m" "a") nil (aa0 r d eh1 m ax0 ))
( ("a" "a" "r" "d" "v" "a" "r" "k") nil (aa1 r d v aa1 r k ))
( ("a" "a" "r" "o" "n") nil (eh1 r ax0 n ))

(require 'lts_build)
(set! allowables
'((a epsilon aa aa1 aa0
ax ax1 ax0
eh eh1 eh0
ah ah1 ah0
ae ae1 ae0
ey ey1 ey0
ay ay1 ay0
er er1 er0

and I think they are consistent, how do you think about it?

boredomed · 2019-03-12T04:22:20Z

Check the log files where there are the unalignments check that the failed ones.
Take a word which is failed check its phones see that wheather they are actually right? and falsely unaligned or else?
and then match them with the phones in allowables
this can help you finding the bug.

attitudechunfeng · 2019-03-12T04:37:38Z

However, I only add one new word to the original festival cmudict and use the same allowable as cmudict, but the new results become wrong. It's strange for me.

boredomed · 2019-03-12T04:43:06Z

Your system is not reading the lexicon and falling back to lts rules. Did you made sure the allowables are not re made an yu commented that line in build_lex. Secondly did you replaced the data_raw file with data compressed as mentioned in the tutorial.

…

On Tue, Mar 12, 2019, 9:37 AM attitudechunfeng ***@***.***> wrote: However, I only add one new word to the original festival cmudict and use the same allowable as cmudict, but the new results become wrong. It's strange for me. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#19 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AcqkWFHvpwXe5q9lZUaY6hwY5BG9hh29ks5vVy8VgaJpZM4bG7k-> .

attitudechunfeng · 2019-03-12T06:16:15Z

I have commented the line "./build_lts make_allowables_smt" in build_lex and replaced the data_raw file with data compressed as mentioned in the tutorial.

ZhenheZhang · 2019-05-08T03:04:44Z

Hi there,

I'm following your conversation to expand the dictionary in my system as well. My objective is to upgrade the default CMUdict-0.4 to 0.7b in flite-2.2.
I have commented the line "./build_lts make_allowables_smt" in build_lex. But I have not removed '--heap HEAP' as I have not got the error info stated above.
My problem is the step "./build_lex lts" takes long time, 2 days already. I've attached my server info below. can you help explain how long should be the expectation. Thanks.
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 4
NUMA node(s): 4
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E7-4820 v2 @ 2.00GHz
Stepping: 7
CPU MHz: 2000.000
BogoMIPS: 4002.65

Btw, this tutorial is greatly helpful.
https://boredomed.wordpress.com/2019/03/07/festvox-to-flite-tts-conversion/

Regards,
Zhenhe

boredomed · 2019-05-08T08:04:04Z

Hi,
Instead of directly doing the step ./build_lex lts run the individual commands in it that are as following:
./build_lts cummulate
./build_lts align
./build_lts build
./build_lts merge
Check which of these step takes the longest so you can find whats taking the longest time and pinpoint the issue.
Also don't run the last command in it ./build_lts test maybe it's the one taking the longest an is not compulsory for it.

ZhenheZhang · 2019-05-09T00:52:06Z

It's ./build_lts align that takes the longest. What's worse is I've got failed for several words, like:
align failed: (("a" "a" "a") nil (t r ih1 p ax0 l ey1))
what could be the cause you think?
And regarding to the time comsuming, is it related to the HEAP as well.
what is this command doing in festival? can you explain the fundamental briefly.
$FESTIVAL -b --heap $SIODHEAPSIZE allowables.scm lts_scratch/lex-pl-tablesp.scm

Thanks,
Zhenhe

bringtree · 2020-07-05T08:52:55Z

Ye, I also found the process of "./build_lts align" is too slow.
It spends too much time to generate "/lex/lts_scratch/lex.align" and it is a single thread program.

@boredomed @ZhenheZhang Could you share me with the generated files?

bringtree · 2020-07-10T06:09:11Z

oh, I found the allowable generated from ./build_lts make_allowables_smt is ver large.

attitudechunfeng closed this as completed Mar 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to extend the default dictionary? #19

how to extend the default dictionary? #19

attitudechunfeng commented Feb 21, 2019

boredomed commented Mar 7, 2019

attitudechunfeng commented Mar 8, 2019

boredomed commented Mar 8, 2019

attitudechunfeng commented Mar 8, 2019

attitudechunfeng commented Mar 8, 2019

boredomed commented Mar 8, 2019

attitudechunfeng commented Mar 8, 2019

boredomed commented Mar 8, 2019

attitudechunfeng commented Mar 11, 2019

boredomed commented Mar 11, 2019

attitudechunfeng commented Mar 11, 2019

attitudechunfeng commented Mar 12, 2019

boredomed commented Mar 12, 2019

attitudechunfeng commented Mar 12, 2019

boredomed commented Mar 12, 2019

attitudechunfeng commented Mar 12, 2019

boredomed commented Mar 12, 2019 via email •

edited

Loading

attitudechunfeng commented Mar 12, 2019

ZhenheZhang commented May 8, 2019

boredomed commented May 8, 2019

ZhenheZhang commented May 9, 2019 •

edited

Loading

bringtree commented Jul 5, 2020

bringtree commented Jul 10, 2020

how to extend the default dictionary? #19

how to extend the default dictionary? #19

Comments

attitudechunfeng commented Feb 21, 2019

boredomed commented Mar 7, 2019

attitudechunfeng commented Mar 8, 2019

boredomed commented Mar 8, 2019

attitudechunfeng commented Mar 8, 2019

attitudechunfeng commented Mar 8, 2019

boredomed commented Mar 8, 2019

attitudechunfeng commented Mar 8, 2019

boredomed commented Mar 8, 2019

attitudechunfeng commented Mar 11, 2019

boredomed commented Mar 11, 2019

attitudechunfeng commented Mar 11, 2019

attitudechunfeng commented Mar 12, 2019

boredomed commented Mar 12, 2019

attitudechunfeng commented Mar 12, 2019

boredomed commented Mar 12, 2019

attitudechunfeng commented Mar 12, 2019

boredomed commented Mar 12, 2019 via email • edited Loading

attitudechunfeng commented Mar 12, 2019

ZhenheZhang commented May 8, 2019

boredomed commented May 8, 2019

ZhenheZhang commented May 9, 2019 • edited Loading

bringtree commented Jul 5, 2020

bringtree commented Jul 10, 2020

boredomed commented Mar 12, 2019 via email •

edited

Loading

ZhenheZhang commented May 9, 2019 •

edited

Loading