Liason in French #7
Comments
Liason looks like it's going to be fairly complex to implement correctly. I'm leaving this link here for future me: https://github.com/juliacarbajal/french_phonologizer/blob/master/phonologize.py |
I can suggest this algorithm.
|
Also I think it's not all examples by link is correct in modern French. but I'm not a native French speaker. |
I agree liason in french is important, it helps a lot to understand the speaker and larynx doesn't make it. |
For french speaker, here is an article talking when we should do the liason or not: Tomorrow I could make a summary in english. |
@tjiho, I don't think it's possible to make an one universal solution. For example, in your article he wrote you shouldn't use a liason for phrases like “des haricots” (last 's' and first 'h') but if you check siwis dataset in sentence "Voilà donc de quoi dépendent les destins des hommes !" she used a liason for "des_hommes" and as I know it's a standard pronunciation. Maybe it depends from region where man/woman lives but even in most known self-study guide Mauger "Course de Lange et de Civilisation Francaises" they used the liason in that case (see the page 4: [dezom]). |
@alt131 About liason with |
So it lists some rules:
|
@synesthesiam, did you define a part of speech of word in gruut? |
@tjiho, I doubt what gruut has a syntactical analyzer. |
And I also believe it's not a big problem if we'll not add a liason for "un_ami" etc, at least for speech-to-text neuro net. |
200 words? |
A bit more 😇 There are 573 words. That's nice, so we have all the words with a h aspiré . |
|
Gruut can do syntax analysis using python-crfsuite. I trained a model for French today on the French Universal Dependencies treebank, and it seems to work quite well. Here's the result of my first attempt: https://drive.google.com/drive/folders/1a232BIJ_gTfm3wHEKr0F86K8BkepQYay?usp=sharing I took my example sentences from here and implemented just these few rules (for now):
This is the log with generated phonemes:
|
It apply to this kind of adjectives (qualificatif) :
Sure, the article talks about some expressions: |
@tjiho, thanks. |
|
@tjiho, I don't like a pronunciation 'ton' :) |
Here are the same sentences with a word break ( Do these sound better or worse? |
@synesthesiam, 'Ton excellent vin.' is OK |
@synesthesiam All sentences are OK for me. |
Sorry "Je vis en Amérique." Liason is lost but pronunciation is still better)) |
In the second one, |
OK, I'll keep the word breaks in then. This seems like progress at least 🙂 |
Please generate these sentences also |
I've added them here: https://drive.google.com/drive/folders/1U8i14JX_IB2HC-0YlGrTunFkzM9lpAvR |
'Sa vie n’était pas en danger' is OK |
Check phonems for it if they are OK then do nothing. |
DEBUG:larynx:Words for 'Un bâtiment est en vue de l'île.': ['un/DET', 'bâtiment/NOUN', 'est/AUX', 'en/ADP', 'vue/NOUN', 'de/ADP', "l'île/NOUN", './PUNCT'] 😕 |
'#', 'ɛ', '#', 'ɑ̃', '#' should be '#', 'ɛ', 't', '#', 'ɑ̃', '#'. 't' was lost. |
Another example for you 'Amalia est en danger.' |
Ah, I'm missing the verb -> vowel case. Hang on. |
Let's check this too "C`est incroyable!" |
Updated the Google Drive directory.
|
They are OK. (Phonems) |
The pronunciation is OK too. |
Great, thanks! I've uploaded new code for gruut and larynx as well as the French model with POS tagging. I won't be able to update Docker images until later. |
OK, thank you, I'll check it tomorrow. |
DEBUG:larynx:Words for 'je peux vous aider à le retrouver': ['je', 'peux', 'vous', 'aider', 'à', 'le', 'retrouver'] no liason in vous_aider and sound 'z' was lost in phonems |
'Chacun est uni à l`arbre de vie.' And then: |
You have at least 2 bugs in French models. |
Haven't updated the Docker images yet. I had to roll back to push a different fix. The ValueError you got is likely from leaving the |
No, it's because in " l`arbre" used no standard apostrophe. If I change it on standard ', then it works OK. |
Just wanted to mention:
On this pronunciation sample, I hear a D-sound rather than the expected T-sound: est “D”un grand instead of est “T”un grand. |
In French sometimes two words sound like one
DEBUG:larynx:Words for 'oui, c'est un': ['oui', ',', "c'est", 'un']
DEBUG:larynx:Phonemes for 'c'est un': ['#', 's', 'e', 't', '#', 'œ̃', '#', '‖', '‖']
't' was lost in the output wav and phonemes should be something like this
DEBUG:larynx:Phonemes for 'c'est un': ['#', 's', 'e', 't', 'œ̃', '#', '‖', '‖']
DEBUG:larynx:Words for 'ce n'est pas un': ['ce', "n'est", 'pas', 'un']
DEBUG:larynx:Phonemes for 'ce n'est pas un': ['#', 's', 'e', 'ə', '#', 'n', 'ɛ', '#', 'p', 'a', '#', 'œ̃', '#', '‖', '‖']
the output wav was OK ('z' was added) but I think phonemes should be something like this
DEBUG:larynx:Phonemes for 'ce n'est pas un': ['#', 's', 'e', 'ə', '#', 'n', 'ɛ', '#', 'p', 'a', 'z', 'œ̃', '#', '‖', '‖']
The text was updated successfully, but these errors were encountered: