Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Liason in French #7

Open
alt131 opened this issue Apr 1, 2021 · 47 comments
Open

Liason in French #7

alt131 opened this issue Apr 1, 2021 · 47 comments
Labels
bug Something isn't working

Comments

@alt131
Copy link

alt131 commented Apr 1, 2021

In French sometimes two words sound like one
DEBUG:larynx:Words for 'oui, c'est un': ['oui', ',', "c'est", 'un']
DEBUG:larynx:Phonemes for 'c'est un': ['#', 's', 'e', 't', '#', 'œ̃', '#', '‖', '‖']
't' was lost in the output wav and phonemes should be something like this
DEBUG:larynx:Phonemes for 'c'est un': ['#', 's', 'e', 't', 'œ̃', '#', '‖', '‖']

DEBUG:larynx:Words for 'ce n'est pas un': ['ce', "n'est", 'pas', 'un']
DEBUG:larynx:Phonemes for 'ce n'est pas un': ['#', 's', 'e', 'ə', '#', 'n', 'ɛ', '#', 'p', 'a', '#', 'œ̃', '#', '‖', '‖']
the output wav was OK ('z' was added) but I think phonemes should be something like this
DEBUG:larynx:Phonemes for 'ce n'est pas un': ['#', 's', 'e', 'ə', '#', 'n', 'ɛ', '#', 'p', 'a', 'z', 'œ̃', '#', '‖', '‖']

@synesthesiam synesthesiam added the bug Something isn't working label Apr 12, 2021
@synesthesiam
Copy link
Contributor

Liason looks like it's going to be fairly complex to implement correctly. I'm leaving this link here for future me: https://github.com/juliacarbajal/french_phonologizer/blob/master/phonologize.py

@alt131
Copy link
Author

alt131 commented Apr 12, 2021

I can suggest this algorithm.

  1. If last symbol of first word is 's', 'x', 'z', 't', 'd' and first symbol of second word is h or any vowel
  2. If last symbol of first word is consonant + apostrophe and first symbol of second word is h or any vowel
    then if the last consonant in 1st word is pronounced then just delete pause between words
    if last consonant in 1st word is NOT pronounced then change pause between words on 'z' for 's', 'x', 'z' and on 't' for 't', 'd'.
    I think it covers most cases.

@alt131
Copy link
Author

alt131 commented Apr 12, 2021

Also I think it's not all examples by link is correct in modern French.
For example (from your learning French dataset), "any instances of après that are not part of après-midi"
Cyrus Smith et Gédéon Spilett, après être << liason
C’est le mot du professeur, qui, après avoir << liason

but I'm not a native French speaker.

@tjiho
Copy link

tjiho commented Apr 12, 2021

I agree liason in french is important, it helps a lot to understand the speaker and larynx doesn't make it.
At this late hour, I will not pick a solution. It fairly complex to know when we should make liaison or not (in fact it's more a feeling than a rule that I apply).
I found this algorithm yesterday, with two function check_liaison which verify if we should to do a liason between two words. And liaison which apply the liaison.
https://github.com/tjiho/PoemesProfonds/blob/ede1b32df153254e826cd9779f971fe72d6bd3eb/lecture.py#L143

@tjiho
Copy link

tjiho commented Apr 13, 2021

For french speaker, here is an article talking when we should do the liason or not:
https://www.francaisauthentique.com/quand-faire-la-liaison-en-francais/

Tomorrow I could make a summary in english.

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

@tjiho, I don't think it's possible to make an one universal solution. For example, in your article he wrote you shouldn't use a liason for phrases like “des haricots” (last 's' and first 'h') but if you check siwis dataset in sentence "Voilà donc de quoi dépendent les destins des hommes !" she used a liason for "des_hommes" and as I know it's a standard pronunciation. Maybe it depends from region where man/woman lives but even in most known self-study guide Mauger "Course de Lange et de Civilisation Francaises" they used the liason in that case (see the page 4: [dezom]).

@tjiho
Copy link

tjiho commented Apr 13, 2021

@alt131 About liason with h I searched on internet and I found this: http://www.languefrancaise.net/forum/viewtopic.php?id=180
It say (in summary) that there is not liason with word beginning with a h non latin (Germanic or Greek). There is a list of those (~200), i will search it and publish it when I'll have the list.
It's very ugly to say des_haies [dezɛ]. And you're right it should say des_hommes [dezom].

@tjiho
Copy link

tjiho commented Apr 13, 2021

For french speaker, here is an article talking when we should do the liason or not:
https://www.francaisauthentique.com/quand-faire-la-liaison-en-francais/

Tomorrow I could make a summary in english.

So it lists some rules:

  • you should do a liaison only it the first word ends in a consonant and the second begins with a vowel.
  • determinant + noun : liason
  • pronoun + verb : liason
  • adjective + noun : liason
  • there is a list of typical expression to know for which you should do a liason (todo: find this list).
  • noun + adjective: no liason
  • after et: no liason (else it sounds like est)
  • after the verb être and avoir liason is optional (it sounds more formal with the liason) - we could simplify after a verb no liason
  • if the second word begin with a h aspiré (non latin words) : no liason (todo: find the list)
  • if the second word begin with a h muet (latin words): liason

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

@synesthesiam, did you define a part of speech of word in gruut?
@tjiho, ok, it's a good list, but do you understand it's not so easy to detect a part of speech of word in text?

@tjiho
Copy link

tjiho commented Apr 13, 2021

@tjiho, ok, it's a good list, but do you understand it's not so easy to detect a part of speech of word in text?

@alt131 yes i understand it 😅 It will be a big improvement to have correct liason.

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

@tjiho, I doubt what gruut has a syntactical analyzer.
I suggest to use my algorithm above and add there some exceptions like 'et' for first word and 'à', 'il', 'elle' etc for second word. It covers 95-98% cases for TTS and STT and there is 3 sentences on Python I believe.

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

And I also believe it's not a big problem if we'll not add a liason for "un_ami" etc, at least for speech-to-text neuro net.

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

200 words?
https://fr.wikipedia.org/wiki/H_aspir%C3%A9

@tjiho
Copy link

tjiho commented Apr 13, 2021

200 words?
https://fr.wikipedia.org/wiki/H_aspir%C3%A9

A bit more 😇 There are 573 words. That's nice, so we have all the words with a h aspiré .

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

  • adjective + noun : liason
    I believe there are adjectives like grand, petit, gros, long, beaux, bel
    It's possible to add them to a first word exception.
  • there is a list of typical expression to know for which you should do a liason (todo: find this list).
    Can you give some example(s)?

@synesthesiam
Copy link
Contributor

Gruut can do syntax analysis using python-crfsuite. I trained a model for French today on the French Universal Dependencies treebank, and it seems to work quite well.

Here's the result of my first attempt: https://drive.google.com/drive/folders/1a232BIJ_gTfm3wHEKr0F86K8BkepQYay?usp=sharing

I took my example sentences from here and implemented just these few rules (for now):

  • "et" = no liason
  • determiner -> anything = liason
  • pronoun -> verb, liason
  • preposition or très -> noun, liason

This is the log with generated phonemes:

Words for 'Un enfant.': ['un/DET', 'enfant/NOUN', './PUNCT']
Phonemes for 'Un enfant.': ['#', 'œ̃', 'n', 'ɑ̃', 'f', 'ɑ̃', '#', '‖', '‖']

Words for 'Les arbres.': ['les/DET', 'arbres/NOUN', './PUNCT']
Phonemes for 'Les arbres.': ['#', 'l', 'e', 'z', 'a', 'ʁ', 'b', 'ʁ', '#', '‖', '‖']

Words for 'Deux amis.': ['deux/NUM', 'amis/NOUN', './PUNCT']
Phonemes for 'Deux amis.': ['#', 'd', 'ø', 'z', 'a', 'm', 'i', '#', '‖', '‖']

Words for 'Ton excellent vin.': ['ton/DET', 'excellent/ADJ', 'vin/NOUN', './PUNCT']
Phonemes for 'Ton excellent vin.': ['#', 't', 'ɔ̃', 'n', 'ɛ', 'k', 's', 'ɛ', 'l', '#', 'v', 'ɛ̃', '#', '‖', '‖']

Words for 'Ces autres voyages.': ['ces/DET', 'autres/ADJ', 'voyages/NOUN', './PUNCT']
Phonemes for 'Ces autres voyages.': ['#', 's', 'ɛ', 'z', 'o', 't', 'ʁ', '#', 'v', 'w', 'a', 'j', 'a', 'ʒ', '#', '‖', '‖']

Words for 'On est là!': ['on/PRON', 'est/AUX', 'là/ADV', '!/PUNCT']
Phonemes for 'On est là!': ['#', 'ɔ̃', 'n', 'ɛ', '#', 'l', 'a', '#', '‖', '‖']

Words for 'Elles ont faim!': ['elles/PRON', 'ont/AUX', 'faim/VERB', '!/PUNCT']
Phonemes for 'Elles ont faim!': ['#', 'ɛ', 'l', 'z', 'ɔ̃', '#', 'f', 'ɛ̃', '#', '‖', '‖']

Words for 'Vous êtes sûrs?': ['vous/PRON', 'êtes/AUX', 'sûrs/ADJ', '?/PUNCT']
Phonemes for 'Vous êtes sûrs?': ['#', 'v', 'u', 'z', 'ɛ', 't', '#', 's', 'y', 'ʁ', '#', '‖', '‖']

Words for 'Tu nous entends.': ['tu/PRON', 'nous/PRON', 'entends/VERB', './PUNCT']
Phonemes for 'Tu nous entends.': ['#', 't', 'y', '#', 'n', 'u', 'z', 'ɑ̃', 't', 'ɑ̃', '#', '‖', '‖']

Words for 'Je les adore.': ['je/PRON', 'les/PRON', 'adore/VERB', './PUNCT']
Phonemes for 'Je les adore.': ['#', 'ʒ', 'ə', '#', 'l', 'e', 'z', '#', 'a', 'd', 'ɔ', 'ʁ', '#', '‖', '‖']

Words for 'J'ai des petites oreilles.': ["j'ai/VERB", 'des/DET', 'petites/ADJ', 'oreilles/NOUN', './PUNCT']
Phonemes for 'J'ai des petites oreilles.': ['#', 'ʒ', 'e', '#', 'd', 'e', '#', 'p', 'ə', 't', 'i', 't', 'z', 'ɔ', 'ʁ', 'ɛ', 'j', '#', '‖', '‖']

Words for 'Michel est un grand ami.': ['michel/PROPN', 'est/AUX', 'un/DET', 'grand/ADJ', 'ami/NOUN', './PUNCT']
Phonemes for 'Michel est un grand ami.': ['#', 'm', 'i', 'ʃ', 'ɛ', 'l', '#', 'ɛ', '#', 'œ̃', 'n', '#', 'ɡ', 'ʁ', 'ɑ̃', 't', 'a', 'm', 'i', '#', '‖', '‖']

Words for 'Je regarde la télé sur un petit écran.': ['je/PRON', 'regarde/VERB', 'la/DET', 'télé/NOUN', 'sur/ADP', 'un/DET', 'petit/ADJ', 'écran/NOUN', './PUNCT']
Phonemes for 'Je regarde la télé sur un petit écran.': ['#', 'ʒ', 'ə', '#', 'ʁ', 'ə', 'ɡ', 'a', 'ʁ', 'd', '#', 'l', 'a', '#', 't', 'e', 'l', 'e', '#', 's', 'y', 'ʁ', '#', 'œ̃', 'n', '#', 'p', 'ə', 't', 'i', 't', 'e', 'k', 'ʁ', 'ɑ̃', '#', '‖', '‖']

Words for 'C'est un ancien élève.': ["c'est/AUX", 'un/DET', 'ancien/ADJ', 'élève/NOUN', './PUNCT']
Phonemes for 'C'est un ancien élève.': ['#', 's', 'ɛ', '#', 'œ̃', 'n', '#', 'ɑ̃', 's', 'j', 'ɛ̃', 'n', 'e', 'l', 'ɛ', 'v', '#', '‖', '‖']

Words for 'C'est très amusant!': ["c'est/AUX", 'très/ADV', 'amusant/ADJ', '!/PUNCT']
Phonemes for 'C'est très amusant!': ['#', 's', 'ɛ', '#', 't', 'ʁ', 'ɛ', 'z', 'a', 'm', 'y', 'z', 'ɑ̃', '#', '‖', '‖']

Words for 'Je vis en Amérique.': ['je/PRON', 'vis/VERB', 'en/ADP', 'amérique/PROPN', './PUNCT']
Phonemes for 'Je vis en Amérique.': ['#', 'ʒ', 'ə', '#', 'v', 'i', '#', 'ɑ̃', 'n', 'a', 'm', 'e', 'ʁ', 'i', 'k', '#', '‖', '‖']

Words for 'Ils sont chez eux.': ['ils/PRON', 'sont/AUX', 'chez/ADP', 'eux/PRON', './PUNCT']
Phonemes for 'Ils sont chez eux.': ['#', 'i', 'l', '#', 's', 'ɔ̃', '#', 'ʃ', 'e', 'z', 'ø', '#', '‖', '‖']

Words for 'J'arrive dans une minute.': ["j'arrive/VERB", 'dans/ADP', 'une/DET', 'minute/NOUN', './PUNCT']
Phonemes for 'J'arrive dans une minute.': ['#', 'ʒ', 'a', 'ʁ', 'i', 'v', '#', 'd', 'ɑ̃', 'z', 'y', 'n', '#', 'm', 'i', 'n', 'y', 't', '#', '‖', '‖']

@tjiho
Copy link

tjiho commented Apr 13, 2021

  • adjective + noun : liason
    I believe there are adjectives like grand, petit, gros, long, beaux, bel
    It's possible to add them to a first word exception.

It apply to this kind of adjectives (qualificatif) : un petit_ami
But also to:

  • possessive adjectives, mon, ma, mes, ton, ta, tes, etc... : mon_amant
  • relative adjectives, Lequel, duquel, auquel, laquelle, Lesquels, desquels, auxquels, Lesquelles, desquelles, auxquelles : Lesquelles_avocats.
  • And numeral adjective (numbers): deux_amants
  • other adjective are in fact determinant
  • there is a list of typical expression to know for which you should do a liason (todo: find this list).
    Can you give some example(s)?

Sure, the article talks about some expressions:
tout_à l’heure
c’est_à dire
plus_ou moins
peut_être

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

@tjiho, thanks.
@synesthesiam, I like it but "Je vis en Amérique." Phonems is correct but the pronunciation is not.
'Ton excellent vin.' also has a problem with pronunciation. I think they both have relation with #5.

@tjiho
Copy link

tjiho commented Apr 13, 2021

'Ton excellent vin.' also has a problem with pronunciation. I think they both have relation with #5.

excellent could have mutiple pronunciation depending of the context. It has been pronunced like a verb. ils excellent en mathématiques

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

@tjiho, I don't like a pronunciation 'ton' :)
It's very different from original one.

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

@synesthesiam
Copy link
Contributor

Here are the same sentences with a word break (#) added after the liason: https://drive.google.com/drive/folders/1U8i14JX_IB2HC-0YlGrTunFkzM9lpAvR?usp=sharing

Do these sound better or worse?

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

@synesthesiam, 'Ton excellent vin.' is OK
"Je vis en Amérique." is OK
I check another and write later.

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

@synesthesiam All sentences are OK for me.

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

Sorry "Je vis en Amérique." Liason is lost but pronunciation is still better))

@tjiho
Copy link

tjiho commented Apr 13, 2021

In the second one, les arbres is better. In the first one, I hear ['#', 'l', 'e', 'z', 'a', 'b', 'ʁ', '#', '‖', '‖'] instead of ['#', 'l', 'e', 'z', 'a', 'ʁ', 'b', 'ʁ', '#', '‖', '‖'] ('ʁ is missing before the 'b')

@synesthesiam
Copy link
Contributor

OK, I'll keep the word breaks in then. This seems like progress at least 🙂
Thanks for all your help!

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

Please generate these sentences also
Un bâtiment est en vue de l’île.
Sa vie n’était pas en danger
let's check them too

@synesthesiam
Copy link
Contributor

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

'Sa vie n’était pas en danger' is OK
'Un bâtiment est en vue de l’île.' I hear "est en" as something 'e' 'd' 'ɑ̃', but it should be 'e' 't' 'ɑ̃'

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

Check phonems for it if they are OK then do nothing.

@synesthesiam
Copy link
Contributor

synesthesiam commented Apr 13, 2021

DEBUG:larynx:Words for 'Un bâtiment est en vue de l'île.': ['un/DET', 'bâtiment/NOUN', 'est/AUX', 'en/ADP', 'vue/NOUN', 'de/ADP', "l'île/NOUN", './PUNCT']
DEBUG:larynx:Phonemes for 'Un bâtiment est en vue de l'île.': ['#', 'œ̃', '#', 'b', 'a', 't', 'i', 'm', 'ɑ̃', '#', 'ɛ', '#', 'ɑ̃', '#', 'v', 'y', '#', 'd', 'ə', '#', 'l', 'i', 'l', '#', '‖', '‖']

😕

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

'#', 'ɛ', '#', 'ɑ̃', '#' should be '#', 'ɛ', 't', '#', 'ɑ̃', '#'. 't' was lost.

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

Another example for you 'Amalia est en danger.'

@synesthesiam
Copy link
Contributor

Ah, I'm missing the verb -> vowel case. Hang on.

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

Let's check this too "C`est incroyable!"

@synesthesiam
Copy link
Contributor

Updated the Google Drive directory.

DEBUG:larynx:Words for 'Un bâtiment est en vue de l'île.': ['un/DET', 'bâtiment/NOUN', 'est/AUX', 'en/ADP', 'vue/NOUN', 'de/ADP', "l'île/NOUN", './PUNCT']
DEBUG:larynx:Phonemes for 'Un bâtiment est en vue de l'île.': ['#', 'œ̃', '#', 'b', 'a', 't', 'i', 'm', 'ɑ̃', '#', 'ɛ', 't', '#', 'ɑ̃', '#', 'v', 'y', '#', 'd', 'ə', '#', 'l', 'i', 'l', '#', '‖', '‖']

DEBUG:larynx:Words for 'Amalia est en danger.': ['amalia/PROPN', 'est/AUX', 'en/ADP', 'danger/NOUN', './PUNCT']
DEBUG:larynx:Phonemes for 'Amalia est en danger.': ['#', 'a', 'm', 'a', 'l', 'j', 'a', '#', 'ɛ', 't', '#', 'ɑ̃', '#', 'd', 'ɑ̃', 'ʒ', 'e', '#', '‖', '‖']

DEBUG:larynx:Words for 'C'est incroyable!': ["c'est/AUX", 'incroyable/ADJ', '!/PUNCT']
DEBUG:larynx:Phonemes for 'C'est incroyable!': ['#', 's', 'ɛ', 't', '#', 'ɛ̃', 'k', 'ʁ', 'w', 'a', 'j', 'a', 'b', 'l', '#', '‖', '‖']

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

They are OK. (Phonems)

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

The pronunciation is OK too.

@synesthesiam
Copy link
Contributor

Great, thanks!

I've uploaded new code for gruut and larynx as well as the French model with POS tagging. I won't be able to update Docker images until later.

@alt131
Copy link
Author

alt131 commented Apr 13, 2021

OK, thank you, I'll check it tomorrow.
I use Windows subsystem for linux 1))

@alt131
Copy link
Author

alt131 commented Apr 14, 2021

DEBUG:larynx:Words for 'je peux vous aider à le retrouver': ['je', 'peux', 'vous', 'aider', 'à', 'le', 'retrouver']
DEBUG:larynx:Phonemes for 'je peux vous aider à le retrouver': ['#', 'ʒ', 'ə', '#', 'p', 'ø', '#', 'v', 'u', '#', 'e', 'd', 'e', '#', 'a', '#', 'l', 'ə', '#', 'ʁ', 'ə', 't', 'ʁ', 'u', 'v', 'e', '#', '‖', '‖']

no liason in vous_aider and sound 'z' was lost in phonems

@alt131
Copy link
Author

alt131 commented Apr 14, 2021

'Chacun est uni à l`arbre de vie.'
sh: 1: arbre: not found
sh: 1: chacun: not found

And then:
DEBUG:hifi_gan:Initializing denoiser
Traceback (most recent call last):
File "/usr/local/bin/larynx", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/larynx/main.py", line 165, in main
line_id, line = line.split(args.id_delimiter, maxsplit=1)
ValueError: not enough values to unpack (expected 2, got 1)

@alt131
Copy link
Author

alt131 commented Apr 14, 2021

You have at least 2 bugs in French models.
First bug. Current NNs were trained without the liason.
Second bug. To compeтsate first bug you added # in liason.
I think you need to delete # in liason and train swiss model for test. I believe the result will be better quality and more accurate
And bug #5 still exists for example for 'livre' or 'homme' phonemes are OK but pronunciation is not (inside a word)

@synesthesiam
Copy link
Contributor

Haven't updated the Docker images yet. I had to roll back to push a different fix.

The ValueError you got is likely from leaving the --csv command-line argument on while passing in sentences without an id field (like id|text).

@alt131
Copy link
Author

alt131 commented Apr 14, 2021

The ValueError you got is likely from leaving the --csv command-line argument on while passing in sentences without an id field (like id|text).

No, it's because in " l`arbre" used no standard apostrophe. If I change it on standard ', then it works OK.

@zopieux
Copy link

zopieux commented May 22, 2021

Just wanted to mention:

Phonemes for 'Michel est un grand ami.': ['#', 'm', 'i', 'ʃ', 'ɛ', 'l', '#', 'ɛ', '#', 'œ̃', 'n', '#', 'ɡ', 'ʁ', 'ɑ̃', 't', 'a', 'm', 'i', '#', '‖', '‖']

On this pronunciation sample, I hear a D-sound rather than the expected T-sound: est “D”un grand instead of est “T”un grand.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants