Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Catalan (4 accents) #43

Merged
merged 61 commits into from
Jul 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
07984f0
added catalan lang
Apr 6, 2023
504e96c
added catalan lang
Apr 6, 2023
d9c755a
ca
Apr 18, 2023
231b3b5
added all files from version softcatala_most_freq_words_no_accent
May 7, 2023
06b0ac9
Merge pull request #1 from fedecosta/feature/catalan
fedecosta May 7, 2023
8c13650
solved lang settings import
Jun 1, 2023
3e90b0e
change apostrophe treatment
Jun 2, 2023
e49e258
cleaned version
Jun 27, 2023
3b69dd1
Merge pull request #2 from fedecosta/feature/catalan
fedecosta Jun 27, 2023
1022e31
Merge pull request #3 from fedecosta/develop
fedecosta Jun 27, 2023
d550384
reverted gitignore to enable pr
Jul 7, 2023
cbe9f6a
removed useless space
Jul 7, 2023
732f4b7
deleted data/ca folder to enable pr
Jul 7, 2023
f5fe5d4
reverted changes to enable pr
Jul 7, 2023
b7eaa2c
reverted changes to enable pr
Jul 7, 2023
e466933
reverted changes to enable pr
Jul 7, 2023
95a1c0d
reverted changes to enable pr
Jul 7, 2023
1efb1d9
reverted changes to enable pr
Jul 7, 2023
ccda06b
reverted changes to enable pr
Jul 7, 2023
b1810ef
reverted changes to enable pr
Jul 7, 2023
83c06f9
reverted changes to enable pr
Jul 7, 2023
52fecec
reverted changes to enable pr
Jul 7, 2023
41958d8
reverted changes to enable pr
Jul 7, 2023
734da67
reverted changes to enable pr
Jul 7, 2023
3c6a61f
reverted changes to enable pr
Jul 7, 2023
f082ae8
last version model
Oct 6, 2023
516565f
trivial lexicon
Oct 6, 2023
2cefac4
updated phonemes
Oct 6, 2023
e0f8332
added pre processing template
Oct 6, 2023
063b7f2
added pre and post processing
Oct 14, 2023
d37ccc4
added post process logic between words
Oct 17, 2023
d195b7b
added ca-ba files
Oct 19, 2023
9aec49d
added ca-ba files
Oct 19, 2023
b0ffd3b
added files for central and balear
Oct 19, 2023
15c86ab
cleaned data folders
Oct 19, 2023
a0a2500
add cleaned files
Oct 19, 2023
275f31c
Merge pull request #4 from fedecosta/feature/pr_ca
fedecosta Oct 19, 2023
e625a9b
upload data
Oct 19, 2023
b4e6452
Merge branch 'develop' of github.com:fedecosta/gruut into develop
Oct 19, 2023
fa12ac4
original data ignore
Oct 19, 2023
6f5f5de
removed prints and debugs
Oct 19, 2023
fd327f0
original gitignore
Oct 19, 2023
87cf9ff
original gitignore
Oct 20, 2023
f003df1
lang debugger
Oct 20, 2023
13ebd87
removed data files
Oct 20, 2023
452a78b
removed data files
Oct 20, 2023
3a71833
Merge branch 'master' into develop
fedecosta Oct 20, 2023
b6ffdd0
Merge pull request #5 from fedecosta/develop
fedecosta Oct 20, 2023
09be7b0
added ce and ba files
Oct 20, 2023
40c5eee
Merge pull request #6 from fedecosta/develop
fedecosta Oct 20, 2023
f789735
added v3 central files
Nov 30, 2023
d149c55
added v3 nordoccidental files
Nov 30, 2023
1bd8004
added v3 valencia files
Nov 30, 2023
70e551e
added v3 balear files
Nov 30, 2023
b20f79f
added all dialects constants
Nov 30, 2023
3ad3c34
added pre and post processing
Nov 30, 2023
f414455
added v3_1 files
Dec 28, 2023
28eabfe
added fix of l lambda l issue
Dec 28, 2023
64a23ab
Merge pull request #8 from fedecosta/feature/catalan_v3_1
fedecosta Dec 28, 2023
f82e680
Merge pull request #9 from fedecosta/develop
fedecosta Dec 28, 2023
ce86ad5
uncommented data gitignore
Dec 28, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added data/ca-ba/g2p/model.crf
Binary file not shown.
50 changes: 50 additions & 0 deletions data/ca-ba/language.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---

language:
name: "Balear Catalan"
code: "ca-ba"
phonemes: !env "${config_dir}/phonemes.txt"
keep_stress: true

lexicon: !env "${config_dir}/lexicon.db"

g2p:
model: !env "${config_dir}/g2p.fst"

symbols:
casing: "lower"
number_regex: "^-?\\d+([,.]\\d+)*$"
token_split: "\\s+"
token_join: " "
minor_breaks:
- ","
- ":"
- ";"
- "..."
major_breaks:
- "."
- "?"
- "!"
replace:
"[\\<\\>\\(\\)\\[\\]\"]+": ""
"\\B'": "\""
"'\\B": "\""
"’": "'"
"'": ""
"-": ""
"l·l": "l"
punctuations:
- "\""
- "„"
- "“"
- "”"
- "«"
- "»"
- ","
- ":"
- ";"
- "."
- "?"
- "¿"
- "!"
- "¡"
Binary file added data/ca-ba/lexicon.db
Binary file not shown.
44 changes: 44 additions & 0 deletions data/ca-ba/phonemes.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# https://en.wikipedia.org/wiki/Catalan_phonology
# Catalan phonemes

p [p]ala
b [b]ala
t [t]ela
d [d]onar
k [k]ala
ɡ [g]ala
m [m]ala
ɲ fa[ng]
β aca[b]a
ð ca[d]a
ɣ ama[g]ar
f [f]als
v a[f]ganès
s [s]ala
z ca[s]a
ʃ [x]oc
ʒ mà[g]ic
tʃ co[tx]e
dʒ me[tg]e
l [l]íquid
ʎ [ll]amp
r ca[rr]o
ɾ ca[r]a
w ve[u]en
uw ca[u]re
j ca[i]re
y [i]a[i]a
n [n]ena
ŋ pi[n]güí
ts po[ts]er
dz do[tz]e

# Vowels
i r[i]c
e c[e]c
ɛ s[e]c
a s[a]c
ɔ f[o]c
o s[ó]c
u s[u]c
ə [a]mor
Binary file added data/ca-ce/g2p/model.crf
Binary file not shown.
50 changes: 50 additions & 0 deletions data/ca-ce/language.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---

language:
name: "Central Catalan"
code: "ca-ce"
phonemes: !env "${config_dir}/phonemes.txt"
keep_stress: true

lexicon: !env "${config_dir}/lexicon.db"

g2p:
model: !env "${config_dir}/g2p.fst"

symbols:
casing: "lower"
number_regex: "^-?\\d+([,.]\\d+)*$"
token_split: "\\s+"
token_join: " "
minor_breaks:
- ","
- ":"
- ";"
- "..."
major_breaks:
- "."
- "?"
- "!"
replace:
"[\\<\\>\\(\\)\\[\\]\"]+": ""
"\\B'": "\""
"'\\B": "\""
"’": "'"
"'": ""
"-": ""
"l·l": "l"
punctuations:
- "\""
- "„"
- "“"
- "”"
- "«"
- "»"
- ","
- ":"
- ";"
- "."
- "?"
- "¿"
- "!"
- "¡"
Binary file added data/ca-ce/lexicon.db
Binary file not shown.
44 changes: 44 additions & 0 deletions data/ca-ce/phonemes.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# https://en.wikipedia.org/wiki/Catalan_phonology
# Catalan phonemes

p [p]ala
b [b]ala
t [t]ela
d [d]onar
k [k]ala
ɡ [g]ala
m [m]ala
ɲ fa[ng]
β aca[b]a
ð ca[d]a
ɣ ama[g]ar
f [f]als
v a[f]ganès
s [s]ala
z ca[s]a
ʃ [x]oc
ʒ mà[g]ic
tʃ co[tx]e
dʒ me[tg]e
l [l]íquid
ʎ [ll]amp
r ca[rr]o
ɾ ca[r]a
w ve[u]en
uw ca[u]re
j ca[i]re
y [i]a[i]a
n [n]ena
ŋ pi[n]güí
ts po[ts]er
dz do[tz]e

# Vowels
i r[i]c
e c[e]c
ɛ s[e]c
a s[a]c
ɔ f[o]c
o s[ó]c
u s[u]c
ə [a]mor
Binary file added data/ca-no/g2p/model.crf
Binary file not shown.
50 changes: 50 additions & 0 deletions data/ca-no/language.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---

language:
name: "Nord-Occidental Catalan"
code: "ca-no"
phonemes: !env "${config_dir}/phonemes.txt"
keep_stress: true

lexicon: !env "${config_dir}/lexicon.db"

g2p:
model: !env "${config_dir}/g2p.fst"

symbols:
casing: "lower"
number_regex: "^-?\\d+([,.]\\d+)*$"
token_split: "\\s+"
token_join: " "
minor_breaks:
- ","
- ":"
- ";"
- "..."
major_breaks:
- "."
- "?"
- "!"
replace:
"[\\<\\>\\(\\)\\[\\]\"]+": ""
"\\B'": "\""
"'\\B": "\""
"’": "'"
"'": ""
"-": ""
"l·l": "l"
punctuations:
- "\""
- "„"
- "“"
- "”"
- "«"
- "»"
- ","
- ":"
- ";"
- "."
- "?"
- "¿"
- "!"
- "¡"
Binary file added data/ca-no/lexicon.db
Binary file not shown.
44 changes: 44 additions & 0 deletions data/ca-no/phonemes.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# https://en.wikipedia.org/wiki/Catalan_phonology
# Catalan phonemes

p [p]ala
b [b]ala
t [t]ela
d [d]onar
k [k]ala
ɡ [g]ala
m [m]ala
ɲ fa[ng]
β aca[b]a
ð ca[d]a
ɣ ama[g]ar
f [f]als
v a[f]ganès
s [s]ala
z ca[s]a
ʃ [x]oc
ʒ mà[g]ic
tʃ co[tx]e
dʒ me[tg]e
l [l]íquid
ʎ [ll]amp
r ca[rr]o
ɾ ca[r]a
w ve[u]en
uw ca[u]re
j ca[i]re
y [i]a[i]a
n [n]ena
ŋ pi[n]güí
ts po[ts]er
dz do[tz]e

# Vowels
i r[i]c
e c[e]c
ɛ s[e]c
a s[a]c
ɔ f[o]c
o s[ó]c
u s[u]c
ə [a]mor
Binary file added data/ca-va/g2p/model.crf
Binary file not shown.
50 changes: 50 additions & 0 deletions data/ca-va/language.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---

language:
name: "Valencià Catalan"
code: "ca-va"
phonemes: !env "${config_dir}/phonemes.txt"
keep_stress: true

lexicon: !env "${config_dir}/lexicon.db"

g2p:
model: !env "${config_dir}/g2p.fst"

symbols:
casing: "lower"
number_regex: "^-?\\d+([,.]\\d+)*$"
token_split: "\\s+"
token_join: " "
minor_breaks:
- ","
- ":"
- ";"
- "..."
major_breaks:
- "."
- "?"
- "!"
replace:
"[\\<\\>\\(\\)\\[\\]\"]+": ""
"\\B'": "\""
"'\\B": "\""
"’": "'"
"'": ""
"-": ""
"l·l": "l"
punctuations:
- "\""
- "„"
- "“"
- "”"
- "«"
- "»"
- ","
- ":"
- ";"
- "."
- "?"
- "¿"
- "!"
- "¡"
Binary file added data/ca-va/lexicon.db
Binary file not shown.
44 changes: 44 additions & 0 deletions data/ca-va/phonemes.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# https://en.wikipedia.org/wiki/Catalan_phonology
# Catalan phonemes

p [p]ala
b [b]ala
t [t]ela
d [d]onar
k [k]ala
ɡ [g]ala
m [m]ala
ɲ fa[ng]
β aca[b]a
ð ca[d]a
ɣ ama[g]ar
f [f]als
v a[f]ganès
s [s]ala
z ca[s]a
ʃ [x]oc
ʒ mà[g]ic
tʃ co[tx]e
dʒ me[tg]e
l [l]íquid
ʎ [ll]amp
r ca[rr]o
ɾ ca[r]a
w ve[u]en
uw ca[u]re
j ca[i]re
y [i]a[i]a
n [n]ena
ŋ pi[n]güí
ts po[ts]er
dz do[tz]e

# Vowels
i r[i]c
e c[e]c
ɛ s[e]c
a s[a]c
ɔ f[o]c
o s[ó]c
u s[u]c
ə [a]mor
1 change: 1 addition & 0 deletions gruut-lang-ca/LANGUAGE
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ca-ce Catalan
3 changes: 3 additions & 0 deletions gruut-lang-ca/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# gruut Catalan

Language-specific files for Catalan (ca) in [gruut](https://github.com/rhasspy/gruut)
1 change: 1 addition & 0 deletions gruut-lang-ca/gruut_lang_ca/VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.0.0
Loading