Skip to content

Commit

Permalink
Fix build errors by adding support for SHARED-SMI
Browse files Browse the repository at this point in the history
  • Loading branch information
snomos committed Oct 2, 2023
1 parent aa6271b commit 4da3c77
Show file tree
Hide file tree
Showing 3 changed files with 97 additions and 22 deletions.
2 changes: 2 additions & 0 deletions configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,8 @@ gt_ENABLE_TARGETS
# Add locally used shares below this line:
gt_USE_SHARED([common], [shared-mul], [giella-shared-mul])
AM_CONDITIONAL([HAVE_SHARED_COMMON], [test x$gt_SHARED_common != xfalse])
gt_USE_SHARED([smi], [shared-smi], [giella-shared-smi])
AM_CONDITIONAL([HAVE_SHARED_SMI], [test x$gt_SHARED_smi != xfalse])


# Locally used shares above this line
Expand Down
108 changes: 87 additions & 21 deletions docs/sju-x-sydlapsk.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ Correction rules
!! __lgRemove__ removes the language tags <sma>, <sme>, etc, before proceeding to the dep file.

* * *
<small>This (part of) documentation was generated from [src/cg3/dependency.cg3](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/src/cg3/dependency.cg3)</small>
<small>This (part of) documentation was generated from [src/cg3/dependency.cg3](https://github.com/giellalt//blob/main/src/cg3/dependency.cg3)</small>
S Y N T A C T I C F U N C T I O N S F O R S Á M I

Sámi language technology project 2003-2018, University of Tromsø #
Expand Down Expand Up @@ -358,6 +358,8 @@ These were the set types.

* __habGenQst__ (<hab> @<ADVL) hab for Gen; in a question sentence. Gen is located sentence initially and SUBJ is found to the right. To the right of SUBJ is copulas

* __habRefl__ # with inf

* __n<titel1__ (@N<) for ("jr") or ("sr"); if first one to the left is Prop

* __n<titel2__ (@N<) for INITIAL; if first one to the left is a noun, or if to the left of you is a single letter which is part of a noun conjunction ''bustávas e ja f gáibiduvvo''
Expand Down Expand Up @@ -622,12 +624,12 @@ These were the set types.
The analysis give double analysis because of optional semtags. We go for the one with semtag.

* * *
<small>This (part of) documentation was generated from [src/cg3/functions.cg3](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/src/cg3/functions.cg3)</small>Adjective inflection
<small>This (part of) documentation was generated from [src/cg3/functions.cg3](https://github.com/giellalt//blob/main/src/cg3/functions.cg3)</small>Adjective inflection
The 18th century Southern Saami language adjectives compare.

* * *

<small>This (part of) documentation was generated from [src/fst/affixes/adjectives.lexc](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/src/fst/affixes/adjectives.lexc)</small>
<small>This (part of) documentation was generated from [src/fst/affixes/adjectives.lexc](https://github.com/giellalt//blob/main/src/fst/affixes/adjectives.lexc)</small>

---

Expand All @@ -636,7 +638,7 @@ The 18th century Southern Saami language nouns inflect in number and cases.

* * *

<small>This (part of) documentation was generated from [src/fst/affixes/nouns.lexc](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/src/fst/affixes/nouns.lexc)</small>
<small>This (part of) documentation was generated from [src/fst/affixes/nouns.lexc](https://github.com/giellalt//blob/main/src/fst/affixes/nouns.lexc)</small>

---

Expand All @@ -646,7 +648,7 @@ nouns, but perhaps with a colon (':') as separator.

* * *

<small>This (part of) documentation was generated from [src/fst/affixes/propernouns.lexc](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/src/fst/affixes/propernouns.lexc)</small>
<small>This (part of) documentation was generated from [src/fst/affixes/propernouns.lexc](https://github.com/giellalt//blob/main/src/fst/affixes/propernouns.lexc)</small>

---

Expand All @@ -655,7 +657,7 @@ nouns, but perhaps with a colon (':') as separator.

* * *

<small>This (part of) documentation was generated from [src/fst/affixes/symbols.lexc](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/src/fst/affixes/symbols.lexc)</small>
<small>This (part of) documentation was generated from [src/fst/affixes/symbols.lexc](https://github.com/giellalt//blob/main/src/fst/affixes/symbols.lexc)</small>

---

Expand All @@ -664,7 +666,7 @@ The 18th century Southern Saami language verbs inflect in persons.

* * *

<small>This (part of) documentation was generated from [src/fst/affixes/verbs.lexc](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/src/fst/affixes/verbs.lexc)</small>
<small>This (part of) documentation was generated from [src/fst/affixes/verbs.lexc](https://github.com/giellalt//blob/main/src/fst/affixes/verbs.lexc)</small>

---

Expand All @@ -683,7 +685,7 @@ The 18th century Southern Saami morphophonological/twolc rules file !

* * *

<small>This (part of) documentation was generated from [src/fst/phonology.twolc](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/src/fst/phonology.twolc)</small>
<small>This (part of) documentation was generated from [src/fst/phonology.twolc](https://github.com/giellalt//blob/main/src/fst/phonology.twolc)</small>

---

Expand Down Expand Up @@ -762,7 +764,7 @@ word classes, or optionally from prefixes:

* * *

<small>This (part of) documentation was generated from [src/fst/root.lexc](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/src/fst/root.lexc)</small>
<small>This (part of) documentation was generated from [src/fst/root.lexc](https://github.com/giellalt//blob/main/src/fst/root.lexc)</small>

---

Expand All @@ -771,7 +773,7 @@ Adjectives in the 18th century Southern Saami language describe the entities nou

* * *

<small>This (part of) documentation was generated from [src/fst/stems/adjectives.lexc](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/src/fst/stems/adjectives.lexc)</small>
<small>This (part of) documentation was generated from [src/fst/stems/adjectives.lexc](https://github.com/giellalt//blob/main/src/fst/stems/adjectives.lexc)</small>

---

Expand All @@ -780,7 +782,7 @@ Nouns in the 18th century Southern Saami language refer to objects or sets of ob

* * *

<small>This (part of) documentation was generated from [src/fst/stems/nouns.lexc](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/src/fst/stems/nouns.lexc)</small>
<small>This (part of) documentation was generated from [src/fst/stems/nouns.lexc](https://github.com/giellalt//blob/main/src/fst/stems/nouns.lexc)</small>

---

Expand All @@ -789,7 +791,7 @@ Numerals in the 18th century Southern Saami language describe a nuerical quantit

* * *

<small>This (part of) documentation was generated from [src/fst/stems/numerals.lexc](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/src/fst/stems/numerals.lexc)</small>
<small>This (part of) documentation was generated from [src/fst/stems/numerals.lexc](https://github.com/giellalt//blob/main/src/fst/stems/numerals.lexc)</small>

---

Expand All @@ -798,7 +800,7 @@ Prefixes in the 18th century Southern Saami language are attatched to the left o

* * *

<small>This (part of) documentation was generated from [src/fst/stems/prefixes.lexc](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/src/fst/stems/prefixes.lexc)</small>
<small>This (part of) documentation was generated from [src/fst/stems/prefixes.lexc](https://github.com/giellalt//blob/main/src/fst/stems/prefixes.lexc)</small>

---

Expand All @@ -807,7 +809,7 @@ Pronouns in the 18th century Southern Saami language are words that may replace

* * *

<small>This (part of) documentation was generated from [src/fst/stems/pronouns.lexc](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/src/fst/stems/pronouns.lexc)</small>
<small>This (part of) documentation was generated from [src/fst/stems/pronouns.lexc](https://github.com/giellalt//blob/main/src/fst/stems/pronouns.lexc)</small>

---

Expand All @@ -816,7 +818,7 @@ Verbs in the 18th century Southern Saami language inflect for tense.

* * *

<small>This (part of) documentation was generated from [src/fst/stems/verbs.lexc](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/src/fst/stems/verbs.lexc)</small>
<small>This (part of) documentation was generated from [src/fst/stems/verbs.lexc](https://github.com/giellalt//blob/main/src/fst/stems/verbs.lexc)</small>

---

Expand Down Expand Up @@ -983,7 +985,7 @@ retracted tongue root _q

* * *

<small>This (part of) documentation was generated from [src/phonetics/txt2ipa.xfscript](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/src/phonetics/txt2ipa.xfscript)</small>
<small>This (part of) documentation was generated from [src/phonetics/txt2ipa.xfscript](https://github.com/giellalt//blob/main/src/phonetics/txt2ipa.xfscript)</small>

---

Expand All @@ -1003,7 +1005,7 @@ For example:

* * *

<small>This (part of) documentation was generated from [src/transcriptions/transcriptor-abbrevs2text.lexc](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/src/transcriptions/transcriptor-abbrevs2text.lexc)</small>
<small>This (part of) documentation was generated from [src/transcriptions/transcriptor-abbrevs2text.lexc](https://github.com/giellalt//blob/main/src/transcriptions/transcriptor-abbrevs2text.lexc)</small>

---

Expand All @@ -1017,7 +1019,7 @@ For example:

* * *

<small>This (part of) documentation was generated from [src/transcriptions/transcriptor-numbers-digit2text.lexc](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/src/transcriptions/transcriptor-numbers-digit2text.lexc)</small>
<small>This (part of) documentation was generated from [src/transcriptions/transcriptor-numbers-digit2text.lexc](https://github.com/giellalt//blob/main/src/transcriptions/transcriptor-numbers-digit2text.lexc)</small>

---

Expand Down Expand Up @@ -1318,7 +1320,7 @@ expression **WORD - premodifiers**.
### Grammarchecker sets

* * *
<small>This (part of) documentation was generated from [tools/grammarcheckers/grammarchecker.cg3](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/tools/grammarcheckers/grammarchecker.cg3)</small># Tokeniser for sju-x-sydlapsk
<small>This (part of) documentation was generated from [tools/grammarcheckers/grammarchecker.cg3](https://github.com/giellalt//blob/main/tools/grammarcheckers/grammarchecker.cg3)</small># Tokeniser for sju-x-sydlapsk

Usage:
```
Expand Down Expand Up @@ -1377,7 +1379,7 @@ Finally we mark as a token any sequence making up a:

* * *

<small>This (part of) documentation was generated from [tools/tokenisers/tokeniser-disamb-gt-desc.pmscript](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/tools/tokenisers/tokeniser-disamb-gt-desc.pmscript)</small>
<small>This (part of) documentation was generated from [tools/tokenisers/tokeniser-disamb-gt-desc.pmscript](https://github.com/giellalt//blob/main/tools/tokenisers/tokeniser-disamb-gt-desc.pmscript)</small>

---

Expand Down Expand Up @@ -1440,7 +1442,71 @@ Finally we mark as a token any sequence making up a:

* * *

<small>This (part of) documentation was generated from [tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript](https://github.com/giellalt/lang-sju-x-sydlapsk/blob/main/tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript)</small>
<small>This (part of) documentation was generated from [tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript](https://github.com/giellalt//blob/main/tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript)</small>

---

# TTS tokenisation for smj

Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc)
Then just:
```sh
make
echo "ja, ja" \
| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
```

More usage examples:
```sh
echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa \
boasttu olmmoš, man mielde lahtuid." \
| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" \
| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
echo "márffibiillagáffe" \
| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
```

Pmatch documentation:
<https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstPmatch>

Characters which have analyses in the lexicon, but can appear without spaces
before/after, that is, with no context conditions, and adjacent to words:
* Punct contains ASCII punctuation marks
* The symbol after m-dash is soft-hyphen `U+00AD`
* The symbol following {•} is byte-order-mark / zero-width no-break space
`U+FEFF`.

Whitespace contains ASCII white space and
the List contains some unicode white space characters
* En Quad U+2000 to Zero-Width Joiner U+200d'
* Narrow No-Break Space U+202F
* Medium Mathematical Space U+205F
* Word joiner U+2060

Apart from what's in our morphology, there are
1) unknown word-like forms, and
2) unmatched strings
We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a
* select extended latin symbols
* select symbols
* various symbols from Private area (probably Microsoft),
so far:
* U+F0B7 for "x in box"

TODO: Could use something like this, but built-in's don't include šžđčŋ:

Simply give an empty reading when something is unknown:
hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and
remove empty analyses from other readings. Empty readings are also
legal in CG, they get a default baseform equal to the wordform, but
no tag to check, so it's safer to let hfst-tokenise handle them.

Needs hfst-tokenise to output things differently depending on the tag they get

* * *

<small>This (part of) documentation was generated from [tools/tokenisers/tokeniser-tts-cggt-desc.pmscript](https://github.com/giellalt//blob/main/tools/tokenisers/tokeniser-tts-cggt-desc.pmscript)</small>

---

9 changes: 8 additions & 1 deletion src/cg3/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,21 @@ endif # WANT_SYNTAX

#################################################
#### Add language-specific build rules here: ####
if HAVE_SHARED_SMI
%.cg3: $(gt_SHARED_smi)/src/cg3/%.cg3
$(AM_V_CP)cp -f $< $@
else
%.cg3:
$(AM_V_GEN)echo "! Missing dependency for $@" > $@
endif

##################################################################
#### END: Add local processing instructions ABOVE this line ######
##################################################################

####### Other targets: ###########
clean-local:
-rm -f *.bin
-rm -f *.bin functions.cg3 dependency.cg3 korp.cg3

####### Build rules via include: ########

Expand Down

0 comments on commit 4da3c77

Please sign in to comment.