Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

North sámi testdata in SMJ #103

Open
ilm024 opened this issue May 24, 2024 · 7 comments
Open

North sámi testdata in SMJ #103

ilm024 opened this issue May 24, 2024 · 7 comments
Assignees

Comments

@ilm024
Copy link
Contributor

ilm024 commented May 24, 2024

SMJ make check is failing:

FAIL: accept-all-lemmas.sh
============================================================================
Testsuite summary for Giella smj 0.2.0
============================================================================
# TOTAL: 2
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
============================================================================
See tools/spellcheckers/test/fstbased/desktop/hfst/test-suite.log
Please report to feedback@divvun.no

It seams like it north sámi test data in SMJ:

=====================================================================================
   Giella smj 0.2.0: tools/spellcheckers/test/fstbased/desktop/hfst/test-suite.log
=====================================================================================

# TOTAL: 2
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: accept-all-lemmas.sh
==========================

"Áváhårsa" is NOT in the lexicon:
"Helmuk" is NOT in the lexicon:
"Kr.å" is NOT in the lexicon:
"Kr.m" is NOT in the lexicon:
"Mančuria" is NOT in the lexicon:
"MuVá" is NOT in the lexicon:
"Tearbmasymposia" is NOT in the lexicon:
"Vuottnánáhpe" is NOT in the lexicon:
"áhpeguollebivdár" is NOT in the lexicon:
"álggididdje" is NOT in the lexicon:
see rejected_lemmas.txt for more
FAIL accept-all-lemmas.sh (exit status: 1)
@flammie
Copy link
Contributor

flammie commented Jun 4, 2024

ok, so the words are:

"áhpeguollebivdár" is NOT in the lexicon:
"álggididdje" is NOT in the lexicon:
"almasjlasj" is NOT in the lexicon:
"Áváhårsa" is NOT in the lexicon:
"avtl." is NOT in the lexicon:
"bba" is NOT in the lexicon:
"boajto" is NOT in the lexicon:
"buojk" is NOT in the lexicon:
"Četčenia" is NOT in the lexicon:
"dárrolasj" is NOT in the lexicon:
"do" is NOT in the lexicon:
"dub" is NOT in the lexicon:
"dus" is NOT in the lexicon:
"ednamlasj" is NOT in the lexicon:
"fárrolasj" is NOT in the lexicon:
"fylkkasuohkanlasj" is NOT in the lexicon:
"færtguhti" is NOT in the lexicon:
"gájkkasasj" is NOT in the lexicon:
"gáktse" is NOT in the lexicon:
"gávo" is NOT in the lexicon:
"goabbák guojmme" is NOT in the lexicon:
"guhtik guojmme" is NOT in the lexicon:
"guoktajuodevidálågåk" is NOT in the lexicon:
"guoktajuohtevidálågåk" is NOT in the lexicon:
"guollebivdár" is NOT in the lexicon:
"gånågislasj" is NOT in the lexicon:
"háldaduslasj" is NOT in the lexicon:
"Helmuk" is NOT in the lexicon:
"huom" is NOT in the lexicon:
"iesjguhti" is NOT in the lexicon:
"jav" is NOT in the lexicon:
"j.d" is NOT in the lexicon:
"jd" is NOT in the lexicon:
"jdd" is NOT in the lexicon:
"j.d.s" is NOT in the lexicon:
"j.e" is NOT in the lexicon:
"je" is NOT in the lexicon:
"jed" is NOT in the lexicon:
"j.i" is NOT in the lexicon:
"j.n.v" is NOT in the lexicon:
"j.s" is NOT in the lexicon:
"jsg." is NOT in the lexicon:
"Kr.m" is NOT in the lexicon:
"Kr.å" is NOT in the lexicon:
"labun" is NOT in the lexicon:
"lájbbár" is NOT in the lexicon:
"låbdun" is NOT in the lexicon:
"lågenan" is NOT in the lexicon:
"lågenanvuostas" is NOT in the lexicon:
"låptun" is NOT in the lexicon:
"Mančuria" is NOT in the lexicon:
"materiáladahtes" is NOT in the lexicon:
"miljo" is NOT in the lexicon:
"MuVá" is NOT in the lexicon:
"måjo" is NOT in the lexicon:
"måtso" is NOT in the lexicon:
"niellje" is NOT in the lexicon:
"nubbe nubbe" is NOT in the lexicon:
"sadj" is NOT in the lexicon:
"sahtemus" is NOT in the lexicon:
"sebrudaklasj" is NOT in the lexicon:
"su" is NOT in the lexicon:
"suohkanlasj" is NOT in the lexicon:
"såbadimahtes" is NOT in the lexicon:
"Tearbmasymposia" is NOT in the lexicon:
"tjábbámus" is NOT in the lexicon:
"ulmusjlasj" is NOT in the lexicon:
"Vuottnánáhpe" is NOT in the lexicon:
"ålleslasj" is NOT in the lexicon:
"åss" is NOT in the lexicon:

this is the distribution in lexc files:

$ for w in $(cat tools/spellcheckers/test/fstbased/desktop/hfst/rejected_lemmas.txt | sed -e 's/^"//' -e 's/" is NOT.*//') ; do egrep "^$w\+" src/fst/morphology/stems/*; done
src/fst/morphology/stems/nouns.lexc:áhpeguollebivdár+N+Err/Der+CmpN/SgN+CmpN/SgG+CmpN/PlG+Sem/Hum:áhpe#guolle#bivdár GAHPER ;
src/fst/morphology/stems/nouns.lexc:álggididdje+N+Err/Der+CmpN/SgN+CmpN/SgG+CmpN/PlG+CmpN/SgNomLeft+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Hum:álggididdje ACTOR ; !No verb "álggidit", so álggididdje isn't possible
src/fst/morphology/stems/adjectives.lexc:almasjlasj+A+Err/Der+CmpN/SgN+CmpN/PlG:almasjl DÁRBULASJ ;
src/fst/morphology/stems/smj-propernouns.lexc:Áváhårsa+Use/-Spell:Ává^hårsa MARJA-plc ; !
src/fst/morphology/stems/smj-abbreviations.lexc:avtl.+N:avtalåhko ab-dot-noun-itrab ;
src/fst/morphology/stems/smj-abbreviations.lexc:bba+N:bårråmbassti ab-dot-noun-itrab ;!bårråmbassti
src/fst/morphology/stems/adjectives.lexc:boajto+A:boajto VINJO- ;
src/fst/morphology/stems/smj-abbreviations.lexc:buojk+Adv:buojk ab-dot-adv-trab ; ! buojkulvis/vissan
src/fst/morphology/stems/smj-propernouns.lexc:Četčenia+Use/-Spell+OLang/SME:Četčenia ACCRA-plc ;
src/fst/morphology/stems/nouns.lexc:dárrolasj+N+Err/Der+CmpN/SgN+CmpN/SgG+CmpN/PlG+Sem/Hum:dárrol BERULASJ ; !should be dárulasj? It's more ok with dárrolasj than dárrulasj
src/fst/morphology/stems/smj-abbreviations.lexc:do+N:do ab-dot-adv-itrab ; !hæ?
src/fst/morphology/stems/smj-abbreviations.lexc:dub+Adv:dub ab-dot-adv-itrab ; !hæ?
src/fst/morphology/stems/smj-abbreviations.lexc:dus+Adv:dus ab-dot-adv-itrab ; !hæ?
src/fst/morphology/stems/adjectives.lexc:ednamlasj+A+Err/Der+CmpN/SgN+CmpN/PlG:ednamladtj ÅLLAGASJ ;
src/fst/morphology/stems/adjectives.lexc:ednamlasj+A+Err/Der:ednamladtj ÅLLAGASJ ; !ulikestavleses subtsantiv får ikke -lasj derivasjon
src/fst/morphology/stems/nouns.lexc:fárrolasj+N+Err/Der+CmpN/SgN+CmpN/SgG+CmpN/PlG+Sem/Hum:fárrol BERULASJ ; !should be fárulasj, but more ok with fárrolasj than fárrulasj
src/fst/morphology/stems/adjectives.lexc:fylkkasuohkanlasj+A+Err/Der:fylkka#suohkanl METÅVDÅLASJ;
src/fst/morphology/stems/pronouns.lexc:færtguhti+Pron+Indef:færtge%> guhtikobl ;
src/fst/morphology/stems/pronouns.lexc:færtguhti+Pron+Indef+Sg+Nom+Foc/Pos-k:fært#guhti%>k # ;
src/fst/morphology/stems/pronouns.lexc:færtguhti+Pron+Indef+Sg+Nom+Foc/Neg-k:fært#guhti%>k # ;
src/fst/morphology/stems/pronouns.lexc:færtguhti+Pron+Indef+Sg+Ine+Foc/Pos-k:fært#gænºna%>k # ;
src/fst/morphology/stems/pronouns.lexc:færtguhti+Pron+Indef+Sg+Ine+Foc/Neg-k:fært#gænºna%>k # ;
src/fst/morphology/stems/pronouns.lexc:færtguhti+Pron+Indef+Sg+Ine+Foc/Pos-k+Use/NG:fært#gænºna%>nik # ; ! 
src/fst/morphology/stems/pronouns.lexc:færtguhti+Pron+Indef+Sg+Ine+Foc/Neg-k+Use/NG:fært#gænºna%>nik # ; ! 
src/fst/morphology/stems/pronouns.lexc:færtguhti+Pron+Indef+Sg+Ela+Foc/Pos-k:fært#gæssta%>k # ;
src/fst/morphology/stems/pronouns.lexc:færtguhti+Pron+Indef+Sg+Ela+Foc/Neg-k:fært#gæssta%>k # ;
src/fst/morphology/stems/pronouns.lexc:færtguhti+Pron+Indef+Sg+Ela+Foc/Pos-k+Use/NG:fært#gæssta%>stik # ;   ! 
src/fst/morphology/stems/pronouns.lexc:færtguhti+Pron+Indef+Sg+Ela+Foc/Neg-k+Use/NG:fært#gæssta%>stik # ;   ! 
src/fst/morphology/stems/pronouns.lexc:færtguhti+Pron+Indef+Pl+Nom+Foc/Pos-k:fært#gudi%>k # ;
src/fst/morphology/stems/pronouns.lexc:færtguhti+Pron+Indef+Pl+Nom+Foc/Neg-k:fært#gudi%>k # ;
src/fst/morphology/stems/pronouns.lexc:gájkkasasj+Pron+Indef+Err/Orth:gájkkasa juohkkahasjcase ; !
src/fst/morphology/stems/numerals.lexc:gáktse+Err/Orth+Use/-Spell+Use/Marg+Use/NG:gáktse# NLX ; !Err/Orth?
src/fst/morphology/stems/adjectives.lexc:gávo+A:gávo VINJO- ;
src/fst/morphology/stems/nouns.lexc:guojmme+N+CmpN/SgN+CmpN/SgG+CmpN/PlG+Sem/Hum:guojmme MUORRA ; ! 
src/fst/morphology/stems/nouns.lexc:guojmme+N+CmpN/SgN+CmpN/SgG+CmpN/PlG+Sem/Hum:guojmme MUORRA ; ! 
src/fst/morphology/stems/numerals.lexc:guoktajuodevidálågåk+Num:guok#tjuode#vidá#lågåg9 VUOSTASJ ;
src/fst/morphology/stems/numerals.lexc:guoktajuohtevidálågåk+Num:guok#tjuohte#vidá#lågåg9 VUOSTASJ ;
src/fst/morphology/stems/nouns.lexc:guollebivdár+N+Err/Der+CmpN/SgN+CmpN/SgG+CmpN/PlG+Sem/Hum:guolle#bivdár GAHPER ; ! used derivation for contraced verb, when verb is even, bivddet-bivdde
src/fst/morphology/stems/adjectives.lexc:gånågislasj+A+Err/Der:gånågisladtj ÅLLAGASJ ;
src/fst/morphology/stems/adjectives.lexc:háldaduslasj+A+Err/Der:háldadusladtj ÅLLAGASJ ; !feil
src/fst/morphology/stems/smj-propernouns.lexc:Helmuk+Use/-Spell:Helmug9 LONDON-plc ; !
src/fst/morphology/stems/smj-abbreviations.lexc:huom+Adv:huom ab-dot-adv-trab ; !huomaha
src/fst/morphology/stems/pronouns.lexc:iesjguhti+Pron+Indef:iesj#ge%> guhtikobl ;
src/fst/morphology/stems/pronouns.lexc:iesjguhti+Pron+Indef+Pl+Nom+Foc/Neg-k:iesj#gudi%>k # ;
src/fst/morphology/stems/pronouns.lexc:iesjguhti+Pron+Indef+Pl+Nom+Foc/Pos-k:iesj#gudi%>k # ;
src/fst/morphology/stems/pronouns.lexc:iesjguhti+Pron+Indef+Sg+Ela+Foc/Neg-k+Use/NG:iesj#gæstá%>stik # ;  ! 
src/fst/morphology/stems/pronouns.lexc:iesjguhti+Pron+Indef+Sg+Ela+Foc/Pos-k+Use/NG:iesj#gæstá%>stik # ;  ! 
src/fst/morphology/stems/pronouns.lexc:iesjguhti+Pron+Indef+Sg+Ela+Foc/Neg-k:iesj#gæssta%>k # ; 
src/fst/morphology/stems/pronouns.lexc:iesjguhti+Pron+Indef+Sg+Ela+Foc/Pos-k:iesj#gæssta%>k # ; 
src/fst/morphology/stems/pronouns.lexc:iesjguhti+Pron+Indef+Sg+Ine+Foc/Neg-k+Use/NG:iesj#gænºna%>nik # ; ! 
src/fst/morphology/stems/pronouns.lexc:iesjguhti+Pron+Indef+Sg+Ine+Foc/Pos-k+Use/NG:iesj#gænºna%>nik # ; ! 
src/fst/morphology/stems/pronouns.lexc:iesjguhti+Pron+Indef+Sg+Ine+Foc/Neg-k:iesj#gænºna%>k # ;
src/fst/morphology/stems/pronouns.lexc:iesjguhti+Pron+Indef+Sg+Ine+Foc/Pos-k:iesj#gænºna%>k # ;
src/fst/morphology/stems/pronouns.lexc:iesjguhti+Pron+Indef+Sg+Nom+Foc/Neg-k:iesj#guhti%>k # ;
src/fst/morphology/stems/pronouns.lexc:iesjguhti+Pron+Indef+Sg+Nom+Foc/Pos-k:iesj#guhti%>k # ;
src/fst/morphology/stems/pronouns.lexc:iesjguhti+Pron+Indef+Attr+Foc/Neg-k:iesj#guhti%>k # ; !OBS
src/fst/morphology/stems/pronouns.lexc:iesjguhti+Pron+Indef+Attr+Foc/Pos-k:iesj#guhti%>k # ; !OBS
src/fst/morphology/stems/pronouns.lexc:iesjguhti+Pron+Indef+Attr:iesj#guhti%>k # ; ! double, harmonised with sme
src/fst/morphology/stems/smj-abbreviations.lexc:j.d+Adv:j.d ab-dot-adv-itrab ; !ja% dakkára
src/fst/morphology/stems/smj-abbreviations.lexc:jdd+Adv:jdd ab-dot-adv-itrab ; !hæ?
src/fst/morphology/stems/smj-abbreviations.lexc:jed+Adv:jed ab-dot-adv-itrab ; !hæ?
src/fst/morphology/stems/smj-abbreviations.lexc:jd+Adv:jd ab-dot-adv-itrab ;   !hæ?
src/fst/morphology/stems/smj-abbreviations.lexc:jdd+Adv:jdd ab-dot-adv-itrab ; !hæ?
src/fst/morphology/stems/nouns.lexc:judos+N+CmpN/SgN+CmpN/SgG+CmpN/PlG+Sem/Dummytag:juhtos ÅRES ;
src/fst/morphology/stems/nouns.lexc:jådos+N+Sem/Act:jåhtos ÅRES ;
src/fst/morphology/stems/nouns.lexc:jådås+N+Sem/Dummytag:jåhtås GÁMAS ;
src/fst/morphology/stems/smj-abbreviations.lexc:j.d.s+Adv:j.d.s ab-dot-adv-itrab ; !hæ?
src/fst/morphology/stems/smj-abbreviations.lexc:j.e+Adv:j.e ab-dot-adv-itrab ; !hæ
src/fst/morphology/stems/smj-abbreviations.lexc:je+Adv:je ab-dot-adv-itrab ;   !hæ?
src/fst/morphology/stems/smj-abbreviations.lexc:jed+Adv:jed ab-dot-adv-itrab ; !hæ?
src/fst/morphology/stems/smj-abbreviations.lexc:j.i+Adv:j.i ab-dot-adv-itrab ; !ja ienep
src/fst/morphology/stems/smj-abbreviations.lexc:j.n.v+Adv:j.n.v ab-dot-adv-itrab ; !ja nav vijdábun !
src/fst/morphology/stems/smj-abbreviations.lexc:j.s+Adv:j.s ab-dot-adv-itrab ; !hæ?
src/fst/morphology/stems/smj-abbreviations.lexc:jsg.+N:julevsámegiella ab-dot-noun-itrab ;
src/fst/morphology/stems/smj-abbreviations.lexc:Kr.m+Adv+Sem/Time:Kr.m ab-dot-adv-itrab ;
src/fst/morphology/stems/smj-abbreviations.lexc:Kr.å+Adv+Sem/Time:Kr.å ab-dot-adv-itrab ;
src/fst/morphology/stems/nouns.lexc:labun+N+Sem/Dummytag+Err/Der:labun GAHPER ; ! bad der?
src/fst/morphology/stems/nouns.lexc:lájbbár+N+Err/Der+CmpN/SgN+CmpN/SgG+CmpN/PlG+Sem/Hum:lájbbár GUOLLÁR ; ! !Feil, baker er "lájbbo", her har ordboksforfatterne gjort feil når de har laget en avledning
src/fst/morphology/stems/nouns.lexc:låbdun+N+Err/Der+Sem/Ctain:låbdun GAHPER ; ! contraced stems don't make NomInstr, no verb låbddot
src/fst/morphology/stems/numerals.lexc:lågenanvuostas+v1+A+Ord+Err/Orth:lågenan#vuostas VUOSTASJ ;
src/fst/morphology/stems/nouns.lexc:låptun+N+Err/Der+CmpN/SgN+CmpN/SgG+CmpN/PlG+Sem/Obj:bevkun GAHPER ; !låpptit can't make this derivation
src/fst/morphology/stems/nouns.lexc:låptun+N+Err/Der+Sem/Ctain:låptun GAHPER ; ! contraced stems don't make NomInstr, no verb låpptot
src/fst/morphology/stems/smj-propernouns.lexc:Mančuria+Use/-Spell+OLang/SME:Mančuria ACCRA-plc ;
src/fst/morphology/stems/adjectives.lexc:materiáladahtes+A+Err/Der:materi^álad DIEHTEMAHTES ;
src/fst/morphology/stems/smj-abbreviations.lexc:miljo+N:miljo ab-dot-num ; !millijåvnnå
src/fst/morphology/stems/smj-acronyms.lexc:MuVá+N+Prop+Sem/Org+ACR+Err/Orth:MuVá ACRO_cons ;  !  - propername according to čállinrávvagat
src/fst/morphology/stems/adjectives.lexc:måjo+A+CmpN/SgN+CmpN/PlG:måjo VINJO- ;
src/fst/morphology/stems/adjectives.lexc:måtso+A:måtso VINJO- ;
src/fst/morphology/stems/numerals.lexc:niellje+Err/Orth+Use/-Spell+Use/Marg+Use/NG:niellje# NLX ; !Err/Orth?
src/fst/morphology/stems/numerals.lexc:nubbe+A+Ord:nupp nubbecase ;
src/fst/morphology/stems/numerals.lexc:nubbe+A+Ord+Sg+Nom:nubbe%> K ;
src/fst/morphology/stems/numerals.lexc:nubbe+A+Ord+Ess:nubbe%>n K ;
src/fst/morphology/stems/numerals.lexc:nubbe+A+Ord+Sg+Ill:nubbá%>j K ;
src/fst/morphology/stems/numerals.lexc:nubbe+A+Ord+Cmp/SgGen:nuppe%> NUMERALCOMPOUNDS ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Recipr+Ess:nubbe%>n K-CONS ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Recipr+Par:nuppe%>t # ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Recipr+Sg+Ill:nubbá%>j K-CONS ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Recipr+Sg+Nom:nubbe%> K-VOW ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Recipr+Attr:nubbe%> K-VOW ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Recipr:nupp nubbecase ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Indef+Ess:nubbe%>n K-CONS ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Indef+Par:nuppe%>t # ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Indef+Sg+Ill:nubbá%>j K-CONS ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Indef+Sg+Nom:nubbe%> K-VOW ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Indef+Attr:nuppe K-VOW ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Indef:nupp nubbecase ;
src/fst/morphology/stems/numerals.lexc:nubbe+A+Ord:nupp nubbecase ;
src/fst/morphology/stems/numerals.lexc:nubbe+A+Ord+Sg+Nom:nubbe%> K ;
src/fst/morphology/stems/numerals.lexc:nubbe+A+Ord+Ess:nubbe%>n K ;
src/fst/morphology/stems/numerals.lexc:nubbe+A+Ord+Sg+Ill:nubbá%>j K ;
src/fst/morphology/stems/numerals.lexc:nubbe+A+Ord+Cmp/SgGen:nuppe%> NUMERALCOMPOUNDS ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Recipr+Ess:nubbe%>n K-CONS ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Recipr+Par:nuppe%>t # ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Recipr+Sg+Ill:nubbá%>j K-CONS ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Recipr+Sg+Nom:nubbe%> K-VOW ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Recipr+Attr:nubbe%> K-VOW ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Recipr:nupp nubbecase ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Indef+Ess:nubbe%>n K-CONS ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Indef+Par:nuppe%>t # ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Indef+Sg+Ill:nubbá%>j K-CONS ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Indef+Sg+Nom:nubbe%> K-VOW ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Indef+Attr:nuppe K-VOW ;
src/fst/morphology/stems/pronouns.lexc:nubbe+Pron+Indef:nupp nubbecase ;
src/fst/morphology/stems/smj-abbreviations.lexc:sadj+A:sadj ab-dot-adj-trab ; !sadjásasj
src/fst/morphology/stems/adjectives.lexc:sahtemus+A+Err/Der:sahte TJAVGGÁMUS ; !must be sademus (generated by sadep)
src/fst/morphology/stems/adjectives.lexc:sebrudaklasj+A+Err/Der:sebrudahkaladtj ÅLLAGASJ ;
src/fst/morphology/stems/adjectives.lexc:sebrudaklasj+A+Err/Der:sebrudakladtj ÅLLAGASJ ;
src/fst/morphology/stems/smj-abbreviations.lexc:su+Adv:su ab-dot-adv-numnoab ; ! La stå ! hæ?
src/fst/morphology/stems/adjectives.lexc:suohkanlasj+A+Err/Der:suohkanl METÅVDÅLASJ;
src/fst/morphology/stems/adjectives.lexc:såbadimahtes+A+Err/Der+CmpN/SgN+CmpN/PlG:såbadim DIEHTEMAHTES ;
src/fst/morphology/stems/smj-propernouns.lexc:Tearbmasymposia+Use/-Spell+OLang/SME:Tearbmasymposia ACCRA-obj ;
src/fst/morphology/stems/adjectives.lexc:tjábbámus+A+Err/Der:tjábbá TJAVGGÁMUS ; !must be tjáppámus (generated by tjáppep)
src/fst/morphology/stems/adjectives.lexc:ulmusjlasj+A+Err/Der+CmpN/SgN+CmpN/PlG:ulmusjl DÁRBULASJ ;
src/fst/morphology/stems/smj-propernouns.lexc:Vuottnánáhpe+Use/-Spell:Vuottnán^áhpe MARJA-plc ; !
src/fst/morphology/stems/adjectives.lexc:ålleslasj+A+Err/Der:ållesl DÁRBULASJ ;
src/fst/morphology/stems/smj-abbreviations.lexc:åss+N:åss    ab-dot-noun-itrab ; !åssudahka

I'm thinking we can exclude +Err/Der and ab-dot and VINJO- from testing? Then what is left is:

"færtguhti" is NOT in the lexicon:
"goabbák guojmme" is NOT in the lexicon:
"guhtik guojmme" is NOT in the lexicon:
"guoktajuodevidálågåk" is NOT in the lexicon:
"guoktajuohtevidálågåk" is NOT in the lexicon:
"iesjguhti" is NOT in the lexicon:
"lågenan" is NOT in the lexicon:
"nubbe nubbe" is NOT in the lexicon:

@snomos
Copy link
Member

snomos commented Jun 4, 2024

I'm thinking we can exclude +Err/Der and ab-dot and VINJO- from testing?

By default everything containing +Err/ should be removed from testing, so if it is not, that is a bug that needs to be investigated. Could it be that +Err/Der is not defined in root.lexc?

And it makes sense to also exclude VINJO- from testing.

@ilm024
Copy link
Contributor Author

ilm024 commented Jun 4, 2024

"færtguhti" is NOT in the lexicon:
"goabbák guojmme" is NOT in the lexicon:
"guhtik guojmme" is NOT in the lexicon:
"guoktajuodevidálågåk" is NOT in the lexicon:
"guoktajuohtevidálågåk" is NOT in the lexicon:
"iesjguhti" is NOT in the lexicon:
"lågenan" is NOT in the lexicon:
"nubbe nubbe" is NOT in the lexicon:

"færtguhti" is NOT in the lexicon:> not a word, maybe "færtguhtik"
"guoktajuodevidálågåk" is NOT in the lexicon: > typo in test? "guoktatjuodevidálågåk"
"guoktajuohtevidálågåk" is NOT in the lexicon: > typo in test? "guoktatjuohtevidálågåk"
"iesjguhti" is NOT in the lexicon: > not a word, maybe "iesjguhtik"
"lågenan" is NOT in the lexicon: > not a word, works only as cmp

I don't know what to do with MWE:
"nubbe nubbe" is NOT in the lexicon:
"goabbák guojmme" is NOT in the lexicon:
"guhtik guojmme" is NOT in the lexicon:

@flammie
Copy link
Contributor

flammie commented Jun 4, 2024

"færtguhti" is NOT in the lexicon:
"goabbák guojmme" is NOT in the lexicon:
"guhtik guojmme" is NOT in the lexicon:
"guoktajuodevidálågåk" is NOT in the lexicon:
"guoktajuohtevidálågåk" is NOT in the lexicon:
"iesjguhti" is NOT in the lexicon:
"lågenan" is NOT in the lexicon:
"nubbe nubbe" is NOT in the lexicon:

"færtguhti" is NOT in the lexicon:> not a word, maybe "færtguhtik"
"guoktajuodevidálågåk" is NOT in the lexicon: > typo in test? "guoktatjuodevidálågåk"
"guoktajuohtevidálågåk" is NOT in the lexicon: > typo in test? "guoktatjuohtevidálågåk"
"iesjguhti" is NOT in the lexicon: > not a word, maybe "iesjguhtik"

these might be typoes in lexc files? I.e. https://github.com/giellalt/lang-smj/blob/main/src/fst/morphology/stems/pronouns.lexc#L354-L366 https://github.com/giellalt/lang-smj/blob/main/src/fst/morphology/stems/pronouns.lexc#L433-L449 and https://github.com/giellalt/lang-smj/blob/main/src/fst/morphology/stems/numerals.lexc#L643-L650

"lågenan" is NOT in the lexicon: > not a word, works only as cmp
I don't know what to do with MWE: "nubbe nubbe" is NOT in the lexicon: "goabbák guojmme" is NOT in the lexicon: "guhtik guojmme" is NOT in the lexicon:

mm, for now we can manually filter them from testing, I'll use ! NOT-TO-LEMMATEST in lexc comments for this. Naturally the abovementioned can be excluded with same method if its relevant.

@flammie
Copy link
Contributor

flammie commented Jun 4, 2024

I'm thinking we can exclude +Err/Der and ab-dot and VINJO- from testing?

By default everything containing +Err/ should be removed from testing, so if it is not, that is a bug that needs to be investigated.

That might be a good option, the current template only has: --exclude "(CmpN/Only|ShCmp|\+Cmp\/SplitR| Rreal | R | Rnoun |\+V\+|NOT-TO-LEMMATEST)" although notably fixing it won't be applied the most developed language since this line is modified in all languages.

@snomos
Copy link
Member

snomos commented Jun 4, 2024

I'm thinking we can exclude +Err/Der and ab-dot and VINJO- from testing?

By default everything containing +Err/ should be removed from testing, so if it is not, that is a bug that needs to be investigated.

That might be a good option, the current template only has: --exclude "(CmpN/Only|ShCmp|\+Cmp\/SplitR| Rreal | R | Rnoun |\+V\+|NOT-TO-LEMMATEST)" although notably fixing it won't be applied the most developed language since this line is modified in all languages.

I see now that the relevant line in the extract lemma script is not as thorough as it should be regarding noise:

https://github.com/giellalt/giella-core/blob/f73fef326ddd9cabdd9e07f28ba5ddf71ca2d960/scripts/extract-lemmas.sh#L96

This should be fixed.

@snomos
Copy link
Member

snomos commented Jun 4, 2024

This should be fixed.

Done in giellalt/giella-core@3b766fb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants