-
Notifications
You must be signed in to change notification settings - Fork 1
/
root.lexc
169 lines (142 loc) · 5.22 KB
/
root.lexc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
! Divvun & Giellatekno - open source grammars for Albanian language
! Copyright © 2015 The University of Tromsø & the Norwegian Sámi Parliament
! http://giellatekno.uit.no & http://divvun.no
!
! This program is free software; you can redistribute and/or modify
! this file under the terms of the GNU General Public License as published by
! the Free Software Foundation, either version 3 of the License, or
! (at your option) any later version. The GNU General Public License
! is found at http://www.gnu.org/licenses/gpl.html. It is
! also available in the file $GTHOME/LICENSE.txt.
!
! Other licensing options are available upon request, please contact
! giellatekno@uit.no or feedback@divvun.no
! ========================================================================== !
!! # Albanian morphological analyser !
! ========================================================================== !
!! INTRODUCTION TO MORPHOLOGICAL ANALYSER OF Albanian LANGUAGE.
Multichar_Symbols !!≈ # Definitions for @CODE@
!! ## Analysis symbols
! ----------------
!! The morphological analyses of wordforms for the Albanian
!! language are presented in this system in terms of the following symbols.
!! (It is highly suggested to follow existing standards when adding new tags).
!! The parts-of-speech are:
+N +A +Adv +V !!≈
+Pron +CS +CC +Adp +Po +Pr +Interj +Pcle +Num !!≈
!! The parts of speech are further split up into:
+Prop +Pers +Dem +Interr +Refl +Recipr +Rel +Indef
+Indef
+Def
+Msc
+Fem
+Neu
!! The Usage extents are marked using following tags:
+Err/Orth
+Use/-Spell
!! The nominals are inflected in the following Case and Number
+Sg +Du +Pl
+Abl
+Acc
+Dat
+Gen
+Nom
!! The comparative forms are:
+Comp +Superl
!! Numerals are classified under:
+Attr +Card
+Ord
!! Verb moods are:
+Ind +Prs +Prt +Pot +Cond +Imprt
!! Verb personal forms are:
+Sg1 +Sg2 +Sg3 +Pl1 +Pl2 +Pl3
!! Other verb forms are
+Inf +Ger +ConNeg +ConNegII +Neg +ImprtII +PrsPrc +PrfPrc +Sup +VGen +VAbess
! Abbreviated words are classified with:
+ABBR +ACR
+Symbol !!≈ * @CODE@ = independent symbols in the text stream, like £, €, ©
!! Special symbols are classified with:
+CLB +PUNCT +LEFT +RIGHT
!! The verbs are syntactically split according to transitivity:
+TV +IV
!! Special multiword units are analysed with:
+Multi
!! Non-dictionary words can be recognised with:
+Guess
!! Question and Focus particles:
+Qst +Foc
!! Semantics are classified with
+Sem/Mal
+Sem/Fem
+Sem/Sur
+Plc
+Org
+Obj
+Ani
+Hum
+Plant
+Group
+Time
+Txt
+Route
+Measr
+Wthr
+Build
+Edu
+Veh
+Clth
!! Derivations are classified under the morphophonetic form of the suffix, the
!! source and target part-of-speech.
+V→N +V→V +V→A
+Der/xxx
%>
!! Morphophonology
! ---------------
!! To represent phonologic variations in word forms we use the following
!! symbols in the lexicon files:
{aä} {oö} {uü}
!! And following triggers to control variation
{front} {back}
%^Pen !! penultimate vowel
%^A2E !! fronting/shwa change
!! ## Flag diacritics
!! We have manually optimised the structure of our lexicon using following
!! flag diacritics to restrict morhpological combinatorics - only allow compounds
!! with verbs if the verb is further derived into a noun again:
@P.NeedNoun.ON@ !!≈ | @CODE@ | (Dis)allow compounds with verbs unless nominalised
@D.NeedNoun.ON@ !!≈ | @CODE@ | (Dis)allow compounds with verbs unless nominalised
@C.NeedNoun@ !!≈ | @CODE@ | (Dis)allow compounds with verbs unless nominalised
!!
!! For languages that allow compounding, the following flag diacritics are needed
!! to control position-based compounding restrictions for nominals. Their use is
!! handled automatically if combined with +CmpN/xxx tags. If not used, they will
!! do no harm.
@P.CmpFrst.FALSE@ !!≈ | @CODE@ | Require that words tagged as such only appear first
@D.CmpPref.TRUE@ !!≈ | @CODE@ | Block such words from entering ENDLEX
@P.CmpPref.FALSE@ !!≈ | @CODE@ | Block these words from making further compounds
@D.CmpLast.TRUE@ !!≈ | @CODE@ | Block such words from entering R
@D.CmpNone.TRUE@ !!≈ | @CODE@ | Combines with the next tag to prohibit compounding
@U.CmpNone.FALSE@ !!≈ | @CODE@ | Combines with the prev tag to prohibit compounding
@P.CmpOnly.TRUE@ !!≈ | @CODE@ | Sets a flag to indicate that the word has passed R
@D.CmpOnly.FALSE@ !!≈ | @CODE@ | Disallow words coming directly from root.
!!
!! Use the following flag diacritics to control downcasing of derived proper
!! nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use
!! these flags. There exists a ready-made regex that will do the actual down-casing
!! given the proper use of these flags.
@U.Cap.Obl@ !!≈ | @CODE@ | Allowing downcasing of derived names: deatnulasj.
@U.Cap.Opt@ !!≈ | @CODE@ | Allowing downcasing of derived names: deatnulasj.
LEXICON Root
!! The word forms in Albanian language start from the lexeme roots of basic
!! word classes, or optionally from prefixes:
Nouns ;
Verbs ;
Adjectives ;
Pronouns ;
Numerals ;
Prefixes ;
Punctuation ;
Symbols ;
LEXICON EndLex
# ;
! vim: set ft=xfst-lexc: