-
Notifications
You must be signed in to change notification settings - Fork 118
/
4.0.dict
11296 lines (9826 loc) · 362 KB
/
4.0.dict
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
%***************************************************************************%
% %
% Copyright (C) 1991-1998 Daniel Sleator and Davy Temperley %
% Copyright (c) 2003 Peter Szolovits and MIT. %
% Copyright (c) 2008-2014 Linas Vepstas %
% Copyright (c) 2013 Lian Ruiting %
% %
% See file "README" for information about commercial use of this system %
% %
%***************************************************************************%
% Dictionary version number is 5.3.0 (formatted as V5v3v0+)
<dictionary-version-number>: V5v3v0+;
% _ORGANIZATION OF THE DICTIONARY_
%
% I. NOUNS
% II. PRONOUNS
% III. DETERMINERS
% IV. NUMERICAL EXPRESSIONS
% V. VERBS
% A. Auxiliaries; B. Common verb types; C. complex intransitive verbs;
% D. complex intransitive verbs; E. complex verbs taking [obj] +
% [complement]; F. idiomatic verbs
% VI. PREPOSITIONS
% VII. TIME AND PLACE EXPRESSIONS
% VIII. QUESTION-WORDS AND CONJUNCTIONS
% IX. ADJECTIVES
% X. COMPARATIVES AND SUPERLATIVES
% XI. ADVERBS
% A. Mainly adjectival; B. Mainly post-verbal; C. Post-verbal/pre-verbal;
% D. Post-verbal/pre-verbal/openers; E. Post-verbal/openers;
% F. Pre-verbal/openers
% XII. MISCELLANEOUS WORDS AND PUNCTUATION
%
%
% TODO:
% To-do: many verb simple past-tense forms include ({@E-} & A+) to
% make them adjective-like. Strictly speaking, these should probably
% copied into words.adj.1 and treated like common adjectives, right?
%
% Many nouns in words.n.4 are treated as "mass or count". The side
% effect is that mass nouns are inconsistently treated as sometimes
% singular, sometimes plural. e.g. words.n.3 gets <noun-sub-s> &
% <noun-main-m>. This is a kind-of ugly tangle, it should really
% be sorted out so that links are properly marks as s, p or m.
% This is mostly fixed, except that some uses of <noun-main-m>
% remain, below.
% The empty word is a used in the 2D array used by the parser,
% in "word slots" in which "no word" is a possibility to consider.
% When the Wordgraph is converted ("falttened") to this 2D array,
% empty words are issued whenever needed.
% FIXME: A better comment maybe.
% See also EMPTY-WORD.x for the highly-unusual situation that EMPTY-WORD
% appears in the input text.
EMPTY-WORD.zzz: ZZZ-;
% Quotation marks.
% Unimplemented here yet - behave as empty words.
% TODO: Add ' and ` also as quotation marks.
% For a list see:
% http://en.wikipedia.org/wiki/Quotation_mark_glyphs#Quotation_marks_in_Unicode
« 《 【 『 „: ZZZ-;
» 》 】 』 ` “: ZZZ-;
% For now, using ".x and ".y in the above definitions multiplies the number
% of linkages by 2^(number of "). So it is separated below.
""": ZZZ-;
% Capitalization handling (null effect for now- behave as empty words).
1stCAP.zzz: ZZZ-;
nonCAP.zzz: ZZZ-;
% Null links. These are used to drop the requirement for certain words
% to appear during parsing. Basically, if a parse fails at a given cost,
% it is retried at a higher cost (by raising the disjunct_cost).
% Currently, two different nulls are defined: a no-det-null, and a
% costly null. The no-det-null is used to make determiners optional;
% this allows for the parsing of newspaper headlines and clipped
% technical speech (e.g. medical, engineering, where determiners are
% often dropped). The costly-null is used during panic parsing.
% Currently, both have the same cost: using a less costly null results
% in too many sentences being parsed incorrectly. Oh well.
% Default cost=4. This allows the Russian dicts to use a cost of 3 for
% various things, including regex matches for unknown words. (i.e. panic
% parsing is set to 4 at this time.)
<no-det-null>: [[[[()]]]];
<costly-null>: [[[[()]]]];
% NOUNS
% The marker-entity is used to identify identity names.
% The marker-common-entity is used to identify all common nouns
% and adjectives that might appear in entity names:
% e.g. "Great Southern Federal Bank and Railroad" or "Aluminum Bahrain"
% These markers are used in the Java interfaces, to help identify entities.
<marker-entity>: XXXENTITY+;
<marker-common-entity>: XXXGIVEN+;
% The RJ links connect to "and"; the l,r prevent cross-linking
<clause-conjoin>: RJrc- or RJlc+;
% {@COd-} : "That is the man who, in Joe's opinion, we should hire"
<CLAUSE>: {({@COd-} & (C- or <clause-conjoin>)) or ({@CO-} & (Wd- & {CC+})) or [Rn-]};
<S-CLAUSE>: {({@COd-} & (C- or <clause-conjoin>)) or ({@CO-} & (Wd- & {CC+}))};
<CLAUSE-E>: {({@COd-} & (C- or <clause-conjoin>)) or ({@CO-} & (Wd- or {CC+})) or Re-};
% Post-nominal qualifiers, complete with commas, etc.
% We give these a small cost, so that they dont hide quotational
% complements (i.e. so that "blah blah blah, he said" doesn't
% get the MX link at lower cost than the CP link...)
<post-nominal-x>:
[{[B*j+]} & Xd- & (Xc+ or <costly-null>) & MX-]0.1;
<post-nominal-s>:
[{[Bsj+]} & Xd- & (Xc+ or <costly-null>) & MX-]0.1;
<post-nominal-p>:
[{[Bpj+]} & Xd- & (Xc+ or <costly-null>) & MX-]0.1;
<post-nominal-u>:
[{[Buj+]} & Xd- & (Xc+ or <costly-null>) & MX-]0.1;
% noun-main-x -- singular or plural or mass.
<noun-main-x>:
(S+ & <CLAUSE>) or SI- or J- or O-
or <post-nominal-x>
or <costly-null>;
% noun-main-s -- singular
% XXX FIXME: <noun-main-?> is often used with <noun-sub-?> and sub
% has a R+ & B+ on it. The problem here is that R+ & B+ should not
% be used with the J- here. This needs to be refactored to prevent
% this, or at least, cost it in some way.
<noun-main-s>:
(Ss+ & <CLAUSE>) or SIs- or Js- or Os-
or <post-nominal-s>
or <costly-null>;
% noun-main-p -- plural
<noun-main-p>:
(Sp+ & <CLAUSE>) or SIp- or Jp-
or Op-
or <post-nominal-p>
or <costly-null>;
% noun-main-u -- u == uncountable
% TODO: alter this to use Su+, SIu- someday. likewise Buj+
% Doing this requires adding Su- links to many entries
<noun-main-u>:
(Ss+ & <CLAUSE>) or SIs- or Ju- or Ou-
or <post-nominal-s>
or <costly-null>;
% noun-main-m -- m == mass
% TODO: get rid of this someday.
% To get rid of this, any noun that uses this needs to be split into
% two: the countable form, which will used <noun-main-s> and the
% uncountable form, which will use <noun-main-u>
<noun-main-m>:
(Ss+ & <CLAUSE>) or SIs- or Jp- or Os-
or <post-nominal-s>
or <costly-null>;
% used only for this, that.
% (Jd- & Dmu- & Os-): they have plenty of this
% (Jd- & Dmu- & {Wd-} & Ss+): "not enough of this was used"
% XXX -- is Js- ever really needed?
<noun-main-h>:
(Jd- & Dmu- & Os-)
or (Jd- & Dmu- & {Wd-} & Ss*b+)
or (Ss*b+ & <CLAUSE>) or SIs*b- or [[Js-]] or [Os-]
or <post-nominal-x>
or <costly-null>;
<noun-main2-x>:
J- or O-
or <post-nominal-x>
or <costly-null>;
<noun-main2-s>:
Js- or Os-
or <post-nominal-s>
or <costly-null>;
% Xd- or [[()]] allows parsing of "I have no idea what that is."
% without requiring comma after "idea"
<noun-main2-s-no-punc>:
Js- or Os-
or ({[Bsj+]} & (Xd- or [[()]]) & (Xc+ or <costly-null>) & MX-)
or <costly-null>;
<noun-main2-p>:
Jp- or Op-
or <post-nominal-p>
or <costly-null>;
<noun-main2-m>:
Jp- or Os-
or <post-nominal-s>
or <costly-null>;
% @M+: "The disability of John means he is slow"
<noun-sub-x>: {@M+} & {R+ & B+ & {[[@M+]]}} & {@MX+};
<noun-sub-s>: {@M+} & {R+ & Bs+ & {[[@M+]]}} & {@MXs+};
<noun-sub-p>: {@M+} & {R+ & Bp+ & {[[@M+]]}} & {@MXp+};
% [@AN-].1: add a tiny cost so that A- is preferred to AN- when there
% is a choice. The is because some nouns are also listed as adjectives,
% and we want to use the adjective version A- link in such cases.
% [@AN- & @A-] has cost so that G links are prefered.
% {[@AN-].1} & {@A- & {[[@AN-]]}};
<noun-modifiers>:
(@A- & {[[@AN-]]})
or [@AN-]0.1
or ([[@AN-].1 & @A-] & {[[@AN-]]})
or ();
<nn-modifiers>:
(@A- & {[[@AN-]]})
or [@AN-]0.1
or ([[@AN-].1 & @A-] & {[[@AN-]]});
% conjoined nouns or noun-phrases.
% The l and r prevent two nouns from hooking up directly, they
% must hook up to a conjunction (and, or) in the middle.
% SJl == connect to left
% SJr == connect to right
% SJ*s == singular
% SJ*p == plural
% SJ*u == mass
%
% M+: "gloom of night and heat will not stop me"
% The "of night" can connect to the left noun, but rarely to the right noun
% because it should then connect to the "and", not the right noun.
% but then: "neither heat nor gloom of night shall stop me"
% Looks like only a proper semantic decision can determine the correct parse here ...
%
% Add cost to M+, so that "a number of recommendations and suggestions"
% gets priority in modifying the and.j-n
<noun-and-s>: ({@M+} & SJls+) or ({[@M+]} & SJrs-);
<noun-and-p>: ({[@M+]} & SJlp+) or ({[[@M+]]} & SJrp-);
<noun-and-u>: ({[@M+]} & SJlu+) or ({[[@M+]]} & SJru-);
<noun-and-x>: ({[@M+]} & SJl+) or ({[[@M+]]} & SJr-);
<noun-and-p,u>:
({[@M+]} & SJlp+) or ({[[@M+]]} & SJrp-) or
({[@M+]} & SJlu+) or ({[[@M+]]} & SJru-);
<rel-clause-x>: {Rw+} & B*m+;
<rel-clause-s>: {Rw+} & Bsm+;
<rel-clause-p>: {Rw+} & Bpm+;
% TOf+ & IV+: "there is going to be a meeting", "there appears to be a bug"
% TOn+ & IV+: "there are plots to hatch", "there is a bill to sign"
% TOt+ & B+: this is one where B makes the link
<inf-verb>: IV+;
<to-verb>: TO+ & IV+;
<tof-verb>: TOf+ & IV+;
<toi-verb>: TOi+ & IV+;
<ton-verb>: TOn+ & IV+;
<too-verb>: TOo+ & IV+;
<tot-verb>: TOt+ & B+;
<subord-verb>: CV+;
<embed-verb>: Ce+ & CV+;
<subcl-verb>: Cs+ & CV+;
<advcl-verb>: Ca+ & CV+;
<fitcl-verb>: Ci+ & CV+;
<porcl-verb>: Cr+ & CV+;
<thncl-verb>: Cc+ & CV+;
% We don't handle Ct,Cta in the above, because the AF and B link plays
% the role of CV, connecting to the head-verb.
% The use of COa here needs to be carefully re-examined; it is used much too freely.
<directive-opener>:
{[[Wa-]]} &
((Xc+ & Ic+) or
({Xd-} & (Xc+ or [[()]]) & [[COa+]]));
% Just pure singular entities, no mass nouns
% The CAPITALIZED-WORDS rule is triggered by regex matching, and
% applies to all capitalized words that are not otherwise found in
% the dictionary.
% ({[[@MX+]]} & AN+) comes from postposed modifiers:
% "Codon 311 (Cys --> Ser) polymorphism"
%
% We do NOT tag these with <marker-entity>, a this messes up first-word
% processing in tokenize.c. So for example, we do *not* want "There"
% in "There they are" tagged as an entity, just because its capitalized.
% We really do want to force the lower-case usage, because the lower case
% is in the dict, and its the right word to use. (The only entities that
% should be tagged as such are those that are in the dicts, in thier
% capitalized form, e.g. "Sue.f" female given name as opposed to "sue.v"
% verb in the sentence "Sue went to the store.")
%
% To help discourage capitalized use when the lower-case is in the dict,
% we give a slight cost to [<noun-sub-s> & (JG- or <noun-main-s>)] to
% discourage use as a common noun, so that the lower-case version can
% play this role. Likewise th cost on [AN+].
%
% The cost on AN+ also discourages crazy AN links to noun cognates of verbs:
% e.g. "The Western Railroad runs through town" -- down't want AN to runs.n.
%
% MX+ & <noun-main-s>: country names: "...went to Paris, France"
%
INITIALS <entity-singular>:
({NM+} & ({G-} & {[MG+]} &
(({DG- or [[GN-]] or [[@A- & @AN-]] or [[{@A-} & {D-}]] or ({@A-} & Jd- & Dmc-)} &
((<noun-sub-s> & (JG- or <noun-main-s>))
or <noun-and-s>
or YS+
))
or ({[[@MX+]]} & [AN+]) or G+)))
or (MXs+ & (<noun-main-s> or <noun-and-s>))
or ({@A- or G-} & {D-} & Wa-)
or <directive-opener>;
% As above, but with a tiny extra cost, so that a dictionary word is
% prefered to the regex match (i.e. for a common noun starting a
% sentence). However, the other regex matches (e.g. MC-NOUN-WORDS)
% should have a cost that is even higher (so that we take the
% capitalized version before we take any other matches.)
CAPITALIZED-WORDS: [<entity-singular>]0.05;
% Hack, see EMPTY-WORD, up top.
EMPTY-WORD.x: CAPITALIZED-WORDS;
% Capitalized words that seem to be plural (by ending with an s, etc)
% -- But not all words that end with an 's' are plural:
% e.g. Cornwallis ... and some of these can take a singular determiner:
% "a Starbucks"
PL-CAPITALIZED-WORDS:
({NM+} & {G-} & {[MG+]} &
(({DG- or [[GN-]] or [[{@A-} & ({Dmc-} or {Ds-})]] or ({@A-} & Jd- & Dmc-) } &
([<noun-sub-x> & (JG- or <noun-main-x>)]
or <noun-and-x>
or YS+
or YP+
))
or AN+
or G+))
or ({@A- or G-} & {D-} & Wa-)
or <directive-opener>;
% capitalized words ending in s
% -- hmm .. proper names not used anywhere right now, has slot for plural ... !!??
<proper-names>:
({G-} & {[MG+]} & (({DG- or [[GN-]] or [[{@A-} & {D-}]]} &
(({@MX+} & (JG- or <noun-main-s>)) or YS+ or YP+)) or AN+ or G+));
% "Tom" is a given name, but can also be a proper name, so e.g.
% "The late Mr. Tom will be missed." which needs A-, D- links
% Wa-: A single exclamation: "Tom! Hey, Tom! Oh, hello John!"
% <noun-and-s> is trikcy when used with [[...]] connectors.
% Careful for bad parses of
% "This is the dog and cat Pat and I chased and ate"
% "actress Whoopi Goldberg and singer Michael Jackson attended the ceremony"
<given-names>:
{G-} & {[MG+]} &
(({DG- or [GN-]2.1 or [[{@A-} & {D-}]]} &
(({@MX+} & {NMr+} & (JG- or <noun-main-s> or <noun-and-s>))
or YS+
or YP+))
or AN+
or Wa-
or G+);
% Whole, entire entities, cannot participate in G links
% because the entire entity has already been identified.
<entity-entire>:
({DG- or [[GN-]] or [[{@A-} & {D-}]]} &
(({@MX+} & <noun-main-s>) or <noun-and-s> or YS+))
or Wa-
or AN+;
% Words that are also given names
% Cannot take A or D links.
% Art Bell Bill Bob Buck Bud
%
% The bisex dict includes names that can be given to both
% men and women.
/en/words/entities.given-bisex.sing
/en/words/entities.given-female.sing
/en/words/entities.given-male.sing:
<marker-entity> or <given-names> or <directive-opener>;
% Special handling for certain given names. These are words that have a
% lower-case analog in the dictionary, and are also used in upper-case
% form in an "idiomatic name" e.g. Vatican_City. Without the below,
% this use of "City" would prevent it from being recognized in other
% (non-idiomatic) proper name constructions, e.g. New York City.
/en/words/entities.organizations.sing:
<marker-entity> or <entity-singular>;
/en/words/entities.locations.sing:
<marker-entity> or <entity-singular>;
% words.n.4: nouns that can be mass or countable
% allocation.n allotment.n alloy.n allure.n alteration.n alternation.n
% piano.n flute.n belong here, because of "He plays piano"
%
% This class has now been eliminated: nouns are either singular, plural
% or mass. If they can be more than one, then they are listed separately
% in each class e.g. words.n.1 and/or words.n.2 and/or words.n.3, etc.
% (Only a few screwball exceptions below; should be fixed ...)
<noun-mass-count>:
<noun-modifiers> &
(({NM+} & AN+)
or ({NM+ or ({Jd-} & D*u-)} & <noun-sub-s> & (<noun-main-m> or <rel-clause-s>))
or ({NM+ or ({Jd-} & D*u-)} & <noun-and-p,u>)
or (YS+ & {D*u-})
or (GN+ & (DD- or [()]))
or Us- or ({D*u-} & Wa-));
GREEK-LETTER-AND-NUMBER pH.i x.n: <noun-mass-count>;
% Same as pattern used in words.n.4 -- mass nouns or countable nouns
<generic-singular-id>: <noun-mass-count>;
% Pattern used for words.n.2.s
% Similar to <common-noun>, but with different determiners for number
% agreement.
% ({{Dmc-} & Jd-} & Dmc-) : "I gave him a number of the cookies"
% want "Dmc-" on both to avoid "this cookies"
<generic-plural-id>:
<noun-modifiers> &
([[AN+]]
or ({NM+ or ({{Dmc-} & Jd-} & Dmc-)} &
<noun-sub-p> & (<noun-main-p> or <rel-clause-p>))
or ({NM+ or Dmc-} & <noun-and-p>)
or SJrp-
or (YP+ & {Dmc-})
or (GN+ & (DD- or [()]))
or Up-
or ({Dmc-} & Wa-));
%for YEAR-DATE year numbers
<date-id>:
NMd-
or ({EN-} & (NIfn+ or NItn-))
or NN+
or AN+
or Wa-
or ((Xd- & TY- & Xc+) or TY-)
or ({EN- or NIc-}
& (ND+
or OD-
or ({{@L+} & DD-}
& ([[Dmcn+]]
or ((<noun-sub-x> or TA-) & (JT- or IN- or [[<noun-main-x>]]))))));
% Number abbreviations: no.x No.x
% pp. paragraph, page art article
% RR roural route
No.x No..x no.x no..x Nos.x Nos..x nos.x nos..x
Nr.x Nr..x Nrs.x Nrs..x nr.x nr..x nrs.x nrs..x
Num.x Num..x num.x num..x pp.x pp..x
Art..x art..x RR..x RR.x rr..x :
(Xi+ or [[()]]) & AN+;
% Explicitly include the period at the end of the abbreviation.
Adj..x Adm..x Adv..x Asst..x Atty..x Bart..x Bldg..x Brig..x Bros..x Capt..x Cie..x
Cmdr..x Col..x Comdr..x Con..x Corp..x Cpl..x DR..x Dr..x Drs..x Ens..x Ft..x
Gen..x Gov..x Hon..x Hr..x Hosp..x HMS..x Insp..x Lieut..x Lt..x MM..x MR..x MRS..x
MS..x Maj..x Messrs..x Mlle..x Mme..x Mr..x Mrs..x Ms..x Msgr..x Mt..x Op..x
Ord..x Pfc..x Ph..x Prof..x Pvt..x Rep..x Reps..x Res..x Rev..x Rt..x
Sen..x Sens..x Sfc..x Sgt..x Sr..x St..x Supt..x Surg..x:
G+;
% Period is missing in the abbreviation! Accept, but with a cost.
Adj.x Adm.x Adv.x Asst.x Atty.x Bart.x Bldg.x Brig.x Bros.x Capt.x Cie.x
Cmdr.x Col.x Comdr.x Con.x Corp.x Cpl.x DR.x Dr.x Drs.x Ens.x Ft.x
Gen.x Gov.x Hon.x Hr.x Hosp.x HMS.x Insp.x Lieut.x Lt.x MM.x MR.x MRS.x
MS.x Maj.x Messrs.x Mlle.x Mme.x Mr.x Mrs.x Ms.x Msgr.x Mt.x Op.x
Ord.x Pfc.x Ph.x Prof.x Pvt.x Rep.x Reps.x Res.x Rev.x Rt.x
Sen.x Sens.x Sfc.x Sgt.x Sr.x St.x Supt.x Surg.x:
[[G+]];
% Street addresses, company abbreviations
St.y St..y Ave.y Ave..y Av.y Av..y Pl.y Pl..y Ct.y Ct..y Dr.y Dr..y
Gr.y Gr..y Ln.y Ln..y Rd.y Rd..y Rt.y Rt..y
Blvd.y Blvd..y Pkwy.y Pkwy..y Hwy.y Hwy..y
AG.y Assn.y Assn..y
Corp.y Corp..y Co.y Co..y Inc.y Inc..y PLC.y
Pty.y Pty..y Ltd.y Ltd..y LTD.y Bldg.y Bldg..y and_Co GmBH.y:
({[X-]} & G-) & {Xi+} & {[MG+]} &
(({DG- or [[GN-]] or [[{@A-} & {D-}]]} &
(({@MX+} & (JG- or <noun-main-s>)) or
<noun-and-s> or
YS+ or
YP+)) or
AN+ or
G+);
% Titles, e.g. Joe Blow, Esq. or Dr. Smarty Pants, Ph.D.
% Gack. See absurdely large collection at:
% http://en.wikipedia.org/wiki/List_of_post-nominal_letters
Jr.y Jr..y Sr.y Sr..y Esq.y Esq..y
AB.y A.B..y AIA.y A.I.A..y
BA.y B.A..y BFA.y B.F.A..y BS.y B.S..y BSc.y B.Sc..y
CEng.y CEng..y CFA.y CPA.y CPL.y CSV.y
DD.y D.D..y DDS.y D.D.S..y DO.y D.O..y D.Phil..y D.P.T..y
Eng.D..y
JD.y J.D..y KBE.y K.B.E..y LLD.y LL.D..y
MA.y M.A..y MBA.y M.B.A..y MD.y M.D.y MFA.y M.F.A..y
MS.y M.S..y MSc.y M.Sc..y
OFM.y
PE.y P.E..y Pfc.y Pharm.D..y
PhD.y Ph.D.y Ph.D..y
RA.y R.A..y RIBA.y R.I.B.A..y RN.y R.N..y Sgt.y
USMC.y USN.y:
{Xi+} & {Xd- & {Xc+}} & G- & {[MG+]} &
(({DG- or [[GN-]] or [[{@A-} & {D-}]]} &
(({@MX+} & (JG- or <noun-main-s>)) or
<noun-and-s> or
YS+ or
YP+)) or
AN+ or
G+);
% The generic category for strings containing a hyphen
PART-NUMBER.n
HYPHENATED-WORDS.n:
[[({@AN-} & {@A-} &
(({NM+ or D-} &
((<noun-sub-x> & (<noun-main-x> or <rel-clause-x>))
or <noun-and-x>))
or U-
or ({D-} & Wa-)))
or ((YS+ or YP+) & {@AN-} & {@A-} & {D-})]];
% NOUNS --------------------------------------------------------
% Nouns typically take determiners (a, the). The minor flags are:
% D link: determiners: D1234
% position 1 can be s, m for singular, mass
% position 2 can be c for count, u for uncountable
% position 3 can be k,m,y for comparatives, w for questions.
% position 4 can be c for consonant, v for vowel, x for long-distance.
% words.n.1-vowel words.n.1-const: Common nouns
% activist.n actor.n actress.n actuary.n ad.n adage.n adagio.n adapter.n
% The naked SJr- allows article to be skipped in conjunction (and,or)
% constructions ("the hammer and sickle")
%
% ({NMa+} & AN+): He takes vitamin D supplements.
%
% XXX TODO fixme: there are many gerund-like nouns in here (e.g. "reading")
% which screw things up when linking to "be" (e.g. "I have to be reading now")
% by appearing as objects (O-) connector when really the verb form (Pg-)
% is what should be happening. So rip these words out... (similar remarks for
% words.n.3)
<common-noun>:
<noun-modifiers> &
(({NMa+} & AN+)
or ((NM+ or ({[NM+]1.5} & (Ds- or <no-det-null>)))
& ((<noun-sub-s> & (<noun-main-s> or <rel-clause-s>))
or <noun-and-s>))
or SJrs-
or (YS+ & Ds-)
or (GN+ & (DD- or [()]))
or Us-
or ({Ds-} & Wa-));
% Preliminary experimental split for supporting a/an phonetic change
% for common nouns starting with vowels or consonant's.
% XXX not yet fully tested; seems over-complicated.
<common-phonetic>:
(<noun-modifiers> &
(SJrs-
or (GN+ & (DD- or [()]))
or Us-
or ({Ds-} & Wa-)))
or (<nn-modifiers> &
(({NMa+} & AN+)
or ((NM+ or ({[NM+]1.5} & (Ds**x- or <no-det-null>)))
& ((<noun-sub-s> & (<noun-main-s> or <rel-clause-s>))
or <noun-and-s>))
or (YS+ & Ds**x-)
));
<common-vowel-noun>:
<common-phonetic>
or (({NMa+} & AN+)
or ((NM+ or ({[NM+]1.5} & (Ds**v- or <no-det-null>)))
& ((<noun-sub-s> & (<noun-main-s> or <rel-clause-s>))
or <noun-and-s>))
or (YS+ & Ds**v-));
<common-const-noun>:
<common-phonetic>
or (({NMa+} & AN+)
or ((NM+ or ({[NM+]1.5} & (Ds**c- or <no-det-null>)))
& ((<noun-sub-s> & (<noun-main-s> or <rel-clause-s>))
or <noun-and-s>))
or (YS+ & Ds**c-));
/en/words/words.n.1-vowel :
<marker-common-entity> or <common-vowel-noun>;
/en/words/words.n.1-const :
<marker-common-entity> or <common-const-noun>;
/en/words/words.n.1.gerund :
<marker-common-entity> or <common-noun>;
% Common plural nouns ending in "s"
% allocations.n allotments.n allowances.n alloys.n allures.n allusions.n
/en/words/words.n.2.s :
<marker-common-entity> or <generic-plural-id>;
PL-GREEK-LETTER-AND-NUMBER: <generic-plural-id>;
% plural nouns not ending in "s"
% almost exactly identical to <generic-plural-id> except that there is
% a YS+ instead of a YP+, uses a <noun-and-s> instead of <noun-and-p>
%
% {Jd-}: allows a "a flock of birds" to act as determiner.
%
% aircraft.p bacteria.p bellmen.n buffalo.p businessmen.n chairmen.n
/en/words/words.n.2.x :
<marker-common-entity> or
(<noun-modifiers> &
([[AN+]]
or (GN+ & (DD- or [()]))
or Up-
or ({Dmc-} & Wa-)
or ({NM+ or ({Jd-} & Dmc-)} &
((<noun-sub-p> & (<noun-main-p> or <rel-clause-p>)) or <noun-and-s>))
or (YS+ & {Dmc-})
));
% XXX should probably eliminate <noun-and-p,u> and replace by <noun-and-u>
% but this requires other spread-out changes
%
% ({{Dmu-} & Jd-} & Dmu-): "Drink a pint of this beer"
% XXX: perhaps the above belongs on <noun-main-u> ??? If so,
% then we should also fix up similar connectors on "these", "those", "it",
% "them" etc; see below, search for Jd-
<mass-noun>:
<noun-modifiers> &
(AN+
or (GN+ & (DD- or [()]))
or Up-
or ({Dmu-} & Wa-)
or ({NM+ or ({{Dmu-} & Jd-} & Dmu-)}
& ((<noun-sub-s> & (<noun-main-u> or <rel-clause-s>)) or <noun-and-p,u>))
or (YS+ & {Dmu-})
);
% XXX FIXME: this has only partial phonetic support. I guess the Dm+ need to
% be fixed up as well.
<mass-phonetic>:
<noun-modifiers> &
((GN+ & (DD- or [()]))
or Up-
or ({Dm-} & Wa-)
or ({NM+ or ({{Dmu-} & Jd-} & Dmu-)}
& ((<noun-sub-s> & (<noun-main-u> or <rel-clause-s>)) or <noun-and-p,u>))
or (YS+ & {Dmu-})
);
% If PH is possible, then it is preferred. See PH below for explanation.
<wantPHc>: [PHc-]-0.1 or ();
<wantPHv>: [PHv-]-0.1 or ();
<mass-vowel-noun>:
<mass-phonetic>
or (AN+ & <wantPHv>)
or (<nn-modifiers> & AN+);
<mass-const-noun>:
<mass-phonetic>
or (AN+ & <wantPHc>)
or (<nn-modifiers> & AN+);
% nouns that are mass only
% absolutism.n absorption.n abstinence.n abundance.n academia.n
/en/words/words.n.3-vowel:
<marker-common-entity> or <mass-vowel-noun>;
/en/words/words.n.3-const:
<marker-common-entity> or <mass-const-noun>;
% Gonna treat these as mass nouns, not sure if this is correct.
% "She wished me goodnight" "She wishes you well"
adieu.n-u bye.n-u farewell.n-u fare-thee-well good-bye.n-u goodbye.n-u
good-night.n-u goodnight.n-u welcome.n-u well.n-u:
<mass-noun>;
% Want to cost this so that it doesn't interfere with given name "Tom".
tom.n-u: [<marker-common-entity> or <mass-noun>];
% Nouns that are also adjectives (e.g. red.a) and so we don't want to
% allow these to take AN+ links (we want to have red.a get used instead).
% But we do need these as nouns, so as to parse 'she prefers red'.
% However, assign a cost, so that 'her shoes are red' gets red.a (with
% the Pa link) perfered over red.n (with the O link).
%
% Doesn't seem to need a noun-and-x to make this work ...
% In other respects, these are kind-of-like mass nouns...
auburn.n black.n blue.n brown.n green.n gray.n ochre.n
pink.n purple.n red.n
tawny.n ultramarine.n umber.n yellow.n:
<marker-common-entity>
or (<noun-modifiers> & (
(GN+ & (DD- or [()]))
or Up-
or ({Dmu-} & Wa-)
or ({Dmu-} & <noun-sub-s> & ([<noun-main-m> or <rel-clause-s>]))
or (YS+ & {Dmu-})
));
% US state names and abbreviations
% NM N.M. NY N.Y. NC N.C. ND N.D. Ohio Okla.
/en/words/entities.us-states.sing:
<marker-entity>
or ({G-} & {DG- or [[GN-]] or [[{@A-} & {D-}]]} &
(({MG+} & {@MX+} & (JG- or <noun-main-s> or <noun-and-s>))
or G+
or ({[[MG+]]} & (AN+ or YS+ or YP+))))
or (Xc+ & Xd- & G- & AN+)
or Wa-;
% SINGULAR ENTITIES FOR ENTITY EXTRACTION
% This must appear after other categories so it doesnt interfere with those.
/en/words/entities.national.sing:
<marker-entity> or <entity-singular>;
% Enable parsing of "Mother likes her"
% Informal only, see formal version below.
auntie.f dad.m daddy.m granny.f
granddad.m grandpa.f grandpop.m mom.f mommy.f
pop.m papa.m poppy.m pops.m sis.f:
<entity-singular>;
% XXX FIXME: unfortunately, this doubles the number of parses for many
% things, e.g. colliding with mother.n-f
% MX-: Shem, brother of Jopheth, left the village.
aunt.f brother.m father.m grandmother.f grandfather.m mother.f
sister.f uncle.m child.s son.m daughter.f grandson.m granddaughter.f
granduncle.m grandaunt.f:
<entity-singular>
or (OF+ & {@MV+} & Xd- & Xc+ & MX*a-);
alter_ego au_pair mise_en_scene faux_pas non_sequitur fait_accompli
modus_vivendi head_of_state tour_de_force:
(<noun-modifiers> &
((Ds- & <noun-sub-s> & (<noun-main-s> or <rel-clause-s>)) or
({Ds-} & <noun-and-s>) or
Us- or
(YS+ & Ds-) or
(GN+ & (DD- or [()])))) or
AN+;
kung_fu joie_de_vivre op_art noblesse_oblige force_majeure
lese_majesty lese_majeste lèse_majesty lèse_majesté lèse-majesté leze_majesty
a_must time_of_day time_of_year top_dollar year_end
breach_of_contract sleight_of_hand power_of_attorney word_of_mouth
carte_blanche:
(<noun-modifiers> &
(({Dmu-} & <noun-sub-s> & (<noun-main-m> or <rel-clause-s>)) or
({Dmu-} & <noun-and-u>) or
Um- or
(YS+ & {Dmu-}) or
(GN+ & (DD- or [()])))) or
AN+;
% XXX FIXME plurals:
% lese_majesties or lèse_majestés
% title nouns (president, chairman)
% auditor.t bailiff.t broker.t buyer.t candidate.t captain.t cardinal.t
% Ou-: "He was made knight by the crown"
/en/words/words.n.t:
<noun-modifiers> & {@M+}
& (BIt- or (Xd- & (Xc+ or <costly-null>) & MX-) or Ou- or TI-);
% Almost identical to below.
% Ds- & {NM+} & <noun-sub-x> &.. "the number 12 is a lucky number"
% above has cost to give "a number of" priority.
number.n:
(<nn-modifiers> & (
[Ds**x- & {NM+} & <noun-sub-x> & (<noun-main-x> or B*x+)]
or ({Ds**x-} & {NM+} & <noun-and-x>)
))
or (
[Ds**c- & {NM+} & <noun-sub-x> & (<noun-main-x> or B*x+)]
or ({Ds**c-} & {NM+} & <noun-and-x>)
)
or AN+;
% Almost identical to above.
% Differing in strange ways from <common-noun>
majority.n minority.n bunch.n batch.n bulk.n handful.n group.n:
(<nn-modifiers> & (
[Ds**x- & <noun-sub-x> & (<noun-main-x> or B*x+)]
or ({Ds**x-} & <noun-and-x>)
))
or (
[Ds**c- & <noun-sub-x> & (<noun-main-x> or B*x+)]
or ({Ds**c-} & <noun-and-x>)
)
or AN+;
% Identical to <common-noun>, except that D- costs extra
<costly-common-noun>:
<noun-modifiers> &
(AN+
or ((NM+ or [[{[NM+]1.5} & (Ds- or <no-det-null>) ]] )
& ((<noun-sub-s> & (<noun-main-s> or <rel-clause-s>))
or <noun-and-s>))
or SJrs-
or (YS+ & Ds-)
or (GN+ & (DD- or [()]))
or Us-
or ({Ds-} & Wa-));
% determiner constructions, with a dangling of: "a number of", "a lot of"
% "I have a number of cookies"
% "a pride of lions" "a litter of kittens" all take determiners
% Some of these commonly modify count nouns, other mass nouns.
% {A-}: "a vast expanse" "a large flock"
% All of these "measure" nouns can also act as common nouns, but
% we want to give these a cost, so that they don't get the first choice.
/en/words/measures.1:
(OFd+ & Dm+ & {A-} & D-)
or <marker-common-entity>
or <costly-common-noun>;
% determiner constructions, with a dangling of:
% "I have bags of money"
% NIn+ needed for money amounts
% {Dmcn-}: "two kilograms of ..."
% The [<generic-plural-id>] is from words.n.2.s
/en/words/measures.2:
(OFd+ & (NIn+ or Dm+) & {A-} & {Dmcn-})
or <marker-common-entity>
or [<generic-plural-id>];
<kind-of>:
({@AN-} & @A- & U+ &
((Ds**x- & <noun-sub-s> & (<noun-main-s> or <rel-clause-s>))
or ({Ds**x-} & <noun-and-s>)
or Us-))
or (U+ &
((Ds**c- & <noun-sub-s> & (<noun-main-s> or <rel-clause-s>))
or ({Ds**c-} & <noun-and-s>)
or Us-));
% This gets a cost, so that the {Jd-} link for measures.1 is prefered.
kind_of:
[<kind-of>]
or EA+
or EE+
or Wa-;
% This gets a cost, so that the {Jd-} link for measures.1 is prefered.
type_of sort_of breed_of species_of:
[<kind-of>]
or [Us-]
or [Wa-];
% This gets a cost, so that the {Jd-} link for measures.2 is prefered.
kinds_of types_of sorts_of breeds_of species_of:
[{{@AN-} & @A-} & U+ &
(({Dmc-} & <noun-sub-p> & (<noun-main-p> or <rel-clause-p>))
or ({Dmc-} & <noun-and-p>)
or Up-)];
percent.u:
(<noun-modifiers> &
((ND- & {DD-} & <noun-sub-x> & (<noun-main-x> or B*x+)) or
(ND- & {DD-} & <noun-and-x>) or
U-)) or
(ND- & (OD- or AN+ or YS+));
% This set of disjuncts should probably be split up and refined.
% "shame.n", "crux.n" are here because they need the Ss*t connector
% to pick up "that" in "The crux of it is that we must act first."
% However, report.n and sign.n and remark.n, etc. do not seem to
% need this connector ...
%
% ({NM+} & {Ss+} & Wd-): "Hypothesis 2: The door on the left hides the prize."
% "Problem: How do you convince your customer that you are on the right path?"
%
<Dsv>: Ds**v- or Ds**x-;
<Dsc>: Ds**c- or Ds**x-;
% Vowel-only form of the below
argument.n impression.n allegation.n announcement.n assertion.n
accusation.n idea.n assumption.n implication.n
indication.n inkling.n amount.n answer.n:
<noun-modifiers> & (
AN+
or (<Dsv> & {@M+} & {(TH+ or (R+ & Bs+)) & {[[@M+]]}} & {@MXs+} &
(<noun-main2-s> or
(Ss*t+ & <CLAUSE>) or
SIs*t- or
<rel-clause-s>))
or ({<Dsv>} & <noun-and-s>)
or SJrs-
or (YS+ & <Dsv>)
or ({NM+} & {Ss+} & Wd-)
or (GN+ & (DD- or [()]))
or Us-);
% consonant-only form of the above.
report.n sign.n conclusion.n complaint.n position.n restriction.n
notion.n remark.n proclamation.n reassurance.n saying.n possibility.n
problem.n claim.n result.n statement.n hunch.n concept.n hypothesis.n
message.n premonition.n prerequisite.n prereq.n pre-req.n pre-requisite.n
corequisite.n co-requisite.n coreq.n co-req.n truism.n fallacy.n
proposition.n prospect.n presupposition.n supposition.n finding.n
crux.n shame.n thing.n bet.n guess.n:
<noun-modifiers> & (
AN+
or (<Dsc> & {@M+} & {(TH+ or (R+ & Bs+)) & {[[@M+]]}} & {@MXs+} &
(<noun-main2-s> or
(Ss*t+ & <CLAUSE>) or
SIs*t- or
<rel-clause-s>))
or ({<Dsc>} & <noun-and-s>)
or SJrs-
or (YS+ & <Dsc>)
or ({NM+} & {Ss+} & Wd-)
or (GN+ & (DD- or [()]))
or Us-);
% Vowel form of the below
acknowledgment.n acknowledgement.n understanding.n assurance.n
awareness.n opinion.n explanation.n expectation.n insistence.n:
(<noun-modifiers> & (
({(D*u*v- or D*u*x-)} & {@M+} & {(TH+ or (R+ & Bs+)) & {[[@M+]]}} & {@MXs+} & (
<noun-main2-m>
or (Ss*t+ & <CLAUSE>)
or SIs*t-
or <rel-clause-s>))
or ({(D*u*v- or D*u*x-)} & <noun-and-u>)
or Us-
or (YS+ & {D*u-})
or (GN+ & (DD- or [()]))))
or AN+;
% Consonant for of the above.
proof.n doubt.n suspicion.n hope.n knowledge.n relief.n disclosure.n
fear.n principle.n concern.n philosophy.n risk.n threat.n conviction.n
theory.n speculation.n news.n belief.n contention.n thought.n myth.n
discovery.n rumor.n probability.n fact.n feeling.n comment.n
perception.n sense.n realization.n view.n consensus.n notification.n
rule.n danger.n warning.n suggestion.n:
(<noun-modifiers> & (
({(D*u*c- or D*u*x-)} & {@M+} & {(TH+ or (R+ & Bs+)) & {[[@M+]]}} & {@MXs+} & (
<noun-main2-m>
or (Ss*t+ & <CLAUSE>)
or SIs*t-
or <rel-clause-s>))
or ({(D*u*c- or D*u*x-)} & <noun-and-u>)
or Us-
or (YS+ & {D*u-})
or (GN+ & (DD- or [()]))))
or AN+;
evidence.n reasoning.n likelihood:
(<noun-modifiers> &
(({Dmu-} & {@M+} & {(TH+ or (R+ & Bs+)) & {[[@M+]]}} & {@MXs+} &
(<noun-main2-m> or
(Ss*t+ & <CLAUSE>) or
SIs*t- or
<rel-clause-s>)) or
({Dmu-} & <noun-and-u>) or
Up- or
(YS+ & {Dmu-}) or
(GN+ & (DD- or [()])))) or
AN+;
ideas.n opinions.n statements.n beliefs.n facts.n arguments.n
principles.n theories.n philosophies.n signs.n impressions.n
conclusions.n contentions.n complaints.n proofs.n doubts.n
suspicions.n allegations.n reports.n claims.n announcements.n
positions.n risks.n hopes.n explanations.n restrictions.n threats.n
thoughts.n myths.n feelings.n discoveries.n rumors.n comments.n
realizations.n probabilities.n remarks.n notions.n convictions.n
hunches.n assumptions.n concepts.n hypotheses.n assertions.n
expectations.n implications.n perceptions.n proclamations.n
reassurances.n fears.n sayings.n senses.n messages.n disclosures.n
accusations.n views.n concerns.n understandings.n acknowledgments.n
acknowledgements.n possibilities.n premonitions.n prerequisites.n
prereqs.n pre-reqs.n pre-requisites.n
corequisites.n co-requisites.n coreqs.n co-reqs.n
provisos.n truisms.n fallacies.n assurances.n speculations.n
propositions.n prospects.n presuppositions.n inklings.n suppositions.n
findings.n amounts.n rules.n dangers.n warnings.n indications.n
answers.n suggestions.n:
(<noun-modifiers> &
(({{Jd-} & Dmc-} & {@M+} & {(TH+ or (R+ & Bp+)) & {[[@M+]]}} & {@MXp+} &