Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Make JFlex-based tokenizers share more and be more consistent.
- Everthing uses AbstractTokenizer.NEW_LINE - French and Spanish add PTBLexer enum for dashes option/treatments, and delete ptb3Dashes options - ellipsis and dashes style "ptb3" renamed to "ascii" - extract out and unify more token regex specifications in LexCommon.tokens (e.g., PHONE, EMOJI) - add FILENAME rule to Spanish lexer
- Loading branch information
Showing
12 changed files
with
92,240 additions
and
116,309 deletions.
There are no files selected for viewing
568 changes: 275 additions & 293 deletions
568
src/edu/stanford/nlp/international/french/process/FrenchLexer.flex
Large diffs are not rendered by default.
Oops, something went wrong.
22,780 changes: 12,250 additions & 10,530 deletions
22,780
src/edu/stanford/nlp/international/french/process/FrenchLexer.java
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
564 changes: 267 additions & 297 deletions
564
src/edu/stanford/nlp/international/spanish/process/SpanishLexer.flex
Large diffs are not rendered by default.
Oops, something went wrong.
21,303 changes: 13,829 additions & 7,474 deletions
21,303
src/edu/stanford/nlp/international/spanish/process/SpanishLexer.java
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.