Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
reworked tokenizer howto, as docstrings in tokenizer package
- Loading branch information
1 parent
637d190
commit 37aced7
Showing
12 changed files
with
407 additions
and
361 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
|
||
*************************************************************************** | ||
File "/Users/sb/git/nltk/nltk/test/tag.doctest", line 167, in tag.doctest | ||
Failed example: | ||
print 'Accuracy: %4.1f%%' % ( | ||
100.0 * unigram_tagger.evaluate(brown_test)) | ||
Expected: | ||
Accuracy: 85.4% | ||
Got: | ||
Accuracy: 85.8% | ||
|
||
*************************************************************************** | ||
File "/Users/sb/git/nltk/nltk/test/tag.doctest", line 178, in tag.doctest | ||
Failed example: | ||
print 'Accuracy: %4.1f%%' % ( | ||
100.0 * unigram_tagger_2.evaluate(brown_test)) | ||
Expected: | ||
Accuracy: 88.0% | ||
Got: | ||
Accuracy: 88.4% | ||
|
||
*************************************************************************** | ||
File "/Users/sb/git/nltk/nltk/test/tag.doctest", line 205, in tag.doctest | ||
Failed example: | ||
print bigram_tagger.size() | ||
Expected: | ||
3394 | ||
Got: | ||
3386 | ||
|
||
*************************************************************************** | ||
File "/Users/sb/git/nltk/nltk/test/tag.doctest", line 207, in tag.doctest | ||
Failed example: | ||
print 'Accuracy: %4.1f%%' % ( | ||
100.0 * bigram_tagger.evaluate(brown_test)) | ||
Expected: | ||
Accuracy: 89.4% | ||
Got: | ||
Accuracy: 89.6% | ||
|
||
*************************************************************************** | ||
File "/Users/sb/git/nltk/nltk/test/tag.doctest", line 222, in tag.doctest | ||
Failed example: | ||
print trigram_tagger.size() | ||
Expected: | ||
1493 | ||
Got: | ||
1502 | ||
|
||
*************************************************************************** | ||
File "/Users/sb/git/nltk/nltk/test/tag.doctest", line 224, in tag.doctest | ||
Failed example: | ||
print 'Accuracy: %4.1f%%' % ( | ||
100.0 * trigram_tagger.evaluate(brown_test)) | ||
Expected: | ||
Accuracy: 88.8% | ||
Got: | ||
Accuracy: 89.0% | ||
|
||
*************************************************************************** | ||
File "/Users/sb/git/nltk/nltk/test/tag.doctest", line 251, in tag.doctest | ||
Failed example: | ||
brill_tagger = trainer.train(brown_train, max_rules=10) # doctest: +NORMALIZE_WHITESPACE | ||
Expected: | ||
Training Brill tagger on 4523 sentences... | ||
Finding initial useful rules... | ||
Found 75359 useful rules. | ||
<BLANKLINE> | ||
B | | ||
S F r O | Score = Fixed - Broken | ||
c i o t | R Fixed = num tags changed incorrect -> correct | ||
o x k h | u Broken = num tags changed correct -> incorrect | ||
r e e e | l Other = num tags changed incorrect -> incorrect | ||
e d n r | e | ||
------------------+------------------------------------------------------- | ||
354 354 0 3 | TO -> IN if the tag of the following word is 'AT' | ||
111 173 62 3 | NN -> VB if the tag of the preceding word is 'TO' | ||
110 110 0 4 | TO -> IN if the tag of the following word is 'NP' | ||
83 157 74 4 | NP -> NP-TL if the tag of the following word is | ||
| 'NN-TL' | ||
73 77 4 0 | VBD -> VBN if the tag of words i-2...i-1 is 'BEDZ' | ||
71 116 45 3 | TO -> IN if the tag of words i+1...i+2 is 'NNS' | ||
65 65 0 3 | NN -> VB if the tag of the preceding word is 'MD' | ||
63 63 0 0 | VBD -> VBN if the tag of words i-3...i-1 is 'HVZ' | ||
59 62 3 2 | CS -> QL if the text of words i+1...i+3 is 'as' | ||
55 57 2 0 | VBD -> VBN if the tag of words i-3...i-1 is 'HVD' | ||
Got: | ||
Training Brill tagger on 4523 sentences... | ||
Finding initial useful rules... | ||
Found 75299 useful rules. | ||
<BLANKLINE> | ||
B | | ||
S F r O | Score = Fixed - Broken | ||
c i o t | R Fixed = num tags changed incorrect -> correct | ||
o x k h | u Broken = num tags changed correct -> incorrect | ||
r e e e | l Other = num tags changed incorrect -> incorrect | ||
e d n r | e | ||
------------------+------------------------------------------------------- | ||
354 354 0 3 | TO -> IN if the tag of the following word is 'AT' | ||
110 110 0 3 | TO -> IN if the tag of the following word is 'NP' | ||
91 127 36 6 | VB -> NN if the tag of words i-2...i-1 is 'AT' | ||
82 143 61 3 | NN -> VB if the tag of the preceding word is 'TO' | ||
71 116 45 2 | TO -> IN if the tag of words i+1...i+2 is 'NNS' | ||
66 69 3 0 | VBN -> VBD if the tag of the preceding word is | ||
| 'NP' | ||
64 131 67 6 | NP -> NP-TL if the tag of the following word is | ||
| 'NN-TL' | ||
59 62 3 2 | CS -> QL if the text of words i+1...i+3 is 'as' | ||
55 55 0 1 | NN -> VB if the tag of the preceding word is 'MD' | ||
55 59 4 0 | VBD -> VBN if the tag of words i-2...i-1 is 'BEDZ' | ||
|
||
*************************************************************************** | ||
File "/Users/sb/git/nltk/nltk/test/tag.doctest", line 274, in tag.doctest | ||
Failed example: | ||
print 'Accuracy: %4.1f%%' % ( | ||
100.0 * brill_tagger.evaluate(brown_test)) | ||
Expected: | ||
Accuracy: 89.1% | ||
Got: | ||
Accuracy: 89.5% | ||
. |
Oops, something went wrong.