Skip to content

Commit

Permalink
reworked tokenizer howto, as docstrings in tokenizer package
Browse files Browse the repository at this point in the history
  • Loading branch information
stevenbird committed Nov 6, 2011
1 parent 637d190 commit 37aced7
Show file tree
Hide file tree
Showing 12 changed files with 407 additions and 361 deletions.
121 changes: 121 additions & 0 deletions nltk/test/tag.errs
@@ -0,0 +1,121 @@

***************************************************************************
File "/Users/sb/git/nltk/nltk/test/tag.doctest", line 167, in tag.doctest
Failed example:
print 'Accuracy: %4.1f%%' % (
100.0 * unigram_tagger.evaluate(brown_test))
Expected:
Accuracy: 85.4%
Got:
Accuracy: 85.8%

***************************************************************************
File "/Users/sb/git/nltk/nltk/test/tag.doctest", line 178, in tag.doctest
Failed example:
print 'Accuracy: %4.1f%%' % (
100.0 * unigram_tagger_2.evaluate(brown_test))
Expected:
Accuracy: 88.0%
Got:
Accuracy: 88.4%

***************************************************************************
File "/Users/sb/git/nltk/nltk/test/tag.doctest", line 205, in tag.doctest
Failed example:
print bigram_tagger.size()
Expected:
3394
Got:
3386

***************************************************************************
File "/Users/sb/git/nltk/nltk/test/tag.doctest", line 207, in tag.doctest
Failed example:
print 'Accuracy: %4.1f%%' % (
100.0 * bigram_tagger.evaluate(brown_test))
Expected:
Accuracy: 89.4%
Got:
Accuracy: 89.6%

***************************************************************************
File "/Users/sb/git/nltk/nltk/test/tag.doctest", line 222, in tag.doctest
Failed example:
print trigram_tagger.size()
Expected:
1493
Got:
1502

***************************************************************************
File "/Users/sb/git/nltk/nltk/test/tag.doctest", line 224, in tag.doctest
Failed example:
print 'Accuracy: %4.1f%%' % (
100.0 * trigram_tagger.evaluate(brown_test))
Expected:
Accuracy: 88.8%
Got:
Accuracy: 89.0%

***************************************************************************
File "/Users/sb/git/nltk/nltk/test/tag.doctest", line 251, in tag.doctest
Failed example:
brill_tagger = trainer.train(brown_train, max_rules=10) # doctest: +NORMALIZE_WHITESPACE
Expected:
Training Brill tagger on 4523 sentences...
Finding initial useful rules...
Found 75359 useful rules.
<BLANKLINE>
B |
S F r O | Score = Fixed - Broken
c i o t | R Fixed = num tags changed incorrect -> correct
o x k h | u Broken = num tags changed correct -> incorrect
r e e e | l Other = num tags changed incorrect -> incorrect
e d n r | e
------------------+-------------------------------------------------------
354 354 0 3 | TO -> IN if the tag of the following word is 'AT'
111 173 62 3 | NN -> VB if the tag of the preceding word is 'TO'
110 110 0 4 | TO -> IN if the tag of the following word is 'NP'
83 157 74 4 | NP -> NP-TL if the tag of the following word is
| 'NN-TL'
73 77 4 0 | VBD -> VBN if the tag of words i-2...i-1 is 'BEDZ'
71 116 45 3 | TO -> IN if the tag of words i+1...i+2 is 'NNS'
65 65 0 3 | NN -> VB if the tag of the preceding word is 'MD'
63 63 0 0 | VBD -> VBN if the tag of words i-3...i-1 is 'HVZ'
59 62 3 2 | CS -> QL if the text of words i+1...i+3 is 'as'
55 57 2 0 | VBD -> VBN if the tag of words i-3...i-1 is 'HVD'
Got:
Training Brill tagger on 4523 sentences...
Finding initial useful rules...
Found 75299 useful rules.
<BLANKLINE>
B |
S F r O | Score = Fixed - Broken
c i o t | R Fixed = num tags changed incorrect -> correct
o x k h | u Broken = num tags changed correct -> incorrect
r e e e | l Other = num tags changed incorrect -> incorrect
e d n r | e
------------------+-------------------------------------------------------
354 354 0 3 | TO -> IN if the tag of the following word is 'AT'
110 110 0 3 | TO -> IN if the tag of the following word is 'NP'
91 127 36 6 | VB -> NN if the tag of words i-2...i-1 is 'AT'
82 143 61 3 | NN -> VB if the tag of the preceding word is 'TO'
71 116 45 2 | TO -> IN if the tag of words i+1...i+2 is 'NNS'
66 69 3 0 | VBN -> VBD if the tag of the preceding word is
| 'NP'
64 131 67 6 | NP -> NP-TL if the tag of the following word is
| 'NN-TL'
59 62 3 2 | CS -> QL if the text of words i+1...i+3 is 'as'
55 55 0 1 | NN -> VB if the tag of the preceding word is 'MD'
55 59 4 0 | VBD -> VBN if the tag of words i-2...i-1 is 'BEDZ'

***************************************************************************
File "/Users/sb/git/nltk/nltk/test/tag.doctest", line 274, in tag.doctest
Failed example:
print 'Accuracy: %4.1f%%' % (
100.0 * brill_tagger.evaluate(brown_test))
Expected:
Accuracy: 89.1%
Got:
Accuracy: 89.5%
.

0 comments on commit 37aced7

Please sign in to comment.