You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Originally reported by starkman (sourceforge.net user: starkmanuk) on
2008-03-09
Hello all, I am having a bit of trouble whilst trying to use the Brill
tagger. Below is my code,
from nltk.corpus import treebank
from nltk import tag
from nltk.tag import brill
from nltk.corpus import reader
from nltk.corpus.reader import TaggedCorpusReader
train is the proportion of data used in training; the rest is reserved
for testing.
print "Loading tagged data... "
cutoff = int(num_sents*train)
training_data = tagged_data[:cutoff]
gold_data = tagged_data[cutoff:num_sents]
testing_data = [[t[0] for t in sent] for sent in gold_data]
print "Done lodaing."
It is of course a modifcation of the example brill tagger from the api. I
receive an error when it comes to computer the last line, brill_tagger
trainer.train(training_data, max_rules, min_score). This is the error..
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
brilltagger()
File "<pyshell#0>", line 80, in brilltagger
brill_tagger = trainer.train(training_data, max_rules, min_score)
File "C:\Python25\Lib\site-packages\nltk\tag\brill.py", line 869, in train
rule = self._best_rule(train_sents, test_sents, min_score)
File "C:\Python25\Lib\site-packages\nltk\tag\brill.py", line 1008, in
_best_rule
max_score = max(self._rules_by_score)
ValueError: max() arg is an empty sequence
The error i think can decipher, (i.e. max() is empty) but i am unsure as to
why this is occurring. I am using sections from the LOB corpus, the part of
the corpus has been modified so that NLTK can decipher the word and its
associated tags. This seems to work and i can print out both the word and
its tag correctly as with the other predefined corpora that is bundled with
NLTK. Is it possible that the text I am passing to the Brill tagger simply
has no rules?
Kind Regards,
David
Migrated from http://code.google.com/p/nltk/issues/detail?id=67
earlier comments
paulbone.au said, at 2008-11-06T07:21:14.000Z:
David/starkman,
If you have a copy of the LOB corpus, particularly the a.txt file could you provide
it so I can test this against it.
Thanks.
paulbone.au said, at 2008-11-06T07:33:20.000Z:
I've commited a fix for this but am unable to test it until I have a failing test
case. I'll leave the bug open.
StevenBird1 said, at 2009-01-08T23:35:27.000Z:
Wrote to starkmanuk to request data.
The text was updated successfully, but these errors were encountered:
Originally reported by starkman (sourceforge.net user: starkmanuk) on
2008-03-09
Hello all, I am having a bit of trouble whilst trying to use the Brill
tagger. Below is my code,
from nltk.corpus import treebank
from nltk import tag
from nltk.tag import brill
from nltk.corpus import reader
from nltk.corpus.reader import TaggedCorpusReader
root = 'C:\lob'
reader = TaggedCorpusReader(root, 'a.txt', sep='/')
tagged_data = reader.tagged_sents()
nn_cd_tagger = tag.RegexpTagger([(r'^-?[0-9]+(.[0-9]+)?$', 'CD'),
(r'.*', 'NN')])
train is the proportion of data used in training; the rest is reserved
for testing.
print "Loading tagged data... "
cutoff = int(num_sents*train)
training_data = tagged_data[:cutoff]
gold_data = tagged_data[cutoff:num_sents]
testing_data = [[t[0] for t in sent] for sent in gold_data]
print "Done lodaing."
Start Unigram tagger
print "Training unigram tagger:"
unigram_tagger = tag.UnigramTagger(training_data,
backoff=nn_cd_tagger)
if gold_data:
print " [accuracy: %f]" % tag.accuracy(unigram_tagger, gold_data)
Start Bigram tagger
print "Training bigram tagger:"
bigram_tagger = tag.BigramTagger(training_data,
backoff=unigram_tagger)
if gold_data:
print " [accuracy: %f]" % tag.accuracy(bigram_tagger, gold_data)
Brill tagger
templates = [
brill.SymmetricProximateTokensTemplate(brill.ProximateTagsRule, (1,1)),
brill.SymmetricProximateTokensTemplate(brill.ProximateTagsRule, (2,2)),
brill.SymmetricProximateTokensTemplate(brill.ProximateTagsRule, (1,2)),
brill.SymmetricProximateTokensTemplate(brill.ProximateTagsRule, (1,3)),
brill.SymmetricProximateTokensTemplate(brill.ProximateWordsRule, (1,1)),
brill.SymmetricProximateTokensTemplate(brill.ProximateWordsRule, (2,2)),
brill.SymmetricProximateTokensTemplate(brill.ProximateWordsRule, (1,2)),
brill.SymmetricProximateTokensTemplate(brill.ProximateWordsRule, (1,3)),
brill.ProximateTokensTemplate(brill.ProximateTagsRule, (-1, -1), (1,1)),
brill.ProximateTokensTemplate(brill.ProximateWordsRule, (-1, -1), (1,1)),
]
trainer = brill.FastBrillTaggerTrainer(bigram_tagger, templates, trace)
trainer = brill.BrillTaggerTrainer(u, templates, trace)
brill_tagger = trainer.train(training_data, max_rules, min_score)
It is of course a modifcation of the example brill tagger from the api. I
receive an error when it comes to computer the last line, brill_tagger
trainer.train(training_data, max_rules, min_score). This is the error..
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
brilltagger()
File "<pyshell#0>", line 80, in brilltagger
brill_tagger = trainer.train(training_data, max_rules, min_score)
File "C:\Python25\Lib\site-packages\nltk\tag\brill.py", line 869, in train
rule = self._best_rule(train_sents, test_sents, min_score)
File "C:\Python25\Lib\site-packages\nltk\tag\brill.py", line 1008, in
_best_rule
max_score = max(self._rules_by_score)
ValueError: max() arg is an empty sequence
The error i think can decipher, (i.e. max() is empty) but i am unsure as to
why this is occurring. I am using sections from the LOB corpus, the part of
the corpus has been modified so that NLTK can decipher the word and its
associated tags. This seems to work and i can print out both the word and
its tag correctly as with the other predefined corpora that is bundled with
NLTK. Is it possible that the text I am passing to the Brill tagger simply
has no rules?
Kind Regards,
David
Migrated from http://code.google.com/p/nltk/issues/detail?id=67
earlier comments
paulbone.au said, at 2008-11-06T07:21:14.000Z:
David/starkman,
If you have a copy of the LOB corpus, particularly the a.txt file could you provide
it so I can test this against it.
Thanks.
paulbone.au said, at 2008-11-06T07:33:20.000Z:
I've commited a fix for this but am unable to test it until I have a failing test case. I'll leave the bug open.
StevenBird1 said, at 2009-01-08T23:35:27.000Z:
Wrote to starkmanuk to request data.
The text was updated successfully, but these errors were encountered: