Hi there, please take a look at the current code of CategorizedCorpusReader's _init, I've added an extra if block in it: if file_id in self.fileids() evaluates successfully, the files are there and I can load them up just fine. I had to put it in there to be sure the actual if file_id NOT in self.fileids() was evaluating wrongly. Dunno why I was getting caught in this ValueError, but this does not look correct.
elif self._file is not None:
for line in self.open(self._file).readlines():
line = line.strip()
file_id, categories = line.split(self._delimiter, 1)
if file_id in self.fileids():
print 'In mapping %s found %s' % (self._file, file_id)
if file_id not in self.fileids():
raise ValueError('In category mapping file %s: %s '
'not found' % (self._file, file_id))
for category in categories.split(self._delimiter):
I have to comment this whole if block in order to get it working (and it really works after that):
#if file_id not in self.fileids():
# raise ValueError('In category mapping file %s: %s '
# 'not found' % (self._file, file_id))
How to reproduce? Honestly I don't know... I only wonder what's the matter with this if file_id in self.fileids() being ambiguous... if the files are found (as per my extra if block explained above), why the original if block keeps raising error? If I had wrongly commented it out then it'd stop working, but it did work ok...
Am I missing something here?
Btw, I just checked the code in Git and it's the same, don't mind my NLTK version above.
Please submit a small code sample that makes it possible to reproduce the error, thanks.
It seems it may be due a inherited class I'm using to load up multiple categorized XML files though the code seems vanilla and similar to the one suggested in the NLTK cookbook book. You can see it here: http://stackoverflow.com/questions/6849600/does-anyone-have-a-categorized-xml-corpus-reader-for-nltk/10274179#10274179
Sorry if I was jumpy to raise it as a bug, but I'm still checking whether it's a bug or not and didn't want to forget/miss it.
@caio1982 did you have a chance to check if it is a bug or not?
Not really as I had to keep using the custom inherited class I mentioned I sort of gave up trying to understand if it was NLTK of the class code. I delivered my paper with this "workaround" in it some months ago and I won't go back to it until next year, so you can close the bug if you think nobody caught that before or anything. Thanks for checking by the way.