Skip to content


bogus check category mapping for fileids in class CategorizedCorpusReader's _init? #250

caio1982 opened this Issue · 5 comments

3 participants


python --version
Python 2.6.1

print nltk.version

Hi there, please take a look at the current code of CategorizedCorpusReader's _init, I've added an extra if block in it: if file_id in self.fileids() evaluates successfully, the files are there and I can load them up just fine. I had to put it in there to be sure the actual if file_id NOT in self.fileids() was evaluating wrongly. Dunno why I was getting caught in this ValueError, but this does not look correct.

        elif self._file is not None:
            for line in
                line = line.strip()
                file_id, categories = line.split(self._delimiter, 1)
                if file_id in self.fileids():
                    print 'In mapping %s found %s' % (self._file, file_id)
                if file_id not in self.fileids():
                    raise ValueError('In category mapping file %s: %s '
                                     'not found' % (self._file, file_id))
                for category in categories.split(self._delimiter):
                    self._add(file_id, category)

I have to comment this whole if block in order to get it working (and it really works after that):

                #if file_id not in self.fileids():
                #    raise ValueError('In category mapping file %s: %s '
                #                     'not found' % (self._file, file_id))

How to reproduce? Honestly I don't know... I only wonder what's the matter with this if file_id in self.fileids() being ambiguous... if the files are found (as per my extra if block explained above), why the original if block keeps raising error? If I had wrongly commented it out then it'd stop working, but it did work ok...

Am I missing something here?


Btw, I just checked the code in Git and it's the same, don't mind my NLTK version above.

Natural Language Toolkit member

Please submit a small code sample that makes it possible to reproduce the error, thanks.


It seems it may be due a inherited class I'm using to load up multiple categorized XML files though the code seems vanilla and similar to the one suggested in the NLTK cookbook book. You can see it here:

Sorry if I was jumpy to raise it as a bug, but I'm still checking whether it's a bug or not and didn't want to forget/miss it.

Natural Language Toolkit member

@caio1982 did you have a chance to check if it is a bug or not?


Not really as I had to keep using the custom inherited class I mentioned I sort of gave up trying to understand if it was NLTK of the class code. I delivered my paper with this "workaround" in it some months ago and I won't go back to it until next year, so you can close the bug if you think nobody caught that before or anything. Thanks for checking by the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.