-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nltk fixes #441
Merged
Merged
nltk fixes #441
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
606bbc8
fixed broken tadm trainer gzip encoding, and opens gzip unicode files…
d5b9c4f
fixed problems in data.py so data.doctest passes; the nltk protocol n…
d8d7ea1
python 2.7 pickle compatibility
da6d489
the TODO is not needed anymore: load() supports loading zip files
7af7204
fixed bug in normalize_resource_name when dealing with empty strings
41e1dae
fixed test_corpus_views.py, obsolete code was used in the test
69b0cfb
fixed toolbox.doctest xml api deprecation warning
a25c733
normalize protocols properly in normalize_resource_url
fae1b81
FileSystemPathPointer shouldn't be a subclass of str, as the backward…
9897de4
fixed data.doctest now that the nltk protocol is used by default
152478c
updated chat80 so it handles sqlite3 errors correctly
055425a
changed nltk.data.paths back to nltk.data.path as it's documented and…
bc49847
resolve default paths for nltk.data.find at runtime in-case the user …
5c1ca98
dicts in doctests should use pprint to guarantee sortedness
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate more on this? What is it for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-pickled the data files using python 3 by using code like the following:
If there's a
collections.defaultdict
in the pickle dump, python 3.3 pickles it toUserString.defaultdict
instead ofcollections.defaultdict
. I'm however not sure why this is, but python 2.7 and 2.6 do not have adefaultdict
class inUserString
, so I've added that as a compatibility fix.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's quite strange because Python 3.3 doesn't have UserString module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed. Try the following:
Then dump the pickle file:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I believe it is a Python bug.
The cause of error is here: http://hg.python.org/cpython/file/7272ef213b7c/Lib/_compat_pickle.py#l80
We could try fix_imports=False - it will work as soon as we don't use moved functions from stdlib directly. It should work if we e.g. import urlencode from compat.py.
Another way is to create workarounds in compat.py (as you've done in your pull request). I believe we should also add defaultdict to UserList in this case because REVERSE_IMPORT_MAPPING relies on dict order that can vary between runs. But these workarounds are weird. Maybe it is better to patch REVERSE_IMPORT_MAPPING directly because this makes workaround more clear.
But I hope fix_imports=False would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported here: http://bugs.python.org/issue18473
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup you're right, good job on finding the bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wrong about "it will work as soon as we don't use moved functions from stdlib directly. It should work if we e.g. import urlencode from compat.py.", please disregard this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but fix_imports=False could still work and I think we should try it first