Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PunktTokenizer does not use the correct version of the pickled model on Python 3.x #2250

Closed
BLKSerene opened this issue Mar 10, 2019 · 0 comments

Comments

@BLKSerene
Copy link
Contributor

BLKSerene commented Mar 10, 2019

Hi, I'm trying to package my program with NLTK and nltk_data using PyInstaller. So to minimize the size of the data file, I removed the zip file and all models for Python 2.x in nltk_data/tokenizers/punkt (only the PY3 folder is left).

But it seems that the PunktTokenizer always uses the Python 2.x version of the pickled model regardless of Python version I'm using. And the error message says that it can't find tokenizers/punkt/english.pickle instead of tokenizers/punkt/PY3/english.pickle.

Removing the PY3 folder is okay, so it seems that the Python 3 version of the pickled model is never used.

OS: Windows 10 64-bit
Python version: 3.7.2 64-bit
NLTK version: 3.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants