Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panlex is offline (404) #1283

Closed
tobias-nawa opened this issue Feb 5, 2016 · 12 comments
Closed

panlex is offline (404) #1283

tobias-nawa opened this issue Feb 5, 2016 · 12 comments

Comments

@tobias-nawa
Copy link

>>> nltk.download('panlex_lite')
[nltk_data] Downloading package panlex_lite to
[nltk_data]     /Users/username/nltk_data...
[nltk_data] Error downloading u'panlex_lite' from
[nltk_data]     <https://raw.githubusercontent.com/nltk/nltk_data/gh-
[nltk_data]     pages/packages/corpora/panlex_lite.zip>:   HTTP Error
[nltk_data]     404: Not Found
False
@derekcoleman
Copy link

Is there a workaround for this?

@sorpaas
Copy link

sorpaas commented Feb 10, 2016

+1

@stevenbird
Copy link
Member

Please install the development version of nltk to get the required functionality (third party corpus download locations)

@erlizhou
Copy link

Where can I get the development version of nltk?

@alvations
Copy link
Contributor

On a linux machine, You can try:

pip install virtualenv
mkdir new_python_environment
cd new_python_environment
virtualenv -p /usr/bin/python2.7 venv
source venv/bin/activate
git clone https://github.com/nltk/nltk.git
cd nltk
python setup.py

But do note that doing so will subject your environment to the develop branch and it might contain some buggy code that is under-development. Also, it might break other software that depends on NLTK or its dependencies.

IMHO, if you don't really need the panlex corpus, you should wait until the 3rd party download feature get pushed into the normal master release branch for users.


If you just need to download almost all the corpus so that you don't have to worry about using different models/corpus in NLTK, try the halt_on_error parameter:

>>> import nltk
>>> nltk.download('all', halt_on_error=False)

@stevenbird
Copy link
Member

Closing (resolved in HEAD)

@henryon
Copy link

henryon commented Feb 22, 2016

@stevenbird I am still get error, refer below log.
Any update documents or something else? Thanks
/usr/local/bin/python2.7 -m nltk.downloader -q all -d /usr/local/share/nltk_data
[nltk_data] Error downloading u'panlex_lite' from
[nltk_data] <https://raw.githubusercontent.com/nltk/nltk_data/gh-
[nltk_data] pages/packages/corpora/panlex_lite.zip>: HTTP Error
[nltk_data] 404: Not Found

@henryon
Copy link

henryon commented Feb 24, 2016

@alvations can you help on this? When run it on cli still get above error

@alvations
Copy link
Contributor

alvations commented Feb 24, 2016

@henryon there're several options to resolve the problem.

If you don't need to use the panlex_lite corpus, then you can just use the python interpreter and do

import nltk; nltk.download('popular', halt_on_error=False, download_dir=/path/to/where/you/wanna/save/)

(That will save the nltk_data to /path/to/where/you/wanna/save/nltk_data, see https://github.com/nltk/nltk/blob/develop/nltk/downloader.py#L646 for parameters for the nltk.downloader function)

If you need the panlex_lite corpus and you're willing to upgrade your NLTK, try

pip install -U nltk
python -m nltk.downloader panlex_lite

I'm not sure the packaged version on https://pypi.python.org/pypi/nltk has the 3rd party corpus download function). Otherwise try the develop branch (CAUTION: the develop branch might be subjected to breaking code):

pip install -U git+https://github.com/nltk/nltk.git

or

pip install -U https://github.com/nltk/nltk/archive/develop.zip

then

python -m nltk.download panlex_lite

If all else fails, try downloading the panlex_lite.zip directly from http://dev.panlex.org/db/panlex_lite.zip then put it in the nltk_data directory manually.

And if the panlex link fails, I've uploaded a copy on my personal dropbox for personal use but I'm not sure whether I can redistribute it. Licensing seems to be contribution dependent in panlex.

@henryon
Copy link

henryon commented Feb 24, 2016

@alvations thanks for your reply. will response to you after verify.

@stevenbird
Copy link
Member

Thanks @alvations. This functionality is not yet available in any release.
(This issue is a duplicate of #1253)

@Johnny20Ok
Copy link

For me it just stuck at downloading the panlex_lite package...have to mention that nltk is already upgraded. My version of python is 3.5..I don't get any error message,simply nothing.
Btw,I also tried on Pycharm,still the same
Any ideas ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants