New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to download corpus panlex_lite package in nltk in python #1253
Comments
Try within python: >>> import nltk
>>> nltk.download('panlex_lite') Or on command line: $ python -m nltk.downloader panlex_lite Note: It might take some time to download the data. |
Note that you need to install the development version of NLTK in order to do this. |
use this url [http://dev.panlex.org/db/panlex_lite.zip] to download it manually. |
Wait for NLTK v3.2 and please see extensive discussion on #1283 |
Hi once panlex_lite is downloaded manually where should I put it within nltk_data? |
Please see http://www.nltk.org/data.html |
------------------ Original ------------------ Hi once panlex_lite is downloaded manually where should I put it within nltk_data? — |
Hi, |
@deepp I upload this zip file to baidu cloud. Following is the link and password |
@XiaoZYang Thanks for response I downloaded file manually from your previous response link. Thanks a ton |
@deepp pleasure. be glad to help u |
You can download the panlex_lite.zip from https://dev.panlex.org/db/, and put it in "/nltk_data/corpora/" |
While downloading panlex with nltk downloader, my whole system just froze - even the caps lock indicator light on my keyboard wasn't working anymore. I've restarted my computer, tried again and the same thing happened. I'll just download the zip-file manually. |
what to do after downloading the zip of panlex_lite so that rest packages are downloaded when nltk.download('all') is given? so that it skips panlex_lite downloading? i unzipped the zip folder but still when i try to download rest packages it shows downloading panlex_lite... help please. |
@eupherntech same issue. |
I am also facing the same issue. BTW, downloaded panlex_lite data manually. |
@eupherntech @stevealbertwong You could use |
Same issue here, even manually it takes up to 8 hours. Do something about it please! |
Based on the file mentioned above, it looks like it's a 2.2 GB file. So you might just need to hang tight and wait! One thing you can do in the meantime to get some more information is to look at the filesize and last modified time of the panlex_lite.zip file in nltk_data/corpora/ like so:
|
I'm having the same issue. I have panlex_lite successfully dowloaded (from http://dev.panlex.org/db/panlex_lite.zip) and located in the correct directory, but when nltk.download() is called it tries to download it again. Is there some other file that needs to be updated to show that the corpus is in place? Please Note: I would try @cimarie 's suggestion, but the problem is that I'm trying to use tox to test a branch before submitting a pull request, and tox calls nltk.download internally, so I don't think I have the ability to include those options. |
I've updated the checksums, so please try again |
@stevenbird Which checksums? Anyway, it does not appear to have worked. nltk.download('all') still tries to download panlex light, even though I have put the file attached to the above link in my ~/nltk_data/corpora folder. Also of note, the downloader tries to download panlex_swadesh every time (although this is a much shorter download than panlex_lite). I noticed panlex_swadesh.zip is in the corpora folder, and attempting to unzip it manually gives Arthurs-MacBook-Pro:corpora aetilley$ unzip panlex_swadesh.zip |
@aetilley – the checksums are published on this page – may need to "view source". They are from this file: https://dev.panlex.org/db/panlex_lite-20170401.zip Unfortunately I don't have the bandwidth to download it. There's two things you might try. Maybe you already just did the first in which case the second might be worth a shot.
|
I'm afraid that after running both of these (both successfully), nltk.download('all') still can't see panlex_lite. Again, the main problem here is that it makes it difficult to use tox. So am I the only one having this problem? |
Is Otherwise, the workaround is something like: >>> import nltk
>>> dler = nltk.downloader.Downloader()
>>> dler._update_index()
>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
>>> dler.download('all') |
More specifically, that nltk.download('all') correctly skips over all other corpora that I already have, but for some reason tries to get panlex_lite each time. Also that tox calls nltk.download('all'), so it's difficult to test locally before making a pull request. |
Hopefully, nltk/nltk_data#75 would resolve some of the issues. And after that's merged, users should be able to do |
And what will tox call? Again, I'm happy to download a large file once but the downloader doesn't seem so see that I already have it so it tries to download it every time. And again, if I'm the only person having this problem, then maybe it's not a problem, but I'm baffled. |
@aetilley: is this still happening? I think it should be fixed now that we've dropped panlex-lite from the NLTK corpus collection. |
Yes, tox appears to be working for me now. Sorry, I didn't catch that you had fixed that. |
I am able to download all the packages except the panlex_lite how to download it?
The text was updated successfully, but these errors were encountered: