Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content of app/nltk_data not included in slug #356

Closed
prashnts opened this issue Jan 21, 2017 · 6 comments
Closed

Content of app/nltk_data not included in slug #356

prashnts opened this issue Jan 21, 2017 · 6 comments

Comments

@prashnts
Copy link

@prashnts prashnts commented Jan 21, 2017

With the latest v97, data downloaded as part of slug compilation step (specified via post_compile hook) is not retained in the slug.

This worked fine in v85, and pinning back to v85 this still works.

@ken-reitz
Copy link
Contributor

@ken-reitz ken-reitz commented Jan 23, 2017

you need to download the files to the CWD, not to /app.

@prashnts
Copy link
Author

@prashnts prashnts commented Jan 23, 2017

Thanks; I've pinned to v85 in the meantime. I think it'd be nice to have a warning posted somewhere about this change so to not have a scare and failing builds coming as surprise! :)

The above mentioned post_compile hook is fairly popular way of getting the nltk data.

@ken-reitz
Copy link
Contributor

@ken-reitz ken-reitz commented Feb 15, 2017

I just added official nltk support to the buildpack!

Simply add a nltk.txt file with a list of corpora you want installed, and everything should work as expected.

@ken-reitz ken-reitz closed this Feb 15, 2017
@prashnts
Copy link
Author

@prashnts prashnts commented Feb 17, 2017

Thank you! It looks like a good solution.

I'd whipped up a buildpack to download the corpora in a multi-buildpack configuration: prashnts/heroku-buildpack-textblob which seems unnecessary now.

@ken-reitz
Copy link
Contributor

@ken-reitz ken-reitz commented Apr 4, 2017

@prashnts nice!

@Vichoko
Copy link

@Vichoko Vichoko commented Jan 15, 2019

There something i can see in the logs to know if the nltk.txt file is being used and resources are being downloaded?

I included the nltk.txt file in the root directory, with the following contents:

stopwords
punkt

then deployed my dockerized Django web sever specifying build, release and run phases with heroku.yml file, as can be seen here:

build:
  docker:
    web: Dockerfile
release:
  command:
    - python3 downloadstaticfiles.py
  image: web
run:
  web: bash -c "python3 manage.py migrate && python3 manage.py runserver 0.0.0.0:$PORT"

After deploying i can see the following error in heroku's log:


2019-01-15T05:23:23.773231+00:00 app[web.1]: **********************************************************************

2019-01-15T05:23:23.773232+00:00 app[web.1]:   Resource �[93mstopwords�[0m not found.

2019-01-15T05:23:23.773234+00:00 app[web.1]:   Please use the NLTK Downloader to obtain the resource:

2019-01-15T05:23:23.773236+00:00 app[web.1]: 

2019-01-15T05:23:23.773237+00:00 app[web.1]:   �[31m>>> import nltk

2019-01-15T05:23:23.773239+00:00 app[web.1]:   >>> nltk.download('stopwords')

2019-01-15T05:23:23.773241+00:00 app[web.1]:   �[0m

2019-01-15T05:23:23.773242+00:00 app[web.1]:   Searched in:

2019-01-15T05:23:23.773248+00:00 app[web.1]:     - '/code/nltk_data'

2019-01-15T05:23:23.773249+00:00 app[web.1]:     - '/usr/share/nltk_data'

2019-01-15T05:23:23.773251+00:00 app[web.1]:     - '/usr/local/share/nltk_data'

2019-01-15T05:23:23.773252+00:00 app[web.1]:     - '/usr/lib/nltk_data'

2019-01-15T05:23:23.773254+00:00 app[web.1]:     - '/usr/local/lib/nltk_data'

2019-01-15T05:23:23.773255+00:00 app[web.1]:     - '/usr/local/nltk_data'

2019-01-15T05:23:23.773257+00:00 app[web.1]:     - '/usr/local/share/nltk_data'

2019-01-15T05:23:23.773259+00:00 app[web.1]:     - '/usr/local/lib/nltk_data'

2019-01-15T05:23:23.773260+00:00 app[web.1]: **********************************************************************

2019-01-15T05:23:23.773261+00:00 app[web.1]: 

Also, i've tried installing the resources with nltk.download method inside downloadstaticfiles.py script for release and got the same error.
Any other ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.