Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content of app/nltk_data not included in slug #356

prashnts opened this issue Jan 21, 2017 · 6 comments

Content of app/nltk_data not included in slug #356

prashnts opened this issue Jan 21, 2017 · 6 comments


Copy link

@prashnts prashnts commented Jan 21, 2017

With the latest v97, data downloaded as part of slug compilation step (specified via post_compile hook) is not retained in the slug.

This worked fine in v85, and pinning back to v85 this still works.

Copy link

@ken-reitz ken-reitz commented Jan 23, 2017

you need to download the files to the CWD, not to /app.

Copy link

@prashnts prashnts commented Jan 23, 2017

Thanks; I've pinned to v85 in the meantime. I think it'd be nice to have a warning posted somewhere about this change so to not have a scare and failing builds coming as surprise! :)

The above mentioned post_compile hook is fairly popular way of getting the nltk data.

Copy link

@ken-reitz ken-reitz commented Feb 15, 2017

I just added official nltk support to the buildpack!

Simply add a nltk.txt file with a list of corpora you want installed, and everything should work as expected.

@ken-reitz ken-reitz closed this Feb 15, 2017
Copy link

@prashnts prashnts commented Feb 17, 2017

Thank you! It looks like a good solution.

I'd whipped up a buildpack to download the corpora in a multi-buildpack configuration: prashnts/heroku-buildpack-textblob which seems unnecessary now.

Copy link

@ken-reitz ken-reitz commented Apr 4, 2017

@prashnts nice!

Copy link

@Vichoko Vichoko commented Jan 15, 2019

There something i can see in the logs to know if the nltk.txt file is being used and resources are being downloaded?

I included the nltk.txt file in the root directory, with the following contents:


then deployed my dockerized Django web sever specifying build, release and run phases with heroku.yml file, as can be seen here:

    web: Dockerfile
    - python3
  image: web
  web: bash -c "python3 migrate && python3 runserver$PORT"

After deploying i can see the following error in heroku's log:

2019-01-15T05:23:23.773231+00:00 app[web.1]: **********************************************************************

2019-01-15T05:23:23.773232+00:00 app[web.1]:   Resource �[93mstopwords�[0m not found.

2019-01-15T05:23:23.773234+00:00 app[web.1]:   Please use the NLTK Downloader to obtain the resource:

2019-01-15T05:23:23.773236+00:00 app[web.1]: 

2019-01-15T05:23:23.773237+00:00 app[web.1]:   �[31m>>> import nltk

2019-01-15T05:23:23.773239+00:00 app[web.1]:   >>>'stopwords')

2019-01-15T05:23:23.773241+00:00 app[web.1]:   �[0m

2019-01-15T05:23:23.773242+00:00 app[web.1]:   Searched in:

2019-01-15T05:23:23.773248+00:00 app[web.1]:     - '/code/nltk_data'

2019-01-15T05:23:23.773249+00:00 app[web.1]:     - '/usr/share/nltk_data'

2019-01-15T05:23:23.773251+00:00 app[web.1]:     - '/usr/local/share/nltk_data'

2019-01-15T05:23:23.773252+00:00 app[web.1]:     - '/usr/lib/nltk_data'

2019-01-15T05:23:23.773254+00:00 app[web.1]:     - '/usr/local/lib/nltk_data'

2019-01-15T05:23:23.773255+00:00 app[web.1]:     - '/usr/local/nltk_data'

2019-01-15T05:23:23.773257+00:00 app[web.1]:     - '/usr/local/share/nltk_data'

2019-01-15T05:23:23.773259+00:00 app[web.1]:     - '/usr/local/lib/nltk_data'

2019-01-15T05:23:23.773260+00:00 app[web.1]: **********************************************************************

2019-01-15T05:23:23.773261+00:00 app[web.1]: 

Also, i've tried installing the resources with method inside script for release and got the same error.
Any other ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.