Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Develop a better system for distributing pickled models and other extra data #20
Currently, @syllog1sm's trontagger.pickle file is distributed using Github's Releases, which allows appending of binary files to a release.
While this is fine in the short term, there is a 5MB limit on appended files, which will likely be too small for the long term.
I am not sure of the best way to do this and I welcome suggestions.
Also, the process of installing the pickled models is very manual, i.e. saving the files to the TextBlob installation path. It would be nice to automate this in some way, possibly through a downloader module, similar to NLTK.
Here's how I see the deciderata:
Why not just distribute the models on the PyPI? The PerceptronTagger could be distributed as its own package. This would solve a number of problems:
$ pip install textblob-aptagger
allows you to get this
from text.blob import TextBlob from textblob_aptagger import PerceptronTagger blob = TextBlob("some text", pos_tagger=PerceptronTagger())
I've begun restructuring TextBlob to make it amenable to developing extensions (see #23 ).