Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop #127

Merged
merged 23 commits into from May 7, 2020
Merged

Develop #127

merged 23 commits into from May 7, 2020

Conversation

alattner
Copy link
Contributor

@alattner alattner commented May 7, 2020

No description provided.

dnouri and others added 23 commits August 29, 2019 14:32
Fix Dask hyperparam search example to work with latest sklearn&dask
Update example on how to use Dask to do grid search
An attempt to fix an issue on Travis where rpy2 is installed with
pip (in addition to conda) and appears to be buggy.
Remove julia and rpy2 from docs extra requirements
The idea is that we sometimes want to attach files to models, such as
HTML reports or the like, and in the backend, these files should be
stored separately, to allow easy access.

Here we're implemeting this idea for the 'FileLike' model persister
and testing it for the 'File' subclass.  This should work for 'Rest'
and 'S3' as well, but I thought it's best to add tests when we all
agreed on the idea.

Usage is demonstrated in 'TestFileAttachments'.  The contract is as
follows: Use 'palladium.util.annotate' to add an arbitrary number of
attachments to the model, like so:

```
annotate(model1, {'attachments/myatt.txt': 'aGV5',
                  'attachments/my2ndatt.txt': 'aG8='})
```

Note that the keys of such attachments must start with 'attachments/',
with the rest indicating a filename.  The values must be base64
encoded but converted from bytes to strings.  This is arguably a bit
awkward, but we do this because the attachments dictionary must in
general be JSON serializable, and using bytes would violate this.

When 'model1' is persisted, 'FileLike' will create one file for each
attachment and call them 'model-1-myatt.txt' and
'model-1-my2ndatt.txt'.  The implementation chooses to use flat files
rather than a folder to hold all attachments for a given model.  This
is done so that we do not need to add extra methods to
'FileLikeIO' (such as mkdir), which means we should get support for
other 'FileLike' implementations such as 'Rest' and 'S3' for free.

Moreover, the attachments will be removed from the model's pickle and
from the metadata files, in order not to blow up the size of those.
When the model is loaded back through the model persister, the
attachments are loaded and put back into the model's metadata
dictionary.

What's a good time to add the attachments to the model?  Use the
'write_model_decorators' pluggable decorator hook to add a decorator
that adds your attachment just before it's persisted.  A toy example:

```
def my_write_model_decorator(self, model):
    report = my_make_report(model)  # assume returns an HTML string
    report_encoded = b64encode(report.encode('utf-8')).decode('ascii')
    annotate(model, {'attachments/report.html': report_encoded})
```

Let me know what you think.  Once we've settled on the right way to do
this, we'll put this into proper docs and examples.
…k-as-factory

In configuration, use exclamation mark '!' instead of '__factory__'
…ples

Examples on how to use Keras and XGBoost with Palladium
Proposal implementation for handling model attachments
avoid loading stale metadata in S3 persister
Used https://pypi.org/project/pur/ for update for requirements.
@coveralls
Copy link

Coverage Status

Coverage decreased (-9.9%) to 89.847% when pulling 99bf061 on develop into 20e369b on master.

@alattner alattner merged commit 3e7bd7d into master May 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants