Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal implementation for handling model attachments #119

Merged
merged 1 commit into from May 6, 2020

Conversation

dnouri
Copy link
Collaborator

@dnouri dnouri commented Oct 21, 2019

The idea is that we sometimes want to attach files to models, such as
HTML reports or the like, and in the backend, these files should be
stored separately, to allow easy access.

Here we're implemeting this idea for the 'FileLike' model persister
and testing it for the 'File' subclass. This should work for 'Rest'
and 'S3' as well, but I thought it's best to add tests when we all
agreed on the idea.

Usage is demonstrated in 'TestFileAttachments'. The contract is as
follows: Use 'palladium.util.annotate' to add an arbitrary number of
attachments to the model, like so:

annotate(model1, {'attachments/myatt.txt': 'aGV5',
                  'attachments/my2ndatt.txt': 'aG8='})

Note that the keys of such attachments must start with 'attachments/',
with the rest indicating a filename. The values must be base64
encoded but converted from bytes to strings. This is arguably a bit
awkward, but we do this because the attachments dictionary must in
general be JSON serializable, and using bytes would violate this.

When 'model1' is persisted, 'FileLike' will create one file for each
attachment and call them 'model-1-myatt.txt' and
'model-1-my2ndatt.txt'. The implementation chooses to use flat files
rather than a folder to hold all attachments for a given model. This
is done so that we do not need to add extra methods to
'FileLikeIO' (such as mkdir), which means we should get support for
other 'FileLike' implementations such as 'Rest' and 'S3' for free.

Moreover, the attachments will be removed from the model's pickle and
from the metadata files, in order not to blow up the size of those.
When the model is loaded back through the model persister, the
attachments are loaded and put back into the model's metadata
dictionary.

What's a good time to add the attachments to the model? Use the
'write_model_decorators' pluggable decorator hook to add a decorator
that adds your attachment just before it's persisted. A toy example:

def my_write_model_decorator(self, model):
    report = my_make_report(model)  # assume returns an HTML string
    report_encoded = b64encode(report.encode('utf-8')).decode('ascii')
    annotate(model, {'attachments/report.html': report_encoded})

Let me know what you think. Once we've settled on the right way to do
this, we'll put this into proper docs and examples.

The idea is that we sometimes want to attach files to models, such as
HTML reports or the like, and in the backend, these files should be
stored separately, to allow easy access.

Here we're implemeting this idea for the 'FileLike' model persister
and testing it for the 'File' subclass.  This should work for 'Rest'
and 'S3' as well, but I thought it's best to add tests when we all
agreed on the idea.

Usage is demonstrated in 'TestFileAttachments'.  The contract is as
follows: Use 'palladium.util.annotate' to add an arbitrary number of
attachments to the model, like so:

```
annotate(model1, {'attachments/myatt.txt': 'aGV5',
                  'attachments/my2ndatt.txt': 'aG8='})
```

Note that the keys of such attachments must start with 'attachments/',
with the rest indicating a filename.  The values must be base64
encoded but converted from bytes to strings.  This is arguably a bit
awkward, but we do this because the attachments dictionary must in
general be JSON serializable, and using bytes would violate this.

When 'model1' is persisted, 'FileLike' will create one file for each
attachment and call them 'model-1-myatt.txt' and
'model-1-my2ndatt.txt'.  The implementation chooses to use flat files
rather than a folder to hold all attachments for a given model.  This
is done so that we do not need to add extra methods to
'FileLikeIO' (such as mkdir), which means we should get support for
other 'FileLike' implementations such as 'Rest' and 'S3' for free.

Moreover, the attachments will be removed from the model's pickle and
from the metadata files, in order not to blow up the size of those.
When the model is loaded back through the model persister, the
attachments are loaded and put back into the model's metadata
dictionary.

What's a good time to add the attachments to the model?  Use the
'write_model_decorators' pluggable decorator hook to add a decorator
that adds your attachment just before it's persisted.  A toy example:

```
def my_write_model_decorator(self, model):
    report = my_make_report(model)  # assume returns an HTML string
    report_encoded = b64encode(report.encode('utf-8')).decode('ascii')
    annotate(model, {'attachments/report.html': report_encoded})
```

Let me know what you think.  Once we've settled on the right way to do
this, we'll put this into proper docs and examples.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 89.754% when pulling 60a7ded on feature/model-attachments into 28fa35d on develop.

@BenjaminBossan
Copy link

I have looked at this only superficially. Generally, I think it's very helpful for palladium to offer this. I assume Python objects can be pickle.dumps-ed into the attachments.

Two questions:

  1. Sometimes it would make more sense to attach to another object, e.g. the data loader. Is that supported?
  2. How difficult would it be to support other/change the file format for the user?

@dnouri
Copy link
Collaborator Author

dnouri commented Nov 18, 2019

I have looked at this only superficially. Generally, I think it's very helpful for palladium to offer this. I assume Python objects can be pickle.dumps-ed into the attachments.

Well, you'll have to dumps it and encodes it with b64. The reason is that the metadata is supposed to be json serializable... 🙄

Two questions:

  1. Sometimes it would make more sense to attach to another object, e.g. the data loader. Is that supported?

Not really. It's only supported to add these attachments to the model, because that's what's persisted. Let me know if you have any ideas.

  1. How difficult would it be to support other/change the file format for the user?

Not sure I understand. The file format is pretty much open to the user. The contents of those special metadata entries are decoded and saved to file as binary, so they can contain anything you want, and the filename and extension can be anything you want.

@BenjaminBossan
Copy link

Okay, thanks for clarifying.

@alattner alattner merged commit a4efa8a into develop May 6, 2020
@alattner alattner deleted the feature/model-attachments branch May 31, 2022 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants