Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Storing associated blobs and data with Entries #23

Open
gavento opened this issue Feb 27, 2020 · 1 comment
Open

Feature: Storing associated blobs and data with Entries #23

gavento opened this issue Feb 27, 2020 · 1 comment

Comments

@gavento
Copy link
Contributor

gavento commented Feb 27, 2020

Some computations have multiple outputs, and some of those are naturally files. E.g. training a neural net outputs: the model parameters (data or file), resulting stats (data), TF summarywriter logs (file), sometimes graphs or images (files), stout/stderr captures (data). It would be great if some of those types could be also displayed in the browser (images, text files, logs, ...)

Table / properties

Add a table for storing blobs, every currently valid Entry has associated blobs. It would make sense to include the serialized output value (for consistency or e.g. external blob storage).

Every blob has:

  • id
  • entry - reference to entry, M:1 (TODO: update to match the current schema)
  • data (blob)
  • name - filename (relative to the workdir) or empty (for pickled returned value) or any name withut slash (just a blob, may still be instantiated as a file)
  • Some notion of kind/type/intent - which should be displayed in browser, which are images, which are (viewable) text files, how to highlight the text, etc. Plugins may define more (e.g. tensorboard).
    • Mime seems to be too much and insufficient (e.g. TF logs)? (But good for browser open/download)
    • We can just have tag field mixing role (full, thumbnail, ...) and type (text, json, png, jpg)
    • Or we can have both mime for type and tags for role/intent/plugin (for distinguishing e.g. TF logs ..).

API

Managed through context for creation (see #22):

  • ctx.add_blob(data, name, mimetype, tags=()) - add data blob
  • ctx.add_file(path, name=None, mimetype, tags=()) - add an existing file
    And some type-specific functions (more for text/logs, etc.)
  • ctx.add_figure(fig, name, tags=('thumbnail', )) - render and insert Matplotlib/plotly/bokeh/... image
  • ctx.add_pickled(obj, name, tags=(pickled)) - pickle and add object

Properties and methods on Entry:

  • Entry.files - dictionary name: EntryFile

EntryFile (bikesheddable) has similar properties to the table above. In addition, it has methods:

  • EntryFile.write_file(filename=None) - write as real file, returns Path object
  • EntryFile.as_file() - return a readable file-like object (SQLite supports this)
  • EntryFile.data() - return binary data
@spirali
Copy link
Owner

spirali commented Feb 27, 2020

I agree with the idea. I do not have any objections. If add_figure would be implemented in a way that it does not enforces strong dependancy on matplotlib, etc, it is ok for me.

  • FYI entry has a composite primary key (builder_name, key)
  • it is "id" necessary in the table? It seems that (entry, name) should be unique.
  • I like the idea of tags, but I would rather separate it from content type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants