Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering of packages #497

Closed
evamaxfield opened this issue Mar 15, 2018 · 4 comments
Closed

Filtering of packages #497

evamaxfield opened this issue Mar 15, 2018 · 4 comments
Assignees

Comments

@evamaxfield
Copy link
Collaborator

evamaxfield commented Mar 15, 2018

There is a real desire from my team to have a MongoDB style filtering mechanism for both downloaded packages and installing packages.

# package previously downloaded
is_cell = {'field': 'node_name',
          'check': 'contains',
          'value': 'cell'}

subset = pkg.filter([is_cell, ..., other_filters])
print(len(pkg))
200
print(len(subset))
30

# package to be installed
quilt.install('organization/pkg', filters=[is_cell, ..., other_filters], rename='subset_of_pkg')

Having a native querying system of packages would allow for better reproducibility between individuals using the same package as currently we are each writing our own filtering functions.

@evamaxfield
Copy link
Collaborator Author

I think this also relates to #495 because I feel a lot of the information that could be retrieved from an inspect() call would be a good base filtering mechanism.

@akarve akarve self-assigned this Mar 17, 2018
@akarve akarve changed the title Querying of Packages Filtering of packages Mar 17, 2018
@akarve
Copy link
Member

akarve commented Mar 17, 2018

We are interested in implementing this feature and would like to gather more examples. The above filters by node_name. More generally, we could support regex matching for node names. What are other examples that you encounter in your workflows? The queries that we need to service will strongly effect the design.

Does it make sense to support the attachment of a metadata field to each node (e.g. a single JSON doc) and then filtering on the same?

@evamaxfield
Copy link
Collaborator Author

evamaxfield commented Mar 19, 2018

I thought about this over the weekend and I think my ideal system would be as follows:

Introduce a new keyword to the build.yml file system. 'metadata', works like a 'file' keyword but only accepts json files (json.load() must not return a failure). On construction of the package, you use whatever json is given by this field and inject the attributes like node_name or others found in an inspect() call on that node. An example is below.

If a person provides a metadata file it would look like this.

contents:
   README:
      file: this_is_a_readme.md
      transform: id
   real_file:
      file: actual_file.tiff
      transform: id
      metadata: this_files_meta.json

The above example would append the contents of an inspect() call to the provided metadata.

If a person doesn't provide a metadata file it would look like this.

contents:
   README:
      file: this_is_a_readme.md
      transform: id
   real_file:
      file: actual_file.tiff
      transform: id

The above example would generate a base metadata file that contains the contents of an inspect() call.

Why do I like this system:
It means that not only would people be able to filter on basic quilt node attributes but also allow for custom filtering by the package creator using the same system.

Other things I would really like are things like a number of rows option. Maybe I provide a query but I only want the first 10 that match. More complex would be give me depth_rows so go through each group_node and filter the files present but only return the first 10 for that group_node.

There are some other things but I would like to hear back first on what you think before I go too deep.

@akarve
Copy link
Member

akarve commented Mar 13, 2020

Quilt 3 packages now provide predicate filtering over package metadata.

@akarve akarve closed this as completed Mar 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants