-
Notifications
You must be signed in to change notification settings - Fork 38
Better querying for available nodes #159
Comments
OK, I think this one is going to become necessary as we do more with data quality, etc... Don't have the bandwidth to do it quite yet but I'm putting it on the queue. |
The query API looks great. My team would find this useful to, for example, perform post-hoc analysis on projects to determine if any model features or their inputs were tagged with particular attributes. In my industry (financial services) we often have to be able to answer: were any sensitive features used in the development of this model?. This would be very easy to answer with this kind of query interface. |
Awesome. Will bump this up on the queue. Would love some sample queries you'd like to perform for your use-case if you can provide them! |
Right now it looks like queries would be performed on all nodes in a graph. Usually I want to know characteristics about upstream and downstream nodes for a given set of nodes. For example:
Or, conversely
The second one could be useful to determine downstream impacts of source dataset changes when, for example, one of our data providers has an issue or gets turned off. In both cases, the missing piece seems to be applying the query API to a subset of nodes rather than all of the nodes in the graph. I also haven't checked to see if nodes are hashable, but supporting set operations on node collections could be helpful here as well. I'm going to be demoing the use of hamilton to our data governance team in the coming weeks and that will help me come up with more usage patterns. |
@gravesee awesome, thanks for the information. Mind if we set up time to chat? I sent you a LinkedIn request to get your contact details. We'd love to help. |
We are moving repositories! Please see the new version of this issue at DAGWorks-Inc/hamilton#37. Also, please give us a star/update any of your internal links. Note that everything else (slack community, pypi packages, etc...) will not change at all. |
We make it easy to attach metadata to nodes, but don't yet have a natural API to make it easy to query these. This could be useful if:
(1) You want to look up a set of nodes with specific tags for reporting purposes
(2) You want to look up nodes used in data quality operators (see the motivating use-case below)
(3) You want to run some subset of the DAG that relates to the way nodes are tagged.
Is your feature request related to a problem? Please describe.
When using DQ we have to query by tags, this is really ugly. E.G.
We should be able to have some utility functions here.
Describe the solution you'd like
Some combo of the following:
dr.query(tag_match={...}, name_match=r"...", module_match=r"...")
hamilton_utils.get_dq_validators(...)
or something like that. Note this would be valuable for more than just data quality -- E.G. tagging by nodes in general.
Describe alternatives you've considered
See above
Additional context
Writing out gitbook docs...
The text was updated successfully, but these errors were encountered: