Decorator to specify value and dag node inputs #162

wangkev · 2022-07-23T13:41:45Z

Is your feature request related to a problem? Please describe.
I am looking for a decorator to specify both value and dag node inputs. parameterized allows for value inputs, parameterized_inputs allows for dag node inputs, but there are no decorators that allow you to do both.

Describe the solution you'd like
A decorator that allows for both value and dag node inputs.

Describe alternatives you've considered
I tried just decorating a function with both decorators, but that is not supported.

Additional context

The text was updated successfully, but these errors were encountered:

skrawcz · 2022-07-23T23:49:55Z

@wangkev thanks for posting.

Do you mind providing a bit more information around your use case? Otherwise we've talked about this possibility -- we just haven't had the use case for it.

Feel free to post here -- or join us in slack to chat there.

wangkev · 2022-07-24T13:23:09Z

There are a few use cases. For example, to implement an adder or shifter function.

def add(field: pd.Series, n: int):
    return field + n

def shift(field: pd.Series, window: int):
    return field.shift(window)

We would want to parameterize field and n/window.

More generally, many pandas.Series methods take as input other scalar arguments.

elijahbenizzy · 2022-07-24T23:25:20Z

So its basically a way to do 1 function where it explodes out to multiple use-cases, right?

It's an interesting tension between repetitive code (E.G. repeating multiple input/output parameterizations) and making everything readable. A common way we've seen this is something like:

def _shift_helper(**kwargs):
    # this is a slight limitation of the `@does` API -- happy to think this through a little more
    series_arg = [item for item in kwargs.values() if isinstance(kwarg, pd.Series)]
    window_args = [item for item in kwargs.values() if isinstance(kwarg, int)] 
    return series_arg.shift(window_arg)


@does(_shift_helper)
def shift_artifact_1(field_1: pd.Series, window: int=3) -> pd.Series:
    pass

@does(_shift_helper)
def shift_artifact_2(field_2: pd.Series, window: int=5) -> pd.Series:
    pass

# And so on

This is nice as everything is written down in a function but there's a lot of added code. I think when there gets to be enough parametrizations, what you suggested would be pretty nice.

E.G. something like:

@parametrized_everything_for_lack_of_a_better_name(
    field_1=({"field" : "field_1", "shift" : 3}, "field_1 shifted by 3"),
    field_2=({"field" : "field_2", "shift" : 5}, "field_2 shifted by 5"),
def shift(field: pd.Series, window: int):
    return field.shift(window)

The cool part about this from an implementation perspective is we could simplify the code by gutting @parametrized and @parametrized_inputs , replacing it with subclasses/delegation of this new decorator.

Also, what exactly breaks with the two combined? Feels like that could be an approach as well.

wangkev · 2022-07-25T12:00:27Z

Yes, exactly. I have settled on something like the former you mentioned for now, just to have something that works. The preferred, imo, is definitely the latter, which I find is clearer and more concise.

elijahbenizzy · 2022-07-25T17:07:54Z

Awesome, I'm inclined to agree. Might give this a stab in the next week -- if I do, would you mind testing out on a branch/giving feedback?

There are a few APIs I'm thinking about -- the interesting thing is we have to specify if an input is a literal (E.G. a string), or coming from an input. One option could be...

@parametrized_full(
    field_1=({"field" : dependency("field_1"), "shift" : 3}, "field_1 shifted by 3"),
    field_2=({"field" : dependency("field_2"), "shift" : 5}, "field_2 shifted by 5"),
def shift(field: pd.Series, window: int):
    return field.shift(window)

Or its alternative:

@parametrized_full(
    field_1=({"field" : "field_1", "shift" : literal(3)}, "field_1 shifted by 3"),
    field_2=({"field" : "field_2", "shift" : literal(5)}, "field_2 shifted by 5"),
def shift(field: pd.Series, window: int):
    return field.shift(window)

Any other ideas? I'm kinda into requiring this decorator to have both:

@parametrized_full(
    field_1=({"field" : dependency("field_1"), "shift" : literal(3)}, "field_1 shifted by 3"),
    field_2=({"field" : dependency("field_2"), "shift" : literal(5)}, "field_2 shifted by 5"),
def shift(field: pd.Series, window: int):
    return field.shift(window)

The advantage of this is that it allows something really expressive, and easy delegation to @parametrized and @parametrized_inputs.

The annoying thing is that @parametrized should really be called @parametrized_values, and the new one should just be @parametrized.... Could be a good change for 2.0 but we haven't started thinking too much about that.

wangkev · 2022-07-26T00:21:15Z

Absolutely - happy to test out and give feedback!

I also kind of like explicitly specifying a node vs. a literal. Although, I do find the expression API of polars to be quite elegant (https://pola-rs.github.io/polars-book/user-guide/dsl/expressions.html). This would be in favor of the first option, with only nodes explicitly specified.

Other thing I would note is whether it would make sense to keep the parameterized docstring approach as with parameterized_inputs now. Maybe it is more flexible with an explicit docstring.

elijahbenizzy · 2022-07-30T01:58:50Z

OK, had a lot of fun with this. Not complete yet, but on its way: #163.

polars looks awesome! hadn't seen it before, but definitely want to check it out -- those transformations look quite slick. I'm in favor of specifying both upstream and literal (and this currently would break otherwise), but I'd consider otherwise.

elijahbenizzy · 2022-08-05T23:32:11Z

Hey! Ready for review -- would love your thoughts on the API! #163

skrawcz · 2022-08-06T16:56:37Z

@wangkev in case it's helpful, you can either checkout the branch and do pip install -e ., or install the branch directly from github. I believe this should work pip install "git+https://github.com/stitchfix/hamilton@full_parametrized". Instead of https you could do ssh.

wangkev · 2022-08-06T22:18:48Z

Hi -- thanks for the update, this is neat! The overall functionality seems to be exactly what we discussed. I have a few thoughts from playing around:

parameterize_values and parameterize_inputs do not like it when I pass in literal and upstream inputs. I would have expected that to work, i.e. to be able to be consistent with parameterize. Don't think it's a necessity, but did cause a bit of confusion in the beginning.
Would it make sense to have consistent naming between literal/upstream and parameterize_values/parameterize_inputs?

Thanks again for the quick update on this!

elijahbenizzy · 2022-08-07T19:27:26Z

Hi -- thanks for the update, this is neat! The overall functionality seems to be exactly what we discussed. I have a few thoughts from playing around:

parameterize_values and parameterize_inputs do not like it when I pass in literal and upstream inputs. I would have expected that to work, i.e. to be able to be consistent with parameterize. Don't think it's a necessity, but did cause a bit of confusion in the beginning.

Would it make sense to have consistent naming between literal/upstream and parameterize_values/parameterize_inputs?

Thanks again for the quick update on this!

Thanks for taking it for a spin! Re: (1) the idea was to have it be a more speciifc API, but its hard to say it shouldn't accept them? E.G. why would it not allows upstream for @parameterize_inputs?

Re (2) slightly torn -- appreciate consistency but parameterize_upstream and parameterize_literal doesn't feel particularly readable. Perhaps we should replace upstream and literal with input and value, although those feel a little less clear to me... Thoughts?

wangkev · 2022-08-08T01:26:08Z

Re: (1) the idea was to have it be a more speciifc API, but its hard to say it shouldn't accept them? E.G. why would it not allows upstream for @parameterize_inputs?

Agree.

Re (2) slightly torn -- appreciate consistency but parameterize_upstream and parameterize_literal doesn't feel particularly readable. Perhaps we should replace upstream and literal with input and value, although those feel a little less clear to me... Thoughts?

I was thinking the same thing about naming. Elsewhere in the documentation (https://github.com/stitchfix/hamilton/blob/main/basics.md?plain=1#L16), upstream/inputs is called a node, so maybe it makes sense to keep that terminology? Then it will be node vs value?

elijahbenizzy · 2022-08-08T02:21:17Z

Oh interesting. So:

{'foo' : node('foo_source'), 'bar' : value('bar')}?

Then keep @parameterize_inputs and @parameterize_values? I don't think @parameterize_node would make too much sense...

Maybe @parameterize_sources and @parameterize_values makes sense? Along with source(...) and value(...)?

source and value is what we're really changing...

Thanks for the feedback!

skrawcz · 2022-08-08T05:07:06Z

🤔 I think this is a power user feature, so node is probably better than source? Though node will clash with the package & class name we have... hmm.

We should be aiming for naming where if someone has used it a couple of times, they wont need to look at the documentation to remember what the distinctions are...

So current options are:

source & value.
source & literal.
~~input & value.~~ input would override a built in term so I think this removes this option.
~~input & literal.~~ input would override a built in term so I think this removes this option.
upstream & value.
upstream & literal.
dependency & value.
dependency & literal.
field & value.
field & literal.

Any others?

elijahbenizzy · 2022-08-08T17:14:04Z

input can be namespaced, but agreed we want to steer clear from builtins. IMO:

@parameterize_sources and {'arg' : source('arg_source')}
@parameterize_values and {'arg' : value('arg_value')}

Are internally consistent, logical, and easy to understand. inputs are not specific enough, upstream feels hard to grok, dependency is a little too long. ALthough dependency and value would work as well...

wangkev · 2022-08-09T00:59:10Z

Another option is field vs value.

elijahbenizzy · 2022-08-09T23:47:45Z

OK, so I think I'm going to go with the above:

@parameterize_sources and source(...)
@parameterize_values and value(...)

IMO its the clearest and most concise - any objections?

skrawcz added the triage label for issues that need to be triaged. label Jul 23, 2022

skrawcz added enhancement New feature or request and removed triage label for issues that need to be triaged. labels Aug 6, 2022

skrawcz assigned elijahbenizzy Aug 6, 2022

skrawcz linked a pull request Aug 6, 2022 that will close this issue

Full parametrized decorator #163

Merged

12 tasks

elijahbenizzy closed this as completed in #163 Aug 15, 2022

elijahbenizzy mentioned this issue Aug 16, 2022

ValueError with typing.Union type signature #170

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decorator to specify value and dag node inputs #162

Decorator to specify value and dag node inputs #162

wangkev commented Jul 23, 2022

skrawcz commented Jul 23, 2022

wangkev commented Jul 24, 2022 •

edited

elijahbenizzy commented Jul 24, 2022

wangkev commented Jul 25, 2022

elijahbenizzy commented Jul 25, 2022

wangkev commented Jul 26, 2022 •

edited

elijahbenizzy commented Jul 30, 2022

elijahbenizzy commented Aug 5, 2022

skrawcz commented Aug 6, 2022

wangkev commented Aug 6, 2022 •

edited

elijahbenizzy commented Aug 7, 2022 •

edited

wangkev commented Aug 8, 2022

elijahbenizzy commented Aug 8, 2022 •

edited

skrawcz commented Aug 8, 2022 •

edited

elijahbenizzy commented Aug 8, 2022

wangkev commented Aug 9, 2022

elijahbenizzy commented Aug 9, 2022

Decorator to specify value and dag node inputs #162

Decorator to specify value and dag node inputs #162

Comments

wangkev commented Jul 23, 2022

skrawcz commented Jul 23, 2022

wangkev commented Jul 24, 2022 • edited

elijahbenizzy commented Jul 24, 2022

wangkev commented Jul 25, 2022

elijahbenizzy commented Jul 25, 2022

wangkev commented Jul 26, 2022 • edited

elijahbenizzy commented Jul 30, 2022

elijahbenizzy commented Aug 5, 2022

skrawcz commented Aug 6, 2022

wangkev commented Aug 6, 2022 • edited

elijahbenizzy commented Aug 7, 2022 • edited

wangkev commented Aug 8, 2022

elijahbenizzy commented Aug 8, 2022 • edited

skrawcz commented Aug 8, 2022 • edited

elijahbenizzy commented Aug 8, 2022

wangkev commented Aug 9, 2022

elijahbenizzy commented Aug 9, 2022

wangkev commented Jul 24, 2022 •

edited

wangkev commented Jul 26, 2022 •

edited

wangkev commented Aug 6, 2022 •

edited

elijahbenizzy commented Aug 7, 2022 •

edited

elijahbenizzy commented Aug 8, 2022 •

edited

skrawcz commented Aug 8, 2022 •

edited