Skip to content
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.

Adds parameterized_inputs #104

Merged
merged 2 commits into from
Apr 12, 2022
Merged

Adds parameterized_inputs #104

merged 2 commits into from
Apr 12, 2022

Conversation

skrawcz
Copy link
Collaborator

@skrawcz skrawcz commented Apr 6, 2022

This change adds paramterized_inputs decorator, which
is almost a carbon copy of paramterized_input but that it allows
any number of parameters to be mapped.

Why is it a separate class? Well for backwards compatibility
we don't want to break parameterized_input. Should we try to
consolidate them? I think so -- so we should then mark parameterized_input
as deprecated and will be removed in a 2.0 release? ✅

I should then probably update all documentation to reflect parameterized_inputs
and thus the documentation on paramterized_input to either be hidden or
non-existant? Hmm. ✅

Changes

  • Add parameterized_inputs decorator.
  • Deprecates parameterized_input decorator.
  • Adjust documentation to promote parameterized_inputs instead.

Testing

  1. unit tests pass.

Notes

Motivation is a side project and having to write out repetitive functions is laborious -- this is admittedly a power
user mode, but one I think that makes sense to have.
e.g. this looks nice and readable

from hamilton.function_modifiers import parametrized_inputs
import pandas as pd

NUMERIC_FEATURES = ['age', 'height', 'weight', 'body_mass_index', 'transportation_expense',
                    'distance_from_residence_to_work', 'service_time', 'work_load_average_per_day',
                    'hit_target']


@parametrized_inputs(**{f'{k}_mean': dict(feature=k) for k in NUMERIC_FEATURES})
def mean_computer(feature: pd.Series) -> pd.Series:
    """Average of {feature}"""
    return feature.mean()


@parametrized_inputs(**{f'{k}_std_dev': dict(feature=k) for k in NUMERIC_FEATURES})
def std_dev_computer(feature: pd.Series) -> pd.Series:
    """Standard deviation of {feature}"""
    return feature.std()


@parametrized_inputs(**{f'{k}_zero_mean': dict(feature=k, feature_mean=f'{k}_mean') for k in NUMERIC_FEATURES})
def zero_mean_computer(feature: pd.Series, feature_mean: pd.Series) -> pd.Series:
    """Creates zero mean value of {feature} by subtracting {feature_mean}"""
    return feature - feature_mean


@parametrized_inputs(**{f'{k}_zero_mean_unit_var': dict(feature_mean=f'{k}_mean', feature_std_dev=f'{k}_std_dev') 
                        for k in NUMERIC_FEATURES})
def zero_mean_unit_var_computer(feature_mean: pd.Series, feature_std_dev: pd.Series) -> pd.Series:
    """Computes zero mean unit variance of {feature_mean} with {feature_std_dev} to create {output_name}"""
    return feature_mean / feature_std_dev

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Python - local testing

  • python 3.6
  • python 3.7

@skrawcz
Copy link
Collaborator Author

skrawcz commented Apr 6, 2022

So next steps I'm thinking:

  • Change docs to default to parameterized_inputs (plural).
  • Put a deprecation warning on parameterized_input -- and log when it's used.

tests/resources/parametrized_inputs.py Outdated Show resolved Hide resolved
hamilton/function_modifiers.py Outdated Show resolved Hide resolved
hamilton/function_modifiers.py Outdated Show resolved Hide resolved
hamilton/function_modifiers.py Outdated Show resolved Hide resolved
@skrawcz skrawcz marked this pull request as ready for review April 9, 2022 06:24
@elijahbenizzy
Copy link
Collaborator

Not completely sure how I feel about this API. Definitely cleaner but it seems that its not particularly consistent with other APIs. Its dryer/slightly easier to read (which is good), but it takes quite a bit of work to figure out what's happening/parse the API. The design philosophy behind decorators is specifically to make it easy to figure out what's going on. I worry that with objects/named tuples its going to be harder to figure out the shape of the DAG.

Honestly, I wonder if that's just an inherent issue with @parametrized_inputs?

Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts

decorators.md Outdated Show resolved Hide resolved
decorators.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- not convinced we want parameterization. I get it if we need later kwargs, but I think using **parameterization is going to look cleaner/be more readable.

hamilton/function_modifiers.py Show resolved Hide resolved
hamilton/function_modifiers.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

hamilton/function_modifiers.py Outdated Show resolved Hide resolved
This is a squashed commit of all the commits to create the parameterized_inputs
decorator. See the commits below that are in reverse chronological order.

Basically, this replaces `parameterized_input`, and provides a more succinct
API by using the function doc as a template, removing the need for a tuple.

We believe this is a simpler API to read/maintain — the kwargs assumption will
make it harder to extend the API, but we don’t think we need to.

——— Consolidated commits below ——

Adds format doc function to parameterized_inputs

To consolidate formatting in a single place. Want to keep
the code DRY. (+7 squashed commits)
Squashed commits:
[338c02a] Touches up documentation typos

And adds reraising original exception for formatting errors.
[f2725ec] Adds validate test for parameterized_inputs for the doc string

Doc string templates errors should be checked during validate.
That way we can throw a better error.
[4689177] Changes parameterized_inputs to use kwargs

1. So that we don't scope creep this API. Key words arguments are
assumed to be outputs. This removes a level of nesting on the API.
2. We're inline with other decorators in having the kwargs behavior, e.g. config.when.
[df571f3] Changes parameterized_inputs to take dict and format doc string

With the prior design, the issue was documentation. But after noodling on it
and prodding from Eljiah, I realized that the documentation string should just
be a template. Since you're going to be using the same function, the doc string
should be fairly generic, with only the parameters being passed changing the
meaning -- which with templatization, allows for that to come through.

Note - stylistically I think we prefer:

```python
@parametrized_inputs(
    parameterization={
        'output_123': dict(input_value_tbd1='input_1',
                           input_value_tbd2='input_2',
                           input_value_tbd3='input_3')
    }
)
```
because syntax highlighting will make it clearer what is being replaced in the
function with what input value.

to

```python
@parametrized_inputs(
    parameterization={
        'output_123': {
                 'input_value_tbd1'='input_1',
                 'input_value_tbd2='input_2',
                 'input_value_tbd3'='input_3'
        }
    }
)
```
or

```python
@parametrized_inputs(
    parameterization=dict(
        output_123={
                 'input_value_tbd1'='input_1',
                 'input_value_tbd2='input_2',
                 'input_value_tbd3'='input_3'
        }
    )
)
```
I think the last two ones are less clear/readable.

Otherwise I did have the thought of using actual functions in the mapping. It would
then be clearer to trace what is going on with an IDE.
However that brings up the possibility of people importing functions and running
into import messes... So punting on that for now. We can always add that in later
if we think that's required.
[0e21e9d] Changes parameterized_input to parameterized_inputs in decorator docs

Since parameterized_input is deprecated, we should just remove it from the
docs and instead just push parameterized_inputs.
[4379a38] Adds ParameterMapping named tuple object to adjust parameterized_inputs

People creating a dictionary of tuple to tuples is probably unwieldy. This is a power
use case, and thus it should afford a little more friction to use, and the net result of that
is that the code should become more readable.

We should preference keyword arguments everywhere here -- as that will make the code
much more readable than without it. E.g. :
```python
@parametrized_inputs(
    parameters=['input_value_tbd1', 'input_value_tbd2', 'input_value_tbd3'],
    parameter_mappings=[
        ParameterMapping(
            inputs=['input_1', 'input_2', 'input_3'],
            output_name='output_123',
            output_docs='function_with_multiple_inputs called using input_1'),
    ]
)
def func(...)

@parametrized_inputs(
    ['input_value_tbd1', 'input_value_tbd2', 'input_value_tbd3'],
    [ParameterMapping(['input_1', 'input_2', 'input_3'], 'output_123', 'function_with_multiple_inputs called using input_1')]
)
```

Adjusts tests and documentation accordingly.
[c1d5fa8] Adds parameterized_inputs

This change adds `paramterized_inputs` decorator, which
is almost a carbon copy of `paramterized_input` but that it allows
any number of parameters to be mapped.

Why is it a separate class? Well for backwards compatibility
we don't want to break parameterized_input. Should we try to
consolidate them? I think so -- so we should then mark `parameterized_input`
as deprecated and will be removed in a 2.0 release?

I should then probably update all documentation to reflect `parameterized_inputs`
and thus the documentation on `paramterized_input` to either be hidden or
non-existant? Hmm.
So that we can warn users not to use this, and instead
migrate to `parameterized_inputs` (plural).

Not sure when we'd be ready for a 2.X release, but when we
do, we should remove this decorator.
@skrawcz skrawcz merged commit b4e874b into main Apr 12, 2022
@skrawcz skrawcz deleted the add_parameterized_inputs branch April 12, 2022 17:35
gitbook-com bot pushed a commit that referenced this pull request Aug 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants