Skip to content
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.

Parameterized extract #259

Merged
merged 8 commits into from
Jan 30, 2023
Merged

Parameterized extract #259

merged 8 commits into from
Jan 30, 2023

Conversation

elijahbenizzy
Copy link
Collaborator

[Short description explaining the high-level reason for the pull request]

Changes

How I tested this

Notes

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

This provides a convenience API for
#196. The idea is that
people want to do a parameterization and extract columns operator at
once -- this should be easy. The cool thing is that this just uses the
parameterize and extract APIs. It also has a from_df() function to allow
for passing in a dataframe to be more concise.
This provides a convenience API for
#196. The idea is that
people want to do a parameterization and extract columns operator at
once -- this should be easy. The cool thing is that this just uses the
parameterize and extract APIs. It also has a from_df() function to allow
for passing in a dataframe to be more concise.
@elijahbenizzy elijahbenizzy force-pushed the parameterized_extract branch 2 times, most recently from 757aa05 to 4ad63e7 Compare January 29, 2023 22:06
@elijahbenizzy elijahbenizzy changed the title Parameterized extract - WIP, needs more testing Parameterized extract Jan 29, 2023
This adds testing for the decorator, and changes it to a simpler naming
scheme. Currently the naming scheme appends __{i} to the node name. This
is a reserved pattern that won't be used later.
@elijahbenizzy
Copy link
Collaborator Author

OK, this is ready. Calling @parameterized_frame experimental till we get a question on it -- the API is a bit clunky but I haven't ifgured out how to handle it perfectly yet...

@skrawcz
Copy link
Collaborator

skrawcz commented Jan 30, 2023

@elijahbenizzy as discussed, I think making this live under experimental would be better short term.

Not quite production-ready yet...
Copy link
Collaborator

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable -- just a few minor things. Fix them and I approve.

decorators.md Outdated
)
```

Note that we have a double-index. Note that this is still in experimental.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that we have a double-index. Note that this is still in experimental.
Note that we have a double-index. Note that this is still experimental, and has the possibility of being changed; we'd love feedback on this API if you end up using it!

Comment on lines +522 to +545
df = pd.DataFrame(
[
["outseries1a", "outseries2a", "inseries1a", "inseries2a", 10],
["outseries1b", "outseries2b", "inseries1b", "inseries2b", 100],
# ...
],
# Have to switch as indices have to be unique
columns=[
[
"output1",
"output2",
"input1",
"input2",
"input3",
], # configure whether column is source or value and also whether it's input ("source", "value") or output ("out")
["out", "out", "source", "source", "value"],
],
)

@parameterize_frame(df)
def my_func(input1: pd.Series, input2: pd.Series, input3: float) -> pd.DataFrame:
return pd.DataFrame(
[input1 * input2 * input3, input1 + input2 + input3]
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation off

Suggested change
df = pd.DataFrame(
[
["outseries1a", "outseries2a", "inseries1a", "inseries2a", 10],
["outseries1b", "outseries2b", "inseries1b", "inseries2b", 100],
# ...
],
# Have to switch as indices have to be unique
columns=[
[
"output1",
"output2",
"input1",
"input2",
"input3",
], # configure whether column is source or value and also whether it's input ("source", "value") or output ("out")
["out", "out", "source", "source", "value"],
],
)
@parameterize_frame(df)
def my_func(input1: pd.Series, input2: pd.Series, input3: float) -> pd.DataFrame:
return pd.DataFrame(
[input1 * input2 * input3, input1 + input2 + input3]
)
df = pd.DataFrame(
[
["outseries1a", "outseries2a", "inseries1a", "inseries2a", 10],
["outseries1b", "outseries2b", "inseries1b", "inseries2b", 100],
# ...
],
# Have to switch as indices have to be unique
columns=[
[
"output1",
"output2",
"input1",
"input2",
"input3",
], # configure whether column is source or value and also whether it's input ("source", "value") or output ("out")
["out", "out", "source", "source", "value"],
],
)
@parameterize_frame(df)
def my_func(input1: pd.Series, input2: pd.Series, input3: float) -> pd.DataFrame:
return pd.DataFrame(
[input1 * input2 * input3, input1 + input2 + input3]
)

raise ValueError(f"Invalid dep type: {dep_type}")


def _get_index_levels(index: pd.MultiIndex) -> List[list]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc string please

from hamilton.function_modifiers.expanders import ParameterizedExtract


def _get_dep_type(dep_type: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

short doc string -- was not clear that this function returns the right function modifier type.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or return type annotation.

@elijahbenizzy elijahbenizzy merged commit 8913643 into main Jan 30, 2023
@elijahbenizzy elijahbenizzy deleted the parameterized_extract branch January 30, 2023 17:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants