This repository has been archived by the owner on Jul 3, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 38
Adds simple case to help motivate @extract_fields #66
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
If someone wanted to use Hamilton to model a modeling dataflow, they would struggle. Need a new decorator to handle extracting outputs from functions that return multiple things and aren't a dataframe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like it -- I think there are a lot of extract_
we might want. Some options:
- Have a single extract decorator that can assign polymorphically to different types that we can add extractable types easily
- Have an extract decorator for each type, share some abstraction (perhaps (1)?)
- Be opinionated about the things we want to extract -- only allow dfs, typeddicts, matrices (maybe...)
The API to use it looks like this: ```python @function_modifiers.extract_fields( {'X_train': np.ndarray, 'X_test': np.ndarray, 'y_train': np.ndarray, 'y_test': np.ndarray}) def train_test_split_func( ... , ... ) -> dict ``` I decided to go with a straight dict of `field_name` to `field_type` because that seemed the simplest thing to define. Note, we use the documentation for the original function, rather than enabling individual doc strings for the types. I think this suffices for now. To support TypedDict, I didn't want to have to import typing_extensions to handle it. Also you can't inline define a TypedDict class, so it would be more verbose which is less that ideal. We can always add TypedDict support later. Also punted on `Tuple` support -- that might be another decorator...
This example is interesting because it shows how one might build a "bank" of hamilton functions that do some generic modeling -- while keeping it generic so that adding new contexts/ running it with new models, results in a small amount of work. The key things to get this to work are: - different python modules to load data. They have to output what's required to link with the my_train_evaluate_logic functions. - config & @config.when to add the correct model function to the dataflow. So if you want to switch between model types: easy -- change config. So if you want to switch fitting models on different data: easy -- change the data loading module.
skrawcz
changed the title
Adds simple case to help motivate @extract_outputs
Adds simple case to help motivate @extract_fields
Feb 9, 2022
@elijahbenizzy I punted on TypedDict, in favor of something slightly simpler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts
We have three options for the API:
- Allow
dict
, force specification of types in decorator - Allow
Dict[str, TYPE]
, force everything to be the same type, just specify names in decorator - Allow TypedDict, specify everything in decorator
This chooses (1) -- let's think through the API a bit? IMO (2) is the most readable but its also limiting. Do we want to support dicts with varied value-types?
Helps prove things work as intended!
I wonder if this is flakey somehow? Anyway adding this to see if circleci complains or not. Seems like there could be a version mismatch somewhere that causes this, i.e. my local env, versus what circleci installs, etc.
skrawcz
force-pushed
the
add_generic_model_example
branch
from
February 9, 2022 23:51
8a799ac
to
b359b3b
Compare
elijahbenizzy
approved these changes
Feb 10, 2022
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If someone wanted to use Hamilton to model a modeling dataflow,
they would struggle. Need a new decorator to handle extracting
outputs from functions that return multiple things and aren't
a dataframe.
Additions
@extract_fields
Testing
Checklist
Testing checklist
Python