Adds simple case to help motivate @extract_fields #66

skrawcz · 2022-02-07T06:44:27Z

If someone wanted to use Hamilton to model a modeling dataflow,
they would struggle. Need a new decorator to handle extracting
outputs from functions that return multiple things and aren't
a dataframe.

Additions

hello world showing how to create a model using hamilton as a dataflow language.
adds decorator @extract_fields

Testing

Unit tests & this hello world example.

Checklist

Testing checklist

Python

python 3.6
python 3.7

If someone wanted to use Hamilton to model a modeling dataflow, they would struggle. Need a new decorator to handle extracting outputs from functions that return multiple things and aren't a dataframe.

elijahbenizzy

Like it -- I think there are a lot of extract_ we might want. Some options:

Have a single extract decorator that can assign polymorphically to different types that we can add extractable types easily
Have an extract decorator for each type, share some abstraction (perhaps (1)?)
Be opinionated about the things we want to extract -- only allow dfs, typeddicts, matrices (maybe...)

examples/model_examples/scikit-learn/model_logic.py

The API to use it looks like this: ```python @function_modifiers.extract_fields( {'X_train': np.ndarray, 'X_test': np.ndarray, 'y_train': np.ndarray, 'y_test': np.ndarray}) def train_test_split_func( ... , ... ) -> dict ``` I decided to go with a straight dict of `field_name` to `field_type` because that seemed the simplest thing to define. Note, we use the documentation for the original function, rather than enabling individual doc strings for the types. I think this suffices for now. To support TypedDict, I didn't want to have to import typing_extensions to handle it. Also you can't inline define a TypedDict class, so it would be more verbose which is less that ideal. We can always add TypedDict support later. Also punted on `Tuple` support -- that might be another decorator...

This example is interesting because it shows how one might build a "bank" of hamilton functions that do some generic modeling -- while keeping it generic so that adding new contexts/ running it with new models, results in a small amount of work. The key things to get this to work are: - different python modules to load data. They have to output what's required to link with the my_train_evaluate_logic functions. - config & @config.when to add the correct model function to the dataflow. So if you want to switch between model types: easy -- change config. So if you want to switch fitting models on different data: easy -- change the data loading module.

skrawcz · 2022-02-09T07:43:28Z

@elijahbenizzy I punted on TypedDict, in favor of something slightly simpler.
We just need to add unit tests...

elijahbenizzy

Some thoughts
We have three options for the API:

Allow dict, force specification of types in decorator
Allow Dict[str, TYPE], force everything to be the same type, just specify names in decorator
Allow TypedDict, specify everything in decorator

This chooses (1) -- let's think through the API a bit? IMO (2) is the most readable but its also limiting. Do we want to support dicts with varied value-types?

examples/model_examples/scikit-learn/run.py

hamilton/function_modifiers.py

Helps prove things work as intended!

I wonder if this is flakey somehow? Anyway adding this to see if circleci complains or not. Seems like there could be a version mismatch somewhere that causes this, i.e. my local env, versus what circleci installs, etc.

Adds simple case to help motivate @extract_outputs

95bf960

If someone wanted to use Hamilton to model a modeling dataflow, they would struggle. Need a new decorator to handle extracting outputs from functions that return multiple things and aren't a dataframe.

elijahbenizzy reviewed Feb 8, 2022

View reviewed changes

examples/model_examples/scikit-learn/model_logic.py Outdated Show resolved Hide resolved

skrawcz added 2 commits February 8, 2022 23:34

skrawcz changed the title ~~Adds simple case to help motivate @extract_outputs~~ Adds simple case to help motivate @extract_fields Feb 9, 2022

skrawcz marked this pull request as ready for review February 9, 2022 07:43

elijahbenizzy reviewed Feb 9, 2022

View reviewed changes

skrawcz added 2 commits February 9, 2022 11:38

Adds unit tests for extract_fields decorator

f5dfc77

Helps prove things work as intended!

Fixes graphviz test

b359b3b

I wonder if this is flakey somehow? Anyway adding this to see if circleci complains or not. Seems like there could be a version mismatch somewhere that causes this, i.e. my local env, versus what circleci installs, etc.

skrawcz force-pushed the add_generic_model_example branch from 8a799ac to b359b3b Compare February 9, 2022 23:51

elijahbenizzy self-requested a review February 10, 2022 00:16

elijahbenizzy approved these changes Feb 10, 2022

View reviewed changes

skrawcz merged commit 0071c65 into main Feb 10, 2022

skrawcz deleted the add_generic_model_example branch February 10, 2022 00:17

gitbook-com bot pushed a commit that referenced this pull request Feb 21, 2022

GitBook: [#66] No subject

15ec7d0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds simple case to help motivate @extract_fields #66

Adds simple case to help motivate @extract_fields #66

skrawcz commented Feb 7, 2022 •

edited

Loading

elijahbenizzy left a comment

skrawcz commented Feb 9, 2022

elijahbenizzy left a comment

Adds simple case to help motivate @extract_fields #66

Adds simple case to help motivate @extract_fields #66

Conversation

skrawcz commented Feb 7, 2022 • edited Loading

Additions

Testing

Checklist

Testing checklist

Python

elijahbenizzy left a comment

Choose a reason for hiding this comment

skrawcz commented Feb 9, 2022

elijahbenizzy left a comment

Choose a reason for hiding this comment

skrawcz commented Feb 7, 2022 •

edited

Loading