Skip to content
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.

Adds simple case to help motivate @extract_fields #66

Merged
merged 5 commits into from
Feb 10, 2022

Conversation

skrawcz
Copy link
Collaborator

@skrawcz skrawcz commented Feb 7, 2022

If someone wanted to use Hamilton to model a modeling dataflow,
they would struggle. Need a new decorator to handle extracting
outputs from functions that return multiple things and aren't
a dataframe.

Additions

  • hello world showing how to create a model using hamilton as a dataflow language.
  • adds decorator @extract_fields

Testing

  1. Unit tests & this hello world example.

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows the standards laid out in the TODO link to standards
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Python

  • python 3.6
  • python 3.7

If someone wanted to use Hamilton to model a modeling dataflow,
they would struggle. Need a new decorator to handle extracting
outputs from functions that return multiple things and aren't
a dataframe.
Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like it -- I think there are a lot of extract_ we might want. Some options:

  1. Have a single extract decorator that can assign polymorphically to different types that we can add extractable types easily
  2. Have an extract decorator for each type, share some abstraction (perhaps (1)?)
  3. Be opinionated about the things we want to extract -- only allow dfs, typeddicts, matrices (maybe...)

examples/model_examples/scikit-learn/model_logic.py Outdated Show resolved Hide resolved
The API to use it looks like this:

```python

@function_modifiers.extract_fields(
    {'X_train': np.ndarray, 'X_test': np.ndarray, 'y_train': np.ndarray, 'y_test': np.ndarray})
def train_test_split_func( ... ,  ... ) -> dict
```

I decided to go with a straight dict of `field_name` to `field_type` because that seemed
the simplest thing to define. Note, we use the documentation for the original function,
rather than enabling individual doc strings for the types. I think this suffices for now.

To support TypedDict, I didn't want to have to import typing_extensions to handle
it. Also you can't inline define a TypedDict class, so it would be more verbose which
is less that ideal.

We can always add TypedDict support later. Also punted on `Tuple` support -- that
might be another decorator...
This example is interesting because it shows how one might
build a "bank" of hamilton functions that do some generic
modeling -- while keeping it generic so that adding new contexts/
running it with new models, results in a small amount of work.

The key things to get this to work are:

 - different python modules to load data. They have to output what's required to link
   with the my_train_evaluate_logic functions.
 - config & @config.when to add the correct model function to the dataflow.

So if you want to switch between model types: easy -- change config.
So if you want to switch fitting models on different data: easy -- change the data loading module.
@skrawcz skrawcz changed the title Adds simple case to help motivate @extract_outputs Adds simple case to help motivate @extract_fields Feb 9, 2022
@skrawcz
Copy link
Collaborator Author

skrawcz commented Feb 9, 2022

@elijahbenizzy I punted on TypedDict, in favor of something slightly simpler.
We just need to add unit tests...

@skrawcz skrawcz marked this pull request as ready for review February 9, 2022 07:43
Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts
We have three options for the API:

  • Allow dict, force specification of types in decorator
  • Allow Dict[str, TYPE], force everything to be the same type, just specify names in decorator
  • Allow TypedDict, specify everything in decorator

This chooses (1) -- let's think through the API a bit? IMO (2) is the most readable but its also limiting. Do we want to support dicts with varied value-types?

examples/model_examples/scikit-learn/run.py Show resolved Hide resolved
examples/model_examples/scikit-learn/run.py Show resolved Hide resolved
hamilton/function_modifiers.py Show resolved Hide resolved
hamilton/function_modifiers.py Outdated Show resolved Hide resolved
hamilton/function_modifiers.py Show resolved Hide resolved
hamilton/function_modifiers.py Outdated Show resolved Hide resolved
Helps prove things work as intended!
I wonder if this is flakey somehow? Anyway adding this to see if circleci complains
or not. Seems like there could be a version mismatch somewhere that causes this,
i.e. my local env, versus what circleci installs, etc.
@skrawcz skrawcz merged commit 0071c65 into main Feb 10, 2022
@skrawcz skrawcz deleted the add_generic_model_example branch February 10, 2022 00:17
gitbook-com bot pushed a commit that referenced this pull request Feb 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants