-
Notifications
You must be signed in to change notification settings - Fork 38
Add pandas result builder that converts to long format #121
Comments
So this doesn't appear to be as simple as I thought it would be. The issue going wide to long, is that you need some context to know how to collapse things. To pass that context in, you cannot have a static method, since it can't reference Here's some possible code -- however it's limited in use to non - distributed/cluster computation settings. class SimplePythonLongFormatDataFrameGraphAdapter(SimplePythonDataFrameGraphAdapter):
"""Adapter for building a long format pandas dataframe from the result.
There are two pandas methods that could be used:
- melt() - https://pandas.pydata.org/docs/reference/api/pandas.melt.html#pandas.melt
or
- wide_to_long() - https://pandas.pydata.org/docs/reference/api/pandas.wide_to_long.html
The user must tell this object which one to use, and provide the correct arguments.
"""
def __init__(self, method_name: str, **method_kwargs: Any):
"""
:param method_name: the name of the pandas function to use for going from wide to long format.
Currently "melt" and "wide_to_long".
:param method_kwargs: the arguments, other than the dataframe, to provide for that specific method.
See:
- melt() - https://pandas.pydata.org/docs/reference/api/pandas.melt.html#pandas.melt
- wide_to_long() - https://pandas.pydata.org/docs/reference/api/pandas.wide_to_long.html
For information on what arguments to pass in .
"""
if method_name not in ['melt', 'wide_to_long']:
raise ValueError(f"Error, unknown {method_name} provided. It should be one of ['melt', 'wide_to_long']")
self.method_name = method_name
self.method_kwargs = method_kwargs
def build_result(self, **outputs: typing.Dict[str, typing.Any]) -> typing.Any:
"""Delegates to the result builder function supplied."""
wide_df = super(SimplePythonDataFrameGraphAdapter, self).build_result(**outputs)
pandas_method = getattr(pd, self.method_name)
long_df = pandas_method(wide_df, **self.method_kwargs)
del wide_df # clean this representation up.
return long_df |
@skrawcz I'm not sure I like the abstraction above. Way too coupled to pandas specifics/APIs. Rather, we should come up with a pretty simple API (or multiple) that express what, exactly, we want. |
We are moving repositories! Please see the new version of this issue at DAGWorks-Inc/hamilton#26. Also, please give us a star/update any of your internal links. |
Is your feature request related to a problem? Please describe.
Hamilton works on "wide" columns -- not "long ones". However the "tidy" data ethos thinks data should be in a long format -- it does make some things easier to do.
Describe the solution you'd like
Add a ResultBuilder variant that takes in how you'd want to collapse the resulting pandas dataframe.
Describe alternatives you've considered
People do this manually -- but perhaps in the result builder makes more sense.
Additional context
Prerequisites for someone picking this up:
The text was updated successfully, but these errors were encountered: