Skip to content
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.

Add pandas result builder that converts to long format #121

Closed
skrawcz opened this issue Apr 25, 2022 · 3 comments
Closed

Add pandas result builder that converts to long format #121

skrawcz opened this issue Apr 25, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@skrawcz
Copy link
Collaborator

skrawcz commented Apr 25, 2022

Is your feature request related to a problem? Please describe.
Hamilton works on "wide" columns -- not "long ones". However the "tidy" data ethos thinks data should be in a long format -- it does make some things easier to do.

Describe the solution you'd like
Add a ResultBuilder variant that takes in how you'd want to collapse the resulting pandas dataframe.

Describe alternatives you've considered
People do this manually -- but perhaps in the result builder makes more sense.

Additional context
Prerequisites for someone picking this up:

  • know Pandas.
  • know python.
  • can write the pandas code to go from wide to long.
  • can read the Hamilton code base to figure out where to add it.
@skrawcz skrawcz added enhancement New feature or request good first issue Good for newcomers and removed good first issue Good for newcomers labels Apr 25, 2022
@skrawcz
Copy link
Collaborator Author

skrawcz commented Apr 30, 2022

So this doesn't appear to be as simple as I thought it would be.

The issue going wide to long, is that you need some context to know how to collapse things. To pass that context in, you cannot have a static method, since it can't reference self, which is what build_result() in the ResultMixin is.

Here's some possible code -- however it's limited in use to non - distributed/cluster computation settings.

class SimplePythonLongFormatDataFrameGraphAdapter(SimplePythonDataFrameGraphAdapter):
    """Adapter for building a long format pandas dataframe from the result.

    There are two pandas methods that could be used:
     - melt() - https://pandas.pydata.org/docs/reference/api/pandas.melt.html#pandas.melt
    or
     - wide_to_long() - https://pandas.pydata.org/docs/reference/api/pandas.wide_to_long.html

    The user must tell this object which one to use, and provide the correct arguments.
    """
    def __init__(self, method_name: str, **method_kwargs: Any):
        """

        :param method_name:  the name of the pandas function to use for going from wide to long format.
            Currently "melt" and "wide_to_long".
        :param method_kwargs: the arguments, other than the dataframe, to provide for that specific method.
            See:
             - melt() - https://pandas.pydata.org/docs/reference/api/pandas.melt.html#pandas.melt
             - wide_to_long() - https://pandas.pydata.org/docs/reference/api/pandas.wide_to_long.html
            For information on what arguments to pass in .
        """
        if method_name not in ['melt', 'wide_to_long']:
            raise ValueError(f"Error, unknown {method_name} provided. It should be one of ['melt', 'wide_to_long']")
        self.method_name = method_name
        self.method_kwargs = method_kwargs

    def build_result(self, **outputs: typing.Dict[str, typing.Any]) -> typing.Any:
        """Delegates to the result builder function supplied."""
        wide_df = super(SimplePythonDataFrameGraphAdapter, self).build_result(**outputs)
        pandas_method = getattr(pd, self.method_name)
        long_df = pandas_method(wide_df, **self.method_kwargs)
        del wide_df  # clean this representation up.
        return long_df

@elijahbenizzy
Copy link
Collaborator

@skrawcz I'm not sure I like the abstraction above. Way too coupled to pandas specifics/APIs. Rather, we should come up with a pretty simple API (or multiple) that express what, exactly, we want. melt has a massive amount of complex code, pretty sure wide_to_long just calls it and is more user-friendly. And we should be able to use similar parameters...

@elijahbenizzy
Copy link
Collaborator

We are moving repositories! Please see the new version of this issue at DAGWorks-Inc/hamilton#26. Also, please give us a star/update any of your internal links.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants