You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.
Currently if the caller has a DataFrame structure that they are targeting then they need to ensure they match the names of the columns correctly and manually convert the Series types. If the output_columns or other parameter of the execute function took a DataFrame as a template then the output columns would match the data columns and each series can be delivered using astype conversion.
You will probably need something like a DictionaryError for the scenario where there is a column in the DataFrame template that is not in the data columns available.
There is also the option to be able to process compound column names from the DataFrame to map into a more structured DataFrame. This would involve having a join character e.g. _.
The text was updated successfully, but these errors were encountered:
@straun can you provide some code/context for a little more motivation for this? Some questions to help with that:
they need to ensure they match the names of the columns correctly and manually convert the Series types
What is causing this mismatch in name and/or type to be the case? Can't the functions be named appropriately and return the right types?
You will probably need something like a DictionaryError for the scenario where there is a column in the DataFrame template that is not in the data columns available.
Currently Hamilton throws a ValueError if you request column that isn't defined in the function DAG. I think that should suffice for your needs?
There is also the option to be able to process compound column names from the DataFrame to map into a more structured DataFrame. This would involve having a join character e.g. _.
I'm not sure I'm following. Could you provide an example of what you mean here?
Otherwise we've advised users that in cases that require a bit more massaging of inputs/outputs, the easier thing to do is to create a "Wrapper" driver, using a delegation pattern, and tell people to use that as the interface. I think that might be a better solution for you with my current understanding. E.g.
classMyCustomDriver(object):
def__init__(self, config: Dict[str, Any], *modules: ModuleType):
self.h_driver=Driver(config, *modules) # delegation patterndefmatch_types(self, actual_df: pd.DataFrame, schema_df: pd.DataFrame) ->pd.DataFrame:
"""Code to make sure the actual DF matches the intended schema."""return ...
defexecute(self, wanted_df: pd.DataFrame) ->pd.DataFrame:
assert(wanted_df.empty), "Wanted DF must be empty"output_columns=list(wanted_df.columns)
df=self.h_driver.execute(output_columns)
returnself.match_types(df, wanted_df) # function to convert column types
Currently if the caller has a DataFrame structure that they are targeting then they need to ensure they match the names of the columns correctly and manually convert the Series types. If the
output_columnsor other parameter of theexecutefunction took a DataFrame as a template then the output columns would match the data columns and each series can be delivered usingastypeconversion.You will probably need something like a
DictionaryErrorfor the scenario where there is a column in the DataFrame template that is not in the data columns available.There is also the option to be able to process compound column names from the DataFrame to map into a more structured DataFrame. This would involve having a join character e.g.
_.The text was updated successfully, but these errors were encountered: