-
Is the use of pdpipe supported if there are two or more DataFrames? |
Beta Was this translation helpful? Give feedback.
Replies: 9 comments 2 replies
-
It depends on what exactly do you mean. Do you mean on several dataframes on after the other, or on operations that by nature operation on two or more dataframes? |
Beta Was this translation helpful? Give feedback.
-
For example, df.drop(df[df['col1'].isin(df_other['col1'])].index), this is what I said about using two DataFrame, or more. I also want to know, if I want to use df['col1'].fillna(0), which method do I need to use, through the pdpipe.df can not use this method, whether this built-in method is not supported. |
Beta Was this translation helpful? Give feedback.
-
Processing two dataframesIf one of the dataframes is an input dataframe to a pipeline, and the other is either a static one, or is completely derived from the input, then yes - If, on the other hand, both dataframes are input dataframes to a pipeline, then no, currently Using the
|
Beta Was this translation helpful? Give feedback.
-
The So when you're writing So I think you should ramp up a bit more about how attribute, submodules and properties work in python. Additionally, this is not the kind of use that makes sense for Also, if you want to get my example to work, why didn't you just use it as I gave it to you? You changed it in a way that makes no sense. Again, this is how you initialize a stage that fills na values: nafiller = pdp.df.fillna(value=0) |
Beta Was this translation helpful? Give feedback.
-
Thank you for your answer. |
Beta Was this translation helpful? Give feedback.
-
No problem. Thank you for taking an interest in my project, and for taking the time to ask me about it. :) |
Beta Was this translation helpful? Give feedback.
-
Update: See the pipeline stage here: It gets a product-review dataframe as context, used to enrich some of the rows in the input dataframe with product sentiment features. In another file, on fit-transform the pipeline is provided with the train review dataframe: And on transform it is provided with the rollout/holdout review dataframe: If you want the same dataframe to be merged/joined to input dataframes each time, you should supply it to the |
Beta Was this translation helpful? Give feedback.
Processing two dataframes
If one of the dataframes is an input dataframe to a pipeline, and the other is either a static one, or is completely derived from the input, then yes -
pdpipe
is adequate.If, on the other hand, both dataframes are input dataframes to a pipeline, then no, currently
pdpipe
is not built to handle just joint operations. This is because it's really not clear how to streamline such pipelines. I have some ideas, but this requires basically multi-input pipelines that must have some merge/join stages, and possibly multiple outputs.Using the
fillna
methodRegarding your second question,
pdpipe.df.fillna()
actually works great: