Application context objects should not be kept by default + add way to supply fit context #86

shaypal5 · 2022-02-04T13:36:40Z

pdpipe uses PdpApplicationContext objects in two ways:

As the fit_context that should be kept as-is after a fit, and used by stages to pass to one another parameters that should also be used on transform time.
As the application_context that should be discarded after a specific application is done, and is used by stages to feed consecutive stages with context. It can be added to by supplying apply(context={}), fit_transform(context={}) or transform(context={}) with a dict that will be used to update the application context.

Two changes are required:

At the moment there is a single context parameter to application functions that is used to update both the fit and the application context. I think they should be two, one for each type of context.
At the moment the application_context is not discarded when the application is done. It's as simple as self.application_context = None expression added at the PdPipeline level in the couple of right cases.

The text was updated successfully, but these errors were encountered:

shaypal5 · 2022-02-04T13:37:28Z

@carbonleakage does this sound interesting?

carbonleakage · 2022-02-05T14:06:43Z

@shaypal5 sounds pretty interesting, ill start working on this!

shaypal5 · 2022-02-05T14:24:26Z

Great. I'll share code with the latest use case I had for application context, so you have some context yourself. :)

shaypal5 · 2022-02-06T12:29:34Z

See the pipeline stage here:
https://github.com/shaypal5/mba_ds_project/blob/main/mba/pipeline.py#L20

It gets a product-review dataframe as context, used to enrich some of the rows in the input dataframe with product sentiment features.

In another file, on fit-transform the pipeline is provided with the train review dataframe:
https://github.com/shaypal5/mba_ds_project/blob/main/mba/buyer.py#L112

And on transform it is provided with the rollout/holdout review dataframe:
https://github.com/shaypal5/mba_ds_project/blob/main/mba/buyer.py#L283

Now here I did something sensible and provided it as a path, buy a very intuitive thing to do is to provide the dataframe object itself, in which case it will be kept(!) inside the pipeline object (because the context parameter currently also updates the fit_context attribute). What worse, if the pipeline is serialized to be deployed to production somewhere, the entire train context dataframe is serialized with it, even though it is never used in transform!

shaypal5 · 2022-02-14T13:08:07Z

Hey @carbonleakage :)

Had a chance to take a look at this yet?

carbonleakage · 2022-02-15T06:20:24Z

Hey @shaypal5 I started looking into this last week. I have not made much progress due to other commitments. I plan to do some pull requests in the coming weeks.

shaypal5 · 2022-02-15T06:43:09Z

Great. :)

Just wanted to touch base.

carbonleakage · 2022-02-17T14:35:28Z

Hello @shaypal5 I just did a pull request #89, let me know what you think.

shaypal5 · 2022-02-23T17:05:42Z

Released in v0.2.1: 🎊 🎉
https://github.com/pdpipe/pdpipe/releases/tag/v0.2.1

shaypal5 added enhancement good first issue labels Feb 4, 2022

carbonleakage mentioned this issue Feb 17, 2022

Contexts split #89

Merged

shaypal5 pushed a commit that referenced this issue Feb 22, 2022

Closes #86: Discard ApplicationContext post-apply

fb406fb

shaypal5 pushed a commit that referenced this issue Feb 22, 2022

Closes #86: Discard ApplicationContext post-apply

de1ae48

shaypal5 closed this as completed in a2119d4 Feb 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Application context objects should not be kept by default + add way to supply fit context #86

Application context objects should not be kept by default + add way to supply fit context #86

shaypal5 commented Feb 4, 2022

shaypal5 commented Feb 4, 2022

carbonleakage commented Feb 5, 2022

shaypal5 commented Feb 5, 2022

shaypal5 commented Feb 6, 2022

shaypal5 commented Feb 14, 2022

carbonleakage commented Feb 15, 2022

shaypal5 commented Feb 15, 2022

carbonleakage commented Feb 17, 2022

shaypal5 commented Feb 23, 2022

Application context objects should not be kept by default + add way to supply fit context #86

Application context objects should not be kept by default + add way to supply fit context #86

Comments

shaypal5 commented Feb 4, 2022

shaypal5 commented Feb 4, 2022

carbonleakage commented Feb 5, 2022

shaypal5 commented Feb 5, 2022

shaypal5 commented Feb 6, 2022

shaypal5 commented Feb 14, 2022

carbonleakage commented Feb 15, 2022

shaypal5 commented Feb 15, 2022

carbonleakage commented Feb 17, 2022

shaypal5 commented Feb 23, 2022