Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add a data_preload step in pandas backend #1142

Closed
wants to merge 1 commit into from

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Aug 25, 2017

This PR adds a data_preload step to the execution pipeline for the Pandas
backend.

The motivation for this is to allow clients to perform operations on (Node, ConcreteData) pairs (including a scope argument) before the execution
for a particular operation starts.

Custom data source execution is the main use case here.

As a motivating example, consider a custom data source object that can be
turned into a DataFrame.

We want to be able to operate on our custom data source, but we don't want to
redefine every operation for this data source, or it doesn't make sense to
define operations on this custom object directly.

data_preload gives the ability to call a function on data pieces that are in scope before any execution happens. In the example above, data_preload would turn the ConcreteData object into a pandas.DataFrame.

The default behavior is no-op.

This also sets the stage for multi-client execution. More on that to come.

@cpcloud cpcloud self-assigned this Aug 25, 2017
@cpcloud cpcloud added feature Features or general enhancements pandas The pandas backend labels Aug 25, 2017
@cpcloud cpcloud added this to the 0.11.3 milestone Aug 25, 2017
@cpcloud cpcloud changed the title ENH: Data preload step in pandas backend ENH: Add a data_preload step in pandas backend Aug 25, 2017
@cpcloud cpcloud requested a review from wesm August 25, 2017 19:10
@cpcloud
Copy link
Member Author

cpcloud commented Aug 28, 2017

@wesm can you review this when you get a chance?

@wesm
Copy link
Member

wesm commented Aug 28, 2017

Looking now

@cpcloud cpcloud closed this in 6968251 Aug 28, 2017
@cpcloud cpcloud deleted the data-preload branch August 28, 2017 13:42
@cpcloud
Copy link
Member Author

cpcloud commented Aug 28, 2017

@wesm thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements pandas The pandas backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants