refactoring dataprep to use dask and adding sample script #1470
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
hi @joamatab , this is a proof-of-concept using a dask backend for dataprep, which would enable parallel computation and visualization of task graphs. please see the example script i added which both generates a gds and the accompanying task-graph visualization, which details the computation flow.
i don't know yet how to customize the visualization to have a bit better information in the square nodes, but i think it's pretty neat that we can extract such a graph out of this computation, which is still defined concisely and naturally