Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactoring dataprep to use dask and adding sample script #1470

Closed
wants to merge 1 commit into from

Conversation

tvt173
Copy link
Collaborator

@tvt173 tvt173 commented Mar 24, 2023

hi @joamatab , this is a proof-of-concept using a dask backend for dataprep, which would enable parallel computation and visualization of task graphs. please see the example script i added which both generates a gds and the accompanying task-graph visualization, which details the computation flow.
image

i don't know yet how to customize the visualization to have a bit better information in the square nodes, but i think it's pretty neat that we can extract such a graph out of this computation, which is still defined concisely and naturally

@tvt173
Copy link
Collaborator Author

tvt173 commented Mar 24, 2023

also note that we need to figure out a way to serialize region objects if we want to use multiprocessing or distributed backends for dask. right now it is just using the multithreaded backend

@tvt173
Copy link
Collaborator Author

tvt173 commented Mar 24, 2023

and also, you need to of course install dask and also ipycytoscape for the demo to work

@joamatab
Copy link
Contributor

looks great! thank you Troy!

@joamatab
Copy link
Contributor

merged it to master, and merged the dataprep notebook examples

@joamatab joamatab closed this Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants