Skip to content

My experience contributing to Dask over the summer of 2021.

Notifications You must be signed in to change notification settings

freyam/dask-gsoc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 

Repository files navigation

My Accepted GSoC Application for Dask


Visualizing performance characteristics of computations (link)

Potential Mentors: Genevieve Buckley and Martin Durant

Description: Dask users are sometimes caught off guard when they have a computation that works well and then add a new step which results in an overall computation with very different performance characteristics. We'd like to build better visualization tools so that users can more easily identify possible areas of inefficiency in their computations.

Expected Skills: A good candidate would be familiar with Python, and have some experience reasoning about Python performance. Experience with HTML and JS graph visualization libraries would be a plus, but not a requirement.

Relevant community discussion

Abstract

One of Dask's primary functions is to construct task graphs and optimize them before decorating the functions to operate lazily. By looking at the interconnectedness of tasks, one can learn more about potential bottlenecks where parallelism may or may not be possible to apply to simplify the task graph further.

Dask currently uses graphviz to render a task graph. The .visualize method and the dask.visualize functions are simple and effective. They generate a static black-and-white image showing the blocks and their connections. However, the task graph struggles to work with complex computations and doesn't give a clear idea about the feasibility of the tasks. There's a scope of improvement here.

I plan to color code the graph and highlight the room for optimization (dask.optimization). This allows the user to learn more about the program's performance characteristics and distinguish how the computation fares at a higher complexity. I also plan to introduce collapsible blocks that group smaller blocks into a single large block. This simplifies the rendered graph, especially when the crucial data contains a large number of nodes.

Dask Guidelines (link)

Me the me

  • Your full name
  • University / current enrollment
  • Short bio / overview of your background
  • How can we contact you?
    • Email
    • GitHub username
    • Any other username you want us to know about

Me the programmer

  • In your project proposal let us know about your programming experience.
  • What is your experience programming? Tell us about something you have created.
  • What is your experience with Python?
  • Have you ever used git or another version control system? Do you have experience using GitHub?
  • What is your experience with Open Source? Have you ever contributed to another open source project?

Project Description

  • What do you want to achieve?
  • What excites you about this project? Why did you choose it?
  • What qualifications do you have to implement your idea? Why are you suited to work on this project?
  • How much time do you plan to invest in the project before, during, and after the Summer of Code? (we will expect full time 40h/week during GSoC). If you plan to take any vacations over the summer, let us know about it here.
  • How will you break your project into smaller tasks? Please provide a schedule of how time will be spent on sub-tasks of the project over the GSoC development period. While we will not hold you to this schedule, this lets us know that you have thought about how to break the project into smaller tasks and will help us monitor your progress throughout the summer.
  • Please also note in the schedule where could formulate a pull request. These would be points where you can have a self contained, documented, and tested piece of functionality. Doing this at several points during the summer helps to keep code reviews manageable. A big pull request at the end of the summer will likely be hard to review and merge before the project deadline.

NumFOCUS Guidelines (link)

  • Have you communicated with the organization's mentors?
  • Have you communicated with the community?
  • Did you reference projects you coded WITH links to repos or provided code?
  • Did you provide several methods to contact you? (email, skype, mobile/phone, twitter, chat, and/or tumblr if available)
  • Did you include a preliminary project plan (before, during, after GSoC)?
  • Did you state which project you are applying for and why you think you will end up completing the project?
  • Do you have time for GSoC? This is a paid job! State that you have time in your motivation letter, and list other commitments!
  • Did you add a link to ALL your application files to a cloud hoster like GitHub or Dropbox? (easy points! wink)
  • Be honest! Only universal Karma points.
  • Did you create a pull request on the existing code?
  • Did you continue communication until accepted students are announced?

About

My experience contributing to Dask over the summer of 2021.

Resources

Stars

Watchers

Forks