Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask races #1469

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

Dask races #1469

wants to merge 26 commits into from

Conversation

jcapriot
Copy link
Member

Summary

Updates for dask>2024.2.1

PR Checklist

  • If this is a work in progress PR, set as a Draft PR
  • Linted my code according to the style guides.
  • Added tests to verify changes to the code.
  • Added necessary documentation to any new functions/classes following the
    expect style.
  • Marked as ready for review (if this is was a draft PR), and converted
    to a Pull Request
  • Tagged @simpeg/simpeg-developers when ready for review.

What does this implement/fix?

A few things:

  1. dask recently updated how things items are hashed to create a more deterministic hash for items. For objects it will change depending on the state of the objects. Simulation objects are mutable things, thus their hash would change overtime for the same object. For the DaskMetaSimulation, we want a single simulation's hash (also a single map's hash) to be constant in time, so this PR adds uuid properties to each of those base classes that are then registered to be used by dask to hash the objects.

  2. This deterministic hashing lead to a race condition in the error testing. When objects are scattered, dereferenced, then scattered again. The garbage collector would destroy the scattered objects dereferenced from the first call after the second scatter, thus creating a case where we are trying to access a Canceled Future.

  3. This is also now a bit more rigorous about only using a single worker for each simulation operation when the simulations inadvertently live on multiple workers, by selecting the worker that had fewer simulations assigned to it.

Additional information

See dask/distributed#8576 for more information about the race condition.

This also implements one of the updates in #1444 regarding the testing environment script so that we are sure we are creating the correct environment.

Copy link

codecov bot commented May 20, 2024

Codecov Report

Attention: Patch coverage is 18.51852% with 44 lines in your changes missing coverage. Please review.

Project coverage is 46.16%. Comparing base (caa60b7) to head (d43f68d).

Current head d43f68d differs from pull request most recent head db72c43

Please upload reports for the commit db72c43 to get more accurate results.

Files Patch % Lines
simpeg/meta/dask_sim.py 12.00% 44 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1469      +/-   ##
==========================================
- Coverage   53.17%   46.16%   -7.01%     
==========================================
  Files         171      168       -3     
  Lines       26023    25758     -265     
==========================================
- Hits        13838    11892    -1946     
- Misses      12185    13866    +1681     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant