Skip to content

Commit

Permalink
Optimise scheduler.get_comm_cost set difference (dask#6931)
Browse files Browse the repository at this point in the history
  • Loading branch information
wence- authored and gjoseph92 committed Oct 31, 2022
1 parent d84e5a8 commit 61e10ec
Showing 1 changed file with 11 additions and 1 deletion.
12 changes: 11 additions & 1 deletion distributed/scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -2620,7 +2620,17 @@ def get_comm_cost(self, ts: TaskState, ws: WorkerState) -> float:
on the given worker.
"""
dts: TaskState
deps: set = ts.dependencies.difference(ws.has_what)
deps: set
if 10 * len(ts.dependencies) < len(ws.has_what):
# In the common case where the number of dependencies is
# much less than the number of tasks that we have,
# construct the set of deps that require communication in
# O(len(dependencies)) rather than O(len(has_what)) time.
# Factor of 10 is a guess at the overhead of explicit
# iteration as opposed to just calling set.difference
deps = {dep for dep in ts.dependencies if dep not in ws.has_what}
else:
deps = ts.dependencies.difference(ws.has_what)
nbytes: int = 0
for dts in deps:
nbytes += dts.nbytes
Expand Down

0 comments on commit 61e10ec

Please sign in to comment.