Skip to content

Commit

Permalink
fix race condition in supervisor where resources for a worker were be…
Browse files Browse the repository at this point in the history
…ing cleaned up before setting the new assignment, potentially leading to the supervisor continuously dying
  • Loading branch information
Nathan Marz committed Oct 1, 2012
1 parent 415731f commit 54fc737
Showing 1 changed file with 11 additions and 7 deletions.
18 changes: 11 additions & 7 deletions src/clj/backtype/storm/daemon/supervisor.clj
Expand Up @@ -296,13 +296,7 @@
" from "
master-code-dir)
))
;; remove any downloaded code that's no longer assigned or active
(doseq [storm-id downloaded-storm-ids]
(when-not (assigned-storm-ids storm-id)
(log-message "Removing code for storm id "
storm-id)
(rmr (supervisor-stormdist-root conf storm-id))
))

(log-debug "Writing new assignment "
(pr-str new-assignment))
(doseq [p (set/difference (set (keys existing-assignment))
Expand All @@ -312,6 +306,16 @@
(.put local-state
LS-LOCAL-ASSIGNMENTS
new-assignment)
;; remove any downloaded code that's no longer assigned or active
;; important that this happens after setting the local assignment so that
;; synchronize-supervisor doesn't try to launch workers for which the
;; resources don't exist
(doseq [storm-id downloaded-storm-ids]
(when-not (assigned-storm-ids storm-id)
(log-message "Removing code for storm id "
storm-id)
(rmr (supervisor-stormdist-root conf storm-id))
))
(.add processes-event-manager sync-processes)
)))

Expand Down

0 comments on commit 54fc737

Please sign in to comment.