Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Prevent re-creating MasterContext when result is available [5.1.z] (#…
…21076) The issue is the following: - when a job is finalized then a JobResult (1) is created and after that MasterContext is deleted (2) from masterContexts map - JobCoordinationService.startJobIfNotStartedOrCompleted called from JobCoordinationService.doScanJobs checks this in the same order - check job result (3) and putIfAbsent for master context (4) So this order of actions is possible: 3, 1, 2, 4 at this point, we have a MasterContext for a job that already has a JobResult and its original MasterContext has already been removed. This is then handled by completeMasterContextIfJobAlreadyCompleted (5) The problem is when there is an action using JobCoordinationService#callWithJob between actions 4 and 5. In the case of this test failure it returns empty metrics because it collects metrics using MasterJobContext#collectMetrics and doesn't use already stored metrics (it actually just returns empty list because the re-created MC has job state NOT_RUNNING and empty list is the initial value for jobMetrics). We check the JobResult while holding the lock to avoid this scenario: - We find no job result - Another thread creates the result and removes the master context in completeJob - We re-create the master context below The removal of MasterContext happens while holding the lock. Fixes #19946 Fixes #20277 HZ-997 Backport of #21048 for 5.1.z Co-authored-by: Viliam Durina <viliam@hazelcast.com>
- Loading branch information