FIX - Enforced downtime state calculation after retention load #1990

geektophe · 2019-10-31T16:10:24Z

There is a race condition when the retention data is dumped in the
retention backend:

The downtime depth is calculated by incrementing or decrementing the scheduled_downtime_depth attribute in the Downtime class
If the update_retention_file() thread is run while a downtime is being processed, the value stored in the retention backend may not be up to date because it's read during the enter() or exit() execution:

dt.exit()
...
STOP update_retention_file() -> value with improper value
...
dt.ref.scheduled_downtime_depth -= 1

The consequence of this particular condition is that an object state can become inconsistent when the retention data is reloaded:

All the downtimes of the object have exited
The scheduled_downtime_depth remains > 0 because of the value stored in the backend.

This enforces the downtime state evaluation when an object state is restored from the retention backend.

geektophe · 2020-01-16T10:36:00Z

@naparuba The travis configuration is broken. I fixed it in this PR.

There is a race condition when the retention data is dumped in the retention backend: - The downtime depth is calculated by incrementing or decrementing the `scheduled_downtime_depth` attribute in the `Downtime` class - If the `update_retention_file()` thread is run while a downtime is being processed, the value stored in the retention backend may not be up to date because it's read during the `enter()` or `exit()` execution: dt.exit() ... STOP `update_retention_file()` -> value with improper value ... dt.ref.scheduled_downtime_depth -= 1 The consequence of this particular condition is that an object state can become inconsistent when the retention data is reloaded: - All the downtimes of the object have exited - The `scheduled_downtime_depth` remains `> 0` because of the value stored in the backend. This enforces the downtime state evaluationtion when an object state is restored from the retetion backend. Also fixed unit tests definition: - Removed 2.6 test suie (image no more available in travis) - Fixed test suites run condition that was preventing unit tests from being executed Also fixed the definition of `maintenance_checks_enabled` parameter that should not be loaded from retention data.

geektophe force-pushed the fix/enforce_downtime_depth_calculation branch from 8fd1e51 to 2ab04fc Compare November 8, 2019 12:14

geektophe force-pushed the fix/enforce_downtime_depth_calculation branch 6 times, most recently from be1ff09 to de3a8d8 Compare January 16, 2020 10:21

geektophe changed the title ~~Enforced downtime state after retention load~~ FIX - Enforced downtime state after retention load Jan 16, 2020

geektophe changed the title ~~FIX - Enforced downtime state after retention load~~ FIX - Enforced downtime state calculation after retention load Jan 16, 2020

geektophe force-pushed the fix/enforce_downtime_depth_calculation branch from de3a8d8 to da57100 Compare January 25, 2020 13:04

geektophe force-pushed the fix/enforce_downtime_depth_calculation branch from da57100 to 5f3525f Compare May 1, 2020 12:54

geektophe mentioned this pull request Aug 28, 2020

Shinken scheduled_downtime_depth issue -1 #1997

Open

geektophe merged commit 0e51d49 into shinken-solutions:master Mar 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX - Enforced downtime state calculation after retention load #1990

FIX - Enforced downtime state calculation after retention load #1990

geektophe commented Oct 31, 2019

geektophe commented Jan 16, 2020

FIX - Enforced downtime state calculation after retention load #1990

FIX - Enforced downtime state calculation after retention load #1990

Conversation

geektophe commented Oct 31, 2019

geektophe commented Jan 16, 2020