Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX - Enforced downtime state calculation after retention load #1990

Conversation

geektophe
Copy link
Collaborator

There is a race condition when the retention data is dumped in the
retention backend:

  • The downtime depth is calculated by incrementing or decrementing the scheduled_downtime_depth attribute in the Downtime class

  • If the update_retention_file() thread is run while a downtime is being processed, the value stored in the retention backend may not be up to date because it's read during the enter() or exit() execution:

    dt.exit()
    ...
    STOP update_retention_file() -> value with improper value
    ...
    dt.ref.scheduled_downtime_depth -= 1

The consequence of this particular condition is that an object state can become inconsistent when the retention data is reloaded:

  • All the downtimes of the object have exited
  • The scheduled_downtime_depth remains > 0 because of the value stored in the backend.

This enforces the downtime state evaluation when an object state is restored from the retention backend.

@geektophe geektophe force-pushed the fix/enforce_downtime_depth_calculation branch from 8fd1e51 to 2ab04fc Compare November 8, 2019 12:14
@geektophe geektophe force-pushed the fix/enforce_downtime_depth_calculation branch 6 times, most recently from be1ff09 to de3a8d8 Compare January 16, 2020 10:21
@geektophe
Copy link
Collaborator Author

@naparuba The travis configuration is broken. I fixed it in this PR.

@geektophe geektophe changed the title Enforced downtime state after retention load FIX - Enforced downtime state after retention load Jan 16, 2020
@geektophe geektophe changed the title FIX - Enforced downtime state after retention load FIX - Enforced downtime state calculation after retention load Jan 16, 2020
@geektophe geektophe force-pushed the fix/enforce_downtime_depth_calculation branch from de3a8d8 to da57100 Compare January 25, 2020 13:04
There is a race condition when the retention data is dumped in the
retention backend:

- The downtime depth is calculated by incrementing or decrementing the
`scheduled_downtime_depth` attribute in the `Downtime` class
- If the `update_retention_file()` thread is run while a downtime is being
processed, the value stored in the retention backend may not be up to date
because it's read during the `enter()` or `exit()` execution:

    dt.exit()
        ...
        STOP `update_retention_file()`    -> value with improper value
        ...
        dt.ref.scheduled_downtime_depth -= 1

The consequence of this particular condition is that an object state can
become inconsistent when the retention data is reloaded:

- All the downtimes of the object have exited
- The `scheduled_downtime_depth` remains `> 0` because of the value
stored in the backend.

This enforces the downtime state evaluationtion when an object state is
restored from the retetion backend.

Also fixed unit tests definition:
- Removed 2.6 test suie (image no more available in travis)
- Fixed test suites run condition that was preventing unit tests from
being executed

Also fixed the definition of `maintenance_checks_enabled` parameter that
should not be loaded from retention data.
@geektophe geektophe force-pushed the fix/enforce_downtime_depth_calculation branch from da57100 to 5f3525f Compare May 1, 2020 12:54
@geektophe geektophe merged commit 0e51d49 into shinken-solutions:master Mar 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant