-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip thresholds validation for remove snapshot #614
Merged
ahadas
merged 1 commit into
oVirt:master
from
barpavel:no_auto_generated_snapshot_cleanup
Aug 28, 2022
Merged
Skip thresholds validation for remove snapshot #614
ahadas
merged 1 commit into
oVirt:master
from
barpavel:no_auto_generated_snapshot_cleanup
Aug 28, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
barpavel
force-pushed
the
no_auto_generated_snapshot_cleanup
branch
from
August 24, 2022 09:45
fdf08e5
to
670a767
Compare
barpavel
commented
Aug 24, 2022
...ger/modules/bll/src/main/java/org/ovirt/engine/core/bll/snapshots/RemoveSnapshotCommand.java
Outdated
Show resolved
Hide resolved
/ost |
barpavel
force-pushed
the
no_auto_generated_snapshot_cleanup
branch
from
August 24, 2022 13:38
670a767
to
d0d8f7a
Compare
barpavel
changed the title
Skip thresholds check for Live Storage Migration remove snapshot
Skip thresholds validation for remove snapshot
Aug 25, 2022
barpavel
force-pushed
the
no_auto_generated_snapshot_cleanup
branch
from
August 25, 2022 11:39
d0d8f7a
to
21599db
Compare
/ost |
ahadas
approved these changes
Aug 25, 2022
Try to remove snapshot even in case of low Storage Domain disk space. Otherwise, when remove snapshot is performed at the end of Live Storage Migration operation, the "RemoveSnapshotCommand" fails on validation and leaves 3 chunks' snapshot (7.5 GiB) unreleased. Additionally, the LSM operation is reported as failed, though the "move" part actually succeeded. Due to extra 3 chunks that are temporary used during LSM flow, we might temporary fall below disk space threshold. Currently not only that the code leaves unreleased an almost 8 GiB "junk", but it also leaves the Storage Domain in unhealthy low-space state which blocks other operations and requires a manual intervention instead of cleanly recovering from this temporary low disk space situation by performing the proper cleanup. Before the fix: 2022-08-23 13:14:56,448+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-45) [] EVENT_ID: IRS_DISK_SPACE_LOW_ERROR(201), Critical, Low disk space. iSCSI_SD2 domain has 4 GB of free space. 2022-08-23 13:16:10,455+03 WARN [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-16) [03ef74cf-ddbe-4027-805f-02631bf96929] Validation of action 'RemoveSnapshot' failed for user admin@internal-authz. Reasons: VAR__TYPE__SNAPSHOT,VAR__ACTION__REMOVE,ACTION_TYPE_FAILED_DISK_SPACE_LOW_ON_STORAGE_DOMAIN,$storageName iSCSI_SD2 2022-08-23 13:16:15,555+03 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-17) [03ef74cf-ddbe-4027-805f-02631bf96929] Ending command 'org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand' with failure. 2022-08-23 13:16:15,581+03 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-17) [03ef74cf-ddbe-4027-805f-02631bf96929] EVENT_ID: USER_MOVE_IMAGE_GROUP_FAILED_TO_DELETE_SRC_IMAGE(2,025), Possible failure while deleting iSCSI_VM1_Disk1 from the source Storage Domain iSCSI_SD2 during the move operation. The Storage Domain may be manually cleaned-up from possible leftovers (User:admin@internal-authz). At the end "iSCSI_SD2" has 4 GiB available, which is below the 5 GiB "Critical Space Action Blocker". After the fix "RemoveSnapshotCommand" executes successfully: 2022-08-24 11:00:43,164+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-48) [] EVENT_ID: IRS_DISK_SPACE_LOW_ERROR(201), Critical, Low disk space. iSCSI_SD2 domain has 4 GB of free space. We fall below threshold and then automatically recover, 7.5 GiB is returned to the Storage Domain. At the end "iSCSI_SD2" correctly has 12 GiB available, which is above the 5 GiB "Critical Space Action Blocker". Signed-off-by: Pavel Bar <pbar@redhat.com>
ahadas
force-pushed
the
no_auto_generated_snapshot_cleanup
branch
from
August 25, 2022 13:13
21599db
to
98b1b38
Compare
/ost |
1 similar comment
/ost |
ahadas
approved these changes
Aug 28, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Try to remove snapshot even in case of low Storage Domain disk space.
Otherwise, when remove snapshot is performed at the end of Live Storage Migration operation, the
RemoveSnapshotCommand
fails on validation and leaves 3 chunks' snapshot (7.5 GiB) unreleased.Additionally, the LSM operation is reported as failed, though the "move" part actually succeeded.
Due to extra 3 chunks that are temporary used during LSM flow, we might temporary fall below disk space threshold.
Currently not only that the code leaves unreleased an almost 8 GiB "junk", but it also leaves the Storage Domain in unhealthy low-space state which blocks other operations and requires a manual intervention instead of quietly recovering from this temporary low disk space situation by performing the proper cleanup.
Before the fix:
At the end "iSCSI_SD2" has 4 GiB available, which is below the 5 GiB
Critical Space Action Blocker
.After the fix, the
RemoveSnapshotCommand
executes successfully:We fall below threshold and then automatically recover, 7.5 GiB is returned to the Storage Domain.
At the end "iSCSI_SD2" correctly has 12 GiB available, which is above the 5 GiB
Critical Space Action Blocker
.Signed-off-by: Pavel Bar pbar@redhat.com