[AI Generated] BugFix: Add retry loop for LUN detection in hot_add_disk_serial test#4436
[AI Generated] BugFix: Add retry loop for LUN detection in hot_add_disk_serial test#4436
Conversation
Validation Results
|
There was a problem hiding this comment.
Pull request overview
This PR hardens the storage hot-add serial test by waiting for Azure’s udev-created LUN symlink to appear after attaching a managed disk, avoiding intermittent IndexError failures when the LUN mapping hasn’t populated yet.
Changes:
- Adds a retry loop to poll
Disk.get_luns()until the new LUN appears (or times out) in_hot_add_disk_serial. - Replaces the previous single-shot set-diff indexing approach with an explicit “wait then assert” flow.
| # Retry to allow udev to create the LUN symlink after disk attach | ||
| timeout = 30 | ||
| timer = create_timer() | ||
| new_lun_keys: list = [] |
There was a problem hiding this comment.
timeout = 30 (and the implicit 1s poll interval) is a test-behavior magic number. Please either reference an existing constant/pattern in this suite or add an inline note explaining why 30s is the right bound here (e.g., matches other disk-detection waits) so future adjustments don’t regress coverage or runtime.
| f"luns_before: {linux_device_luns}, " | ||
| f"luns_after: {linux_device_luns_after}" | ||
| ) | ||
| time.sleep(1) |
There was a problem hiding this comment.
Avoid adding new time.sleep() calls in tests; they tend to increase flakiness and slow runs. Prefer a bounded wait helper like check_till_timeout(..., timeout=..., interval=...) or retry_without_exceptions(...) so waiting behavior is centralized and consistent across the repo.
| assert_that( | ||
| new_lun_keys, "New device LUN was not detected on VM" | ||
| ).is_not_empty() | ||
| linux_device_lun_diff = linux_device_luns_after[new_lun_keys[0]] |
There was a problem hiding this comment.
new_lun_keys is derived from a set difference, so its ordering is arbitrary. Indexing with [0] makes the selection non-deterministic if more than one key ever appears in the diff. Consider selecting the new entry by the expected lun value (and/or asserting exactly one new device was detected) to avoid potential flakiness.
❌ AI Test Selection — FAILED74 test case(s) selected (view run) Marketplace image: canonical 0001-com-ubuntu-server-jammy 22_04-lts-gen2 latest
Test case details
|
9c6cb88 to
9f0e2e8
Compare
…sk_serial test After a data disk is attached, the udev symlink under /dev/disk/azure/scsi1/ may not appear immediately. The previous code called get_luns() once and indexed into the result with [0], causing an IndexError when the set difference was empty. Added a 30-second retry loop (matching existing disk detection pattern in the same file) that polls get_luns() until the new LUN appears or times out with a clear assertion error.
9f0e2e8 to
bc94ad2
Compare
|
| Count | |
|---|---|
| ✅ Passed | 26 |
| ❌ Failed | 0 |
| ⏭️ Skipped | 5 |
| Total | 31 |
Test case details
| Test Case | Status | Time (s) | Message |
|---|---|---|---|
| smoke_test (lisa_0_1) | ✅ PASSED | 38.103 | |
| verify_deployment_provision_synthetic_nic (lisa_0_3) | ✅ PASSED | 37.315 | |
| verify_deployment_provision_standard_ssd_disk (lisa_0_4) | ✅ PASSED | 35.685 | |
| smoke_test_check_serial_console_pattern (lisa_0_2) | ✅ PASSED | 37.752 | |
| verify_deployment_provision_ephemeral_managed_disk (lisa_0_5) | ✅ PASSED | 36.049 | |
| verify_deployment_provision_premium_disk (lisa_0_6) | ✅ PASSED | 44.904 | |
| verify_deployment_provision_sriov (lisa_0_8) | ✅ PASSED | 42.023 | |
| verify_deployment_provision_premiumv2_disk (lisa_0_7) | ✅ PASSED | 48.242 | |
| verify_vmbus_devices_channels_bsd (lisa_0_14) | ⏭️ SKIPPED | 0.000 | check skipped: OS type mismatch: ["requires [<class 'lisa.operating_system.BSD'>] but VM supports [<class 'lisa.operatin |
| verify_serial_console (lisa_0_0) | ✅ PASSED | 30.944 | |
| verify_default_targetpw (lisa_0_36) | ✅ PASSED | 2.558 | |
| verify_grub (lisa_0_37) | ✅ PASSED | 1.042 | |
| verify_network_file_configuration (lisa_0_39) | ⏭️ SKIPPED | 0.194 | skipped: unsupported distro type: <class 'lisa.operating_system.Ubuntu'> |
| verify_ifcfg_eth0 (lisa_0_40) | ⏭️ SKIPPED | 0.192 | skipped: unsupported distro type: <class 'lisa.operating_system.Ubuntu'> |
| verify_udev_rules_moved (lisa_0_41) | ⏭️ SKIPPED | 0.161 | skipped: Unsupported distro type : <class 'lisa.operating_system.Ubuntu'> |
| verify_dhcp_file_configuration (lisa_0_42) | ⏭️ SKIPPED | 0.157 | skipped: Unsupported distro type : <class 'lisa.operating_system.Ubuntu'> |
| verify_repository_installed (lisa_0_46) | ✅ PASSED | 30.750 | |
| verify_serial_console_is_enabled (lisa_0_47) | ✅ PASSED | 1.104 | |
| verify_no_pre_exist_users (lisa_0_52) | ✅ PASSED | 2.068 | |
| verify_resource_disk_file_system (lisa_0_54) | ✅ PASSED | 4.000 | |
| verify_waagent_version (lisa_0_55) | ✅ PASSED | 1.392 | |
| verify_python_version (lisa_0_56) | ✅ PASSED | 1.201 | |
| verify_openssl_version (lisa_0_57) | ✅ PASSED | 1.072 | |
| verify_azure_64bit_os (lisa_0_58) | ✅ PASSED | 1.155 | |
| verify_omi_version (lisa_0_59) | ✅ PASSED | 1.273 | |
| verify_no_swap_on_osdisk (lisa_0_60) | ✅ PASSED | 1.076 | |
| verify_essential_kernel_modules (lisa_0_61) | ✅ PASSED | 1.458 | |
| verify_l3_cache (lisa_0_33) | ✅ PASSED | 1.161 | |
| verify_cpu_count (lisa_0_34) | ✅ PASSED | 0.200 | |
| verify_dhcp_client_timeout (lisa_0_22) | ✅ PASSED | 1.310 | |
| verify_dns_name_resolution (lisa_0_71) | ✅ PASSED | 2.008 |
| def _new_lun_detected( | ||
| _baseline: dict[str, int] = linux_device_luns, | ||
| ) -> bool: | ||
| nonlocal linux_device_luns_after | ||
| linux_device_luns_after = disk.get_luns() | ||
| new_keys = set(linux_device_luns_after) - set(_baseline) | ||
| return len(new_keys) > 0 | ||
|
|
||
| # 30s matches the disk-detection timeout used in | ||
| # _hot_add_disk_parallel for partition appearance after attach. | ||
| check_till_timeout( | ||
| _new_lun_detected, | ||
| timeout_message=( | ||
| f"new LUN not detected after disk attach at lun {lun}, " | ||
| f"luns_before: {linux_device_luns}, " | ||
| f"luns_after: {linux_device_luns_after}" | ||
| ), | ||
| timeout=30, | ||
| interval=1, | ||
| ) | ||
| new_lun_keys: list[str] = list( | ||
| set(linux_device_luns_after) - set(linux_device_luns) | ||
| ) |
There was a problem hiding this comment.
The new type hints use PEP 585 built-in generics (dict[str, int], list[str]), which are not supported on Python 3.8 unless the module has from __future__ import annotations. Please either add the future import at the top of this file or switch these annotations to typing.Dict/typing.List to keep the test suite compatible with the repo’s py38 target.
| check_till_timeout( | ||
| _new_lun_detected, | ||
| timeout_message=( | ||
| f"new LUN not detected after disk attach at lun {lun}, " | ||
| f"luns_before: {linux_device_luns}, " | ||
| f"luns_after: {linux_device_luns_after}" | ||
| ), | ||
| timeout=30, | ||
| interval=1, | ||
| ) |
There was a problem hiding this comment.
timeout_message is built from an f-string at call time, so luns_after: {linux_device_luns_after} will always reflect the initial value (before retries), not the final state when the timeout occurs. Consider catching LisaTimeoutException from check_till_timeout and re-raising with a message constructed after the loop (using the last linux_device_luns_after) so the failure output is accurate for debugging.
| ), | ||
| timeout=30, | ||
| interval=1, | ||
| ) |
There was a problem hiding this comment.
interval=1 is a test-behavior magic number. Please add a short inline comment explaining why 1s is appropriate here (or use an existing constant) to make the retry/polling behavior easier to maintain.
| # verify the lun number from linux VM | ||
| # Retry to allow udev to create the LUN symlink after disk attach | ||
| linux_device_luns_after = disk.get_luns() | ||
| linux_device_lun_diff = [ | ||
| linux_device_luns_after[k] | ||
| for k in set(linux_device_luns_after) - set(linux_device_luns) | ||
| ][0] | ||
|
|
||
| def _new_lun_detected( | ||
| _baseline: dict[str, int] = linux_device_luns, | ||
| ) -> bool: |
There was a problem hiding this comment.
Consider linking the related issue/bug for traceability, since this PR is a bug fix (helps future debugging and release notes).
❌ AI Test Selection — FAILED74 test case(s) selected (view run) Marketplace image: canonical 0001-com-ubuntu-server-jammy 22_04-lts-gen2 latest
Test case details
|
|
Is it duplicated with #4435 ? |
Summary
After a data disk is attached via hot-add, the udev symlink under
/dev/disk/azure/scsi1/may not appear immediately. The previous code calledget_luns()once and indexed into the result with[0], causing anIndexError: list index out of rangewhen the set difference was empty.Added a 30-second retry loop (matching the existing disk detection pattern in the same
_hot_add_disk_serialmethod) that pollsget_luns()until the new LUN symlink appears or times out with a clear assertion error.Validation Results