Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Advance-reboot]Need to do some enhancement in post_reboot_analysisor of the advance-reboot test #11311

Open
nhe-NV opened this issue Jan 17, 2024 · 1 comment
Assignees

Comments

@nhe-NV
Copy link
Contributor

nhe-NV commented Jan 17, 2024

Description
In the advance-reboot, after the test pass, it will verify some log information from the syslog in post_reboot_analysis, but sometimes, the functional of the test does not have any issue, but failed when check the information from the syslog due to such as the log rate limt or the log supress in the SONiC. need to use a much more reliable method replace this part of checking

For example:
Event LAG_READY was found 0 times, when expected exactly 9 times

def verify_required_events(duthost, event_counters, timing_data, verification_errors):
    for key in ["time_span", "offset_from_kexec"]:
        for pattern in REQUIRED_PATTERNS.get(key):
            if pattern == 'PORT_READY':
                observed_start_count = timing_data.get(
                    key, {}).get(pattern, {}).get("Start-changes-only count", 0)
            else:
                observed_start_count = timing_data.get(
                    key, {}).get(pattern, {}).get("Start count", 0)
            observed_end_count = timing_data.get(
                key, {}).get(pattern, {}).get("End count", 0)
            expected_count = event_counters.get(pattern)
            # If we're checking PORT_READY, and there are 0 port state change messages captured instead of however many
            # was expected, treat it as a success. Some platforms (Mellanox, Dell S6100) have 0, some platforms (Arista
            #  050cx3) have however many ports are up.
            if observed_start_count != expected_count and (pattern != 'PORT_READY' or observed_start_count != 0):
                verification_errors.append("FAIL: Event {} was found {} times, when expected exactly {} times".
                                           format(pattern, observed_start_count, expected_count))
            if key == "time_span" and observed_start_count != observed_end_count:
                verification_errors.append("FAIL: Event {} counters did not match. ".format(pattern) +
                                           "Started {} times, and ended {} times".
                                           format(observed_start_count, observed_end_count))

Steps to reproduce the issue:

  1. run the advance-reboot

Describe the results you received:
Failed: Advanced-reboot failure. Failed test: test_fast_reboot[], failure summary:
[('test_fast_reboot[]None', ['FAIL: Event LAG_READY was found 0 times, when expected exactly 9 times'])]

Describe the results you expected:
need to use a much more reliable method replace this existing part of checking information from syslog.

Additional information you deem important:

**Output of `show version`:**

```
(paste your output here)
```

**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```
@zhangyanzhao
Copy link

@vaibhavhd can you please help to take a look at this issue? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants