Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pytest] Ignore specific syslog error messages when testing autorestart #2430

Merged
merged 3 commits into from
Oct 31, 2020

Conversation

yozhao101
Copy link
Contributor

@yozhao101 yozhao101 commented Oct 29, 2020

Signed-off-by: Yong Zhao yozhao@microsoft.com

Description of PR

Summary:
When pytest script was ran to test the autorestart feature, there will be error messages in the syslog which will cause
the logAnalyzer to fail at the end of testing. In order to fix this issue, I examine all the error messages in the syslog
during testing the autorestart feature and add them into whitelist. At the same time, the reason why we should ignore
them are put in the comment of ignore_expected_loganalyzer_exception(...) function.

After adding these error messages into whitelist, we need do the post check to see whether all the critical processes
are alive after testing the autorestart feature.

Fixes # (issue)

Type of change

  • [x ] Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Approach

What is the motivation for this PR?

How did you do it?

I added two more functions in pytest script. One function is used to add the error messages which can be ignored into
whitelist. Another is used to do the post check to see whether all the critical processes are alive.

How did you verify/test it?

I tested this pytest script against a physical testbed: str-dx010-acs-1. Since currently acms process in ACMS container
was not running by default until the certificate are deployed, pytest script will fail when testing the ACMS container.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

…feature and do the postcheck to make

sure all the critical processes are alive after testing.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@yozhao101 yozhao101 changed the title [pytest] Ignore some log error messages when testing the autorestart … [pytest] Ignore specific syslog error messages when testing autorestart Oct 29, 2020
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Comment on lines 273 to 278
# Sleep 20 seconds such that containers have a chance to start.
time.sleep(20)
processes_status = duthost.all_critical_process_status()
for container_name, processes in processes_status.items():
if processes["status"] is False or len(processes["exited_critical_process"]) > 0:
pytest.fail("Critical process(es) was not running after container '{}' was restarted.".format(container_name))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please convert this block into pytest_assert(wait_until(...), failure message)

Copy link
Contributor Author

@yozhao101 yozhao101 Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion! Fixed.

Comment on lines 56 to 68
ignoreRegex = [
".*ERR monit.*",
".*ERR swss#orchagent.*removeLag.*",
".*ERR pmon#xcvrd.*initializeGlobalConfig.*",
".*ERR pmon#thermalctld.*Caught exception while initializing thermal manager.*",
".*ERR syncd#syncd.*driverEgressMemoryUpdate.*",
".*ERR syncd#syncd.*brcm_sai*",
".*ERR teamd#teamsyncd.*readData.*netlink reports an error=-33 on reading a netlink socket.*",
".*WARNING syncd#syncd.*saiDiscover: skipping since it cause crash.*",
".*ERR systemd.*Failed to start .* container*",
".*ERR kernel.*PortChannel.*",
".*ERR route_check.*",
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split this regex array to one array for each service.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Suggestion! Fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of splitting the regex into groups was so that we can ignore only a subset of messages for each container being tested. You split them into groups, but you then you simply add all the groups in unconditionally, which results in the same behavior as before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jleveque The idea is to split the ignore regex into subgroups and each subgroup represents a container. When a container was tested for example syncd, this function will be called such that the messages in subgroup of syncd will be ignored, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joe, Thanks for clarification!

pytest_assert(...) to wait containers are all restarted before doing the postcheck.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
loganalyzer.ignore_regex.extend(teamd_ignoreRegex)
loganalyzer.ignore_regex.extend(systemd_ignoreRegex)
loganalyzer.ignore_regex.extend(kernel_ignoreRegex)
loganalyzer.ignore_regex.extend(other_ignoreRegex)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yozhao101 this is not exactly what I had in mind when asked for splitting. But I understand that you have to put all regex up because you are looping through all services and run test against them. you currently cannot really apply individual regex for individual service.

I'll approve this PR and let it go in. Please open a separate PR to parameterize the test so that the test will run for each service individually, that way you can apply ignore regex per service.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misunderstood what you ware meaning in our meeting. I will create a separate PR to improve the ignore regex.

@lguohan lguohan merged this pull request into sonic-net:master Oct 31, 2020
yxieca pushed a commit that referenced this pull request Oct 31, 2020
…rt (#2430)

When pytest script was ran to test the autorestart feature, there will be error messages in the syslog which will cause
the logAnalyzer to fail at the end of testing. In order to fix this issue, I examine all the error messages in the syslog 
during testing the autorestart feature and add them into whitelist. At the same time, the reason why we should ignore 
them are put in the comment of `ignore_expected_loganalyzer_exception(...)` function. 

After adding these error messages into whitelist, we need do the post check to see whether all the critical processes
are alive after testing the autorestart feature.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants