New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: fix raciness in stop mode tests #5526

Merged
merged 7 commits into from Jul 23, 2018

Conversation

Projects
None yet
4 participants
@zyga
Contributor

zyga commented Jul 17, 2018

This branch fixes raciness in the stop-mode tests. This test was failing for
the following reasons:

  • the registration of the signal handler was racing with the kill
  • the shell semantics for trap is error prone and was used incorrectly

The first issue is fixed with a "stamp" like file that we look for in the test.
This ensures that before we attempt to stop the service we wait for it to
initialize fully, that is to register the signal handlers.

The second issue is fixed by re-writing the test in Python 3. This requires
some more explanation. We had established signal handlers using the "trap"
shell builtin and were expecting them to run when a signal (sent by systemd as
a result of stopping the service) arrives. What we can see by running the shell
script with "strace -e %signal" indicates that shell disables trap handlers
before executing any command. A trivial test involved increasing the sleep time
to a large value and sending the expected signal at that time. The signal
handler never runs in that case.

This is fixed by writing the test service in Python and not relying on
additional processes. This has the expected semantics and works well in
practice.

Shell is tricky, more than we though, and we keep learning this the hard way.

Signed-off-by: Zygmunt Krynicki zygmunt.krynicki@canonical.com

zyga added some commits Jul 12, 2018

tests: rewrite {start,stop}-stop-mode in Python
This fixes the issue where shell disables signal handlers while running
external commands. This was causing the signal handler established via
"trap" do nothing at all while the signal was delivered during "sleep 1"
process lifetime.

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests: don't rely on sleep in service-stop-mode tests
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests: don't IO from signal handlers
This fixes this issue:

    Traceback (most recent call last):
      File "./start-stop-mode", line 49, in <module>
	main()
      File "./start-stop-mode", line 45, in main
	time.sleep(1)
      File "./start-stop-mode", line 29, in on_signal
	print("got {}".format(signal.Signals(signum).name.lower()))
      File "./start-stop-mode", line 29, in on_signal
	print("got {}".format(signal.Signals(signum).name.lower()))
    RuntimeError: reentrant call inside <_io.BufferedWriter name='<stdout>'>

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests: write a "ready" file if we cannot notify via systemd
This allows the test to either use system synchronization or hand-made
synchronization but in either case, not be racy.

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>

zyga added some commits Jul 17, 2018

tests: restore the magic sleep value
I've seen the light now. I can sleep.

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests: restore killall sleep & wait for startup
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>

@zyga zyga force-pushed the zyga:fix/non-racy-stop-mode branch from 73e4479 to 26e3299 Jul 17, 2018

@mvo5 mvo5 added the Core18 label Jul 17, 2018

@codecov-io

This comment has been minimized.

codecov-io commented Jul 17, 2018

Codecov Report

Merging #5526 into master will increase coverage by 0.23%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5526      +/-   ##
==========================================
+ Coverage   79.06%   79.29%   +0.23%     
==========================================
  Files         514      514              
  Lines       38949    39492     +543     
==========================================
+ Hits        30795    31316     +521     
- Misses       5681     5688       +7     
- Partials     2473     2488      +15
Impacted Files Coverage Δ
overlord/ifacestate/implicit.go 96.42% <0%> (-3.58%) ⬇️
overlord/hookstate/hookmgr.go 75.23% <0%> (+0.93%) ⬆️
overlord/snapstate/snapstate.go 85.6% <0%> (+3.55%) ⬆️
overlord/ifacestate/helpers.go 78.79% <0%> (+7.88%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4dd420d...85d675d. Read the comment docs.

@zyga

This comment has been minimized.

Contributor

zyga commented Jul 19, 2018

As a general comment, the test still suffers some issues but we suspect those are in our general log extracting helpers. We've decided to merge this as is as it fixes real issues. Investigation of the log extractor bug continues in the branch referenced by mvo above.

@mvo5

mvo5 approved these changes Jul 19, 2018

Looks good, thank you! Much nicer than the original shell version.

mvo5 added a commit to mvo5/snappy that referenced this pull request Jul 19, 2018

@bboozzoo bboozzoo merged commit c6ce331 into snapcore:master Jul 23, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment