Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

t/kvs/watch_disconnect: Fix racy behavior #2019

Merged
merged 1 commit into from Feb 14, 2019

Conversation

@chu11
Copy link
Contributor

commented Feb 14, 2019

Add several loops / waits to wait for rpcs / disconnects to be
processed, to hopefully avoid several racy scenarios.

Fixes #2015

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Feb 14, 2019

hit a #1025, but otherwise fix seems to work. I am going to restart all builders a few times to make sure it's really fixed.

/* must spin / wait for watchers to be registered */
for (i = 0; i < MAX_ITERS; i++) {
w1 = count_watchers (h) - w0;
if (w1 == rankcount)

This comment has been minimized.

Copy link
@grondo

grondo Feb 14, 2019

Contributor

could the test use a barrier here instead of looping and sleeping? I guess if the sleep (1) is only rarely invoked it isn't too big a deal...

This comment has been minimized.

Copy link
@chu11

chu11 Feb 14, 2019

Author Contributor

the test was passing all the time in the past, so hopefully the sleeps are rare :-)

Not sure a barrier will be doable. Basically, they are spins to ensure that RPCs have been sent / received / processed b/c in both cases an RPC response isn't coming back. Perhaps should use usleep() instead?

This comment has been minimized.

Copy link
@grondo

grondo Feb 14, 2019

Contributor

If the sleep is rare then it isn't too big of a deal.

This comment has been minimized.

Copy link
@chu11

chu11 Feb 14, 2019

Author Contributor

I sleep "0.1" seconds in various kvs tests, so I'll just continue the trend and convert to usleep in 0.1 second chunks.

Add several loops / waits to wait for rpcs / disconnects to be
processed, to hopefully avoid several racy scenarios.

Fixes #2015
@chu11 chu11 force-pushed the chu11:issue2015 branch from 7ce0189 to 1a13dff Feb 14, 2019
@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Feb 14, 2019

re-pushed using usleep() instead of sleep(). I restarted the builder 3 times and never hit the race, so I think if travis passes this time (4th overall) we can declare the race fixed.

@grondo

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2019

Thanks!

@grondo

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2019

Ok, this is ready to merge then? 👍

@chu11

This comment has been minimized.

Copy link
Contributor Author

commented Feb 14, 2019

yup, I think it's good to go

@codecov-io

This comment has been minimized.

Copy link

commented Feb 14, 2019

Codecov Report

Merging #2019 into master will increase coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2019      +/-   ##
==========================================
+ Coverage   80.32%   80.34%   +0.02%     
==========================================
  Files         179      179              
  Lines       28898    28898              
==========================================
+ Hits        23212    23219       +7     
+ Misses       5686     5679       -7
Impacted Files Coverage Δ
src/modules/connector-local/local.c 73.62% <0%> (-0.75%) ⬇️
src/cmd/flux-event.c 77.97% <0%> (ø) ⬆️
src/common/libflux/reactor.c 91.17% <0%> (+0.22%) ⬆️
src/cmd/flux-module.c 83.96% <0%> (+0.23%) ⬆️
src/common/libflux/handle.c 83.71% <0%> (+0.45%) ⬆️
src/connectors/local/local.c 89.28% <0%> (+0.71%) ⬆️
src/common/libflux/msg_handler.c 90.23% <0%> (+1.17%) ⬆️
src/modules/barrier/barrier.c 78.62% <0%> (+2.06%) ⬆️
src/common/libflux/ev_flux.c 100% <0%> (+2.43%) ⬆️
@grondo grondo merged commit dc1fcf7 into flux-framework:master Feb 14, 2019
4 checks passed
4 checks passed
Mergify — Summary 1 potential rule
Details
codecov/patch Coverage not affected when comparing 872fad3...1a13dff
Details
codecov/project 80.34% (+0.02%) compared to 872fad3
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
chu11 added a commit to chu11/flux-core that referenced this pull request Feb 15, 2019
Fix a racy portion of transactionmerge that could send a
ENODATA before all watch transactions have completed.

In addition, similar to PR flux-framework#2019, add a racy check to ensure
that the watcher has been setup fully before continuing the
test.

Fixes flux-framework#2021
chu11 added a commit to chu11/flux-core that referenced this pull request Feb 15, 2019
Similar to PR flux-framework#2019, add a racy check to ensure
that the watcher has been setup fully before continuing the
test.

Fixes flux-framework#2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.