Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky TestServiceWithDefaultAddressPoolInit #39671

Merged
merged 2 commits into from
Sep 9, 2019

Conversation

arkodg
Copy link
Contributor

@arkodg arkodg commented Aug 5, 2019

fixes #38514 Flaky test: TestServiceWithDefaultAddressPoolInit

This commit replaces serviceRunningCount with
swarm.RunningTasksCount to accurately check if the
service is running with the accurate number of instances
or not. serviceRunningCount was only checking the ServiceList
and was not checking if the tasks were running or not

This adds a safe barrier to execute docker network inspect
commands for overlay networks which get created
asynchronously via swarmkit

Signed-off-by: Arko Dasgupta arko.dasgupta@docker.com

@arkodg arkodg force-pushed the fix-flaky-addr-pool-init-test branch from 7146630 to c186d46 Compare August 5, 2019 21:38
@arkodg arkodg changed the title Fix flaky TestServiceWithDefaultAddressPoolInit [WIP] Fix flaky TestServiceWithDefaultAddressPoolInit Aug 5, 2019
@arkodg arkodg force-pushed the fix-flaky-addr-pool-init-test branch 4 times, most recently from 608d40e to 5d50ffa Compare August 5, 2019 22:30
@thaJeztah thaJeztah added this to In progress in Improving CI via automation Aug 5, 2019
@arkodg arkodg force-pushed the fix-flaky-addr-pool-init-test branch 4 times, most recently from 34b90f4 to fafea21 Compare August 6, 2019 19:03
@arkodg
Copy link
Contributor Author

arkodg commented Aug 6, 2019

@thaJeztah any idea why the test case is failing ?
I removed all the SwarmLeave calls but I still see Calling POST /v1.41/swarm/leave?force=1" which is causing the service to not start

@arkodg arkodg force-pushed the fix-flaky-addr-pool-init-test branch from fafea21 to 53b8069 Compare August 6, 2019 20:07
@arkodg
Copy link
Contributor Author

arkodg commented Aug 6, 2019

and changing the order of the operation breaks the next test TestServiceWithDataPathPortInit

@thaJeztah
Copy link
Member

is there a manager that failed to stop perhaps?

error is not nil: Error response from daemon: manager stopped: failed to listen on remote API address: listen tcp 0.0.0.0:2477: bind: address already in use: initializing swarm

@arkodg arkodg force-pushed the fix-flaky-addr-pool-init-test branch from 53b8069 to c763936 Compare August 6, 2019 21:01
@kolyshkin
Copy link
Contributor

bind: address already in use

Means someone is already listening on this port. Not sure what it means in this exact context.

@arkodg arkodg force-pushed the fix-flaky-addr-pool-init-test branch 8 times, most recently from 177a832 to c76e7bb Compare August 7, 2019 00:17
@arkodg arkodg force-pushed the fix-flaky-addr-pool-init-test branch from 2549161 to 952d643 Compare August 13, 2019 18:55
@arkodg
Copy link
Contributor Author

arkodg commented Aug 13, 2019

Derek add label: rebuild/windowsRS1

@arkodg
Copy link
Contributor Author

arkodg commented Aug 14, 2019

Derek add label: rebuild/windowsRS1

1.This commit replaces serviceRunningCount with
swarm.RunningTasksCount to accurately check if the
service is running with the accurate number of instances
or not. serviceRunningCount was only checking the ServiceList
and was not checking if the tasks were running or not

This adds a safe barrier to execute docker network inspect
commands for overlay networks which get created
asynchronously via Swarm

2. Make sure client connections are closed

3. Make sure every service and network name is unique

4. Make sure services and networks are cleaned up

Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
@arkodg arkodg force-pushed the fix-flaky-addr-pool-init-test branch from 952d643 to f3a3ea0 Compare August 14, 2019 15:03
@thaJeztah thaJeztah changed the title [WIP] Fix flaky TestServiceWithDefaultAddressPoolInit Fix flaky TestServiceWithDefaultAddressPoolInit Aug 14, 2019
integration/network/service_test.go Outdated Show resolved Hide resolved
integration/network/service_test.go Show resolved Hide resolved
integration/network/service_test.go Show resolved Hide resolved
integration/network/service_test.go Outdated Show resolved Hide resolved
integration/network/service_test.go Outdated Show resolved Hide resolved
integration/network/service_test.go Show resolved Hide resolved
Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Improving CI automation moved this from Review in progress to Reviewer approved Aug 16, 2019
@arkodg
Copy link
Contributor Author

arkodg commented Aug 16, 2019

PTAL @kolyshkin

@thaJeztah
Copy link
Member

@tiborvass PTAL

@thaJeztah
Copy link
Member

ping @tiborvass @kolyshkin PTAL 🤗

poll.WaitOn(t, swarm.NoTasks(ctx, c), swarm.ServicePoll)
err = c.NetworkRemove(ctx, overlayID)
assert.NilError(t, err)
c.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see one potential problem with all the assert statements here. Once any assert fails, the text execution is cancelled, meaning the above cleanup code (remove/close/etc) won't be run.

Perhaps we need to change those assert. calls to check. ones. The difference is check. won't abort test execution immediately (but still mark the test as failed).

If it's not possible to use check. everywhere, maybe we need to do cleanup in a defer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but AFAIK the RootDir will be cleaned up so it won't affect another PR, as for the current PR, this test failed so the reason for the assert failure should be addressed before moving to the next test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Improving CI
  
Done
Development

Successfully merging this pull request may close these issues.

Flaky test: TestServiceWithDefaultAddressPoolInit
6 participants