Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid race condition when creating port bindings #1447

Merged

Conversation

ewencluley
Copy link
Contributor

@ewencluley ewencluley commented Mar 25, 2021

See #1446

Welcome feedback to my approach, to be frank I am not well versed with this code base and may have got some terminology wrong (especially in the unit test). If you can think of a better way to handle this I would love to hear it and happy to change the PR to implement.

On container startup, sometimes the port information retrieved from the
docker API is incomplete with port details being empty arrays, e.g.
Ports:{"2525/tcp":[]}. This causes an IndexOutOfBoundsException to be thrown
as the createPortBindings method assumes there will be at least one entry in
the mapping. In these cases we should retry the API call to get the more
complete port information.

Fixes #1446

@codecov
Copy link

codecov bot commented Mar 25, 2021

Codecov Report

Merging #1447 (4b7c1ef) into master (5e0ac58) will increase coverage by 0.17%.
The diff coverage is 85.00%.

@@             Coverage Diff              @@
##             master    #1447      +/-   ##
============================================
+ Coverage     59.07%   59.25%   +0.17%     
- Complexity     1989     2004      +15     
============================================
  Files           162      163       +1     
  Lines          9032     9047      +15     
  Branches       1365     1366       +1     
============================================
+ Hits           5336     5361      +25     
+ Misses         3205     3198       -7     
+ Partials        491      488       -3     
Impacted Files Coverage Δ Complexity Δ
...o/fabric8/maven/docker/model/ContainerDetails.java 65.51% <75.00%> (+0.04%) 23.00 <0.00> (+1.00)
...va/io/fabric8/maven/docker/service/RunService.java 66.43% <85.71%> (+3.79%) 46.00 <4.00> (+6.00)
...bric8/maven/docker/model/PortBindingException.java 100.00% <100.00%> (ø) 1.00 <1.00> (?)
...ic8/maven/docker/access/ContainerCreateConfig.java 82.92% <0.00%> (+1.21%) 34.00% <0.00%> (+2.00%)
...bric8/maven/docker/access/ContainerHostConfig.java 95.14% <0.00%> (+1.94%) 54.00% <0.00%> (+4.00%)
...o/fabric8/maven/docker/util/VolumeBindingUtil.java 82.35% <0.00%> (+1.96%) 23.00% <0.00%> (+1.00%)

mappedPorts.updateProperties(container.getPortBindings());
} else {
log.warn("Container %s is not running anymore, can not extract dynamic ports",containerId);
for (int attempt = 0; attempt < 5; attempt++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, Do you think we should add Thread.currentThread().isInterrupted() in this loop terminating condition here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good shout. Added

@ewencluley ewencluley force-pushed the avoid-port-mapping-race-condition branch 2 times, most recently from bb884bf to 8f0f009 Compare March 25, 2021 17:42
@@ -71,7 +72,9 @@
* @since 16/06/15
*/
public class RunService {

private static final int UPDATE_MAPPED_PORTS_RETRIES = 5;
private static final int UPDATE_MAPPED_PORTS_RETRY_DELAY_MS = 500;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would recommend to go here to 250 and increase the UPDATE_MAPPED_PORTS_RETRIES

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i checked the logs with this parameters is see only one try per container. i will test it with a lower time
[WARNING] DOCKER> Failed to get port bindings for container 4d2b4ff6615c (attempt 0 of 5)
[WARNING] DOCKER> Failed to get port bindings for container 35efb753ef46 (attempt 0 of 5)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private static final int UPDATE_MAPPED_PORTS_RETRIES = 50;
private static final int UPDATE_MAPPED_PORTS_RETRY_DELAY_MS = 50;

i still see only one attemt
[WARNING] DOCKER> Failed to get port bindings for container 8c64d1409bef (attempt 0 of 50)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private static final int UPDATE_MAPPED_PORTS_RETRIES = 500;
private static final int UPDATE_MAPPED_PORTS_RETRY_DELAY_MS = 5;

[WARNING] DOCKER> Failed to get port bindings for container 0fbb04f25568 (attempt 0 of 500)
[WARNING] DOCKER> Failed to get port bindings for container 0fbb04f25568 (attempt 1 of 500)

note that a have a pretty new PC with a lot of power.

so finally i would recommend 10ms

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, i think the issue is it only prints out when it fails (also it is 0 indexed) so it may fail once and then pass on the 2nd attempt but will not print anything out. Will tweak this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having the delay of 250ms and 10 attempts is reasonable- it really should not make a huge difference to container start times to wait 250ms between attempts, especially as it failing to get the port data from docker is an edge case and race condition. As container start times are usually in the order of seconds or tens of seconds i think this is probably reasonable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change this
private static final int UPDATE_MAPPED_PORTS_RETRY_DELAY_MS = 500;
to
private static final int UPDATE_MAPPED_PORTS_RETRY_DELAY_MS = 10;
and also adjust the UPDATE_MAPPED_PORTS_RETRIES to match again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not convinced that hammering the docker api every 10ms for (potentially) up to 500 attempts is the best idea. By setting the delay to 250ms it will wait long enough that the port information should be available after 1 retry. 250ms should not be problematic to add to container start times.

Happy to be told otherwise by the maintainers of this codebase but IMO 250ms and 10 attempts should be sufficient.

return;
} catch (PortBindingException e) {
try {
log.warn("Failed to get port bindings for container %s (attempt %d of %d)", containerId, attempt, UPDATE_MAPPED_PORTS_RETRIES);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would change this to debug, but add at the end a error log if this fails. or even better throw an exception at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think thats sensible. Have also added a debug log message when it succeeds too.

@ewencluley ewencluley force-pushed the avoid-port-mapping-race-condition branch 4 times, most recently from 1835d7e to 64dee37 Compare March 29, 2021 09:47
@ewencluley
Copy link
Contributor Author

I've updated this PR to use the failsafe library for the retries. I think it makes the code clearer than the loop with if conditions that I had. Hope the maintainers of this repo are OK with introducing this dependency.

@git9999999
Copy link

As a Polling Framework i recommend to use https://github.com/awaitility/awaitility

her you can specify a initial delay and a polling interval. here an example

await() .pollDelay(0, SECONDS) .pollInterval(5, SECONDS) .atMost(60, SECONDS) .....

IMO: for me build time is a very precious thing. Alone in my project i start 8 Docker-Images. I think we should put some effort in keep this time as short as possible. as we have seen on my PC, after10ms the container is ready, so 240ms are lost.

@ewencluley
Copy link
Contributor Author

As a Polling Framework i recommend to use https://github.com/awaitility/awaitility

her you can specify a initial delay and a polling interval. here an example

await() .pollDelay(0, SECONDS) .pollInterval(5, SECONDS) .atMost(60, SECONDS) .....

IMO: for me build time is a very precious thing. Alone in my project i start 8 Docker-Images. I think we should put some effort in keep this time as short as possible. as we have seen on my PC, after10ms the container is ready, so 240ms are lost.

I have not used Awaitility before but from a cursory look it appears to be less versatile than Failsafe while achieving the same goal. The suggestion to use Awaitility seems like a personal preference rather than having identified any deficiency with using Failsafe. In particular Awaitility appears to have several drawbacks that, while it would be possible to use, in my mind make it less desirable in this instance.

  • It does not have (as far as i can tell) a way of specifying a max number of attempts, only a timeout. This is fine for the functionality we want to achieve, however it makes writing tests harder (e.g. with a mock will return state that causes failures n times before returning state that will succeed.) It makes tests less predictable and we can see different results when debugging tests or if tests run faster/slower than expected.
  • Using untilAsserted (which seems to be the only option for running a method until it does not throw an exception) on ultimate failure it throws a ConditionTimeoutException with no reference to the original exception that caused the ultimate failure. While this maybe ok but does obscure the cause of the failure.
  • Logging each attempt would need done inside the condition (a runnable passed to the untilAsserted method) where there is no information available on the current attempt number. Failsafe provides nice onAttempt, onRetry, etc. hooks that do provide this information. I think in particular if we intend to reduce to retry interval and increase the number of attempts it will be useful from a debugging standpoint to be able to see the number of attempts done before success/failure.

Since you feel strongly enough about reducing the retry interval I am happy to do that unless there is any objection from the maintainers.

@rohanKanojia it would be good to have some input from maintainers on this PR whether it looks ok.

@ewencluley
Copy link
Contributor Author

IMO: for me build time is a very precious thing. Alone in my project i start 8 Docker-Images. I think we should put some effort in keep this time as short as possible. as we have seen on my PC, after10ms the container is ready, so 240ms are lost.

Actually come to think of this a potential compromise could be to use a backoff policy so that initially it retries very frequently but if the machine is slower it will backoff to a longer interval. Have added that as a separate commit. This way it will start waiting 10ms between retries but double its interval every attempt up to 100ms.

@ewencluley ewencluley force-pushed the avoid-port-mapping-race-condition branch from a54e0f9 to b9708be Compare March 30, 2021 10:24
@rohanKanojia
Copy link
Member

Hello, I don't have any problem with keeping failsafe 👍 . I'll give it a review again tonight and approve/merge if everything is fine.

@rohanKanojia rohanKanojia added this to the 0.35.0 milestone Mar 30, 2021
@rohanKanojia
Copy link
Member

Could you please update doc/changelog.md regarding this change? I think we can proceed with merge

On container startup, sometimes the port information retrieved from the
docker API is incomplete with port details being empty arrays, e.g.
Ports:{"2525/tcp":[]}. This causes an IndexOutOfBoundsException to be thrown
as the createPortBindings method assumes there will be at least one entry in
the mapping. In these cases we should retry the API call to get the more
complete port information.

Signed-off-by: Ewen Cluley <ewen@brandwatch.com>
@ewencluley ewencluley force-pushed the avoid-port-mapping-race-condition branch from b9708be to 4b7c1ef Compare March 31, 2021 07:57
@ewencluley
Copy link
Contributor Author

Could you please update doc/changelog.md regarding this change? I think we can proceed with merge

Thank you, done.

@ewencluley
Copy link
Contributor Author

@rohanKanojia if you're happy with the changelog can you merge it? I don't have permission to merge I don't think.

@ewencluley ewencluley closed this Apr 1, 2021
@ewencluley ewencluley reopened this Apr 1, 2021
@rohanKanojia rohanKanojia merged commit 5b91720 into fabric8io:master Apr 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition when starting container causes IndexOutOfBoundsException when binding ports
3 participants