Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix (ecs): Unstable deployments for task definitions with more than one container (#5544) #4411

Merged
merged 2 commits into from
Mar 23, 2020

Conversation

atyutyunnik
Copy link
Contributor

The original code will not finish deployment and will eventually time out, in spite of task being successfully deployed by ECS and running (spinnaker/spinnaker#5544)

        "containers": [
            {
                "networkBindings": [],
                "networkInterfaces": [],
            },
            {
                "lastStatus": "RUNNING",
                "networkBindings": [
                    {
                        "bindIP": "0.0.0.0",
                        "containerPort": 8080,
                        "hostPort": 32773,
                        "protocol": "tcp"
                    }
            }]

@atyutyunnik atyutyunnik changed the title fix (ecs): Unstable deployments when one container fix (ecs): Unstable deployments when one task defs has more than container Mar 11, 2020
@atyutyunnik atyutyunnik changed the title fix (ecs): Unstable deployments when one task defs has more than container fix (ecs): Unstable deployments for task definitions with more than one container (#5544) Mar 11, 2020
@ezimanyi
Copy link
Contributor

@atyutyunnik : Thanks for the PR! In general our process is to open fixes against the master branch then cherry-pick them back to release that need them. Can you please open this agains the master branch? Thanks!

@atyutyunnik
Copy link
Contributor Author

ezimanyi, yes, I already have (see PR 4409), but it's not source-compatible with release 1.18. Hence an additional PR

@ezimanyi
Copy link
Contributor

Oh, thanks, I mis-understood! If the code has changed enough that it can't be cleanly cherry-picked to 1.18, then it's fine to open this against 1.18 to manually back-port the fix.

It will be important to cherry-pick the master change to 1.19 though, as otherwise users upgrading from 1.18 to 1.19 will get re-broken.

@allisaurus
Copy link
Contributor

@atyutyunnik can you please add the unit test from #4409 to this PR as well?

@atyutyunnik
Copy link
Contributor Author

@atyutyunnik can you please add the unit test from #4409 to this PR as well?

@allisaurus, cherry-picking doesn't seem possible. I had to extend the unit-test with:
...
then: amazonloadBalancing.describeTargetHealth({ DescribeTargetHealthRequest request ->
...

otherwise, there would be an NPE at 332 of TaskHealthCachingAgent.java:

if (describeTargetHealthResult.getTargetHealthDescriptions().isEmpty()) {

Now the unit-test is passing - please review

Copy link
Contributor

@allisaurus allisaurus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @atyutyunnik , you're right, there was a behavior change between 1.18.x and 1.19.x that makes mocking the call to describeTargetHealth necessary here. With that addition this LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants