Skip to content
This repository has been archived by the owner on May 6, 2020. It is now read-only.

enable_aggregation: Set the default to true #76

Merged
merged 1 commit into from
Jan 14, 2020

Conversation

surajssd
Copy link
Contributor

@surajssd surajssd commented Oct 1, 2019

With usage of the latest cert-manager it is obligatory to enable api
aggregation on the apiserver.

This commit changes the default behavior of the installer to always
enable aggregation.

Copy link
Contributor

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't merge it until we have internal discussion about it.

@surajssd
Copy link
Contributor Author

surajssd commented Oct 1, 2019

Yes waiting on it. I would like others to put their comments here.
cc: @rata @kosyfrances @alban @schu

@rata
Copy link
Contributor

rata commented Oct 1, 2019

@surajssd I'm not familiar with cert-manager latest upgrade, but are you sure it is mandatory? I know you and @invidian worked on this, but I think @invidian mentioned it was optional (like, you can use cert-manager without aggregation). Can you confirm, @invidian ?

I think aggregation should be enabled by default, intuitively, because you can't use HPA without it and that is quite common. But if there is an easy way to enable aggregation on a running cluster (I know we don't support it without cluster re-creation right now, unless maybe the daemonsets are editted manually) we might be able to default to disabled and enable "on demand".

But I'd like to research a little on the pros/cons before switching this at the project level default mode. Do you have any ideas on the trade-offs?

@invidian
Copy link
Contributor

invidian commented Oct 1, 2019

I think aggregation should be enabled by default, intuitively, because you can't use HPA without it and that is quite common. But if there is an easy way to enable aggregation on a running cluster (I know we don't support it without cluster re-creation right now, unless maybe the daemonsets are editted manually) we might be able to default to disabled and enable "on demand".

Good point from architecture point of view.

@surajssd I'm not familiar with cert-manager latest upgrade, but are you sure it is mandatory? I know you and @invidian worked on this, but I think @invidian mentioned it was optional (like, you can use cert-manager without aggregation). Can you confirm, @invidian ?

It is recommended to have webhooks enabled for additional verification etc, but they are not required.

@surajssd
Copy link
Contributor Author

surajssd commented Oct 3, 2019

So is it agreed upon that we need aggregation enabled by default?

@rata
Copy link
Contributor

rata commented Oct 23, 2019

@surajssd there is knob for this already. Why would we change the default without knowing the trade-offs we are doing when switching the default? Is changing the default urgent for some reason I'm missing that we can't evaluate that decision carefully?

Like I said in my previous comment, I guess this might be the right call. But can't we do a little research on the pros/cons of this change and the trade-offs that it implies? (this might be relevant security-wise, right?). I'd prefer to do an informed decision (but my opinion is not final, so ignore me if you want). I won't stop you if you thik this is needed ASAP, of course.

@invidian
Copy link
Contributor

invidian commented Jan 7, 2020

@rata isn't kubernetes/kubernetes#63947 sufficient confirmation? I'd like to merge the PR.

Will merge today if there is no objections.

@rata
Copy link
Contributor

rata commented Jan 7, 2020

@invidian not sure. The confromance test has been run recently (some months after that patch was merged) and they run okay AFAIK. And, IIUC, we didn't enable api aggregation: https://github.com/kinvolk/lokomotive-kubernetes/tree/master/docs/conformance (look at the terraform code).

Maybe conformance test were run using something older than that commit?

Having a quick look, it was run using sonobuoy 0.16.2 (was recent by the time). There are newer releases, though: https://github.com/vmware-tanzu/sonobuoy/releases. Haven't had enough time to see how the conformance tests are really run, so not sure what could be the cause of conformance test passing with API aggregation disabled. Maybe that PR is unrelated? Really not sure, nor have the time to check that now :-/

IMHO, if the reason to merge this is that conformance test need it, then we need to check that conformance test fail without this and work with this.

Again, I guess this is what we probably might end up doing (as I mentioned here: #76 (comment)) but I'd like to understand the reasoning and a minimal trade-off that this change implies. Needed due to pass conformance test seems good enough for me, but if it is actually needed ;)

@invidian
Copy link
Contributor

invidian commented Jan 7, 2020

We can wait until we upgrade to 1.17 then. I'm pretty sure conformance tests will fail then without aggregation enabled.

@schu
Copy link
Contributor

schu commented Jan 7, 2020

We can wait until we upgrade to 1.17 then. I'm pretty sure conformance tests will fail then without aggregation enabled.

AFAIK conformance tests fail w/o aggregation since v1.14 (see e.g. poseidon/typhoon#436), so if they didn't fail for us: did we run old tests or not run them at all?

@invidian
Copy link
Contributor

invidian commented Jan 7, 2020

Thanks for the input @schu! From the issue you linked:

Passing a v1.14 CNCF conformance test requires aggregation be enabled. Having an option for aggregation keeps compliance, but retains the stricter security posture on default clusters

Speaking from experience, having aggregation disabled is a PITA. Also, if conformance test requires it to be enabled, it must not be that insecure I think... I also couldn't find any good arguments against it except the one mentioned in the issue. The k8s documentation also does NOT warn about it.

Enabling aggregation and extension apiservers increases the attack surface of a cluster and makes extensions a part of the control plane. Admins must scrutinize and trust any extension apiserver used.

@invidian
Copy link
Contributor

invidian commented Jan 8, 2020

$ sonobuoy results 202001081355_sonobuoy_fef8b97a-b6d6-419a-a4cb-48b1c99d97f1.tar.gz
Plugin: e2e
Status: failed
Total: 4732
Passed: 273
Failed: 3
Skipped: 4456

Failed tests:
[sig-network] Services should be able to change the type from ExternalName to NodePort [Conformance]
[sig-network] Services should be able to create a functioning NodePort service [Conformance]
[sig-api-machinery] Aggregator Should be able to support the 1.10 Sample API Server using the current Aggregator [Conformance]

Plugin: systemd-logs
Status: passed
Total: 3
Passed: 3
Failed: 0
Skipped: 0

As expected, Lokomotive without aggregation enabled is not complaint. Cluster deployed from 89c3a65.

@rata
Copy link
Contributor

rata commented Jan 9, 2020

@invidian oh, cool. Thanks! If you have it still handy wanna check if they pass (or at least that doesn't faiil) with this enabled? Or have you checked that too setting the var, instead of relying on defaults?

If you don't have it handy, no problem.

Copy link
Contributor

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the good work! :-D

It will be good to confirm that this fixes the conformance test (at least that test) as expected and update the commit with the new reasoning.

Do you have the test environment created still and is easy to test again with the setting enabled? Or is it too late now?

Also, would you mind creating an issue to investigate the other conformance test that failed in your run?

@invidian
Copy link
Contributor

invidian commented Jan 9, 2020

Will rebase and do all that @rata, thanks for LGTM.

@invidian
Copy link
Contributor

invidian commented Jan 9, 2020

It will be good to confirm that this fixes the conformance test (at least that test) as expected and update the commit with the new reasoning.

Done, see 830a777.

Do you have the test environment created still and is easy to test again with the setting enabled? Or is it too late now?

I've tested it this morning. With aggregation enabled, the fail is gone.

Also, would you mind creating an issue to investigate the other conformance test that failed in your run?

No, so I created #124.

@rata
Copy link
Contributor

rata commented Jan 9, 2020

It will be good to confirm that this fixes the conformance test (at least that test) as expected and update the commit with the new reasoning.

Done, see 830a777.

Thanks!

Do you have the test environment created still and is easy to test again with the setting enabled? Or is it too late now?

I've tested it this morning. With aggregation enabled, the fail is gone.

Rock! :D

Also, would you mind creating an issue to investigate the other conformance test that failed in your run?

No, so I created #124.

Thanks again!

Copy link
Contributor

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Just small fixes that don't need another review. Whatever you consider best (true or just true), is good for me :)

Thanks again!

docs/flatcar-linux/packet.md Outdated Show resolved Hide resolved
digital-ocean/container-linux/kubernetes/variables.tf Outdated Show resolved Hide resolved
Aggregation layer is now a requirement when running conformance tests,
so it should be enabled by default.

Additionally, with usage of the latest cert-manager it is obligatory to enable api
aggregation on the apiserver.

This commit changes the behavior of the installer to enable aggregation
by default.

Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
@invidian
Copy link
Contributor

Resolved conflicts.

@invidian invidian merged commit 7759c43 into master Jan 14, 2020
@invidian invidian deleted the surajssd/enable-aggregation branch January 14, 2020 11:37
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants