-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup AWSEBSCSIDriver setup #11610
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -239,6 +239,15 @@ func (tf *TemplateFunctions) AddTo(dest template.FuncMap, secretStore fi.SecretS | |||
dest["EnableSQSTerminationDraining"] = func() bool { return *cluster.Spec.NodeTerminationHandler.EnableSQSTerminationDraining } | |||
} | |||
|
|||
if cluster.Spec.CloudConfig != nil && fi.BoolValue(cluster.Spec.CloudConfig.ManageStorageClasses) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sort of logic shouldn't be template functions. Template functions aren't scoped to any particular component or logic and makes things harder to maintain. That is precisely why we have option builders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you mean that fi.BoolValue(cluster.Spec.CloudConfig.ManageStorageClasses)
part should be removed, I agree.
If you mean that we should always materialize AWSEBSCSIDriver.Enabled
, I am not so sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to materialize it. Or make the templatest more complex.
The problem with the materialization now is that it triggers an unnecessary change to the bootstrap script. I think we can rather restrict what from CloudConfig
is added to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Referencing optional settings in templates other than the one for which they were meant is not nice and makes the templates harder to maintain.
We would trade a simple template function that checks if the feature is enabled, for extra code in multiple places, for an addon that can only be enabled starting k8s 1.20 and enabled by default in 1.22.
If we move the defaulting for 1.22 in model. that would make it better, I guess. This is why I was asking why only for newly created clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this was answered in office hours. Short version is that I wanted e2e to pass before enabling by default for all clusters. That work is almost done now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, answered and the PR was was updated.
4d7390a
to
d845db8
Compare
/retest |
d845db8
to
2c1688d
Compare
2c1688d
to
6f3c72b
Compare
@olemarkus Any other thoughts about this? |
} | ||
|
||
// Pulling CSI driver image | ||
image := "k8s.gcr.io/provider-aws/aws-ebs-csi-driver:" + *csi.Version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's value in doing the warmpull. We should keep this, but figure out why we're dereferencing nil here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree in this case. This should not have been done at all for anything other than CNI and proxy, which affect startup.
A generic way, with container images assets, would be the way to go forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have pending workloads with volumes this will speed up pod startup. So while it does not contribute to node startup, it contributed to pod scheduling time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nil dereferencing works because it assumes cloudup always set this value. That is the case now, but I wouldn't exactly call this very defensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a general argument for any pre-pulled image. Realistically, the pull time for these images is very small.
As I said, I don't disagree with the idea, if done correctly. I am not ok with using it in this case and having extra logic in NodeUp for a special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note quite. There are three images pulled in succession, and one of them sometimes takes up to 20 seconds.
A random example:
FirstSeen Count From Type Reason Message
<nil> <none> <none> Normal Scheduled Successfully assigned kube-system/ebs-csi-node-9sbqq to ip-172-20-98-90.eu-central-1.compute.internal
2021-06-03T17:34:17Z 1 kubelet Normal Pulling Pulling image "k8s.gcr.io/provider-aws/aws-ebs-csi-driver:v1.0.0"
2021-06-03T17:34:20Z 1 kubelet Normal Pulled Successfully pulled image "k8s.gcr.io/provider-aws/aws-ebs-csi-driver:v1.0.0" in 3.187918027s
2021-06-03T17:34:21Z 1 kubelet Normal Created Created container ebs-plugin
2021-06-03T17:34:21Z 1 kubelet Normal Started Started container ebs-plugin
2021-06-03T17:34:21Z 1 kubelet Normal Pulling Pulling image "k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.1.0"
2021-06-03T17:34:34Z 1 kubelet Normal Pulled Successfully pulled image "k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.1.0" in 12.506681986s
2021-06-03T17:34:34Z 1 kubelet Normal Created Created container node-driver-registrar
2021-06-03T17:34:34Z 1 kubelet Normal Started Started container node-driver-registrar
2021-06-03T17:34:34Z 1 kubelet Normal Pulling Pulling image "k8s.gcr.io/sig-storage/livenessprobe:v2.2.0"
2021-06-03T17:34:35Z 1 kubelet Normal Pulled Successfully pulled image "k8s.gcr.io/sig-storage/livenessprobe:v2.2.0" in 1.094773878s
2021-06-03T17:34:35Z 1 kubelet Normal Created Created container liveness-probe
2021-06-03T17:34:36Z 1 kubelet Normal Started Started container liveness-probe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we can debate this forever. I doubt any of us will change its mind. Hopefully this will change in the near future to something more maintainable.
What this does:
/cc @olemarkus @rifelpet