You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, you could address the findings from Oliver's review against the main branch by creating a PR that addresses his comments.
Mid
Switch to prow for head-update and pull-requests jobs. For new repos we already use prow as it gives many possibilities such as having e2e test, defining checks in much better format than CI/CD black magics, etc.
No developer docs/instructions how to run it locally (aka local setup docs).
Switch to project to push to AR. GCR is being replaced by AR and C. Cwienk was adapting all repos to publish to AR. Looks like he missed this repo: Move from GCR to artifact registry #10
We drop vendoring from the repos. Nowadays, gardener/gardener and most of the extension repos don't have a vendor dir. As a new Project you could drop vendor dir as well: Drop vendoring #13
- you should also think about the garden runtime cluster. As far as I understand, we would like to use the same approach to scale the virtual-kube-apiserver, maybe even the gardener-apiserver. Definitely this check won't work for the virtual-kube-apiserver or gardener-apiserver.
: I am not sure reusing the prometheus Shoot access Secret is a good thing. Maybe we should rather have an own Shoot access Secret that gets created by gardenlet?: Apply review comments. Add debug support #7
No readiness and liveness probe defined in the example Deployment in example/custom-metrics-deployment.yaml
Minor
Drop the .docforge dir and switch to the central manifest. After Use central and new manifest format documentation#431, the repos do no longer needs to define a .docforge dir and the manifests are maintained centrally. See the linked issue for more details. Additionally, as a consequence
############# base image # TODO: Andrey: P1: Move to distroless
- +1, let's use distroless instead of alpine. It is also part of the component checklist - the component should not run as a root user, if possible.: Fix make verify #11
testIsolationmetricsClientTestIsolation// Provides indirections necessary to isolate the unit during tests
: Instead of having metricsClientTestIsolation you could directly have a field that is rest.HTTPClient. When you instantiate in non-test code, you pass real client to a constuctor func such as NewMetricsClient(httpClient). When you instantiate in test code, you pass fake/mock client.
: Why we need such utility func and why it is not possible to use fmt.Errorf natively as in every other place in the gardener code-base.
Nits (really, really, really minor)
rm LICENSE.md. The licenses are already defined under LICENSES/ and there is the symlink LICENSE in the root of the repo: Switch to use REUSE license format #12
: In gardener/gardener and other repos we don't use import alias for "sigs.k8s.io/controller-runtime/pkg/manager". I assume you use import alias to have a variable named manager. In gardener/gardener we avoid this problem by naming the var mgr. You could follow also the same pattern: Upgrade k8s.io/* to v0.28, sigs.k8s.io/controller-runtime to v0.16 #14
# Copyright (c) 2020 SAP SE or an SAP affiliate company. All rights reserved. This file is licensed under the Apache Software License, v. 2 except as noted otherwise in the LICENSE file
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
go.mod: We usually use only 2 require blocks. One for the direct dependencies, and the other one managed by the goo tool for the indirect/transitive ones. You could merge the first and the second require blocks.: Apply review comments. Add debug support #7
: There are many comments which are not very useful for the reader. I rather suggest to drop them as they seem leftover from testing: Fix make verify #11
: the package naming convensions in golang (https://go.dev/blog/package-names): Good package names are short and clear. They are lower case, with no under_scores or mixedCaps.
: What happens when pod.Status.PodIP is empty. According to the doc string of the field, pod.Status.PodIP will be empty if not yet allocated.
[andrerun-new]: See the log entry below. It's a bit ugly - a project outsider may have a hard time figuring out what's going on. @ialidzhikov, a pod gets stuck without an IP address every now and then, right? It's not an extremely rare event? If so, I think I should add special handling for this case and log a nicer message. [under-discussion]
ERROR gardener-custom-metrics.input.scraper Kapi metrics retrieval failed {"op": "scrape", "namespace": "shoot--local--local", "pod": "kube-apiserver-5588c58789-crm72", "error": "metrics client: making http request: Get \"https:///metrics\": http: no Host in request URL"}
[andrerun-new]: The design reason is I want to keep the decision 'where to scrape', outside of the scraper. There's also a minor runtime concern - I prefer less object creation/GC churn.
3: Storing the same Pod labels would be a lot waste of memory. I see that you need the Pod labels to allow selecting metrics by object labelSelector. Maybe the whole model has to be adapted. We can for example accept that Pod labels are immutable and store them only once and not for every new metric value. [under-discussion]
: IIUC, the benefit of running 2 replicas is only that the 2nd Pod waits in "stand by" mode and on issues with the leader replica, the "stand by" can take over faster. By faster - we don't to wait a new Pod to be scheduled and started. Updating the Endpoint manually to influence the traffic to go to the leader replica looks hacky. We were running metrics-server for Shoots and ManagedSeeds for years with a single replica and I don't recall us having issues related to it. https://github.com/kubernetes-sigs/metrics-server/tree/master?tab=readme-ov-file#high-availability: metrics-server seems to have a real HA mode where 2 of the replicas are serving (?). We can check what they do and how. And I agree with Proposed #3 (comment) - this approach is error-prone a lot.
[andrerun]: The main benefit I see in the second replica is that it ties compute resources in another AZ, so it guards against AZ resource shortage disrupting failover. Overall, I have my reservations regarding the need for a second replica, considering the intended use of the component, but that was a hard requirement introduced by the GEP review process. I'll elaborate offline. [under-discussion]
I didn't manage to test the component in local setup at all (due to missing docs/instructions) but I wanted to ask how it behaves on restarts and whether the HPA acting on the custom metric is fine with it. I assume on Pod restart the leader will change and the newly elected replica won't report any metrics (or will report 0-ed metrics value). Is HPA able to deal with unavailability of the gardener-custom-metrics component?
Final notes. I didn't deep dive into non-trivial packages like ./pkg/input/metrics_scraper.
The text was updated successfully, but these errors were encountered:
First, you could address the findings from Oliver's review against the main branch by creating a PR that addresses his comments.
Mid
sigs.k8s.io/custom-metrics-apiserver
dependency (v1.28.0
)gardener-custom-metrics/go.mod
Line 21 in 392b48a
k8s.io/*
tov0.28
,sigs.k8s.io/controller-runtime
tov0.16
#14gardener-custom-metrics/pkg/api/generated/openapi/openapi.go
Line 3 in 392b48a
k8s.io/*
tov0.28
,sigs.k8s.io/controller-runtime
tov0.16
#14testIsolation
approach to fake/mockTimeNow
. Usek8s.io/utils/clock/.RealClock
/k8s.io/utils/clock/testing.FakeClock
instead. Example: https://github.com/gardener/gardener/blob/dff43d99d2128c99f6d09116145ee48aebfd97f4/pkg/controllermanager/controller/event/reconciler.go#L38. Usages that need adaptation:gardener-custom-metrics/pkg/metrics_provider/metrics_provider.go
Line 54 in 392b48a
gardener-custom-metrics/pkg/input/input_data_registry/input_data_registry.go
Line 168 in 392b48a
gardener-custom-metrics/pkg/input/metrics_scraper/pacemaker.go
Line 51 in 392b48a
gardener-custom-metrics/pkg/input/metrics_scraper/scrape_queue.go
Line 76 in 392b48a
gardener-custom-metrics/pkg/input/metrics_scraper/scraper.go
Lines 295 to 296 in 392b48a
gardener-custom-metrics/pkg/ha/ha_service.go
Line 29 in 392b48a
gardener-custom-metrics/pkg/util/gardener/util.go
Line 13 in 392b48a
shoot-<project-name>-<shoot-name>
. See https://github.com/gardener/gardener/blob/76704c377f34cdbdf1b0d3986b243c8b67c66909/pkg/component/kubeapiserverexposure/kube_apiserver_service.go#L237-L239 and https://github.com/gardener/gardener/blob/76704c377f34cdbdf1b0d3986b243c8b67c66909/pkg/utils/gardener/shoot.go#L696-L708. You should rather adapt the check to be "has prefixshoot-
": Apply review comments. Add debug support #7gardener-custom-metrics/pkg/util/gardener/util.go
Line 13 in 392b48a
gardener-custom-metrics/pkg/input/controller/secret/actuator.go
Line 33 in 392b48a
gardener-custom-metrics/pkg/input/controller/reconciler_test.go
Line 31 in 392b48a
gardener-custom-metrics/pkg/app/common.go
Lines 10 to 17 in 392b48a
example/custom-metrics-deployment.yaml
Minor
.docforge
dir and switch to the central manifest. After Use central and new manifest format documentation#431, the repos do no longer needs to define a.docforge
dir and the manifests are maintained centrally. See the linked issue for more details. Additionally, as a consequencegardener-custom-metrics/Makefile
Lines 104 to 106 in 392b48a
make check-docforge
target. It should be no longer needed.: Fixmake verify
#11make check
is reporting golangci-lint findings. You could fix them.: Fixmake verify
#11make format
is failing that there is notest/
dir.: Fixmake verify
#11make generate
is not implemented (gardener-custom-metrics/Makefile
Lines 112 to 116 in 392b48a
gardener-custom-metrics/Dockerfile
Line 9 in 392b48a
make verify
#11pkg/version
pkg - we usually don't define such pkg in other repos and rather reuse thek8s.io/component-base/version/verflag
pkg. You should be already familiar with it as in Add support for a --version command line flag gardener-extension-runtime-gvisor#38 you used this pkg and eliminated a custom version pkg in the runtime-gvisor extension: Move from GCR to artifact registry #10gardener-custom-metrics/pkg/input/metrics_scraper/metrics_client.go
Line 44 in 392b48a
metricsClientTestIsolation
you could directly have a field that isrest.HTTPClient
. When you instantiate in non-test code, you pass real client to a constuctor func such asNewMetricsClient(httpClient)
. When you instantiate in test code, you pass fake/mock client../pkg/util/gardener
to/third_party/
. Example: gardener/gardener@a1eb2fb: Drop vendoring #13gardener-custom-metrics/cmd/gardener-custom-metrics/main.go
Lines 85 to 88 in 392b48a
gardener-custom-metrics/pkg/util/errutil/errutil.go
Lines 13 to 19 in 392b48a
fmt.Errorf
natively as in every other place in the gardener code-base.Nits (really, really, really minor)
rm LICENSE.md
. The licenses are already defined underLICENSES/
and there is the symlinkLICENSE
in the root of the repo: Switch to use REUSE license format #12gardener-custom-metrics/cmd/gardener-custom-metrics/main.go
Line 16 in 392b48a
mgr
. You could follow also the same pattern: Upgradek8s.io/*
tov0.28
,sigs.k8s.io/controller-runtime
tov0.16
#14gardener-custom-metrics/Makefile
Lines 1 to 13 in 392b48a
hack/test-e2e-local.sh
but it seems not used. Let's drop it for now and introduce it if needed in the future.: Drop vendoring #13gardener-custom-metrics/Dockerfile
Line 16 in 392b48a
gardener-custom-metrics/Dockerfile
Line 7 in 392b48a
gardener-custom-metrics/Dockerfile
Line 17 in 392b48a
VERSION
file should rather contain the following versionv0.1.0-dev
: Apply review comments. Add debug support #7go.mod
: We usually use only 2require
blocks. One for the direct dependencies, and the other one managed by the goo tool for the indirect/transitive ones. You could merge the first and the second require blocks.: Apply review comments. Add debug support #7gardener-custom-metrics/go.mod
Lines 12 to 21 in 392b48a
make verify
#11gardener-custom-metrics/go.mod
Line 115 in 392b48a
v0.23.7
) ingardener-custom-metrics/go.mod
Line 15 in 392b48a
k8s.io/client-go@v11.0.1-0.20190409021438-1a26190bd76a+incompatible
gardener#6807. But I see that this repo no longer vendorsgithub.com/gardener/gardener
and in the latest versions of gardener this issue is fixed.: Drop the sigs.k8s.io/metrics-server dependency #8gardener-custom-metrics/pkg/api/openapi-gen-dependency.go
Lines 4 to 9 in 392b48a
hack/tools.go
to make it clear that this is a tool dependency: Upgradek8s.io/*
tov0.28
,sigs.k8s.io/controller-runtime
tov0.16
#14gardener-custom-metrics/pkg/app/cli_options.go
Lines 104 to 105 in 392b48a
gardener-custom-metrics/pkg/metrics_provider/metrics_provider.go
Line 18 in 392b48a
Good package names are short and clear. They are lower case, with no under_scores or mixedCaps.
gardener-custom-metrics/pkg/metrics_provider/metrics_provider.go
Line 81 in 392b48a
%s/%s
, the String() method of atypes.NamespacedName
already does this.gardener-custom-metrics/pkg/input/input_data_registry/consumer_interface.go
Lines 13 to 19 in 392b48a
gardener-custom-metrics/pkg/input/input_data_registry/consumer_interface.go
Line 12 in 392b48a
kapi
is an abreviation that get's used somewhere in gardener/gardener code-base. Consider renaming tokubeAPIServer
.gardener-custom-metrics/pkg/input/metrics_scraper/metrics_client.go
Line 96 in 392b48a
responce
->response
: Apply review comments. Add debug support #7gardener-custom-metrics/pkg/input/metrics_scraper/metrics_client.go
Line 71 in 392b48a
metrics client:
prefix from the error messages from this func. It should be rather added by the caller.gardener-custom-metrics/pkg/input/controller/pod/actuator.go
Lines 116 to 120 in 392b48a
InjectClient
are removed in the newer versions ofsigs.k8s.io/controller-runtime
. This should be adapted after upgrading the dependencies to the latest versions: Upgradek8s.io/*
tov0.28
,sigs.k8s.io/controller-runtime
tov0.16
#14gardener-custom-metrics/pkg/input/controller/pod/predicate.go
Line 31 in 392b48a
NewKubeAPIServerPodPredicate
.gardener-custom-metrics/pkg/input/controller/constants.go
Lines 3 to 10 in 392b48a
gardener-custom-metrics/pkg/input/metrics_scraper/scrape_queue.go
Line 161 in 392b48a
dueAtTime.Sub(q.scrapePeriod)
.--version
flag.Questions:
gardener-custom-metrics/pkg/input/controller/pod/actuator.go
Line 75 in 392b48a
pod.Status.PodIP
is empty. According to the doc string of the field,pod.Status.PodIP
will be empty if not yet allocated.gardener-custom-metrics/pkg/input/input_data_registry/input_data_registry.go
Line 24 in 392b48a
MetricsUrl
? Instead you could only store the Pod IP and construct the metrics URL when fetching the metrics.gardener-custom-metrics/pkg/input/input_data_registry/input_data_registry.go
Line 2 in 392b48a
3: Storing the same Pod labels would be a lot waste of memory. I see that you need the Pod labels to allow selecting metrics by object labelSelector. Maybe the whole model has to be adapted. We can for example accept that Pod labels are immutable and store them only once and not for every new metric value. [under-discussion]
gardener-custom-metrics/pkg/ha/ha_service.go
Line 47 in 392b48a
Final notes. I didn't deep dive into non-trivial packages like
./pkg/input/metrics_scraper
.The text was updated successfully, but these errors were encountered: