-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy LoadBalancerStatus from core to networking #106242
Copy LoadBalancerStatus from core to networking #106242
Conversation
the job failures are related, it seems it need to run some update magic
|
2e4b123
to
e25c1db
Compare
/remove-sig api-machinery |
Should be no compat problem, I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mechanics and wire compatibility lgtm. one question on the godoc change on a couple fields
ping @thockin , the beginning of the cycle is the best time for this kind of changes, are you able to move this forward soon? |
This type should never have been shared between Service and Ingress. The `ports` field is unfortunate, but it is needed to stay compatible.
dd5fd59
to
0153bfa
Compare
/lgtm |
Upstream changes included changing Ingress.LoadBalancerStatus from corev1.LoadBalancerStatus to networkingv1.IngressLoadBalancerStatus. This required the addition of 2 new factory funcs to convert slim.LoadBalancerIngress to networkingv1.IngressLoadBalancerIngress and another to convert LoadBalancerStatus to IngressLoadBalancerStatus in the slim client. See: kubernetes/kubernetes#106242 Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Upstream changes included changing Ingress.LoadBalancerStatus from corev1.LoadBalancerStatus to networkingv1.IngressLoadBalancerStatus. This required the addition of 2 new factory funcs to convert slim.LoadBalancerIngress to networkingv1.IngressLoadBalancerIngress and another to convert LoadBalancerStatus to IngressLoadBalancerStatus in the slim client. See: kubernetes/kubernetes#106242 Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Set operator to remove the label of a pod that existed before the node taint 1. Delete the specified label pod according to the parameter --pod-restart-selector, default value is k8s-app=kube-dns 2. --pod-restart-selector="" Remove all pods Fixes: #21594 Signed-off-by: yanru.lv <yanru.lv@daocloud.io> chore(deps): update base-images Signed-off-by: Renovate Bot <bot@renovateapp.com> images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> Added Rafay Systems to the user list Rafay's Kubernetes Operations Platform uses Cilium for centralized network visibility and network policy enforcement. Signed-off-by: Saim Safdar <59512053+Saim-Safdar@users.noreply.github.com> Adding Polverio to user list Signed-off-by: Stuart Preston <mail@stuartpreston.net> gh/wf: Enable kube-proxy on some of DP conformance * 4.19 - kube-proxy only. * 5.4 - mixed (kube-proxy and KPR). * >= 5.10 - KPR only. Signed-off-by: Martynas Pumputis <m@lambda.lt> build: Update Swagger to 0.30.3 Update Swagger to 0.30.x that is the first version with arm64 compatible images at quay.io/goswagger/swagger. This allows Cilium APIs to be regenerated on arm64 platforms. Otherwise starightforward, but endpoint state in Endpoint status changed to a pointer: - State EndpointState `json:"state"` + State *EndpointState `json:"state"` Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> dnsproxy: introduce matcher cache to reference count regexes This introduce a reference counted cache/index for regexes, and is ready for supporting other caches as well. This means that policies with the same set of rules can reuse the underlying matcher saving, since long regex tends to occupy a lot of memory. This also makes the LRU cache for compiled regexes more or less redundant, mitigating the drawbacks of that cache. Having a large LRU cache is useful when a lot of similar policies are in use at the same time, but in case a node have a small set of policies inserted at the same time, but the churn makes new and distinct policies come and go, the LRU cache will use a lot more memory than the benefit it gives. benchmark TL;DR: Heap usage compared to a LRU size of 128 is down by ~99%, and ~92% with a LRU size of 1024, with the given test data*. * defining heap usage as (HeapInuse after test) - (HeapInuse before setup) $ benchstat cache-1024-entries.txt pr-21895-without-cache.txt name old time/op new time/op delta _perEPAllow_setPortRulesForID-7 83.7s ±17% 77.4s ± 8% ~ (p=0.310 n=5+5) name old B(HeapInUse)/op new B(HeapInUse)/op delta _perEPAllow_setPortRulesForID-7 1.79G ± 0% 0.10G ± 1% -94.55% (p=0.008 n=5+5) name old alloc/op new alloc/op delta _perEPAllow_setPortRulesForID-7 78.6GB ± 0% 78.6GB ± 0% -0.00% (p=0.008 n=5+5) name old allocs/op new allocs/op delta _perEPAllow_setPortRulesForID-7 167M ± 0% 167M ± 0% -0.01% (p=0.008 n=5+5) $ go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x ./pkg/fqdn/dnsproxy // With this change; go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x -memprofile memprofile.out ./pkg/fqdn/dnsproxy goos: linux goarch: amd64 pkg: github.com/cilium/cilium/pkg/fqdn/dnsproxy cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz Benchmark_perEPAllow_setPortRulesForID_large Before Setup (N=1,EPs=20,cache=128) Alloc = 3 MiB HeapInuse = 5 MiB Sys = 20 MiB NumGC = 6 Before Test (N=1,EPs=20,cache=128) Alloc = 37 MiB HeapInuse = 49 MiB Sys = 1031 MiB NumGC = 18 After Test (N=1,EPs=20,cache=128) Alloc = 52 MiB HeapInuse = 62 MiB Sys = 1031 MiB NumGC = 53 Benchmark_perEPAllow_setPortRulesForID_large-7 1 4289436000 ns/op 2559832616 B/op 34303565 allocs/op PASS ok github.com/cilium/cilium/pkg/fqdn/dnsproxy 12.574s // Baseline with LRU size of 1024 goos: linux goarch: amd64 pkg: github.com/cilium/cilium/pkg/fqdn/dnsproxy cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz Benchmark_perEPAllow_setPortRulesForID_large Before Setup (N=1,EPs=20,cache=1024) Alloc = 3 MiB HeapInuse = 5 MiB Sys = 20 MiB NumGC = 6 Before Test (N=1,EPs=20,cache=1024) Alloc = 37 MiB HeapInuse = 49 MiB Sys = 969 MiB NumGC = 18 After Test (N=1,EPs=20,cache=1024) Alloc = 312 MiB HeapInuse = 357 MiB Sys = 969 MiB NumGC = 46 Benchmark_perEPAllow_setPortRulesForID_large-7 1 5990697400 ns/op 3470756528 B/op 36068464 allocs/op PASS ok github.com/cilium/cilium/pkg/fqdn/dnsproxy 11.813s // Baseline with LRU size of 128 goos: linux goarch: amd64 pkg: github.com/cilium/cilium/pkg/fqdn/dnsproxy cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz Benchmark_perEPAllow_setPortRulesForID_large Before Setup (N=1,EPs=20,cache=128) Alloc = 3 MiB HeapInuse = 5 MiB Sys = 19 MiB NumGC = 5 Before Test (N=1,EPs=20,cache=128) Alloc = 38 MiB HeapInuse = 49 MiB Sys = 1047 MiB NumGC = 17 After Test (N=1,EPs=20,cache=128) Alloc = 1763 MiB HeapInuse = 1907 MiB Sys = 2918 MiB NumGC = 44 Benchmark_perEPAllow_setPortRulesForID_large-7 1 81058653300 ns/op 10328100808 B/op 46896658 allocs/op PASS ok github.com/cilium/cilium/pkg/fqdn/dnsproxy 93.574s Signed-off-by: Odin Ugedal <odin@uged.al> Signed-off-by: Odin Ugedal <ougedal@palantir.com> dnsproxy: stop compiling regexes in json marshalling There is no need to unmarshal regex patterns directly into compiled regexes. Instead, marshal the pattern into a string that we later compile in the dnsproxy to avoid duplicate work. This ensures that we only compile two equal patterns once, ensuring that we save both memory and cpu time during startup. Function wise this is a no-op, as long as the regexes are valid. If they are invalid, we now unmarshal successfully into a string, and gracefully ignores the provided regex in case we are unable to compile it. Previously this was handled in the json unmarshaller. Signed-off-by: Odin Ugedal <ougedal@palantir.com> aws/eni: create ENI in known subnet by default In the ENI mode of IPAM, when all IPs of an ENI are used, a new ENI is created to provide more IPs. Cilium chooses the subnet for this ENI based on SubnetIDs and SubnetTags, but if none are configured it needs to choose a subnet nonetheless. In this fallback case, the heuristic used was to choose the subnet in the same VPC and Availability Zone with the most available addresses. This led to surprising results, however, since the first ENI and future ENIs were not necessariliy in the same subnet. Instead, as part of node discovery, remember the subnet of the primary ENI as part of the Spec and try to create future ENIs in it as well, as long as there are sufficient addresses available. Note that explicitly configured subnet IDs and/or tags still take precedence, this changes the unconfigured default to be a bit more reasonable. Fixes: #20553 Signed-off-by: David Bimmler <david.bimmler@isovalent.com> aws/eni: fix cilium operator crash on IPv6 ENI Cilium operator would crash when being brought up in an AWS region where there was a IPv6-only ENI and no subnet filters, because it would fail to parse the ENI (logs will show "ENI has no IP address" and "Initial synchronization with instances API failed"). We work around this issue for the moment by filtering the network interfaces we fetch from AWS with 'private-ip-addresses=*', which includes all ENIs with any value in the PrivateIpAddress field. This is the field `parseENI` complains about otherwise. In general, though, it seems that the ENI IPAM mode needs to learn to handle IPv6 ENIs. That will not be a small undertaking, so we fix the obvious bug for now. Co-authored-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: David Bimmler <david.bimmler@isovalent.com> docs: state that IPv6-only ENIs are unsupported Signed-off-by: David Bimmler <david.bimmler@isovalent.com> test/control-plane: Add nil checks to shutdown logic The `agentHandle.tearDown` function always assumes it is called after its `hive` and `d` fields are set. The call to this shutdown logic is deferred. If we encounter an error before setting these fields, the deferred logic would cause an nil pointer dereference panic which masks the original error. See #22224. This commit adds nil checks so `tearDown` can always be called. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> mtu, node: fix build on all non-linux platforms This package is imported as a transitive dependency in cilium-cli which is built for linux, darwin and windows. Make sure the package compiles on all these platforms. Ref. https://github.com/cilium/cilium-cli/pull/958#issuecomment-1318745049 For #16843 Signed-off-by: Tobias Klauser <tobias@cilium.io> .github/workflows: split the image tag update in two steps If the images are not created, because they are already available in the docker image repository, they will have an empty image digest set and the image tag replacement will wrongly use this empty digest. Fixes: c5a778723a43 ("add auto-commit capability to build base images GH workflow") Signed-off-by: André Martins <andre@cilium.io> build(deps): bump github.com/spf13/viper from 1.13.0 to 1.14.0 Bumps [github.com/spf13/viper](https://github.com/spf13/viper) from 1.13.0 to 1.14.0. - [Release notes](https://github.com/spf13/viper/releases) - [Commits](https://github.com/spf13/viper/compare/v1.13.0...v1.14.0) --- updated-dependencies: - dependency-name: github.com/spf13/viper dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> chore(deps): update docker.io/library/golang:1.19.3 docker digest to dc76ef0 Signed-off-by: Renovate Bot <bot@renovateapp.com> workflows: aks: enable debug This commit enables debug logging for all AKS workflow, which should help debug some flaky tests. Signed-off-by: Gilberto Bertin <jibi@cilium.io> pkg/option: add flag for toggling stale CEP cleanup. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> daemon: add indexer to CES watcher which indexes local CES. Added support for indexing informer in k8s/watchers, as well as custom indexer func which allows maintaining index on CES's containing local endpoints by their underlying endpoint names. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> daemon: add cleanup for stale local ciliumendpoints. It's possible for CiliumEndpoints to become stale where they still reference existing Pods that are no longer being managed by Cilium. In this scenario, the operator will not GC these CEPs as they have a valid pod owner reference. This commit adds an init cleanup which cleans up stale ceps. As well, cep/ces K8s watchers will mark such CEPs for deletion and a controller GC routine will periodically GC the old CEPs. Fixes #17631 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> daemon/cmd: make CES cleanup behaviour explicit. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> k8s/watchers: prevent panic when cep has no network status. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> daemon/cmd: cleanup, remove superfluous sprintf. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> cli: Remove yaml parser from policy trace Signed-off-by: Rushikesh Butley <rushikeshbutley@gmail.com> policy/trace: Remove unused yaml parsing function Signed-off-by: Rushikesh Butley <rushikeshbutley@gmail.com> policy/trace: Remove redundant files Signed-off-by: Rushikesh Butley <rushikeshbutley@gmail.com> docs: Clarify wildcards and subdomains in FQDN policies Signed-off-by: flxman <felix.farjsjo@gmail.com> docs: Update API rate limiter metrics to match style of other metrics We do this by removing the extraneous "cilium_" prefix from the metrics to align with the other metrics names in this file. Signed-off-by: Chris Tarazi <chris@isovalent.com> docs: Fix incorrect FQDN metrics which are disabled by default This metrics were incorrectly stating that they were enabled by default which confused users. Fix it to mention they are disabled by default and must be enabled explicitly via --metrics. Fixes: 1133bd5d30 ("docs: Added `Default` column in metrics details") Fixes: https://github.com/cilium/cilium/pull/20255 Signed-off-by: Chris Tarazi <chris@isovalent.com> Fix broken documentation URL in helm chart template Link to OSS documentation for policy-enforcement-modes is incorrect in helm chart template. This is a minor fix to point to correct documentation URL Signed-off-by: Navin Kukreja <navin.kukreja@isovalent.com> Co-authored-by: Raphaël Pinson <raphael@isovalent.com> bpf: egressgw: clarify IPSec key for tunnel encapsulation The `encrypt_key` in handle_ipv4_from_lxc() is obtained from a IPCache lookup for the packet's `daddr`. It doesn't make sense to use this key in the context of redirecting EgressGW traffic - here the tunnel's remote endpoint is not `daddr`, but an EgressGW node. As EgressGW and IPSec are currently mutually exclusive, we can just hard-code this parameter to 0 for now. In the future we would need to look up the IPSec key of the selected EgressGW node. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> build(deps): bump github.com/spf13/cobra from 1.5.0 to 1.6.1 Bumps [github.com/spf13/cobra](https://github.com/spf13/cobra) from 1.5.0 to 1.6.1. - [Release notes](https://github.com/spf13/cobra/releases) - [Commits](https://github.com/spf13/cobra/compare/v1.5.0...v1.6.1) --- updated-dependencies: - dependency-name: github.com/spf13/cobra dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> build(deps): bump github/codeql-action from 2.1.30 to 2.1.32 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.1.30 to 2.1.32. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/18fe527fa8b29f134bb91f32f1a5dc5abb15ed7f...4238421316c33d73aeea2801274dd286f157c2bb) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> pkg/k8s: fallback on retrieving CiliumNode from kube-apiserver Retrieving objects from caches can be useful to prevent doing useless requests to kube-apiserver. In the unlikely event that the object doesn't exist in the local cache Cilium can try to retrieve it from kube-apiserver directly. For this particular case, with CiliumNode, it is causing Cilium to fatal as it is unable to retrieve CiliumNode from the cache, due subsystem initialization issues, thus we will fallback on retrieving the object directly from kube-apiserver. In this case, the subsystem initialization issue happened due to the fact that CiliumNode watcher is blocked on its event handler by the egressGatewayManager [1] which is blocked by the initialization of the identity allocator [2]. Unfortunately, the identity allocator is only initialized at a later stage causing the CiliumNode cache from being populated with all of its nodes. [1] https://github.com/cilium/cilium/blob/933bdcbec9319b0148b12688f720fbaaf55e0dba/pkg/k8s/watchers/cilium_node.go#L56 [2] https://github.com/cilium/cilium/blob/933bdcbec9319b0148b12688f720fbaaf55e0dba/pkg/egressgateway/manager.go#L83 Fixes: 69e4c6974891 ("k8s: optimize API calls made to kube-apiserver") Signed-off-by: André Martins <andre@cilium.io> operator: fix CEP GC When CEP was converted to an internal CEP structure, the UID field was not copied, causing the delete requests of CEPs to have their UID precondition set as empty. When kube-apiserver received this delete request it didn't delete the CEP because an empty CEP UID didn't match an existent UID. Fixes: 6f7bf6c51f7a ("Prevent CiliumEndpoint removal by non-owning agent") Reported-by: Bruno Custódio <bruno@isovalent.com> Signed-off-by: André Martins <andre@cilium.io> docker: Do not specify syntax Not specifying the syntax starts builds faster, but relies the default syntax to be recent enough. This is currently the case, so remove the syntax references. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> images: update cilium-{runtime,builder} Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> bugtool: Fix URL to blog.ralch.com Signed-off-by: yanggang <gang.yang@daocloud.io> Signed-off-by: Joe Stringer <joe@cilium.io> docs: fix deployment resource type output Since k8s had remove support for extensions/v1beta1 API version after 1.16, we should update the docs to the latest and stable version. Signed-off-by: cleverhu <shouping.hu@daocloud.io> pkg/k8s: do not read k8s node annotations if they are not written When there is an annotation in the k8s node object, the annotation `io.cilium.network.ipv4-cilium-host` is used as the CiliumInternal IP address of the CiliumNode object in [1]. Whenever Cilium is updating any state into the CiliumNode it retrieves all IP address from k8s node, including the ones from annotations, and appends the local node's IP addresses, including the newly correct internal / router IP address, in [2]. Since this is a list, the annotation's IP address is always used first and all other Cilium agents will wrongly use it for any operation. [1] https://github.com/cilium/cilium/blob/927bd8c26904ff92e42c61cec6d00ea8ac062c05/pkg/nodediscovery/nodediscovery.go#L453-L459 [2] https://github.com/cilium/cilium/blob/927bd8c26904ff92e42c61cec6d00ea8ac062c05/pkg/nodediscovery/nodediscovery.go#L474-L489 Fixes: 73d6cae2c906 ("install: default AnnotateK8sNode to false") Signed-off-by: André Martins <andre@cilium.io> pkg/nodediscovery: do not use Node annotations when mutating CiliumNode When using CiliumNode, the agent's source of truth should be the agent itself and not k8s node annotations. Thus we will not use the annotations for the CiliumInternalIP address when generating a CiliumNode from the k8s Node resource. Signed-off-by: André Martins <andre@cilium.io> test: Fail on router IP mismatch warnings We try to restore the router IP both from the filesystem (first) and from Kubernetes objects (as a fallback). If the two IP addresses don't match, we emit a warning. There is no good reason for this to happen in CI so we should fail the test if that warning ever shows up. Doing so would have prevented the flake fixed by the previous commit. Signed-off-by: Paul Chaignon <paul@cilium.io> .github: Explicitly set build-commits job runner image version github: Install libtinfo5 for clang in build-commits CI job Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> docs: Update Cilium Sphinx RTD Theme reference This updates Documentation/requirements.txt to reference a new commit hash on the theme's v1.0 branch. This will trigger an RTD build. Signed-off-by: Stacy Kim <stacy.kim@ucla.edu> gha: Pin ubuntu-20.04 for conformance-test-ipv6 This commit is to avoid ubuntu version drift for runner, till the proper version upgrade is done. Signed-off-by: Tam Mach <tam.mach@cilium.io> .github: fix bpf-checks on ubuntu-latest runner Take the same approach as in 5f7aa03fcc7b (".github: Explicitly set build-commits job runner image version"). Signed-off-by: Julian Wiedmann <jwi@isovalent.com> relay: Add Go runtime metrics and process metrics Currently the agent has a GoCollector and ProcessCollector but relay does not, this updates the relay for consistency and enhanced debuggability. Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> daemon/cmd: Fix error handling for getting proxy port The error check handling should be done immediately after the GetProxyPort() call, in order to error out as soon as possible. This unchecked error can cascade to code integrations with the Agent and cause potentially difficult to track down behavior. Signed-off-by: Chris Tarazi <chris@isovalent.com> build(deps): bump go.etcd.io/etcd/client/pkg/v3 from 3.5.5 to 3.5.6 Bumps [go.etcd.io/etcd/client/pkg/v3](https://github.com/etcd-io/etcd) from 3.5.5 to 3.5.6. - [Release notes](https://github.com/etcd-io/etcd/releases) - [Changelog](https://github.com/etcd-io/etcd/blob/main/Dockerfile-release.amd64) - [Commits](https://github.com/etcd-io/etcd/compare/v3.5.5...v3.5.6) --- updated-dependencies: - dependency-name: go.etcd.io/etcd/client/pkg/v3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> build(deps): bump go.etcd.io/etcd/api/v3 from 3.5.5 to 3.5.6 Bumps [go.etcd.io/etcd/api/v3](https://github.com/etcd-io/etcd) from 3.5.5 to 3.5.6. - [Release notes](https://github.com/etcd-io/etcd/releases) - [Changelog](https://github.com/etcd-io/etcd/blob/main/Dockerfile-release.amd64) - [Commits](https://github.com/etcd-io/etcd/compare/v3.5.5...v3.5.6) --- updated-dependencies: - dependency-name: go.etcd.io/etcd/api/v3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> docs: add instructions to build the base images from external forks When opening a PR to update the base images from external forks, the bot does not have necessary permissions to push the changes into the fork. For those cases the developer should amend the commit locally and push the changes themselves. Fixes: c5a778723a43 ("add auto-commit capability to build base images GH workflow") Signed-off-by: André Martins <andre@cilium.io> Revert "relay: Add Go runtime metrics and process metrics" This reverts commit f0fa683870e1030707ed01b4d4b23b57b2d5c6a8. It appears to introduce a double-initialization of metrics, causing relay initialization failures. Signed-off-by: Joe Stringer <joe@cilium.io> daemon/policy: Reduce overhead of policy deletion This reduce the overhead of deleting policies, since it will now only loop through the policies in the repository once instead of twice. We originally found this when some of our clusters started having networking problems where legitimate traffic was randomly dropped on pod startup. After a while, we tracked it down to the main cilium event loop having a bad time, and due to CPU contention, it was unable to keep up with the creation and deletions of policies in the cluster. We grabbed a pprof, and realized that the biggest user of CPU time were "(*Daemon) policyAdd" and "(*Daemon) policyDelete". Overall, we would have expected them to be ~equally costly, and when looking at why, we saw that "(*Daemon) policyDelete" was effectively spending double the amount of CPU time, and that it was calling both "(*Repository) SearchRLocked" and "(*Repository) DeleteByLabelsLocked" for every policy delete; and that they were both ~equally expensive. After some more investigation, we realised that we could omit the call to "(*Repository) SearchRLocked". Signed-off-by: Odin Ugedal <ougedal@palantir.com> Signed-off-by: Odin Ugedal <odin@uged.al> maps/ipcache: add key.Prefix Same as Key.IPNet, but returns a netip.Prefix instead of *net.IPNet. This will be used in a successive commit. Signed-off-by: Tobias Klauser <tobias@cilium.io> daemon: convert Daemon.restoredCIDRs to netip.Prefix This avoids conversions to/from net.IPNet when populating and accessing the restored CIDRs. Signed-off-by: Tobias Klauser <tobias@cilium.io> ip: remove unused IPNetToPrefix This helper function is now unused, remove it. Signed-off-by: Tobias Klauser <tobias@cilium.io> ip: remove deprecated and unused GetCIDRPrefixesFromIPs Last remaining use was removed in commit bbcadc43758b ("treewide: Switch policy CIDR handling to netip"). Signed-off-by: Tobias Klauser <tobias@cilium.io> operator: Fix bucket width for CEP histogram to the documented values In CiliumEndpointSliceDensity histogram buckets configuration option was unset, so defaults were used (they have values form 0 to 10 as seen here: https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#pkg-variables). This change makes the width of the buckets 10 as documented. BUG=254474623 Signed-off-by: Aleksander Mistewicz <amistewicz@google.com> operator: Adjust buckets for CEP queue delay histogram In CiliumEndpointSliceQueueDelay histogram buckets configuration option was unset, so defaults were used (https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#pkg-variables). This change doubles the number of buckets and increases the end of the last bucket to 1 hour as values larger than this can be observed in large clusters. Signed-off-by: Aleksander Mistewicz <amistewicz@google.com> operator: Use hand picked bucket values It gives nicer looking values than the computed version. Signed-off-by: Aleksander Mistewicz <amistewicz@google.com> workflows: aks: enable connectivity test debug logs Signed-off-by: Gilberto Bertin <jibi@cilium.io> workflows: aks: collect sysdumps for each failing test Signed-off-by: Gilberto Bertin <jibi@cilium.io> options: Disable force-local-policy-eval-at-source by default The force-local-policy-eval-at-source flag was introduced in commit c525c755 ("bpf: Continue to enforce policy at source endpoint unless disabled"). It is enabled by default and causes Cilium to always enforce policies at the source when the destination is a local pod. Unfortunately, this flag is also causing issues when both endpoint routes and tunneling are enabled [1] (a configuration that was not possible at the time the flag was introduced). We have enough test coverage (L7 on multiple cloud providers) now to be able to safely disable this flag by default. We can remove it after a couple releases. 1 - https://github.com/cilium/cilium/issues/14657 Signed-off-by: Paul Chaignon <paul@cilium.io> daemon: Deprecate force-local-policy-eval-at-source This should never have been exposed to users in the first place. It also causes issues when set to true, as explained in the previous commit. There are other ways to control if policy enforcement happens at the source or not (enable-endpoint-routes). Signed-off-by: Paul Chaignon <paul@cilium.io> bugtool: add missing bpftool vtep map dump add missing bpftool vtep map dump in cilium bugtool Signed-off-by: Vincent Li <v.li@f5.com> workflows: aks: bump timeout to 60m Some test runs are timing out as each of the 2 connectivity test runs takes about 18/19 minutes. So bump the timeout to 1 hour. Signed-off-by: Gilberto Bertin <jibi@cilium.io> operator: preallocate cnp list backing array The number of returned CiliumClusterwideNetworkPolicies is known in advance, so the preallocation of the backing array will avoid reallocations after the append. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> operator: fix typos in CNP node status gc Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> operator: use GC controller context while patching CNPs Use the context from the GC controller to execute the update queries. Doing so, possible pending queries will be cancelled as soon as the controller context is cancelled. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> operator: clear CNP status nodes if updates disabled When the option `disable-cnp-status-updates` is set to true, no policy enforcement update is tracked in CiliumNetworkPolicies. However, if the option was previously set to false, the field status.nodes still contains the last status of each node when the feature was turned off. Currently, the GC in the cilium operator removes status entries only if the relative node has been turned off. Given that these stale updates may hinder scalability for large clusters, we clean up all those entries at startup if `disable-cnp-status-updates` is set to true. Fixes #20231 Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> operator: add a flag to skip CNP status cleaning at startup When the option `disable-cnp-status-updates` is set to true, the operator, at startup, will garbage collect all stale status nodes updates in CNPs and CCNPs. This new option `skip-cnp-status-startup-cleaning` may be used to skip this clean up so to speed up the operator startup. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> operator: rate limit CNP nodes status clean up When the option `disable-cnp-status-updates` is set to true, the operator, at startup, will garbage collect all stale status nodes updates in CNPs and CCNPs. To avoid an excessive requests rate to the API server, the clean up is rate limited. The requests rate per second and the maximum allowed burst of requests is controlled, respectively, by the two new options `cnp-status-cleanup-qps` and `cnp-status-cleanup-burst`. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> chore: fix typo in enableCNPWatcher comment Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> helm: Do not create Grafana dashboards by default The default in #21181 was true, but not everyone uses Grafana and this was already brought up in a comment in the previous PR that it can cause troubles with the cilium upgrade preflight manifest. Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> test: remove kube-proxy-replacement: probe from upstream tests This option was removed by 691f1c33c9ad and broke all upstream tests. This commit removes this setting as well to make the tests pass. As some tests are failing because KPR is now disabled we need to set the sessionAffinity=true to make the relevant session affinity conformance tests to pass. Fixes: 691f1c33c9ad ("daemon: Remove KPR=probe") Signed-off-by: André Martins <andre@cilium.io> k8s/slim: Add missing fields needed by LB-IPAM This adds: - metav1.Condition, metav1.ConditionStatus - metav1.ObjectMeta.Generation - corev1.IPFamilyPolicy - corev1.IPFamilyPolicyType - corev1.LoadBalancerClass - corev1.Service.{IPFamilyPolicy, LoadBalancerClass} - corev1.ServiceStatus.Condition Signed-off-by: Jussi Maki <jussi@isovalent.com> k8s/resource: Expose the underlying cache.Store in Store[T] To make it easier to partially transition to using Resource[T], expose the underlying cache.Store. Hopefully temporary :fingerscrossed:. Signed-off-by: Jussi Maki <jussi@isovalent.com> k8s: Rename and reuse BGP IP Pool This commit renames the CiliumBGPLoadBalancerIPPool CRD to the CiliumLoadBalancerIPPool so it may be used for load balancers other than those who use BGP. The IP Pool will be used by the operators LB IPAM component, and the contents of the CRD have been updated to match the new requirements. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> operator: Add LB-IPAM This commit adds the LB-IPAM feature. LB-IPAM allows users to specify a set of pools containing one or more CIDRs. Services of type LoadBalancer will receive Ingress IPs from these pools. LB-IPAM is part of the ongoing work to add service announcements to the BGP Control Plane. However, the component is designed to be generic so it can be used by other features as well. Co-authored-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> docs: Add LB-IPAM documentation This commit adds documentation for the LB-IPAM feature. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> test: Move log-gatherer image to Quay Some CI jobs are failing because we are getting rate-limited on docker.io for the log-gatherer image. André copied it to Quay and we can now use that instead of docker.io. Signed-off-by: Paul Chaignon <paul@cilium.io> hubble/metrics: Fix label ordering in Hubble TCP metrics The code setting the flag label value assumes that it's the first label in the slice. If context options are enabled, then it's not true, so one of the context labels incorrectly gets the flag value, and the flag label gets discarded. Fixes: d4d73681026b ("hubble/metrics: Replace panic in contextLabels with error log") Signed-off-by: Anna Kapuscinska <anna@isovalent.com> preflight: Fail 'validate-cnp' check for empty to/from endpoints selector Previously, 'validate-cnp' preflight check would log a verbose warning if it detected a CCNP with an empty toEndpoints/fromEndpoints selector and pass the check with the following output: time="2022-11-03T15:50:04Z" level=info msg="Validation OK!" CiliumClusterwideNetworkPolicy=test-empty-endpointselector time="2022-11-03T15:50:04Z" level=info msg="All CCNPs and CNPs valid!" This could be misleading and tempt the user to ignore the warning. The preflight check will now fail with the following output: time="2022-11-03T16:05:30Z" level=error msg="Unexpected validation error" CiliumClusterwideNetworkPolicy=test-empty-endpointselector error="use of empty toEndpoints/fromEndpoints selector" time="2022-11-03T16:05:30Z" level=error msg="Start hook failed" error="Found invalid CiliumClusterwideNetworkPolicy" function="cilium/cmd.validateCNPCmd.func1.1 (preflight_k8s_valid_cnp.go:41)" subsys=hive time="2022-11-03T16:05:30Z" level=info msg="Stop hook executed" duration="21.858µs" function="pkg/k8s/client.(*compositeClientset).onStop-fm (<autogenerated>:1)" subsys=hive time="2022-11-03T16:05:30Z" level=fatal msg="failed to start: Found invalid CiliumClusterwideNetworkPolicy" Fixes: #17471 Signed-off-by: Tim Horner <timothy.horner@isovalent.com> doc: fixed broken doc link in helm chart Signed-off-by: David Calvert <david@0xdc.me> resource: Fix queue entry coalescing The entries added to the resource's workqueue were added as pointers which messes up the comparisons causing coalescing to not happen. This causes TestResource_Retries to flake sometimes with: Error: Not equal: expected: 5 actual : 10 Test: TestResource_Retries Messages: expected to see 5 retries for update What happens is that the key gets requeued as &updateEntry{key}, which doesn't match with the previous one &updateEntry{key}, so it's effectively a new entry with it's own rate limiting and retry count state and thus we end up seeing more retries than expected. This fixes the issue by adding the entries by value. The comparisons of syncEntry and updateEntry are now trivially correct. The deleteEntry carries the pointer to the last known state of the deleted object, but this is fine since there can only be one such object. Signed-off-by: Jussi Maki <jussi@isovalent.com> build(deps): bump go.etcd.io/etcd/client/v3 from 3.5.5 to 3.5.6 Bumps [go.etcd.io/etcd/client/v3](https://github.com/etcd-io/etcd) from 3.5.5 to 3.5.6. - [Release notes](https://github.com/etcd-io/etcd/releases) - [Changelog](https://github.com/etcd-io/etcd/blob/main/Dockerfile-release.amd64) - [Commits](https://github.com/etcd-io/etcd/compare/v3.5.5...v3.5.6) --- updated-dependencies: - dependency-name: go.etcd.io/etcd/client/v3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> build(deps): bump github.com/hashicorp/consul/api from 1.15.3 to 1.17.0 Bumps [github.com/hashicorp/consul/api](https://github.com/hashicorp/consul) from 1.15.3 to 1.17.0. - [Release notes](https://github.com/hashicorp/consul/releases) - [Changelog](https://github.com/hashicorp/consul/blob/main/CHANGELOG.md) - [Commits](https://github.com/hashicorp/consul/compare/api/v1.15.3...api/v1.17.0) --- updated-dependencies: - dependency-name: github.com/hashicorp/consul/api dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Envoy: Upgrade for SNI enforcement Update Envoy image with: - websocket filters (cilium.network.websocket.client and cilium.network.websocket.server) - use upstream destination address for egress policy enforcement only if listener is an L7 LB listener. This allows listener to tunnel pod traffic while the original destination address is used for policy enforcement rather than the tunnel destination address. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> policy-api: Use Len for IsEmpty Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> policy: Use generated DeepEqual() in PerSelectorPolicy.Equal() Use generated DeepEqual() in PerSelectorPolicy.Equal() instead of reflection. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> policy: Factor out L7ParserType.Merge() Factor out the merging logic of L7ParserTypes and add a unit test. This makes adding new types with more complex merging logic easier in the future. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> policy: Fix comments Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> policy: Allow TLS termination and origination without L7 rules Add new L7ParserType "tls" to be used when TLS termination and/or origination is needed, and when no L7 policy is to be used. Use Envoy TCP proxy for TLS termination and/or origination in this case. artii.herokuapp.com is no more, so tests against it fail. Remove them and unquarantine the TLS test. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> envoy: Add TLS filter chains for TCP proxy Add TLS filter chains so that TLS can be used also with TCP proxy. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> proxylib: Do not log raw policies Policies may contain large sets of TLS certificates, avoid polluting the logs with them. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> envoy: Do not set AutoSNI options Cilium filters already set SNI when available, and Envoy may crash if auto_sni option is used in this case. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> build(deps): bump golang.org/x/tools from 0.2.0 to 0.3.0 Bumps [golang.org/x/tools](https://github.com/golang/tools) from 0.2.0 to 0.3.0. - [Release notes](https://github.com/golang/tools/releases) - [Commits](https://github.com/golang/tools/compare/v0.2.0...v0.3.0) --- updated-dependencies: - dependency-name: golang.org/x/tools dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> helm: Add secret permission for agent This commit is to make sure that cilium agent has required secret permission if gateway api (but not Ingress) is enabled. The original commit 759f7161a925b4e837338bd5c667c1abd8e59452 added the same logic for operator, but missed out agent part. The end-goal is to have ingress and gateway api as independent features, so that users can just enable only what they need. Without this change, gateway API will only work if and only if ingressController.enabled is set and default secret namespace is used (e.g. cilium-secrets). Relates: 759f7161a925b4e837338bd5c667c1abd8e59452 Signed-off-by: Tam Mach <tam.mach@cilium.io> Make fsnotify event more readable. Signed-off-by: yanggang <gang.yang@daocloud.io> datapath: remove unused ENCRYPT_NODE macro It's safe to remove this unused macro. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> bpf: nodeport: reduce scope of macaddr variables The macaddr variables are only needed when updating the neighbour map. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> bpf: nodeport: fine-tune path for delivery to local backend When delivering a packet to its selected backend, we already have a check for whether the backend is local. Also use this path when deciding whether the packet should be passed up to the stack. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> bpf: lb: remove direction argument in lb*_extract_key() It's always CT_EGRESS. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> build(deps): bump google.golang.org/grpc from 1.50.1 to 1.51.0 Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.50.1 to 1.51.0. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](https://github.com/grpc/grpc-go/compare/v1.50.1...v1.51.0) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> .clomonitor: Update CLOMonitor checks exemptions Add dangerous workflow, signed releases and token permissions checks to CLOMonitor exemptions. Signed-off-by: Sandipan Panda <samparksandipan@gmail.com> ingestion/gateway-api: Map backend weight to model This commit is to make sure the weightage value is propagated to internal model. Relates: 58c8aff11062f944e9f3a18569c647c64edd1bc9 Reported-by: Nico Vibert <nicolas.vibert@isovalent.com> Signed-off-by: Tam Mach <tam.mach@cilium.io> Update k8s tests and libraries to v1.26.0-rc.0 Upstream changes included changing Ingress.LoadBalancerStatus from corev1.LoadBalancerStatus to networkingv1.IngressLoadBalancerStatus. This required the addition of 2 new factory funcs to convert slim.LoadBalancerIngress to networkingv1.IngressLoadBalancerIngress and another to convert LoadBalancerStatus to IngressLoadBalancerStatus in the slim client. See: https://github.com/kubernetes/kubernetes/pull/106242 Signed-off-by: Tim Horner <timothy.horner@isovalent.com> ignore auto-generated pkg/k8s/client directories for PR reviews and codeownership Signed-off-by: Tim Horner <timothy.horner@isovalent.com> ctmap: Add missing FromL7LB flag 'FromL7LB' was not added for string conversion when it was added to the map, do it now. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Some tofqdn flags not being parsed Signed-off-by: Carlos Castro <carlos.castro@jumo.world> fqdn: dnsproxy: fix forwarding of the security identity for cluster mesh The commit 44c1def67854 wrongly forwarded only lower 16 bits of the original identity. This might corrupt identities when cluster-id is not zero (as the cluster-id is encoded in bits 16..23 of the identity) and leads to policy drops due to unknown identity, e.g. xx drop (Policy denied) flow 0xd1a7add4 to endpoint 3966, file bpf_lxc.c line 2032, , identity 47657->157516: 10.2.3.223:55853 -> 10.2.3.206:53 udp (Here the security identity 47657 doesn't exist, as it should actually be equal to 0x10000|47657 = 113193.) Fix this by also storing bits 16..23 of the identity in the skb mark according to the datapath ABI, i.e., skb mark should be equal to (id << 16) | (id >> 16). Fixes: 44c1def67854 ("fqdn: dnsproxy: forward the original security identity") Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Anton Protopopov <aspsk@isovalent.com> fqdn: dnsproxy: fix forwarding of the original security identity for TCP In case of TCP this is not enough to do net.Dial + setsockopt(SO_MARK), as in this case TCP SYN will have a wrong identity, e.g.: Policy verdict log: flow 0x7a95a133 local EP ID 393, remote ID 14616, proto 6, egress, action redirect, match L3-L4, 10.244.1.122:42437 -> 10.244.1.120:53 tcp SYN Policy verdict log: flow 0x907eaa19 local EP ID 458, remote ID host, proto 6, ingress, action allow, match L3-Only, 172.19.0.2:56276 -> 10.244.1.120:53 tcp SYN Here the second message has wrong identity (host). We still allow the traffic, as the origin is local host and the coredns is running on the same host, but this will not work for a remote host if ingress policy doesn't allow remote-node identity.) To fix this we need to pass a Control parameter to Dial, so that setsockopt(2) is called before the connect(2). With such a change we now see the correct identity in case of TCP: Policy verdict log: flow 0xeb7902a9 local EP ID 393, remote ID 14616, proto 6, egress, action redirect, match L3-L4, 10.244.1.122:36661 -> 10.244.1.120:53 tcp SYN Policy verdict log: flow 0x4efbc5a0 local EP ID 458, remote ID 41903, proto 6, ingress, action allow, match L3-L4, 172.19.0.2:40508 -> 10.244.1.120:53 tcp SYN Fixes: 44c1def67854 ("fqdn: dnsproxy: forward the original security identity") Signed-off-by: Anton Protopopov <aspsk@isovalent.com> test: Remove flaking test Remove new part of TLS test that keeps flaking in most PRs. Will be added back when flaking is resolved. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> docs: clarifications about CNCF maintainer status Signed-off-by: Liz Rice <liz@lizrice.com> cilium, monitor: Add nat46 and nat64 drop reason Both were missing, so lets fix that. Reported-by: Joe Stringer <joe@cilium.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> cilium, monitor: Add regenerated flow api code Generated code around NAT46/64 drop reason from `make generate-hubble-api`. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> bpf: Also include ICMP traffic for L3-based NAT46/64 Our NAT46x64 engine can handle ICMP/ICMP6 packets for certain types like ICMP_ECHO, ICMP_ECHOREPLY, ICMP_DEST_UNREACH, ICMP_TIME_EXCEEDED and ICMP_PARAMETERPROB. Therefore, consider them under stateless NAT. Example with GW under XDP and tc BPF: [...] 12:13:26.269252 IP6 64:ff9b::101:102 > 64:ff9b::c0a8:20c: ICMP6, echo request, seq 1, length 64 12:13:26.269916 IP 1.1.1.2 > 192.168.2.12: ICMP echo request, id 9, seq 1, length 64 12:13:26.269950 IP 192.168.2.12 > 1.1.1.2: ICMP echo reply, id 9, seq 1, length 64 12:13:26.270582 IP6 64:ff9b::c0a8:20c > 64:ff9b::101:102: ICMP6, echo reply, seq 1, length 64 [...] Reported-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> bpf: For stateless nat both src and dst IPv6 addresses must have prefix Extend the check for both src and dst to require the prefix, and only then perform stateless NAT46x64. Reason is that when just the IPv6 dst has it, then we want to perform stateful NAT instead. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> helm: Add relabelings config to ServiceMonitors This is needed to add the node as a label to metrics, or other service discovery meta labels. Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> helm: Configure node label in cilium/hubble relabelings by default Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> fix: correct parsing of multi-option 'key:value's for config options This fixes support for multi-option 'key:value's used for config options when only a single top-level key-value is provided, such as '--api-rate-limit endpoint-create=rate-limit:2/s,rate-burst:4'. Fixes: #22233 Fixes: 070ded019adb ("cmd: Allow more complicated patterns in map string type.") Signed-off-by: Tim Horner <timothy.horner@isovalent.com> operator: move API server shutdown channel to cell constructor The operator API server requires a channel parameter to be used for cancellation. This channel is currently declared as a package-level variable and thus it is initialized only once at package import time. Instead, the channel is closed in a clenup function, every time the operator is about to stop. This beahvior leads to a panic when running controlplane unit tests that need to start and stop the operator repeatedly: panic: close of closed channel [recovered] panic: close of closed channel goroutine 73 [running]: testing.tRunner.func1.2({0x2e1e240, 0x396b1e0}) /usr/local/go/src/testing/testing.go:1396 +0x24e testing.tRunner.func1() /usr/local/go/src/testing/testing.go:1399 +0x39f panic({0x2e1e240, 0x396b1e0}) /usr/local/go/src/runtime/panic.go:884 +0x212 github.com/cilium/cilium/operator/cmd.doCleanup() /home/pippolo/go/src/github.com/cilium/cilium/operator/cmd/root.go:189 +0x3a github.com/cilium/cilium/operator/cmd.registerOperatorHooks.func2({0x7fbf9059f2e0?, 0xc0018dd940}) /home/pippolo/go/src/github.com/cilium/cilium/operator/cmd/root.go:130 +0x65 github.com/cilium/cilium/pkg/hive.Hook.Stop(...) /home/pippolo/go/src/github.com/cilium/cilium/pkg/hive/lifecycle.go:41 github.com/cilium/cilium/pkg/hive.(*DefaultLifecycle).Stop(0xc0005286c0, {0x39add48?, 0xc0000780a8?}) /home/pippolo/go/src/github.com/cilium/cilium/pkg/hive/lifecycle.go:128 +0x2ba github.com/cilium/cilium/pkg/hive.(*Hive).Stop(0xc000594380, {0x39add48, 0xc0000780a8}) /home/pippolo/go/src/github.com/cilium/cilium/pkg/hive/hive.go:247 +0x85 github.com/cilium/cilium/test/controlplane/suite.(*operatorHandle).tearDown(0xc000841500) /home/pippolo/go/src/github.com/cilium/cilium/test/controlplane/suite/operator.go:19 +0x33 github.com/cilium/cilium/test/controlplane/suite.(*ControlPlaneTest).StopOperator(...) /home/pippolo/go/src/github.com/cilium/cilium/test/controlplane/suite/testcase.go:190 To solve the issue, the commit moves the initialization of the channel in the operator cell constructor. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> test/controlplane: add cnp nodes status updates gc test Add a controlplane unit test to verify that the stale policy enforcement updates are deleted from the Status section of CNPs and CCNPs when the related startup GC is enabled. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> images/runtime, go.mod, vendor: update gops to v0.3.26 Release notes: https://github.com/google/gops/releases/tag/v0.3.26 Signed-off-by: Tobias Klauser <tobias@cilium.io> images: update cilium-{runtime,builder} Signed-off-by: Tobias Klauser <tobias@cilium.io> relay: Add Go runtime metrics and process metrics Currently the agent has a GoCollector and ProcessCollector but relay does not, this updates the relay for consistency and enhanced debuggability. Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> mlh: update Jenkins jobs following 1.26 support K8s 1.26 support was added in 70252f41788028bdfeeadef4b2ed5569106b42e5. We have rotated / expanded the Jenkins test jobs as follow: - Changed: Kernel 5.4 on K8s 1.24 (instead of 1.23, triggered on `/test`). - Changed: Kernel 4.19 on K8s 1.25 (instead of 1.24, triggered on `/test`). - Changed: Kernel net-next on K8s 1.26 (instead of 1.25, triggered on `/test`). - Added: Kernel 4.9 on K8s 1.24 (triggered on `/test-missed-k8s`). See the Table of Truth™️ for up to date status on all trigger phrases: https://docs.google.com/spreadsheets/d/1TThkqvVZxaqLR-Ela4ZrcJ0lrTJByCqrbdCjnI32_X0/edit#gid=0 Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> bgp: BGP Control Plane modularization This work converts the BGP Control Plane controller and BGP Route Manager into hive cells, leaving as much of the existing code intact. These cells are now hooked into the agent hive directly. The daemon now takes the Controller as parameter both to preserve the behavior of setting the controller as a field value on the daemon and so the BGP controllers lifecycle events are invoked. Follow up commits can break the package up into more discrete parts to aid in testing the individual components and or mocking them out. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> option: Fix Populate entries using "viper" package. option.DaemonConfig.Populate() must use the passed in '*viper.Viper' instead of 'viper' as a package. Otherwise populated values will be zeroes. This made Cilium Agent to not wait for FQDN proxy results to be plumbed into the datapath before returning the DNS response, which caused test flakes due to test traffic possibly hitting the datapath or Envoy before policy had reached there. Fixes: #22346 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> .github: pin alpine versions to 3.16 in stable branches We don't need to update alpine docker images in stable versions so we should keep with the 3.16 version. Signed-off-by: André Martins <andre@cilium.io> generated eni limits for AWS Cilium does not have defined eni limits for some AWS instance types Signed-off-by: Timur Solodovnikov <tsolodov@gmail.com> docs: update roadmap for graduation application Signed-off-by: Bill Mulligan <billmulligan516@gmail.com> Co-Authored-By: Aditi Ghag <aditi@cilium.io> Revert "bgp: BGP Control Plane modularization" This reverts commit ce075dcbe38df77ff94e3a525e0d97f322333199. Control plane tests fail reliably after this commit. Signed-off-by: Joe Stringer <joe@cilium.io> add policy fuzzers Adds fuzzers that test whether cilium can crash after sanitizing a rule. To test these fuzzers locally, run go test -fuzz=FuzzTestName, for example go test -fuzz=FuzzCiliumNetworkPolicyParse Signed-off-by: AdamKorcz <adam@adalogics.com> Fix 'egressIP' field indentation Signed-off-by: yulng <wei.yang@daocloud.io> build(deps): bump cilium/little-vm-helper Bumps [cilium/little-vm-helper](https://github.com/cilium/little-vm-helper) from 9bb7d6016e00968adff49dae192a0be87d9c3aef to 83d306aeb0b731c4d29f8762f576ff484aa7a69c. - [Release notes](https://github.com/cilium/little-vm-helper/releases) - [Commits](https://github.com/cilium/little-vm-helper/compare/9bb7d6016e00968adff49dae192a0be87d9c3aef...83d306aeb0b731c4d29f8762f576ff484aa7a69c) --- updated-dependencies: - dependency-name: cilium/little-vm-helper dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> ci: force deploy connectivity test pods in successive GKE test steps Use `cilium connectivity test --force-deploy` instead of manually deleting the pods in a separate step. This follows the suggestion given in the respective issue [1] and is also used successfully in other workflows (and even further down in the GKE workflow). [1] https://github.com/cilium/cilium-cli/issues/156#issuecomment-820808129 Ref. https://github.com/cilium/cilium-cli/issues/156 Signed-off-by: Tobias Klauser <tobias@cilium.io> ci: update cilium-cli to v0.12.10 for master, v1.11 and v1.12 workflows v0.12.10 release notes: https://github.com/cilium/cilium-cli/releases/tag/v0.12.10 v0.12.9 release notes: https://github.com/cilium/cilium-cli/releases/tag/v0.12.9 Signed-off-by: Tobias Klauser <tobias@cilium.io> Helm: Resources option for apiserver etcd This patch adds the option to configure the resources of the init container and the container of etcd in the apiserver pods. Signed-off-by: Sven Haardiek <sven.haardiek@uni-muenster.de> bugtool: Add 'cilium policy get' and 'cilium endpoint list' Add output of 'cilium policy get' to bugtool to gain visibility to the state of the policy repository. This may help figure out if missing bpf policy map entries are due to translation from CNP to policy repository, or from policy repository to the bpf policy maps. Add 'cilium endpoint list' to get a concise summary of Cilium endpoints on the node, including their policy enforcement status. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> docs: Make ginkgo install line more specific Specify "v1.16.5 (latest ginkgo version < 2) instead of "latest", as required. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Revert "test: Remove flaking test" Reintroduce the TLS test without HTTP rules, as it turns out this test failed due to Cilium agent command line option breakage that is now fixed in master. This reverts commit a75e24b558703a4f66337ad28849c6a79240166f. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> gateway-api/model: Refactor envoy virtual host Refactor the code to generate envoy virtual host routes from HTTPRoutes. The new code is functionally equivalent to the previous one, but relies on some helper functions to improve readability while taking into account every different scenario: - HTTPS routes - HTTP routes with Direct Response - HTTP routes with single backend - HTTP routes with multiple load-balanced backend Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> removed lb4_services_v2 Signed-off-by: Vishal Choudhary <contactvishaltech@gmail.com> bpf: drop SVC traffic if no backend is available Resolve an issue where an outgoing packet destined for a service will not be dropped if it does not have any backends. Currently we will not return the service if there are no backends for it, meaning we will never drop a packet in this case and instead simply route it through the kernels default routes. Fixes: #21453 Signed-off-by: Michael Aspinwall <maspinwall@google.com> [jwi: wordsmith the patch description] Signed-off-by: Julian Wiedmann <jwi@isovalent.com> bpf: test: add XDP LB test for service without backend Packets that are adressed to a VIP without any backend should be dropped. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> bpf: test: add TC per-packet LB test for service without backend Packets that are adressed to a VIP without any backend should be dropped. As the VIP doesn't get translated, this currently works "by accident" if no matching allow-policy for the VIP is installed. But we actually want to happen this indepedently of policy, with a proper drop reason. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> bpf/tests: fix redundant usage of variable offset Signed-off-by: Sahid Orentino Ferdjaoui <sahid.ferdjaoui@industrialdiscipline.com> k8s: don't consider 4xx a successful interaction While a 404 Not Found or a 409 Conflict can be considered successful interactions with the k8s API, a blanket accept for all 4xx codes is problematic. Since LastSuccessInteraction is exclusively used as an optimisation, we should err on the cautious side: accept the potential increase in heartbeats to avoid missing being unable to effecticely communicate with the k8s API. As an example of how this can go wrong, in #20915 we have an issue around receiving 401 Unauthorized from the EKS control plane. At sufficient scale, we never see a need to run the heartbeat. Running the heartbeat, however, would close and reopen the connections on receiving a 401, and thus restore connectivity to the k8s API. We currently only use the LastSuccessInteraction to as an optimisation to not perform unnecessary k8s API heartbeats, this "metric" (possibly a misnomer) is not used or exposed and changing its semantics is acceptable. Fixes: f2998b0cc472290ec64068ec15510608778fb431 Signed-off-by: David Bimmler <david.bimmler@isovalent.com> Co-authored-by: Sebastian Wicki <gandro@gmx.net> bgp: BGP Control Plane modularization This work converts the BGP Control Plane controller and BGP Route Manager into hive cells, leaving as much of the existing code intact. These cells are now hooked into the agent hive directly. The daemon now takes the Controller as parameter both to preserve the behavior of setting the controller as a field value on the daemon and so the BGP controllers lifecycle events are invoked. Follow up commits can break the package up into more discrete parts to aid in testing the individual components and or mocking them out. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> docs: Remove Google Season of Docs Signed-off-by: Bill Mulligan <billmulligan516@gmail.com> operator: Avoid spamming logs with entire identity object Printing the entire structure of the identity in the log makes the log hard to read, so only the uid of the identity object is printed Fixes: #21900 Signed-off-by: yanru.lv <yanru.lv@daocloud.io> docs: describe Cilium Feature Proposals Signed-off-by: Liz Rice <liz@lizrice.com> pkg/clustermesh: expose configuration field to set key deletion delay Exposing this configuration field will allow to tune these deletion delays in unit tests as otherwise, if they are not set, they will default to 30 seconds (defaults.NodeDeleteDelay). Signed-off-by: André Martins <andre@cilium.io> pkg/clustermesh: rewrite services_test to avoid flakes This commit removes some time.Sleep due to the inability to define lower SharedKeyDeleteDelay duration. As the SharedKeyDeleteDelay duration is now set to zero, the tests can be verify if the expected event right after the action performed in the KVStore. Signed-off-by: André Martins <andre@cilium.io> build(deps): bump cilium/little-vm-helper Bumps [cilium/little-vm-helper](https://github.com/cilium/little-vm-helper) from 83d306aeb0b731c4d29f8762f576ff484aa7a69c to 0.0.2. This release includes the previously tagged commit. - [Release notes](https://github.com/cilium/little-vm-helper/releases) - [Commits](https://github.com/cilium/little-vm-helper/compare/83d306aeb0b731c4d29f8762f576ff484aa7a69c...76cb7b131c9fa60f697af29106b529c0a423a17e) --- updated-dependencies: - dependency-name: cilium/little-vm-helper dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> hive: Fix CodeQL lints in regex The CodeQL workflow found two issues with the affected line: - https://github.com/cilium/cilium/security/code-scanning/78 - https://github.com/cilium/cilium/security/code-scanning/79 Tested by running `cilium-agent objects` locally. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> option, datapath: Move AreDevicesRequired to option package This will make it easier to reuse that helper function in other places, such as the loader in this commit. Suggested-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> install/kubernetes: Re-order lines in Makefile.values This will prevent the release manager on missing any changes that need to be done in this file. Signed-off-by: André Martins <andre@cilium.io> update AUTHORS and Documentation Signed-off-by: André Martins <andre@cilium.io> Prepare for release v1.13.0-rc3 Signed-off-by: André Martins <andre@cilium.io> Revert "Prepare for release v1.13.0-rc3" This reverts commit 98fef52be8a635656584fb390e66d4900a6a4d33. Signed-off-by: André Martins <andre@cilium.io> fix note for numWorkerThreads() Signed-off-by: yanggang <gang.yang@daocloud.io> workflow: disable tests pod-to-world and pod-to-cidr Tests using ip 1.1.1.1 have been flaky, and this causes a lot of false positives. Until a more permanent solution is found this commit disables http-to-one-one-one-one, https-to-one-one-one-one, https-to-one-one-one-one-index, cloudflare-1001, cloudflare-1111 tests Signed-off-by: Birol Bilgin <birol@cilium.io> daemon: Close the identityAllocator on shutdown Andre reports an issue where during shutdown in a unit test, the identityAllocator may be closed, then an identity allocation attempt is made by the ipcache async logic, then the ipcache is shut down. We could solve this issue only for unit tests by shifting the identityAllocator shutdown until after the rest of the daemon is shut down and only in the test cleanup logic. However, it seems like identityAllocator cleanup should actually be in the main daemon Close() function now that we have one. Shift it in there to close after the ipcache shuts down, so that the previously-mentioned issue is resolved both for test runs and during the real agent shutdown. Reported-by: André Martins <andre@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> Adding/fixing DNSProxy metrics A few changes are made here: - `cilium_fqdn_semaphore_rejected_total` wasn't being updated correctly, due to an incorrect error check. This is fixed now. - Based on the discussion [here](https://github.com/cilium/cilium/pull/19992/commits/403790a7a9b778eebca1605856a8919bcae60812#r936034590), the field `scope:datapathTime` in `cilium_proxy_upstream_reply_seconds` was split into two different scopes: `policyGenerationTime` (for updating the DNS caches and policy caches) and `datapathTime` (which include the async policy map updates and the identity cache updates). Signed-off-by: Rahul Joshi <rkjoshi@google.com> .github: add PR labeler for external contributions Filtering PRs from external contributors will allow committers of the Cilium project to give more attention to those PRs and avoid them to get stale. Signed-off-by: André Martins <andre@cilium.io> hubble/metrics: ProcessFlow() is optional for metrics handlers The motivation behind this is to support types that implement OnDecodeFlows directly, where defining a ProcessFlow method would be redundant. Without this, these plugins would need to implement a no-op ProcessFlow() method, because the logic for their metrics handlers will live in OnDecodeFlows. Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> go.mod, vendor: update cloud provider SDK Go modules for December 2022 Monthly update (sort of, the last one was in August) of the cloud provider SDK Go modules using contrib/scripts/go-mod-update-cloud-providers.sh Signed-off-by: Tobias Klauser <tobias@cilium.io> Revert "Update start-release.sh" While doing the RC from `master` branch we don't want to commit all files in a single commit since we want to have a dedicated commit for AUTHORS update and another for the release preparation. Thus we need to revert commit 1f1926dddc860a5bca6ccc5b34f1d8c1c2a35dd9. Signed-off-by: André Martins <andre@cilium.io> contrib/release: add missing -C to select install/kubernetes directory Fixes: 147c66e1ca84 ("install/kubernetes: Re-order lines in Makefile.values") Signed-off-by: André Martins <andre@cilium.io> contrib/release: remove duplicated instructions Remove manual instructions that are already performed by the automation script. Signed-off-by: André Martins <andre@cilium.io> build(deps): bump actions/setup-go from 3.3.1 to 3.4.0 Bumps [actions/setup-go](https://github.com/actions/setup-go) from 3.3.1 to 3.4.0. - [Release notes](https://github.com/actions/setup-go/releases) - [C…
This type should never have been shared between Service and Ingress.
This is part of the base job for #97681 and replaces #101027
/kind cleanup
What this PR does / why we need it:
It copies the LoadBalancer Status definiton to the networking API group, so futurue modification to the LoadBalancerStatus won't affect the Ingress status (which should not be correlated)