Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e tests slated for removal when we drop cloud providers #122828

Open
dims opened this issue Jan 17, 2024 · 22 comments
Open

e2e tests slated for removal when we drop cloud providers #122828

dims opened this issue Jan 17, 2024 · 22 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@dims
Copy link
Member

dims commented Jan 17, 2024

When we compile e2e.test with providerless tag, there are a bunch of tests that will get dropped. Here's how to generate the list of tests.

KUBE_PROVIDERLESS=y make WHAT=test/e2e/e2e.test
_output/local/go/bin/e2e.test --list-tests > list-tests-providerless.txt
rm -rf _output/
KUBE_PROVIDERLESS=n make WHAT=test/e2e/e2e.test
_output/local/go/bin/e2e.test --list-tests > list-tests-regular.txt
diff -Bbwu list-tests-regular.txt list-tests-providerless.txt

here's the list of tests on the chopping block:

$ diff -Bbwu list-tests-regular.txt list-tests-providerless.txt | grep -E "^-" | sort
-    k8s.io/kubernetes/test/e2e/cloud/gcp/node/gpu.go:46: [sig-node] gpu Upgrade [Feature:GPUUpgrade] master upgrade should NOT disrupt gpu pod [Feature:GPUMasterUpgrade]
-    k8s.io/kubernetes/test/e2e/cloud/gcp/node/gpu.go:59: [sig-node] gpu Upgrade [Feature:GPUUpgrade] cluster upgrade should be able to run gpu pod after upgrade [Feature:GPUClusterUpgrade]
-    k8s.io/kubernetes/test/e2e/cloud/gcp/node/gpu.go:72: [sig-node] gpu Upgrade [Feature:GPUUpgrade] cluster downgrade should be able to run gpu pod after downgrade [Feature:GPUClusterDowngrade]
-    k8s.io/kubernetes/test/e2e/cloud/gcp/recreate_node.go:98: [sig-cloud-provider-gcp] Recreate [Feature:Recreate] recreate nodes and ensure they function upon restart
-    k8s.io/kubernetes/test/e2e/instrumentation/monitoring/accelerator.go:63: [sig-instrumentation] Stackdriver Monitoring should have accelerator metrics [Feature:StackdriverAcceleratorMonitoring]
-    k8s.io/kubernetes/test/e2e/network/firewall.go:212: [sig-network] Firewall rule control plane should not expose well-known ports
-    k8s.io/kubernetes/test/e2e/network/firewall.go:77: [sig-network] Firewall rule [Slow] [Serial] should create valid firewall rules for LoadBalancer type service
-    k8s.io/kubernetes/test/e2e/network/ingress_gce.go:119: [sig-network] Loadbalancing: L7 GCE [Slow] [Feature:Ingress] should conform to Ingress spec
-    k8s.io/kubernetes/test/e2e/network/ingress_gce.go:164: [sig-network] Loadbalancing: L7 GCE [Slow] [Feature:NEG] should conform to Ingress spec
-    k8s.io/kubernetes/test/e2e/network/ingress_gce.go:179: [sig-network] Loadbalancing: L7 GCE [Slow] [Feature:NEG] should be able to switch between IG and NEG modes
-    k8s.io/kubernetes/test/e2e/network/ingress_gce.go:225: [sig-network] Loadbalancing: L7 GCE [Slow] [Feature:NEG] should be able to create a ClusterIP service
-    k8s.io/kubernetes/test/e2e/network/ingress_gce.go:239: [sig-network] Loadbalancing: L7 GCE [Slow] [Feature:NEG] should sync endpoints to NEG
-    k8s.io/kubernetes/test/e2e/network/ingress_gce.go:283: [sig-network] Loadbalancing: L7 GCE [Slow] [Feature:NEG] rolling update backend pods should not cause service disruption
-    k8s.io/kubernetes/test/e2e/network/ingress_gce.go:342: [sig-network] Loadbalancing: L7 GCE [Slow] [Feature:NEG] should sync endpoints for both Ingress-referenced NEG and standalone NEG
-    k8s.io/kubernetes/test/e2e/network/ingress_gce.go:426: [sig-network] Loadbalancing: L7 GCE [Slow] [Feature:NEG] should create NEGs for all ports with the Ingress annotation, and NEGs for the standalone annotation otherwise
-    k8s.io/kubernetes/test/e2e/network/ingress_scale.go:67: [sig-network] Loadbalancing: L7 Scalability GCE [Slow] [Serial] [Feature:IngressScale] Creating and updating ingresses should happen promptly with small/medium/large amount of ingresses
-    k8s.io/kubernetes/test/e2e/network/loadbalancer.go:1077: [sig-network] LoadBalancers should be able to preserve UDP traffic when server pod cycles for a LoadBalancer service on the same nodes
-    k8s.io/kubernetes/test/e2e/network/loadbalancer.go:1209: [sig-network] LoadBalancers should not have connectivity disruption during rolling update with externalTrafficPolicy=Cluster [Slow]
-    k8s.io/kubernetes/test/e2e/network/loadbalancer.go:1218: [sig-network] LoadBalancers should not have connectivity disruption during rolling update with externalTrafficPolicy=Local [Slow]
-    k8s.io/kubernetes/test/e2e/network/loadbalancer.go:1253: [sig-network] LoadBalancers ESIPP [Slow] should work for type=LoadBalancer
-    k8s.io/kubernetes/test/e2e/network/loadbalancer.go:1310: [sig-network] LoadBalancers ESIPP [Slow] should work for type=NodePort
-    k8s.io/kubernetes/test/e2e/network/loadbalancer.go:1342: [sig-network] LoadBalancers ESIPP [Slow] should only target nodes with endpoints
-    k8s.io/kubernetes/test/e2e/network/loadbalancer.go:1418: [sig-network] LoadBalancers ESIPP [Slow] should work from pods
-    k8s.io/kubernetes/test/e2e/network/loadbalancer.go:143: [sig-network] LoadBalancers should be able to change the type and ports of a TCP service [Slow]
-    k8s.io/kubernetes/test/e2e/network/loadbalancer.go:1476: [sig-network] LoadBalancers ESIPP [Slow] should handle updates to ExternalTrafficPolicy field
-    k8s.io/kubernetes/test/e2e/network/loadbalancer.go:336: [sig-network] LoadBalancers should be able to change the type and ports of a UDP service [Slow]
-    k8s.io/kubernetes/test/e2e/network/loadbalancer.go:530: [sig-network] LoadBalancers should only allow access from service loadbalancer source ranges [Slow]
-    k8s.io/kubernetes/test/e2e/network/loadbalancer.go:614: [sig-network] LoadBalancers should be able to create an internal type load balancer [Slow]
-    k8s.io/kubernetes/test/e2e/network/loadbalancer.go:744: [sig-network] LoadBalancers should have session affinity work for LoadBalancer service with ESIPP on [Slow] [LinuxOnly]
-    k8s.io/kubernetes/test/e2e/network/loadbalancer.go:755: [sig-network] LoadBalancers should be able to switch session affinity for LoadBalancer service with ESIPP on [Slow] [LinuxOnly]
-    k8s.io/kubernetes/test/e2e/network/loadbalancer.go:766: [sig-network] LoadBalancers should have session affinity work for LoadBalancer service with ESIPP off [Slow] [LinuxOnly]
-    k8s.io/kubernetes/test/e2e/network/loadbalancer.go:777: [sig-network] LoadBalancers should be able to switch session affinity for LoadBalancer service with ESIPP off [Slow] [LinuxOnly]
-    k8s.io/kubernetes/test/e2e/network/loadbalancer.go:793: [sig-network] LoadBalancers should handle load balancer cleanup finalizer for service [Slow]
-    k8s.io/kubernetes/test/e2e/network/loadbalancer.go:825: [sig-network] LoadBalancers should be able to create LoadBalancer Service without NodePort and change it [Slow]
-    k8s.io/kubernetes/test/e2e/network/loadbalancer.go:945: [sig-network] LoadBalancers should be able to preserve UDP traffic when server pod cycles for a LoadBalancer service on different nodes
-    k8s.io/kubernetes/test/e2e/network/network_tiers.go:71: [sig-network] Services GCE [Slow] should be able to create and tear down a standard-tier load balancer [Slow]
-    k8s.io/kubernetes/test/e2e/scheduling/nvidia-gpus.go:231: [sig-scheduling] [Feature:GPUDevicePlugin] run Nvidia GPU Device Plugin tests
-    k8s.io/kubernetes/test/e2e/scheduling/nvidia-gpus.go:333: [sig-scheduling] GPUDevicePluginAcrossRecreate [Feature:Recreate] run Nvidia GPU Device Plugin tests with a recreation
-    k8s.io/kubernetes/test/e2e/storage/pd.go:137: [sig-storage] Pod Disks [Feature:StorageProvider] schedule pods each with a PD, delete pod and verify detach [Slow] for RW PD with pod delete grace period of "default (30s)"
-    k8s.io/kubernetes/test/e2e/storage/pd.go:137: [sig-storage] Pod Disks [Feature:StorageProvider] schedule pods each with a PD, delete pod and verify detach [Slow] for RW PD with pod delete grace period of "immediate (0s)"
-    k8s.io/kubernetes/test/e2e/storage/pd.go:137: [sig-storage] Pod Disks [Feature:StorageProvider] schedule pods each with a PD, delete pod and verify detach [Slow] for read-only PD with pod delete grace period of "default (30s)"
-    k8s.io/kubernetes/test/e2e/storage/pd.go:137: [sig-storage] Pod Disks [Feature:StorageProvider] schedule pods each with a PD, delete pod and verify detach [Slow] for read-only PD with pod delete grace period of "immediate (0s)"
-    k8s.io/kubernetes/test/e2e/storage/pd.go:257: [sig-storage] Pod Disks [Feature:StorageProvider] schedule a pod w/ RW PD(s) mounted to 1 or more containers, write to PD, verify content, delete pod, and repeat in rapid succession [Slow] using 1 containers and 2 PDs
-    k8s.io/kubernetes/test/e2e/storage/pd.go:257: [sig-storage] Pod Disks [Feature:StorageProvider] schedule a pod w/ RW PD(s) mounted to 1 or more containers, write to PD, verify content, delete pod, and repeat in rapid succession [Slow] using 4 containers and 1 PDs
-    k8s.io/kubernetes/test/e2e/storage/pd.go:351: [sig-storage] Pod Disks [Feature:StorageProvider] detach in a disrupted environment [Slow] [Disruptive] when node's API object is deleted
-    k8s.io/kubernetes/test/e2e/storage/pd.go:351: [sig-storage] Pod Disks [Feature:StorageProvider] detach in a disrupted environment [Slow] [Disruptive] when pod is evicted
-    k8s.io/kubernetes/test/e2e/storage/pd.go:452: [sig-storage] Pod Disks [Feature:StorageProvider] should be able to delete a non-existent PD without error
-    k8s.io/kubernetes/test/e2e/storage/pd.go:461: [sig-storage] Pod Disks [Feature:StorageProvider] [Serial] attach on previously attached volumes should work
-    k8s.io/kubernetes/test/e2e/storage/persistent_volumes-gce.go:134: [sig-storage] PersistentVolumes GCEPD [Feature:StorageProvider] should test that deleting a PVC before the pod does not cause pod deletion to fail on PD detach
-    k8s.io/kubernetes/test/e2e/storage/persistent_volumes-gce.go:151: [sig-storage] PersistentVolumes GCEPD [Feature:StorageProvider] should test that deleting the PV before the pod does not cause pod deletion to fail on PD detach
-    k8s.io/kubernetes/test/e2e/storage/persistent_volumes-gce.go:167: [sig-storage] PersistentVolumes GCEPD [Feature:StorageProvider] should test that deleting the Namespace of a PVC and Pod causes the successful detach of Persistent Disk
-    k8s.io/kubernetes/test/e2e/storage/regional_pd.go:100: [sig-storage] Regional PD RegionalPD should failover to a different zone when all nodes in one zone become unreachable [Slow] [Disruptive]
-    k8s.io/kubernetes/test/e2e/storage/regional_pd.go:82: [sig-storage] Regional PD RegionalPD should provision storage [Slow]
-    k8s.io/kubernetes/test/e2e/storage/regional_pd.go:86: [sig-storage] Regional PD RegionalPD should provision storage with delayed binding [Slow]
-    k8s.io/kubernetes/test/e2e/storage/regional_pd.go:91: [sig-storage] Regional PD RegionalPD should provision storage in the allowedTopologies [Slow]
-    k8s.io/kubernetes/test/e2e/storage/regional_pd.go:95: [sig-storage] Regional PD RegionalPD should provision storage in the allowedTopologies with delayed binding [Slow]
-    k8s.io/kubernetes/test/e2e/storage/volume_provisioning.go:302: [sig-storage] Dynamic Provisioning DynamicProvisioner [Slow] [Feature:StorageProvider] should provision storage with non-default reclaim policy Retain
-    k8s.io/kubernetes/test/e2e/storage/volume_provisioning.go:349: [sig-storage] Dynamic Provisioning DynamicProvisioner [Slow] [Feature:StorageProvider] should test that deleting a claim before the volume is provisioned deletes the volume.
-    k8s.io/kubernetes/test/e2e/storage/volume_provisioning.go:395: [sig-storage] Dynamic Provisioning DynamicProvisioner [Slow] [Feature:StorageProvider] deletion should be idempotent
-    k8s.io/kubernetes/test/e2e/storage/volume_provisioning.go:465: [sig-storage] Dynamic Provisioning DynamicProvisioner External should let an external dynamic provisioner create and delete persistent volumes [Slow]
-    k8s.io/kubernetes/test/e2e/storage/volume_provisioning.go:529: [sig-storage] Dynamic Provisioning DynamicProvisioner Default should create and delete default persistent volumes [Slow]
-    k8s.io/kubernetes/test/e2e/storage/volume_provisioning.go:553: [sig-storage] Dynamic Provisioning DynamicProvisioner Default should be disabled by changing the default annotation [Serial] [Disruptive]
-    k8s.io/kubernetes/test/e2e/storage/volume_provisioning.go:590: [sig-storage] Dynamic Provisioning DynamicProvisioner Default should be disabled by removing the default annotation [Serial] [Disruptive]
-    k8s.io/kubernetes/test/e2e/storage/volume_provisioning.go:630: [sig-storage] Dynamic Provisioning Invalid AWS KMS key should report an error and create no PV
-    k8s.io/kubernetes/test/e2e/storage/volume_provisioning.go:93: [sig-storage] Dynamic Provisioning DynamicProvisioner [Slow] [Feature:StorageProvider] should provision storage with different parameters

@dims dims added the kind/bug Categorizes issue or PR as related to a bug. label Jan 17, 2024
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 17, 2024
@dims dims changed the title e2e tests slated for removal when we drop cloud providers (from e2e.test when compiled with providerless tag) e2e tests slated for removal when we drop cloud providers Jan 17, 2024
@dims
Copy link
Member Author

dims commented Jan 17, 2024

/sig storage
/sig network
/sig node
/sig cloud-provider
/sig instrumentation
/sig scheduling

Please review the tests and see if you need another way/place to run the same sort of thing so as to not lose coverage!

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 17, 2024
@dims
Copy link
Member Author

dims commented Jan 17, 2024

@dims
Copy link
Member Author

dims commented Jan 17, 2024

cc @elmiko @bridgetkromhout

@dims
Copy link
Member Author

dims commented Jan 17, 2024

xref: kubernetes/enhancements#2395

@pohly
Copy link
Contributor

pohly commented Jan 17, 2024

I'm happy to see that --list-tests is put to good use 😄

@SergeyKanzhelev SergeyKanzhelev added this to Triage in SIG Node Bugs Jan 17, 2024
@elmiko
Copy link
Contributor

elmiko commented Jan 17, 2024

we are discussing this issue at the sig cloud provider meeting today, we did start some early work on allowing providers to run external ccms with the core tests. in practice this looks similar to how sig storage handles the external tests for the csi drivers.

linking a few previous discussions on this topic
#75604
#70194

and an idea i've been hacking on https://hackmd.io/@elmiko/BJGn1SQU3
i have made progress on this external testing topic but have had trouble getting over the final obstacles to share a proof-of-concept. i am definitely amenable to pairing with others to help drive this testing effort home, but i do not have the bandwidth to focus on it fully at the moment.

if there are any followups proposed with sig storage or networking, i would love to join and share our status.

@msau42
Copy link
Member

msau42 commented Jan 17, 2024

This is a little different than the sig-storage "external" tests - those tests abstract out the provider-specific apis/info such that the test case can be provider-independent.

But it does limit the scenarios that can be tested, and these remaining storage test cases are the ones that cannot be easily abstracted. Ideally I would like to find a new home for them, either in sig-cloud-provider or provider-specific repos, but there also remains the issue of who is willing to own, manage and monitor the tests.

@shaneutt
Copy link
Member

/assign @aojea @danwinship

@dgrisonnet
Copy link
Member

for sig-instrumentation test
/assign @dashpole

@dashpole
Copy link
Contributor

Confirmed with @bobbypage that the StackdriverAcceleratorMonitoring test can be removed

@bowei
Copy link
Member

bowei commented Feb 5, 2024

Is this going to skip the test or are we going to remove the code altogether? (and when).

@dims
Copy link
Member Author

dims commented Feb 5, 2024

@bowei will get removed as they have vendored dependencies. shooting for 1.31.

@upodroid
Copy link
Member

upodroid commented Feb 5, 2024

Some of these tests are currently broken when they are running on kops clusters(we are planning on replacing kubeup with kops clusters for e2e testing)

https://testgrid.k8s.io/sig-cluster-lifecycle-kubeup-to-kops#ci-kubernetes-e2e-cos-gce-slow-canary
https://testgrid.k8s.io/sig-cluster-lifecycle-kubeup-to-kops#ci-kubernetes-e2e-al2023-aws-slow-canary

@kannon92
Copy link
Contributor

kannon92 commented Feb 7, 2024

cc @bart0sh for GPU tests

@ndixita
Copy link
Contributor

ndixita commented Feb 7, 2024

/cc @bart0sh

@ndixita
Copy link
Contributor

ndixita commented Feb 7, 2024

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 7, 2024
@SergeyKanzhelev SergeyKanzhelev moved this from Triage to Needs Information in SIG Node Bugs Feb 7, 2024
@SergeyKanzhelev SergeyKanzhelev moved this from Needs Information to Triaged in SIG Node Bugs Feb 7, 2024
@AnishShah
Copy link
Contributor

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Mar 25, 2024
@BenTheElder
Copy link
Member

I've been escalating internally about this thread for a while ...

@seans3 and I are going to look at shifting GCP-specific tests that are not relevant for the core project out to cloud-provider-gcp to maintain coverage for GCP out of tree.

... pending which tests will stay in core and be fixed instead and should not be duplicated out of tree.

xref: #124338 for identifying which networking tests.

As discussed in #124338, #123714, ... some of these tests should be made independent of the cloud provider(s) rather than removing them.

@elmiko
Copy link
Contributor

elmiko commented Apr 19, 2024

+1, i think it would be great if we could distill an interface for these load balancer tests. if we can understand that, we could create a generic test which could be varied by the cloud provider who wants to test it by implementing the interface. then we just need a new test configuration to capture the integration of the cloud provider with the test repo to generate the code for testing.

although i realize you might be talking about the tests that don't have provider specific functionality in them. in which case, i still agree =)

@pohly
Copy link
Contributor

pohly commented Apr 22, 2024

I created

// ProviderInterface contains the implementation for certain
// provider-specific functionality.
type ProviderInterface interface {
FrameworkBeforeEach(f *Framework)
FrameworkAfterEach(f *Framework)
ResizeGroup(group string, size int32) error
GetGroupNodes(group string) ([]string, error)
GroupSize(group string) (int, error)
DeleteNode(node *v1.Node) error
CreatePD(zone string) (string, error)
DeletePD(pdName string) error
CreateShare() (string, string, string, error)
DeleteShare(accountName, shareName string) error
CreatePVSource(ctx context.Context, zone, diskName string) (*v1.PersistentVolumeSource, error)
DeletePVSource(ctx context.Context, pvSource *v1.PersistentVolumeSource) error
CleanupServiceResources(ctx context.Context, c clientset.Interface, loadBalancerName, region, zone string)
EnsureLoadBalancerResourcesDeleted(ctx context.Context, ip, portRange string) error
LoadBalancerSrcRanges() []string
EnableAndDisableInternalLB() (enable, disable func(svc *v1.Service))
}
six years ago for exactly such a purpose. The interface at that time only covered a subset of the provider-specific behavior. More could be added.

@elmiko
Copy link
Contributor

elmiko commented Apr 23, 2024

ah! excellent @pohly , i'm surprised that i missed this. thank you for sharing, it definitely seems like the right place to start iterating on the tests from.

@aojea
Copy link
Member

aojea commented Apr 23, 2024

and we have cloud provider kind to keep coverage, need lgtm kubernetes/test-infra#32495

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

No branches or pull requests