Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e: 009.sh can be flaky #331

Closed
johnbelamaric opened this issue Jul 28, 2023 · 5 comments
Closed

e2e: 009.sh can be flaky #331

johnbelamaric opened this issue Jul 28, 2023 · 5 comments
Assignees
Labels
area/test-infra SIG Release Test Infra
Milestone

Comments

@johnbelamaric
Copy link
Member

In v1.0.1-beta.1, we upgraded Porch to a version that better protects against multiple clients updating a package concurrently by implementing standard Kubernetes optimistic concurrency (i.e., checking resource version).

The 009.sh seems to hit this occasionally, where a controller makes an edit of the SMF Draft after the 009.sh script creates it, and then the 009.sh fails to save its changes. Or, in another case, the controller created a Draft, and then 009.sh tried to copy that Draft rather than editing it directly.

We should understand why a controller is messing with the SMF package at this stage; I believe it should already be fully configured and so no controller should be touching it. However, we can work around this for now by making 009.sh more resilient to this

  1. If a Draft exists, it should use the Draft instead of creating its own with copy. Alternatively it should just make sure the Draft it creates is based on the published revision (change the way it queries for the PR).
  2. If the push, propose, or approve fails, it should try a kpt pkg update and retry.
@vjayaramrh
Copy link
Contributor

FYI below is the output when the 009.sh is run on Fedora34

$ DEBUG=true ./test-infra/e2e/tests/009.sh
+ export HOME=/home/openshift
+ HOME=/home/openshift
+ export E2EDIR=/home/openshift/test-infra/e2e
+ E2EDIR=/home/openshift/test-infra/e2e
+ export TESTDIR=/home/openshift/test-infra/e2e/tests
+ TESTDIR=/home/openshift/test-infra/e2e/tests
+ export LIBDIR=/home/openshift/test-infra/e2e/lib
+ LIBDIR=/home/openshift/test-infra/e2e/lib
+ source /home/openshift/test-infra/e2e/lib/k8s.sh
+ kubeconfig=/home/openshift/.kube/config
+ echo 'Getting kubeconfig for regional'
Getting kubeconfig for regional
++ k8s_get_capi_kubeconfig /home/openshift/.kube/config default regional
++ local kubeconfig=/home/openshift/.kube/config
++ local namespace=default
++ local cluster=regional
+++ mktemp --suffix _kubeconfig-regional
++ local file=/tmp/tmp.q3W2ixvNQe_kubeconfig-regional
++ k8s_wait_exists /home/openshift/.kube/config 600 default secret regional-kubeconfig
++ kubectl --kubeconfig /home/openshift/.kube/config -n default get secret regional-kubeconfig -o 'jsonpath={.data.value}'
++ base64 -d
++ echo /tmp/tmp.q3W2ixvNQe_kubeconfig-regional
+ cluster_kubeconfig=/tmp/tmp.q3W2ixvNQe_kubeconfig-regional
+ echo 'Getting pod for SMF in cluster regional'
Getting pod for SMF in cluster regional
++ kubectl --kubeconfig /tmp/tmp.q3W2ixvNQe_kubeconfig-regional get pods -l name=smf-regional -n free5gc-cp
++ grep smf
++ head -1
++ cut -d ' ' -f 1
+ smf_pod_id=smf-regional-5d55cb7d9b-ch4zx
+ '[' -z smf-regional-5d55cb7d9b-ch4zx ']'
+ echo 'Getting CPU for smf-regional-5d55cb7d9b-ch4zx'
Getting CPU for smf-regional-5d55cb7d9b-ch4zx
++ k8s_get_first_container_requests /tmp/tmp.q3W2ixvNQe_kubeconfig-regional free5gc-cp smf-regional-5d55cb7d9b-ch4zx cpu
++ local kubeconfig=/tmp/tmp.q3W2ixvNQe_kubeconfig-regional
++ local namespace=free5gc-cp
++ local pod_id=smf-regional-5d55cb7d9b-ch4zx
++ local resource_type=cpu
++ kubectl --kubeconfig /tmp/tmp.q3W2ixvNQe_kubeconfig-regional get pods smf-regional-5d55cb7d9b-ch4zx -n free5gc-cp -o 'jsonpath={.spec.containers[0].resources.requests.cpu}'
+ current_cpu=100m
+ echo 'Getting memory for smf-regional-5d55cb7d9b-ch4zx'
Getting memory for smf-regional-5d55cb7d9b-ch4zx
++ k8s_get_first_container_requests /tmp/tmp.q3W2ixvNQe_kubeconfig-regional free5gc-cp smf-regional-5d55cb7d9b-ch4zx memory
++ local kubeconfig=/tmp/tmp.q3W2ixvNQe_kubeconfig-regional
++ local namespace=free5gc-cp
++ local pod_id=smf-regional-5d55cb7d9b-ch4zx
++ local resource_type=memory
++ kubectl --kubeconfig /tmp/tmp.q3W2ixvNQe_kubeconfig-regional get pods smf-regional-5d55cb7d9b-ch4zx -n free5gc-cp -o 'jsonpath={.spec.containers[0].resources.requests.memory}'
+ current_memory=128Mi
+ echo 'Current CPU 100m'
Current CPU 100m
+ echo 'Current Memory 128Mi'
Current Memory 128Mi
++ kubectl --kubeconfig /home/openshift/.kube/config get packagevariant regional-free5gc-smf-regional-free5gc-smf -o 'jsonpath={.status.downstreamTargets[0].name}'
+ smf_deployment_pkg=regional-42cae84940985bdf2ecbbf93ac47f1f31fa7baa7
+ echo 'Copying regional-42cae84940985bdf2ecbbf93ac47f1f31fa7baa7'
Copying regional-42cae84940985bdf2ecbbf93ac47f1f31fa7baa7
+ ws=regional-smf-scaling
++ cut -d ' ' -f 1
++ kpt alpha rpkg copy -n default regional-42cae84940985bdf2ecbbf93ac47f1f31fa7baa7 --workspace regional-smf-scaling
Error: Internal error occurred: source revision must be published 
+ smf_pkg_rev=

@vjayaramrh
Copy link
Contributor

FYI Below is the output when I run the 009 script on Ubuntu 20.04

ubuntu@nephio-r1-e2e-vish-ubuntu:~$ ./test-infra/e2e/tests/009.sh
Getting kubeconfig for regional
Getting pod for SMF in cluster regional
Getting CPU for smf-regional-5756f659f-7mh8l
Getting memory for smf-regional-5756f659f-7mh8l
Current CPU 100m
Current Memory 128Mi
Copying regional-321b4715a21d6fdc9c3a6d1766f6bb8c7eb0e630
Copied to regional-93bf39647b100940286b6725387ed02329dbc2d4, pulling
Updating the capacity
[RUNNING] "gcr.io/kpt-fn/search-replace:v0.2.0"
[PASS] "gcr.io/kpt-fn/search-replace:v0.2.0" in 500ms
  Results:
    [info] spec.maxSessions: Mutated field value to "10000"
[RUNNING] "gcr.io/kpt-fn/search-replace:v0.2.0"
[PASS] "gcr.io/kpt-fn/search-replace:v0.2.0" in 500ms
  Results:
    [info] spec.maxNFConnections: Mutated field value to "50"
diff -r /tmp/regional-smf-scaling/capacity.yaml regional-smf-scaling/capacity.yaml
11,12c11,12
<   maxSessions: 500
<   maxNFConnections: 5
---
>   maxSessions: 10000
>   maxNFConnections: 50

Pushing update
[RUNNING] "gcr.io/kpt-fn/apply-replacements:v0.1.1" 
[PASS] "gcr.io/kpt-fn/apply-replacements:v0.1.1"
[RUNNING] "gcr.io/kpt-fn/apply-replacements:v0.1.1" 
[PASS] "gcr.io/kpt-fn/apply-replacements:v0.1.1"
[RUNNING] "gcr.io/kpt-fn/set-namespace:v0.4.1" 
[PASS] "gcr.io/kpt-fn/set-namespace:v0.4.1"
  Results:
    [info]: all namespaces are already "free5gc-cp". no value changed
[RUNNING] "docker.io/nephio/smf-deploy-fn:v1.0.1-beta.1" 
[PASS] "docker.io/nephio/smf-deploy-fn:v1.0.1-beta.1"
[RUNNING] "docker.io/nephio/interface-fn:v1.0.1-beta.1" 
[PASS] "docker.io/nephio/interface-fn:v1.0.1-beta.1"
[RUNNING] "docker.io/nephio/dnn-fn:v1.0.1-beta.1" 
[PASS] "docker.io/nephio/dnn-fn:v1.0.1-beta.1"
[RUNNING] "docker.io/nephio/nad-fn:v1.0.1-beta.1" 
[PASS] "docker.io/nephio/nad-fn:v1.0.1-beta.1"
[RUNNING] "docker.io/nephio/dnn-fn:v1.0.1-beta.1" 
[PASS] "docker.io/nephio/dnn-fn:v1.0.1-beta.1"
[RUNNING] "docker.io/nephio/interface-fn:v1.0.1-beta.1" 
[PASS] "docker.io/nephio/interface-fn:v1.0.1-beta.1"
[RUNNING] "docker.io/nephio/smf-deploy-fn:v1.0.1-beta.1" 
[PASS] "docker.io/nephio/smf-deploy-fn:v1.0.1-beta.1"
Proposing update
regional-93bf39647b100940286b6725387ed02329dbc2d4 failed (Internal error occurred: Operation cannot be fulfilled on packagerevisions.porch.kpt.dev "regional-93bf39647b100940286b6725387ed02329dbc2d4": the object has been modified; please apply your changes to the latest version and try again)
Error: errors:
  Internal error occurred: Operation cannot be fulfilled on packagerevisions.porch.kpt.dev "regional-93bf39647b100940286b6725387ed02329dbc2d4": the object has been modified; please apply your changes to the latest version and try again 

@johnbelamaric
Copy link
Member Author

It looks like that's based on v1.0.1-beta.1, the fixes for the flakiness I think were in v1.0.1.

@gvbalaji gvbalaji added this to the R2-Sprint1 milestone Aug 22, 2023
@johnbelamaric
Copy link
Member Author

this is done

@electrocucaracha
Copy link
Member

@johnbelamaric Maybe we should reconsider to open this or create a new one because there is still some issues on that test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test-infra SIG Release Test Infra
Projects
Status: Done
Development

No branches or pull requests

4 participants