Skip to content

Conversation

@ozhuraki
Copy link
Contributor

@ozhuraki ozhuraki commented Oct 11, 2022

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented Oct 17, 2022

Codecov Report

Merging #1196 (0e49e48) into main (cb3b2a7) will not change coverage.
The diff coverage is n/a.

❗ Current head 0e49e48 differs from pull request most recent head ef7954c. Consider uploading reports for the commit ef7954c to get more accurate results

@@           Coverage Diff           @@
##             main    #1196   +/-   ##
=======================================
  Coverage   50.78%   50.78%           
=======================================
  Files          41       41           
  Lines        4608     4608           
=======================================
  Hits         2340     2340           
  Misses       2133     2133           
  Partials      135      135           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ozhuraki ozhuraki force-pushed the e2e-operator branch 2 times, most recently from 997b405 to b90e913 Compare November 1, 2022 13:05
@ozhuraki ozhuraki changed the title operator: Add e2e tests for DSA, IAA, DLB, QAT Gen4, WIP operator: Add e2e tests for DSA, IAA Nov 1, 2022
@mythi
Copy link
Contributor

mythi commented Nov 3, 2022

Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": context deadline exceeded

@ozhuraki I've seen this error for the past ~2 rounds of your pushesh. Interestingly, e2e-iaa is OK... something with the operator manager deployment seems to trigger these. Just making wild guesses but long time ago setting GOMAXPROCS=1 env variable to manager helped with some cpu throttling problems.

@ozhuraki ozhuraki force-pushed the e2e-operator branch 2 times, most recently from 7fcb573 to 432d2bd Compare November 8, 2022 19:38
@ozhuraki
Copy link
Contributor Author

ozhuraki commented Nov 8, 2022

Changes:

  • Added a check for NFD readiness
  • Removed CPU limits for the operator under e2e
  • Increased DSA, IAA webhook timeouts
  • Added GOMAXPROCS to the operator e2e deployment
  • Dropped BeforeEeach, AfterEeach in SGX (SGX's AfterEach deleted NFD).
    After the NFD was moved to BeforeSuite, this started to interfere,
    yet to occasionally producing an OK overal pass

@mythi
Copy link
Contributor

mythi commented Nov 9, 2022

Changes:

* Added a check for NFD readiness

The NFD rework can go to a PR of its own. The same for the initImage additions to DSA/IAA resource YAMLs.

* Removed CPU limits for the operator under e2e

* Increased DSA, IAA webhook timeouts

* Added GOMAXPROCS to the operator e2e deployment

Do we understand the problem and why these (and the unconditional 2 minutes sleep) are needed?

* Dropped BeforeEeach, AfterEeach in SGX (SGX's AfterEach deleted NFD).
  After the NFD was moved to BeforeSuite, this started to interfere,
  yet to occasionally producing an OK overal pass

Yes this is the right thing to do. I wonder if we have similar race conditions elsewhere...

Copy link
Contributor

@mythi mythi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of comments before the rebase.

@ozhuraki
Copy link
Contributor Author

Changes:

  • Dropped a timeout after the operator deployment
  • Added a check for the webhook availability
    Apparently a simple curl check (--connect-timeout ...) isn't enough.
    Despite the check succeeding, webhook occasionally timeouts at CR creation
    (even with e2e customizations: increased timeout, no CPU limit).
  • Dropped operator e2e customizations

@ozhuraki
Copy link
Contributor Author

Changes:

  • Changed the order of applied customizations in the default operator deployment
    preventing a clean undeployment.
    Apparently this is a long standing bug leading to sporadic errors
    on operator undeployment (even in the simplest scenario: manual operator deploy/undeploy)
  • Added the deletion of deployed operator resources
  • Increased the terminationGracePeriodSeconds.
    The default also results in errors on undeployment
  • Dropped the namespace from the deletion kubectl (deployments/operator/default)
    This also results in error under k8s/e2e

@mythi
Copy link
Contributor

mythi commented Nov 14, 2022

  • Changed the order of applied customizations in the default operator deployment
    preventing a clean undeployment.
    Apparently this is a long standing bug leading to sporadic errors
    on operator undeployment (even in the simplest scenario: manual operator deploy/undeploy)

kustomize orders its output. What is the error you are seeing? I tried kubectl kustomize with and without this change and the output yaml is the same.

@ozhuraki
Copy link
Contributor Author

Changes:

  • Dropped the changes in the order of the operator customizations

@ozhuraki
Copy link
Contributor Author

Changes:

  • Dropped a namespace at CR creation / deletion

@ozhuraki
Copy link
Contributor Author

Changes:

  • Dropped a TLS check
  • Added AddReadyzCheck() in the operator
    Had a false negative with this earlier

Copy link
Contributor

@mythi mythi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's looking good. a couple of questions still

Closes intel#1230

Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
@ozhuraki
Copy link
Contributor Author

Changes:

  • Dropped a namespace when deploying the operator
  • Removed the deletion of the namespace
  • Corrected a previous mistake with AddReadyzCheck()

Copy link
Contributor

@mythi mythi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ozhuraki thanks! we're finally past the simics woes and the end result looks good

@mythi mythi merged commit afce0ed into intel:main Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants