Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: bump to GK 3.13 #1019

Merged
merged 9 commits into from
Aug 31, 2023

Conversation

akashsinghal
Copy link
Collaborator

Description

What this PR does / why we need it:

Bumps testing and scripts to use GK 3.13 as default.
Removes enableExternalData flag from GK installation where GK version is specified to be >= 3.11.0

Which issue(s) this PR fixes (optional, using fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when the PR gets merged):

Fixes #995

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Helm Chart Change (any edit/addition/update that is necessary for changes merged to the main branch)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Please also list any relevant details for your test configuration

  • Test A
  • Test B

Checklist:

  • Does the affected code have corresponding tests?
  • Are the changes documented, not just with inline documentation, but also with conceptual documentation such as an overview of a new feature, or task-based documentation like a tutorial? Consider if this change should be announced on your project blog.
  • Does this introduce breaking changes that would require an announcement or bumping the major version?
  • Do all new files have appropriate license header?

Post Merge Requirements

  • MAINTAINERS: manually trigger the "Publish Package" workflow after merging any PR that indicates Helm Chart Change

@akashsinghal akashsinghal self-assigned this Aug 23, 2023
@codecov
Copy link

codecov bot commented Aug 23, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (9b03e99) 58.34% compared to head (1a5d8ab) 58.34%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1019   +/-   ##
=======================================
  Coverage   58.34%   58.34%           
=======================================
  Files          93       93           
  Lines        5541     5541           
=======================================
  Hits         3233     3233           
  Misses       1992     1992           
  Partials      316      316           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@akashsinghal
Copy link
Collaborator Author

Tests failing likely due to new GK ED response cache which has a ttl of 3 min. Working with GK team to see how we can turn off or reduce ttl via helm chart

@akashsinghal
Copy link
Collaborator Author

akashsinghal commented Aug 30, 2023

TODO items to discuss:

  1. Do we plan to still support 3.11 in our testing pipeline?
  2. GK introduced an external data provider response cache in GK 3.13. This cache has a default ttl of 3 min. For our tests, there are some scenarios where we deploy the same subject image twice expecting different behaviors. We would like to reduce our TTL for GK 3.13. Currently this is not exposed as a helm parameter. I've added a check that for GK 3.13, the GK controller manager deployment is patched with an overridden ttl argument. 3.14 will have support for this in helm chart
  3. I have some concerns with the new cache's default ttl being so large in admission scenarios. Currently, if there's a system failure (timeout of request, or some body/request parsing error) then the response will not be cached. This is good. But, if there are errors during verification operations (such as failed to pull from registry, auth failure etc) and these failures are transient, then response will still be cached. This is because the error is surfaced only in the verifier report and not as an actual 'error' in the response. From GK's perspective, this is a valid response (makes sense). But we don't want a user to wait 3 min before trying to deploy the same image again.

@akashsinghal akashsinghal marked this pull request as ready for review August 30, 2023 23:50
@akashsinghal
Copy link
Collaborator Author

TODO items to discuss:

  1. Do we plan to still support 3.11 in our testing pipeline?
  2. GK introduced an external data provider response cache in GK 3.13. This cache has a default ttl of 3 min. For our tests, there are some scenarios where we deploy the same subject image twice expecting different behaviors. We would like to reduce our TTL for GK 3.13. Currently this is not exposed as a helm parameter. I've added a check that for GK 3.13, the GK controller manager deployment is patched with an overridden ttl argument. 3.14 will have support for this in helm chart
  3. I have some concerns with the new cache's default ttl being so large in admission scenarios. Currently, if there's a system failure (timeout of request, or some body/request parsing error) then the response will not be cached. This is good. But, if there are errors during verification operations (such as failed to pull from registry, auth failure etc) and these failures are transient, then response will still be cached. This is because the error is surfaced only in the verifier report and not as an actual 'error' in the response. From GK's perspective, this is a valid response (makes sense). But we don't want a user to wait 3 min before trying to deploy the same image again.

As discussed in community meeting on 8/30/23:

  1. support 3 versions of GK
  2. This is fine temporarily
  3. Will continue to discuss.

Copy link
Collaborator

@susanshi susanshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, left a minor comment

@@ -28,7 +28,7 @@ SLEEP_TIME=1
assert_success
run bash -c "kubectl logs -l app.kubernetes.io/name=ratify -c ratify --tail=-1 -n gatekeeper-system | grep 'cache hit for subject descriptor'"
assert_success

sleep 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion to add a comment here for 3min gatekeeper cache TTL

@susanshi susanshi enabled auto-merge (squash) August 31, 2023 07:24
@susanshi
Copy link
Collaborator

3. en response will still be cached. This is because the error is surfaced only in the verifier report and not as an actual 'error' in the response.

Should we return an "error" instead of a verifier report then?

@susanshi susanshi merged commit b641d33 into ratify-project:main Aug 31, 2023
19 checks passed
bspaans pushed a commit to bspaans/ratify that referenced this pull request Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update test matrix to support Gatekeeper 3.13
2 participants