Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backport][v0.9][SURE-7469] Imagescan breaks fleet controller #2181

Closed
0xavi0 opened this issue Feb 27, 2024 · 3 comments
Closed

[backport][v0.9][SURE-7469] Imagescan breaks fleet controller #2181

0xavi0 opened this issue Feb 27, 2024 · 3 comments

Comments

@0xavi0
Copy link
Contributor

0xavi0 commented Feb 27, 2024

SURE-7469

backport of: #2096

Issue description:

Fleet controller in Rancher upstream clusters crashes if the imagen tag does not include the semver range expected when using ImageScan in the fleet.yaml file.

="2024-01-17T18:44:44Z" level=debug msg="DesiredSet - No change(2) gitjob.cattle.io/v1, Kind=GitJob fleet-default/project-test2 for gitjobs fleet-default/project-test2"
E0117 18:44:44.351563       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 1491 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x22b04a0?, 0x3e240f0})
    /go/pkg/mod/k8s.io/apimachinery@v0.25.12/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xfffffffe?})
    /go/pkg/mod/k8s.io/apimachinery@v0.25.12/pkg/util/runtime/runtime.go:49 +0x75
panic({0x22b04a0, 0x3e240f0})
    /usr/lib64/go/1.20/src/runtime/panic.go:884 +0x213
github.com/Masterminds/semver/v3.(*Version).Original(...)
    /go/pkg/mod/github.com/!masterminds/semver/v3@v3.2.1/version.go:259
github.com/rancher/fleet/internal/cmd/controller/controllers/image.semverLatest({0x3d70a50?, 0xc0026ff260?}, {0xc0058ff200, 0x2f, 0x0?})
    /go/src/github.com/rancher/fleet/internal/cmd/controller/controllers/image/image.go:501 +0x105
github.com/rancher/fleet/internal/cmd/controller/controllers/image.latestTag({0xc0075afc90?, 0xc0075afca0?}, {0xc0058ff200?, 0xc009d1a01d?, 0x44?}) 

Business impact:

Fleet goes into a crash and the GitRepos are not updated. 

Troubleshooting steps:

From the customer: 
In artifact registry our latest image was built 20 hours ago and has the tag 0.0.0-52. The one deployed with the initial fleet.yaml file is 0.0.0-44. This fails to update.

Actual behavior:

Fleet controller in Rancher upstream clusters crashes if the imagen tag does not include the semver range expected

Expected behavior:

Fleet controller should not be crashed.

Files, logs, traces:

-GitRepo project-test2: project-test.yaml

-Gitjob log

-Fleet controller log

Additional notes:

As this comment remarks https://github.com/rancher/fleet/pull/413#issuecomment-880228630,  it might lack understanding how to set up the parameter semver ranger in the fleet.yaml manifest.

@0xavi0
Copy link
Contributor Author

0xavi0 commented Feb 29, 2024

QA Template

Solution

Fixes panic issue when working with pre-release image tags and the "*" range.

Testing

Create a imagescan configuration using the "*" target and images with pre-release tags. (0.0.1-10, 0.0.2-20, etc)
It should not crash the controller and, it should not update the image although we push a new one.

Additionally, you should see the message: no available version matching * in the logs

Additional info

@mmartin24
Copy link
Collaborator

Tested this issue along with @0xavi0 and @sbulage in Rancher 2.8.3-rc2, with fleet:103.1.1+up0.9.1-rc.5 and the fix was still not present for the time being

@mmartin24
Copy link
Collaborator

mmartin24 commented Mar 15, 2024

Checked and working in Rancher 2.8.3-rc3 and Fleet 0.9.1-rc.6

System Information
For fresh installation

  • Rancher version: 2.8.3-rc3
  • Fleet Version: 0.9.1-rc.6
  • Kubernetes version: v1.28.7+k3s1

For upgrade installation

  • Rancher version: 2.7.9
  • Fleet Version: 0.8.0
  • Kubernetes version: v1.26.10+k3s2

Installation and Testing

Scenario 1 - Test Fleet controller does not crash when image tag does not include semserver range and check no panic logs are found

  • Deployed local GitRepo using an example with "*" target range. For example this one from Xavi
Repo: https://github.com/0xavi0/fleet-examples/
Branch: test-imagescans
Path: imagescans
  • Wait until completion.
  • Clusters and resources are ready
  • UI displays state "Error" and outputs "ImageScan test-scan is not ready: no available version matching"
  • Fleet controller pod is Ready and does not crash
➜  ~ k get pod -n cattle-fleet-system fleet-controller-7db5cc987b-55d5j 
NAME                                READY   STATUS    RESTARTS   AGE
fleet-controller-7db5cc987b-55d5j   1/1     Running   0          23m
  • No panic logs are found
  • Logs "no available version matching *" are present
➜  ~ k logs -n cattle-fleet-system fleet-controller-7db5cc987b-55d5j | grep "panic"                                  
➜  ~ k logs -n cattle-fleet-system fleet-controller-7db5cc987b-55d5j | grep "no available version matching *" | head -1 
time="2024-03-15T08:55:13Z" level=error msg="error syncing 'fleet-local/imagescan-imagescan-test-imagescans-0': handler image-scan: no available version matching *, requeuing"

Screenshot:
2024-03-15_10-04

Before this happened:
Log error:

Ξ (dhcp117.qa.suse.cz) ~ → k logs -n cattle-fleet-system fleet-controller-56968b86b6-ck29k | grep "panic"
E0315 08:45:45.361898       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
panic({0x22af3c0, 0x3e210b0})
	/usr/lib64/go/1.20/src/runtime/panic.go:884 +0x213
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
panic({0x22af3c0, 0x3e210b0})
	/usr/lib64/go/1.20/src/runtime/panic.go:884 +0x213

Pod crashed:

Ξ (dhcp117.qa.suse.cz) ~ → k get pods -n cattle-fleet-system fleet-controller-56968b86b6-ck29k           
NAME                                READY   STATUS             RESTARTS         AGE
fleet-controller-56968b86b6-ck29k   0/1     CrashLoopBackOff   11 (4m56s ago)   63m

2024-03-15_09-47

Scenario 2 - Check crashed pod, panic logs errors and correct logs appear after upgrade to healthy version

  • Deploy Rancher 2,7.9
  • Deploy same example as above mentioned
  • Observe error
  • Upgrade to Rancher 2.8.3-rc3
  • Wait until full completion
  • Fleet controller pod is Ready and not crashed
  • No panic errors
  • UI displays state "Error" and outputs "ImageScan test-scan is not ready: no available version matching"

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

3 participants