Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to GRPC sockets rather than using tcp ports #631

Merged
merged 1 commit into from
Nov 8, 2021

Conversation

saschagrunert
Copy link
Member

@saschagrunert saschagrunert commented Nov 2, 2021

What type of PR is this?

/kind feature

What this PR does / why we need it:

The enricher as well as the bpf recorder require host network for cgroup
determination. This means if we already serve something on the GRPC tcp
ports, then the operator will fail to deploy.

To work around this error-case, we now rely on unix domain sockets
rather than TCP ports.

Which issue(s) this PR fixes:

None

Does this PR have test?

None

Special notes for your reviewer:

Refers to #618 (comment)

Does this PR introduce a user-facing change?

Switched to unix domain sockets for the GRPC servers.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 2, 2021
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 2, 2021
@saschagrunert saschagrunert mentioned this pull request Nov 2, 2021
7 tasks
@codecov-commenter
Copy link

codecov-commenter commented Nov 2, 2021

Codecov Report

Merging #631 (7fd255c) into master (73b1235) will increase coverage by 0.05%.
The diff coverage is 45.45%.

@@            Coverage Diff             @@
##           master     #631      +/-   ##
==========================================
+ Coverage   43.76%   43.82%   +0.05%     
==========================================
  Files          29       29              
  Lines        1565     1570       +5     
==========================================
+ Hits          685      688       +3     
- Misses        834      835       +1     
- Partials       46       47       +1     

@saschagrunert
Copy link
Member Author

/test pull-security-profiles-operator-test-e2e

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 2, 2021
@saschagrunert saschagrunert force-pushed the grpc-sockets branch 2 times, most recently from 2a6f372 to c755643 Compare November 2, 2021 15:22
@saschagrunert
Copy link
Member Author

@jhrozek @JAORMX PTAL

@JAORMX
Copy link
Contributor

JAORMX commented Nov 2, 2021

Lgtm. Is the e2e-fedora test working at all? Shall it be rekicked?

@saschagrunert
Copy link
Member Author

Lgtm. Is the e2e-fedora test working at all? Shall it be rekicked?

I re-triggered the CI, let's see why it hung.

Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think it's a nice improvement, just see one inline question about permissions

internal/pkg/daemon/bpfrecorder/bpfrecorder.go Outdated Show resolved Hide resolved
@saschagrunert saschagrunert force-pushed the grpc-sockets branch 3 times, most recently from f034a91 to 4249649 Compare November 3, 2021 08:35
@saschagrunert
Copy link
Member Author

/test pull-security-profiles-operator-test-e2e

@saschagrunert
Copy link
Member Author

/test pull-security-profiles-operator-test-e2e

@saschagrunert saschagrunert force-pushed the grpc-sockets branch 2 times, most recently from 7fd255c to eee3589 Compare November 4, 2021 08:53
@saschagrunert
Copy link
Member Author

/retest
e2e-fedora seems to hang and requires more debugging

@saschagrunert saschagrunert force-pushed the grpc-sockets branch 2 times, most recently from 252ab13 to 9851e04 Compare November 4, 2021 11:41
@jhrozek
Copy link
Contributor

jhrozek commented Nov 4, 2021 via email

@saschagrunert
Copy link
Member Author

/test pull-security-profiles-operator-test-e2e

@saschagrunert
Copy link
Member Author

/retest

@jhrozek
Copy link
Contributor

jhrozek commented Nov 4, 2021

Here's what I'm seeing running the e2e tests locally. The SPO pod restarts (still trying to figure out why) and then:

$RUN kubectl logs spod-8hwd2 security-profiles-operator -nsecurity-profiles-operator
I1104 19:17:02.358924   60463 deleg.go:130] setup "msg"="starting component: spod"  "buildDate"="1980-01-01T00:00:00Z" "compiler"="gc" "gitCommit"="081a559d7cf38c42455fd8329501776b9221820a" "gitTreeState"="clean" "goVersion"="go1.16.9" "libseccomp"="2.5.1" "platform"="linux/amd64" "version"="0.4.0-dev"
I1104 19:17:03.067711   60463 deleg.go:130] controller-runtime/metrics "msg"="metrics server is starting to listen"  "addr"=":8080"
I1104 19:17:03.069201   60463 metrics.go:159] metrics "msg"="Registering metric: seccomp_profile_total"
I1104 19:17:03.069559   60463 metrics.go:159] metrics "msg"="Registering metric: seccomp_profile_audit_total"
I1104 19:17:03.069716   60463 metrics.go:159] metrics "msg"="Registering metric: seccomp_profile_error_total"
I1104 19:17:03.069861   60463 metrics.go:159] metrics "msg"="Registering metric: selinux_profile_total"
I1104 19:17:03.069999   60463 metrics.go:159] metrics "msg"="Registering metric: selinux_profile_audit_total"
I1104 19:17:03.070142   60463 metrics.go:159] metrics "msg"="Registering metric: selinux_profile_error_total"
E1104 19:17:03.070967   60463 deleg.go:144] setup "msg"="running security-profiles-operator" "error"="start metrics grpc server: create listener: listen unix /var/run/grpc/metrics.sock: bind: address already in use"

@jhrozek
Copy link
Contributor

jhrozek commented Nov 4, 2021

huh:

I1104 19:34:02.701789   57516 selinuxpolicy_controller.go:356] selinuxprofile "msg"="Removing policy file" "Request.Name"="errorlogger" "Request.Namespace"="default" "policyPath"="/etc/selinux.d/errorlogger_default.cil"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x172683e]

goroutine 383 [running]:
github.com/crossplane/crossplane-runtime/pkg/event.Warning(0x1cbb0ef, 0x19, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        github.com/crossplane/crossplane-runtime@v0.14.1-0.20210713194031-85b19c28ea88/pkg/event/event.go:63 +0x5e
sigs.k8s.io/security-profiles-operator/internal/pkg/daemon/selinuxprofile.(*ReconcileSP).Reconcile(0xc000163540, 0x1f1e038, 0xc00002d860, 0xc0007a3639, 0x7, 0xc0007a3624, 0xb, 0xc00002d860, 0xc00002d830, 0xc00054fdb0, ...)
        sigs.k8s.io/security-profiles-operator/internal/pkg/daemon/selinuxprofile/selinuxpolicy_controller.go:185 +0x6be
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc000181540, 0x1f1e038, 0xc00002d830, 0xc0007a3639, 0x7, 0xc0007a3624, 0xb, 0xc00002d800, 0x0, 0x0, ...)
        sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:114 +0x247
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000181540, 0x1f1df90, 0xc000162f00, 0x1adff00, 0xc000048220)
        sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:311 +0x305
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000181540, 0x1f1df90, 0xc000162f00, 0x0)
        sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:266 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2(0xc0004b1950, 0xc000181540, 0x1f1df90, 0xc000162f00)
        sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:227 +0x6b
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
        sigs.k8s.io/controller-runtime@v0.10.2/pkg/internal/controller/controller.go:223 +0x425
{"level":"info","ts":1636054442.7021089,"logger":"file-watcher","caller":"daemon/daemon.go:89","msg":"Removing policy","file":"/etc/selinux.d/errorlogger_default.cil"}
{"level":"info","ts":1636054442.8720913,"caller":"semanage/semanage.go:89","msg":"Removing last errorlogger_default module (no other errorlogger_default module exists at another priority)."}
{"level":"info","ts":1636054450.455779,"logger":"policy-installer","caller":"daemon/daemon.go:130","msg":"The operation was successful","operation":"remove - /etc/selinux.d/errorlogger_default.cil"}

@jhrozek
Copy link
Contributor

jhrozek commented Nov 4, 2021

Can you try if the tests work better if you include #640 in the PR?
(btw I am not working tomorrow, I might not be able to check any comments to that PR until Monday..)

@jhrozek
Copy link
Contributor

jhrozek commented Nov 4, 2021

oh and that PR definitely does not fix the issue related to container restart failing with address already in use, just the crash itself..

@saschagrunert
Copy link
Member Author

oh and that PR definitely does not fix the issue related to container restart failing with address already in use, just the crash itself..

We have to remove the socket if the file already exists. Good catch!

The enricher as well as the bpf recorder require host network for cgroup
determination. This means if we already serve something on the GRPC tcp
ports, then the operator will fail to deploy.

To work around this error-case, we now rely on unix domain sockets
rather than TCP ports.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 5, 2021
@saschagrunert
Copy link
Member Author

/test pull-security-profiles-operator-test-e2e

@saschagrunert
Copy link
Member Author

/retest

1 similar comment
@saschagrunert
Copy link
Member Author

/retest

@saschagrunert
Copy link
Member Author

@jhrozek tests are green now :)

@saschagrunert
Copy link
Member Author

@JAORMX @pjbgf PTAL

Copy link
Member

@pjbgf pjbgf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 8, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pjbgf, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [pjbgf,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit ac8a762 into kubernetes-sigs:master Nov 8, 2021
@saschagrunert saschagrunert deleted the grpc-sockets branch November 8, 2021 14:38
@jhrozek
Copy link
Contributor

jhrozek commented Nov 8, 2021

/lgtm
thank you, great work on avoiding the 777 permissions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants