Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(userspace/falco): fixed grpc server shutdown. #2350

Merged
merged 1 commit into from
Jan 24, 2023
Merged

Conversation

FedeDP
Copy link
Contributor

@FedeDP FedeDP commented Jan 12, 2023

What type of PR is this?

/kind bug

Any specific area of the project related to this PR?

/area engine

What this PR does / why we need it:

This PR fixes the grpc server shutdown.
Nowadays, we support hot-reload by default when ruleset/config is updated; this led to sigabrts when the grpc server was enabled since it was not properly shutdown.

Which issue(s) this PR fixes:

Help with (but probably does not fix) #2342

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

fix(userspace/falco): fix grpc server shutdown

Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
@FedeDP
Copy link
Contributor Author

FedeDP commented Jan 12, 2023

/milestone 0.34.0

@FedeDP
Copy link
Contributor Author

FedeDP commented Jan 12, 2023

Example output:

Thu Jan 12 11:51:42 2023: Falco version: 0.33.1-106+1b83cdc (x86_64)
Thu Jan 12 11:51:42 2023: Falco initialized with configuration file: ../falco.yaml
Thu Jan 12 11:51:42 2023: Loading rules from file ../rules/falco_rules.local.yaml
Thu Jan 12 11:51:42 2023: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Thu Jan 12 11:51:42 2023: gRPC server threadiness equals to 8
Thu Jan 12 11:51:42 2023: Starting health webserver with threadiness 8, listening on port 8765
Thu Jan 12 11:51:42 2023: Enabled event sources: syscall
Thu Jan 12 11:51:42 2023: Opening capture with Kernel module
Thu Jan 12 11:51:42 2023: Starting gRPC server at unix:///run/falco/falco.sock
Thu Jan 12 11:51:44 2023: SIGHUP received, restarting...
Syscall event drop monitoring:
- event drop detected: 0 occurrences
- num times actions taken: 0
Events detected: 0
Rule counts by severity:
Triggered rules by rule name:
Thu Jan 12 11:51:45 2023: Shutting down gRPC server. Waiting until external connections are closed by clients
Thu Jan 12 11:51:45 2023: Waiting for the gRPC threads to complete
Thu Jan 12 11:51:45 2023: Draining all the remaining gRPC events
Thu Jan 12 11:51:45 2023: Shutting down gRPC server complete
Thu Jan 12 11:51:45 2023: Falco version: 0.33.1-106+1b83cdc (x86_64)
Thu Jan 12 11:51:45 2023: Falco initialized with configuration file: ../falco.yaml
Thu Jan 12 11:51:45 2023: Loading rules from file ../rules/falco_rules.local.yaml
Thu Jan 12 11:51:45 2023: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Thu Jan 12 11:51:45 2023: gRPC server threadiness equals to 8
Thu Jan 12 11:51:45 2023: Starting health webserver with threadiness 8, listening on port 8765
Thu Jan 12 11:51:45 2023: Enabled event sources: syscall
Thu Jan 12 11:51:45 2023: Opening capture with Kernel module
Thu Jan 12 11:51:45 2023: Starting gRPC server at unix:///run/falco/falco.sock
Thu Jan 12 11:51:50 2023: SIGHUP received, restarting...
Syscall event drop monitoring:
- event drop detected: 0 occurrences
- num times actions taken: 0
Events detected: 0
Rule counts by severity:
Triggered rules by rule name:
Thu Jan 12 11:51:51 2023: Shutting down gRPC server. Waiting until external connections are closed by clients
Thu Jan 12 11:51:51 2023: Waiting for the gRPC threads to complete
Thu Jan 12 11:51:51 2023: Draining all the remaining gRPC events
Thu Jan 12 11:51:51 2023: Shutting down gRPC server complete
Thu Jan 12 11:51:51 2023: Falco version: 0.33.1-106+1b83cdc (x86_64)
Thu Jan 12 11:51:51 2023: Falco initialized with configuration file: ../falco.yaml
Thu Jan 12 11:51:51 2023: Loading rules from file ../rules/falco_rules.local.yaml
Thu Jan 12 11:51:51 2023: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Thu Jan 12 11:51:51 2023: gRPC server threadiness equals to 8
Thu Jan 12 11:51:51 2023: Starting health webserver with threadiness 8, listening on port 8765
Thu Jan 12 11:51:51 2023: Enabled event sources: syscall
Thu Jan 12 11:51:51 2023: Opening capture with Kernel module
Thu Jan 12 11:51:51 2023: Starting gRPC server at unix:///run/falco/falco.sock
Thu Jan 12 11:51:53 2023: SIGHUP received, restarting...
Syscall event drop monitoring:
- event drop detected: 0 occurrences
- num times actions taken: 0
Events detected: 0
Rule counts by severity:
Triggered rules by rule name:
Thu Jan 12 11:51:53 2023: Shutting down gRPC server. Waiting until external connections are closed by clients
Thu Jan 12 11:51:53 2023: Waiting for the gRPC threads to complete
Thu Jan 12 11:51:53 2023: Draining all the remaining gRPC events
Thu Jan 12 11:51:53 2023: Shutting down gRPC server complete
Thu Jan 12 11:51:53 2023: Falco version: 0.33.1-106+1b83cdc (x86_64)
Thu Jan 12 11:51:53 2023: Falco initialized with configuration file: ../falco.yaml
Thu Jan 12 11:51:53 2023: Loading rules from file ../rules/falco_rules.local.yaml
Thu Jan 12 11:51:53 2023: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Thu Jan 12 11:51:53 2023: gRPC server threadiness equals to 8
Thu Jan 12 11:51:53 2023: Starting health webserver with threadiness 8, listening on port 8765
Thu Jan 12 11:51:53 2023: Enabled event sources: syscall
Thu Jan 12 11:51:53 2023: Opening capture with Kernel module
Thu Jan 12 11:51:53 2023: Starting gRPC server at unix:///run/falco/falco.sock
^CThu Jan 12 11:51:55 2023: SIGINT received, exiting...
Syscall event drop monitoring:
- event drop detected: 0 occurrences
- num times actions taken: 0
Events detected: 0
Rule counts by severity:
Triggered rules by rule name:
Thu Jan 12 11:51:55 2023: Shutting down gRPC server. Waiting until external connections are closed by clients
Thu Jan 12 11:51:55 2023: Waiting for the gRPC threads to complete
Thu Jan 12 11:51:55 2023: Draining all the remaining gRPC events
Thu Jan 12 11:51:55 2023: Shutting down gRPC server complete

@happy-dude
Copy link
Contributor

Commented on the issue ticket #2342 (comment) that we are still seeing this error.

@FedeDP
Copy link
Contributor Author

FedeDP commented Jan 19, 2023

I am still unable to repro :(
I reproduced it quickly by using a master build: https://app.circleci.com/pipelines/github/falcosecurity/falco/3593/workflows/29ce9bfb-73ea-4597-bc25-9f5f7aab3cca/jobs/30903/artifacts

I cannot repro using latest build from this PR: https://app.circleci.com/pipelines/github/falcosecurity/falco/3569/workflows/dc4db6ae-3f60-4245-9125-cfc21d15949e/jobs/30666/artifacts
And neither using a clean build.

Did you make a clean build for Falco?

@FedeDP
Copy link
Contributor Author

FedeDP commented Jan 19, 2023

  • One terminal runs sudo ./falco -c /home/federico/Work/falco/falco.yaml --modern-bpf -r ../..//etc/falco/falco_rules.yaml (in config there is grpc output and server enabled)
  • one terminal runs sudo ./falco-exporter

From master, as soon as i ctrl-C on the falco tab:

Thu Jan 19 09:27:06 2023: Shutting down gRPC server. Waiting until external connections are closed by clients
Thu Jan 19 09:27:07 2023: Waiting for the gRPC threads to complete
Thu Jan 19 09:27:07 2023: grpc: assertion failed: grpc_server_request_registered_call( server_->server(), registered_method, &call_, &context_->deadline_, context_->client_metadata_.arr(), payload, call_cq_->cq(), notification_cq->cq(), this) == GRPC_CALL_OK
Aborted

On this PR:

Thu Jan 19 09:25:43 2023: Shutting down gRPC server. Waiting until external connections are closed by clients
Thu Jan 19 09:25:43 2023: Waiting for the gRPC threads to complete
Thu Jan 19 09:25:43 2023: Draining all the remaining gRPC events
Thu Jan 19 09:25:43 2023: Shutting down gRPC server complete

EDIT: in both cases, falco-exporter leaves with no error though:

sudo ./falco-exporter
2023/01/19 09:27:05 connecting to gRPC server at unix:///run/falco/falco.sock (timeout 2m0s)
2023/01/19 09:27:05 listening on http://:9376/metrics
2023/01/19 09:27:05 connected to gRPC server, subscribing events stream
2023/01/19 09:27:05 ready
2023/01/19 09:27:07 gRPC stream closed

@happy-dude
Copy link
Contributor

happy-dude commented Jan 19, 2023

The latest build was from master (as of 2 days ago) and pulling this PR ontop of that.

Can you clarify what you meant by clean build?

@FedeDP
Copy link
Contributor Author

FedeDP commented Jan 19, 2023

I meant "rm -rf build/ && mkdir build && cmake"!
Or, can you try with the packages built by this PR?

@happy-dude
Copy link
Contributor

I meant "rm -rf build/ && mkdir build && cmake"!

Ah; yup -- that is what I did to build + create the package.

I can take a look at the build specifically generated from this PR once the workday resumes.

@happy-dude
Copy link
Contributor

happy-dude commented Jan 19, 2023

Not sure if this helps narrow the problem down, but the latest SIGABRTs (after a clean build from master @ 306f9ba and this PR pulled in) occurred on ARM64 nodes.

Truth be told, we didn't enable gRPC on for that long to determine if it continues to happen in AMD64 nodes as well.

So, to sum up -- we were thinking that this PR resolved the SIGABRTs on AMD64 nodes and unfortunately are continuing on ARM64 nodes? Is there any way we can examine this further or gather more info regarding this?

If there are a set of commands and/or conf file we can test would, happy to do so and report our findings.

@FedeDP
Copy link
Contributor Author

FedeDP commented Jan 19, 2023

So, to sum up -- we were thinking that this PR resolved the SIGABRTs on AMD64 nodes and unfortunately are continuing on ARM64 nodes? Is there any way we can examine this further or gather more info regarding this?

That would be so strange!
So, you are not running falco-exporter right?

@happy-dude
Copy link
Contributor

Sorry; we are running falco-exporter !

Some more clarification: we do not think the SIGABRT happens because of falco-exporter but when the falco service is interrupted/needs to restart because of config management software (such as during a salt highstate).

@FedeDP
Copy link
Contributor Author

FedeDP commented Jan 19, 2023

Uh it's that we do not release falco-exporter for arm :D ok!

when the falco service is interrupted/needs to restart because of config management software (such as during a salt highstate).

Yep this is what i think too

@happy-dude
Copy link
Contributor

happy-dude commented Jan 19, 2023

Just to add some additional context (Fede and I had a quick chat on Slack):

Copy link
Member

@leogr leogr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we are not sure if this fixes #2342, it helps mitigate similar class of problem. Also, it should be the standard way to shutdown a gRPC serve.

Thus
/approve

PS we will continue investigate the problem describe by #2342 (that might happen only on arm64 platform)

/approve

Copy link
Contributor

@jasondellaluce jasondellaluce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana
Copy link
Contributor

poiana commented Jan 24, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: FedeDP, jasondellaluce, leogr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [FedeDP,jasondellaluce,leogr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana merged commit e64c14a into master Jan 24, 2023
@poiana poiana deleted the fix/grpc_shutdown branch January 24, 2023 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants