Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Falco pods crashing with "free(): corrupted unsorted chunks" #1656

Closed
ecology-chris opened this issue May 18, 2021 · 12 comments
Closed

Falco pods crashing with "free(): corrupted unsorted chunks" #1656

ecology-chris opened this issue May 18, 2021 · 12 comments

Comments

@ecology-chris
Copy link

Pods are scheduled, star up, then suddenly crash with error: free(): corrupted unsorted chunks

This is currently only happening in just one cluster.

Expected behaviour

We have the same falco chart installed on a similar cluster with no issues.

Environment

  • Falco version: 0.27.0 (helm chart 1.7.10)
  • Cloud provider or hardware configuration:EKS
  • OS/Kernel: falco_amazonlinux2_4.14.219-161.340.amzn2.x86_64_1.ko
  • Installation method:helm

This would seem to be a problem with the one particular cluster, but the error isn't giving us much to work with. Any advice or guidance is appreciated.

@leogr
Copy link
Member

leogr commented May 18, 2021

Have you tried 0.28.1 ?

Could you also provide the relevant part of the log, please?

@ecology-chris
Copy link
Author

I just installed 0.28.1 with helm. I see the driver load, the web server come up, a few findings on different containers, then an error like this before the enter a crash backoff loop and try to restart.

02:04:13.350191969: Debug Falco internal: syscall event drop. 66960 system calls dropped in last second. (ebpf_enabled=0 n_drops=66960 n_drops_buffer=66960 n_drops_bug=0 n_drops_pf=0 n_evts=281105) free(): corrupted unsorted chunks

@poiana
Copy link

poiana commented Aug 17, 2021

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@leogr
Copy link
Member

leogr commented Aug 18, 2021

Hey @ecology-chris

sorry for the late reply. Anyways, I was not able to reproduce this issue. Could you provide more details or a reproducible setup?

@poiana
Copy link

poiana commented Sep 17, 2021

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

@jcdecaux-oss
Copy link

Same here only when k8s audit logs is enabled.

Environment

Falco version: 0.29.1 (helm chart 1.15.7)
Cloud provider or hardware configuration: EKS v1.18.9-eks-d1db3c
OS/Kernel: falco_amazonlinux2_4.14.238-182.421.amzn2.x86_64_1.ko
Installation method: helm

$ k logs falco-4t6ln

  • Setting up /usr/src links from host
  • Running falco-driver-loader for: falco version=0.29.1, driver version=17f5df52a7d9ed6bb12d3b1768460def8439936d
  • Running falco-driver-loader with: driver=module, compile=yes, download=yes
  • Unloading falco module, if present
  • Trying to load a system falco module, if present
  • Looking for a falco module locally (kernel 4.14.238-182.421.amzn2.x86_64)
  • Trying to download a prebuilt falco module from https://download.falco.org/driver/17f5df52a7d9ed6bb12d3b1768460def8439936d/falco_amazonlinux2_4.14.238-182.421.amzn2.x86_64_1.ko
  • Download succeeded
  • Success: falco module found and inserted
    Mon Sep 20 15:55:06 2021: Falco version 0.29.1 (driver version 17f5df52a7d9ed6bb12d3b1768460def8439936d)
    Mon Sep 20 15:55:06 2021: Falco initialized with configuration file /etc/falco/falco.yaml
    Mon Sep 20 15:55:06 2021: Loading rules from file /etc/falco/falco_rules.yaml:
    Mon Sep 20 15:55:06 2021: Loading rules from file /etc/falco/falco_rules.local.yaml:
    Mon Sep 20 15:55:06 2021: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
    Mon Sep 20 15:55:06 2021: Loading rules from file /etc/falco/rules.d/custom-lists.yaml:
    Mon Sep 20 15:55:06 2021: Loading rules from file /etc/falco/rules.d/custom-macros.yaml:
    Mon Sep 20 15:55:06 2021: Loading rules from file /etc/falco/rules.d/custom-rules.yaml:
    Mon Sep 20 15:55:07 2021: Starting internal webserver, listening on port 8765
    free(): corrupted unsorted chunks

/var/log/messages

Sep 20 15:54:15 ip-10-235-221-131 kernel: traps: falco[28701] general protection ip:7f859b294611 sp:7ffd0db3d650 error:0 in libc-2.28.so[7f859b294000+148000]
Sep 20 15:54:15 ip-10-235-221-131 kernel: falco: deallocating consumer ffff888aff4f0000
Sep 20 15:54:15 ip-10-235-221-131 kernel: falco: no more consumers, stopping capture

@FedeDP
Copy link
Contributor

FedeDP commented Sep 22, 2021

Hi @jcdecaux-oss !
Are you able to share a coredump? (if your node runs systemd, coredumpctl can help!)
Thank you very much for your efforts :)

@poiana
Copy link

poiana commented Oct 22, 2021

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community.
/close

@poiana poiana closed this as completed Oct 22, 2021
@poiana
Copy link

poiana commented Oct 22, 2021

@poiana: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lzaldivarkt
Copy link

/reopen
Hi @FedeDP, I'm seeing the same issue on a high activity cluster, I managed to get a coredump, you can find it here:
https://motive-shared-public-files.s3.amazonaws.com/falco.coredump.gz
This doesn't happen on any other cluster with the same configuration. And we consistently can reproduce it, it's currently happening in half of this cluster nodes.

Specs:

Running
Version: falco 0.30 on kubernetes 1.20 (deployed via kOps)
OS: Debian GNU/Linux 9.13 (stretch)
Kernel: Linux ip-10-0-97-98 4.9.0-14-amd64 #1 SMP Debian 4.9.246-2 (2020-12-17) x86_64 GNU/Linux

@poiana
Copy link

poiana commented Jul 23, 2022

@lzaldivarkt: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen
Hi @FedeDP, I'm seeing the same issue on a high activity cluster, I managed to get a coredump, you can find it here:
https://motive-shared-public-files.s3.amazonaws.com/falco.coredump.gz
This doesn't happen on any other cluster with the same configuration. And we consistently can reproduce it, it's currently happening in half of this cluster nodes.

Specs:

Running
Version: falco 0.30 on kubernetes 1.20 (deployed via kOps)
OS: Debian GNU/Linux 9.13 (stretch)
Kernel: Linux ip-10-0-97-98 4.9.0-14-amd64 #1 SMP Debian 4.9.246-2 (2020-12-17) x86_64 GNU/Linux

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lzaldivarkt
Copy link

Welp, I can't reopen issues, I'll open a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants