Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GKE - Falco Logs not appearing for up to 10 minutes #1291

Closed
danpop-chainguard opened this issue Jun 30, 2020 · 9 comments · Fixed by #1296
Closed

GKE - Falco Logs not appearing for up to 10 minutes #1291

danpop-chainguard opened this issue Jun 30, 2020 · 9 comments · Fixed by #1296
Labels
kind/bug triage/duplicate Indicates an issue is a duplicate of other open issue.

Comments

@danpop-chainguard
Copy link

danpop-chainguard commented Jun 30, 2020

Describe the bug

GKE - Falco Logs not appearing for up to 10 minutes

How to reproduce it

  1. deploy a gke cluster
  2. use the helm chart to install falco. (helm install falco --set ebpf.enabled=true falcosecurity/falco)
  3. tail one of the falco logs (kubectl logs -f falco-podname
  4. in another terminal try to exec into the pod. or do another falco rule violation.
  5. await logs to show up. (up to 10 minutes)
    Expected behaviour

logs/falco event notifications to be immediate or close to it vs 9-10 minutes..

Screenshots

image

Environment

  • Falco version:
    Falco version: 0.23.0
    Driver version: 96bd9bc560f67742738eb7255aeb4d03046b8045
  • System info:

{
"machine": "x86_64",
"nodename": "gke-popfalco1-default-pool-880376cf-qj8n",
"release": "4.19.109+",
"sysname": "Linux",
"version": "#1 SMP Mon Mar 16 06:27:01 PDT 2020"
}

  • Cloud provider or hardware configuration:
    GKE... Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.8-gke.15". (3 nodes)

Additional context

none

@danpop-chainguard danpop-chainguard added the kind/bug Something isn't working label Jun 30, 2020
@krisnova
Copy link
Contributor

Thanks for the report

Can you share the following

The exact helm chart and associated commands

Recently the community ported to this resource https://github.com/falcosecurity/charts/tree/master/falco#installing-the-chart

Can you confirm you are using the must recent version here?

How are you viewing the logs?

Are you viewing the logs on all nodes in the cluster? Something like

kubectl logs -l app=falco -f

What is your burst set to?

If you look here you can see the following:

Note that 1000 seconds is between 16 and 17 minutes.

# A throttling mechanism implemented as a token bucket limits the
# rate of falco notifications. This throttling is controlled by the following configuration
# options:
#  - rate: the number of tokens (i.e. right to send a notification)
#    gained per second. Defaults to 1.
#  - max_burst: the maximum number of tokens outstanding. Defaults to 1000.
#
# With these defaults, falco could send up to 1000 notifications after
# an initial quiet period, and then up to 1 notification per second
# afterward. It would gain the full burst back after 1000 seconds of
# no activity.

outputs:
  rate: 1
  max_burst: 1000

Probably the quickest way to debug this is to do send a quick batch of alerts consecutively and see if you start getting real-time alerts.

When I demo I usually like to violate 10+ rules as quickly as possible. Usually, I do something like this:

touch /usr/bin/1
touch /usr/bin/2
touch /usr/bin/3
cat /etc/shadow
touch /etc/1
touch /proc/1
cat ~/.bashrc
touch /root/1
touch /root/2
touch /root/3

Let me know of any of that doesn't help you out and I can spin up a GKE cluster tomorrow and tell you exactly what is going on.

@krisnova
Copy link
Contributor

So I looked into the Helm chart (It looks like that is what you are using) and I think you are trying to pull logs on the same pod you are trying to trigger a rule with

Which means I bet changing the values here or triggering more alerts could help get you closer to the state you are looking for.

# The rate corresponds to one message every 30 seconds.
# A burst of 10 messages.
helm install falco --set ebpf.enabled=true,outputs.rate=.03333,outputs.maxRate=10 falcosecurity/falco

@danpop-chainguard
Copy link
Author

seeing the same thing. when attempting to do a simple write to bin/

after using the helm values listed above: helm install falco --set ebpf.enabled=true,outputs.rate=.03333,outputs.maxRate=10 falcosecurity/falco

image

i also attempted:

touch /usr/bin/1
touch /usr/bin/2
touch /usr/bin/3
cat /etc/shadow
touch /etc/1
touch /proc/1
cat ~/.bashrc
touch /root/1
touch /root/2
touch /root/3

@danpop-chainguard
Copy link
Author

danpop-chainguard commented Jun 30, 2020

going to revert to using /usr/bin for my demo. it came up after about 90 or so seconds. /etc/shadow also worked repeatedly.

still a bit sluggish on the initial deploy then seeing logs even with those settings.

image

@leogr
Copy link
Member

leogr commented Jul 1, 2020

It seems to me that this issue duplicates:

Could you confirm that's the same problem?

Basically, the output log is delayed when Falco runs in a container in a non-interactive mode (ie. no TTY) and the activity is low (very few events per second). As soon as other events arrive, previous events appear immediately (so you won't notice this problem in a system that generates many events per second).

I encountered the same problem many times, but AFAIK we don't have a fix yet.
I'm still investigating.

@leogr leogr added triage/duplicate Indicates an issue is a duplicate of other open issue. and removed kind/bug Something isn't working labels Jul 1, 2020
@leogr leogr added the kind/bug label Jul 1, 2020
@poiana poiana removed the needs-kind label Jul 1, 2020
@fntlnz
Copy link
Contributor

fntlnz commented Jul 1, 2020

@danpopSD Investigated this with @leodido

This happens because GKE distinguishes between a TTY and plain stdout while flushing logs.

I achieved real time logs by doing this:

  1. Install Falco
helm install falco falcosecurity/falco --set ebpf.enabled=true
  1. wait for falco to be installed..

  2. Add the tty: true to Falco pods containers (you can quickly apply the patch below)

kubectl patch daemonset falco --patch '{"spec":{"template":{"spec":{"$setElementOrder/containers":[{"name":"falco"}],"containers":[{"name":"falco","tty":true}]}}}}'
  1. Enjoy realtime logs

Nota bene

Related issues on kubernetes: kubernetes/kubernetes#57414

So at this point we need to make a decision here, we might want to enable tty in our manifests and helm chart. However, before doing that we need to investigate what are the boundaries and side effects of doing this.

We need to answer some questions like:

  • does having a tty affects any other Falco functionality? Is it better for Falco? Is it worse?
  • does having a tty have any performance/security impact on Falco?
  • does the tty behave differently on different cloud providers? Maybe this works better only on GKE while it get things worse on other providers.

@leogr
Copy link
Member

leogr commented Jul 1, 2020

This happens because GKE distinguishes between a TTY and plain stdout while flushing logs.

Note that this issue is not GKE specific. I was able to reproduce it with docker (as explained in this comment) and others k8s installations as well.

If we decided to enable TTY by default, we will need to update the docs regarding the docker usage too.

@danpop-chainguard
Copy link
Author

this #1291 (comment) addressed my issue..

@LucaGuerra
Copy link
Contributor

Since this issue pops out from time to time, I'd like to add the quick tl;dr here:

  • add the -U/--unbuffered option to Falco
  • turn on the tty option in the helm chart

more information: https://falco.org/docs/alerts/#standard-output-buffering

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug triage/duplicate Indicates an issue is a duplicate of other open issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants