Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Falco took so long to write a JSON output log to stdout #1198

Closed
smeeklai opened this issue May 7, 2020 · 16 comments · Fixed by #1296
Closed

Falco took so long to write a JSON output log to stdout #1198

smeeklai opened this issue May 7, 2020 · 16 comments · Fixed by #1296
Labels

Comments

@smeeklai
Copy link

smeeklai commented May 7, 2020

Describe the bug

I know that Falco would take sometime to evaluate and output a log when a certain condition met. For example, I tried to exec into a container and found that Falco took 2-3 mins to output that log to stdout. However, after I've added several custom rules, now Falco took so long to output the logs, or sometime doesn't output the logs at all until I or K8S master delete the pods. I also found that the timestamp that's in the log is actually not so far from when the action was performed. But the time until that log was output to stdout is surprisingly long. You can see in the screenshot, the timestamp represents the time which is taken from Falco (since it's the same), and the receiveTimestamp which represents the time Fluentd received the log. The different is about ~10 mins.

How to reproduce it

Expected behaviour

The time it take the logs to output should be about the same or not much different compared to when I haven't added any custom rules.

Screenshots

image

Environment

  • GKE version 1.14.10-gke.27

  • Falco version: 0.22.1

  • System info:

  • Cloud provider or hardware configuration: GKE
  • OS: COS
  • Kernel: 4.14.138+
  • Installation method: Kubernetes

Additional context

Container runtime version: docker://18.9.7

@leogr
Copy link
Member

leogr commented May 7, 2020

Not sure, but... might it be related to the throttling configuration?

@smeeklai
Copy link
Author

smeeklai commented May 7, 2020

It could be related. I did some more experiments and found that the outputs seem the get buffered although the buffered_outputs option is set to false. With the default throttling configuration, Falco writes outputs to stdout whenever the number of outputs reaches 5. Regardless of how long I waited. For example, I've to issue apk add .... command 5 times and then the outputs are written instantly to stdout.

So now the question becomes that why the outputs get buffered although the configuration is set to false? and how do I make Falco write outputs to stdout instantly?

@smeeklai
Copy link
Author

smeeklai commented May 7, 2020

buffered_outputs: true or false results in the same behaviour I explained above

@fcastello
Copy link

fcastello commented Jun 18, 2020

Any updated about this bug?
I have a similar issue and it's probably related. buffered_outputs doesn't seem to have any impact no matter what value I use. However I start falco as a container, if I start the container as interactive container works OK, events show up in stdout as I generate them, but when I start the container as a daemon mode events show up after a generate a few, which I am suspecting it is buffering them, but if I generate just 1 even doesn't show up in the stout until I stop the container.
Also looking at the code, I am no expert in cpp but in the line https://github.com/falcosecurity/falco/blob/master/userspace/falco/falco_outputs.cpp#L41

falco_outputs::falco_outputs(falco_engine *engine):
	m_falco_engine(engine),
	m_initialized(false),
	m_buffered(true),
	m_json_output(false),
	m_time_format_iso_8601(false),
	m_hostname("")
{

This value m_buffered(true), shouldn't be m_buffered(false),

I have been trying to find where buffered_output bug might be but I am not very familiar with the codebase so haven't found anything yet.

Another possible place where bool buffered_outputs = true; at falco initialization

bool buffered_outputs = true;
but I can find where it's overriden by the config from yaml. Anyway probably I need to look at the code a bit more.

@leogr
Copy link
Member

leogr commented Jun 18, 2020

m_buffered(true) is just the default value.

Anyway, I'm experiencing the same issue. We need to investigate.
/cc @leodido
/cc @fntlnz

@fcastello
Copy link

Is there any config that will make the buffered output flush faster? at least to workaround the issue.
This is a problem specially for systems that have little events generated

@leogr
Copy link
Member

leogr commented Jun 18, 2020

I have also tried:

 -U,--unbuffered               Turn off output buffering to configured outputs.
                               This causes every single line emitted by falco to be flushed,
                               which generates higher CPU usage but is useful when piping those outputs
                               into another process or into a script.

But the problem persists.

Since it happens only when Falco runs inside a container, that lets me think that the problem persists even without the buffered output.

We need some time to dig more into it.

@fcastello
Copy link

fcastello commented Jun 18, 2020

If it helps, if I start the container as an interactive TTY like the following
docker run --rm -p 8765:8765 --interactive --privileged --tty --name falco --volume /var/run/docker.sock:/host/var/run/docker.sock --volume /dev:/host/dev --volume /proc:/host/proc:ro --volume /boot:/host/boot:ro --volume /lib/modules:/host/lib/modules:ro --volume /usr:/host/usr:ro falcosecurity/falco:0.23.0
I don't notice the issue.

but If I start it as a daemon like this
docker run -d -p 8765:8765 --privileged --name falco --volume /var/run/docker.sock:/host/var/run/docker.sock --volume /dev:/host/dev --volume /proc:/host/proc:ro --volume /boot:/host/boot:ro --volume /lib/modules:/host/lib/modules:ro --volume /usr:/host/usr:ro falcosecurity/falco:0.23.0
the problem happens, of course I am using a vanilla configuration for the falco container.

In this last example I generate events like spawning a shell into a container, and all the logs show up at the same time after I send ~10 to ~15 events. The weird part is that no log is generated if I generate just 1 event I waited up to an hour and didn't show in the logs.
Should there be an expiration to flush the messages periodically?
Also trying to find out why my first example works ok and the second doesn't

@fcastello
Copy link

Any news on this? I have deployed it with apt in me regular vms and baremetal, but I want to deploy to GKE and it's an issue as it need to run in a container there.
Is there anything I can help with?

@leogr
Copy link
Member

leogr commented Jul 2, 2020

Hi @fcastello and @smeeklai

I believe I found the root cause of this problem, PTAL 👉 #1296
N.B.: my fix is intended to solve the problem only when buffered_outputs is false since the buffered output is the expected behavior when that option is true, IMHO.

@leogr
Copy link
Member

leogr commented Jul 3, 2020

Hey @fcastello and @smeeklai

Since #1296 got merged, you can try it using the following docker image falcosecurity/falco:master

Let me know if that solved your problem!

Thanks in advance.

@fcastello
Copy link

fcastello commented Jul 7, 2020

@leogr I can't test it yet as the master image has errors.
I am getting this when it starts
/docker-entrypoint.sh: line 20: syntax error in conditional expression: unexpected token ;'
/docker-entrypoint.sh: line 20: syntax error near ;'
/docker-entrypoint.sh: line 20: if [[ ! -z "${SKIP_MODULE_LOAD}"]]; then'

So I will need to wait until someone else fixed that in master image

@leogr
Copy link
Member

leogr commented Jul 8, 2020 via email

@leogr
Copy link
Member

leogr commented Jul 8, 2020

Hey @fcastello

I have fixed that problem by #1307 (already merged into master).
Could you try again now, please?

And sorry for the inconvenience 👼

@fcastello
Copy link

Thank you @leogr I already pulled the image and started using it. The original issue is fixed for me, thank you very much!

@leodido
Copy link
Member

leodido commented Jul 10, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants