Falco is not showing container information. All container events are coming with name "incomplete" #716

srivastavaabhinav · 2019-07-10T00:44:20Z

What happened:

Falco is not showing container related information. All container events are coming with name "incomplete".

What you expected to happen:

Container events should come with the container information. I know that few events would come without container names due to the new design, however, even when container is running for more than an hour, still information is not present.

How to reproduce it (as minimally and precisely as possible):

In an already running container, we created a shell and then ran the command "nc". It reported the alert with the container name "incomplete". We are seeing other events as well coming out with container information.

Anything else we need to know?:

Environment:

Falco version (use falco --version): 0.15.3
System info
Cloud provider or hardware configuration: AWS/Docker/ECS
OS (e.g: cat /etc/os-release):Amazon Linux 2
Kernel (e.g. uname -a):4.14.123-111.109.amzn2.x86_64
Install tools (e.g. in kubernetes, rpm, deb, from source):rpm
Others:

The text was updated successfully, but these errors were encountered:

mstemm · 2019-07-10T19:27:12Z

Remember that with 0.15.0, docker metadata fetches are now done asynchronously, so the initial process in a container might race and not have container info at the time it runs. You can see this by running falco 0.15.3 and doing repeated calls to docker run appropriate/nc:

$ sudo falco
Tue Jul  9 18:55:22 2019: Falco initialized with configuration file /etc/falco/falco.yaml
Tue Jul  9 18:55:22 2019: Loading rules from file /etc/falco/falco_rules.yaml:
Tue Jul  9 18:55:23 2019: Loading rules from file /etc/falco/falco_rules.local.yaml:
Tue Jul  9 18:55:23 2019: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Tue Jul  9 18:55:23 2019: Starting internal webserver, listening on port 8765
18:55:26.271688772: Notice Network tool launched in container (user=root command=nc parent_process=<NA> container_id=08f38e671b88 container_name=incomplete image=incomplete:incomplete)
18:56:04.489592922: Notice Network tool launched in container (user=root command=nc parent_process=<NA> container_id=033b10771a39 container_name=heuristic_saha image=appropriate/nc:latest)
18:56:09.891225740: Notice Network tool launched in container (user=root command=nc parent_process=<NA> container_id=71027961b8d8 container_name=incomplete image=incomplete:incomplete)
18:56:11.484247160: Notice Network tool launched in container (user=root command=nc parent_process=<NA> container_id=5e00d4f5b35d container_name=incomplete image=incomplete:incomplete)
18:56:13.718492097: Notice Network tool launched in container (user=root command=nc parent_process=<NA> container_id=c37117e7fe0c container_name=incomplete image=incomplete:incomplete)

It sounded like your containers are long-lived and you're exec-ing into them to run nc, though, so let's try that. If I first start a container using docker run -it appropriate/nc /bin/sh, and then exec into it via docker exec -it 5aad18c686fb /bin/sh, and then run nc repeatedly, I see reliable complete output:

$ sudo falco
Tue Jul  9 18:59:47 2019: Falco initialized with configuration file /etc/falco/falco.yaml
Tue Jul  9 18:59:47 2019: Loading rules from file /etc/falco/falco_rules.yaml:
Tue Jul  9 18:59:48 2019: Loading rules from file /etc/falco/falco_rules.local.yaml:
Tue Jul  9 18:59:48 2019: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Tue Jul  9 18:59:48 2019: Starting internal webserver, listening on port 8765
18:59:49.961332036: Notice A shell was spawned in a container with an attached terminal (user=root incomplete (id=5aad18c686fb) shell=sh parent=<NA> cmdline=sh terminal=34816)
19:00:14.454585820: Notice Network tool launched in container (user=root command=nc parent_process=sh container_id=5aad18c686fb container_name=heuristic_pascal image=appropriate/nc:latest)
19:00:16.506453114: Notice Network tool launched in container (user=root command=nc parent_process=sh container_id=5aad18c686fb container_name=heuristic_pascal image=appropriate/nc:latest)
19:00:17.713325261: Notice Network tool launched in container (user=root command=nc parent_process=sh container_id=5aad18c686fb container_name=heuristic_pascal image=appropriate/nc:latest)

(the first incomplete is from the initial docker run command)

I'll try ECS next to see if it changes anything.

mstemm · 2019-07-10T19:59:50Z

I created a single-node ECS cluster of the type Linux + EC2 e.g. not fargate, and then created a task that used appropriate/nc to run long-lived. Here's the entrypoint from docker inspect:

"Entrypoint": [
                "/bin/sh",
                "-c",
                "while true; do sleep 60; done"
            ],

I then ran falco directly on the ec2 host, using a container, with this command line:

docker run \
  --interactive \
  --privileged \
  --tty \
  --name falco \
  --volume /var/run/docker.sock:/host/var/run/docker.sock \
  --volume /dev:/host/dev \
  --volume /proc:/host/proc:ro \
  --volume /boot:/host/boot:ro \
  --volume /lib/modules:/host/lib/modules:ro \
  --volume /usr:/host/usr:ro \
  falcosecurity/falco:0.15.3

Finally, I exec'd into the appropriate/nc container via docker exec -it de3ab0a71e55 /bin/sh and ran nc. I see appropriate falco alerts:

2019-07-10T19:54:28+0000: Falco initialized with configuration file /etc/falco/falco.yaml
2019-07-10T19:54:28+0000: Loading rules from file /etc/falco/falco_rules.yaml:
2019-07-10T19:54:28+0000: Loading rules from file /etc/falco/falco_rules.local.yaml:
2019-07-10T19:54:28+0000: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
2019-07-10T19:54:29+0000: Starting internal webserver, listening on port 8765
2019-07-10T19:54:29.016459000+0000: Informational Container with sensitive mount started (user=<NA> command=container:0dac4f446ce0 ecs-agent (id=0dac4f446ce0) image=amazon/amazon-ecs-agent:latest mounts=/var/lib/ecs/data:/data::true:rprivate,/var/lib/ecs:/var/lib/ecs::true:rprivate,/run/docker/plugins:/run/docker/plugins:ro:false:rprivate,/etc/pki:/etc/pki:ro:false:rprivate,/usr/lib/docker/plugins:/usr/lib/docker/plugins:ro:false:rprivate,/lib64:/lib64:ro:false:rprivate,/var/log/ecs:/log::true:rprivate,/var/cache/ecs:/var/cache/ecs::true:rprivate,/sys/fs/cgroup:/sys/fs/cgroup::true:rprivate,/proc:/host/proc:ro:false:rprivate,/sbin:/sbin:ro:false:rprivate,/var/run:/var/run::true:rprivate,/etc/ecs:/etc/ecs::true:rprivate,/etc/docker/plugins:/etc/docker/plugins:ro:false:rprivate,/usr/lib:/usr/lib:ro:false:rprivate,/lib:/lib:ro:false:rprivate,/usr/lib64:/usr/lib64:ro:false:rprivate)
2019-07-10T19:54:29.024269000+0000: Informational Privileged container started (user=<NA> command=container:00402b3d7c11 falco (id=00402b3d7c11) image=falcosecurity/falco:0.15.3)
2019-07-10T19:55:36.276987162+0000: Notice A shell was spawned in a container with an attached terminal (user=root ecs-appropriate-nc-3-appropriate-nc-while-loop-80a684b5b9e388e9c501 (id=de3ab0a71e55) shell=sh parent=docker-runc cmdline=sh terminal=34816)
2019-07-10T19:55:37.283565213+0000: Notice Network tool launched in container (user=root command=nc parent_process=sh container_id=de3ab0a71e55 container_name=ecs-appropriate-nc-3-appropriate-nc-while-loop-80a684b5b9e388e9c501 image=appropriate/nc:latest)

So it looks like it's working to me.

Can you describe exactly how you're running falco? Are you running it on the host or as a task. If it's as a task can you paste the task definition as json? You'll want to make sure you have all the appropriate mounts set up.

fntlnz · 2019-07-11T09:54:21Z

/triage needs-information

billyshambrook · 2019-07-11T18:10:59Z

Hey @mstemm thanks for taking a look into this. We have been able to reproduce the issue with a limited number of variables.

Create an EC2 instance using the latest ECS Optimized AMI in us-east-1 - amzn2-ami-ecs-hvm-2.0.20190614-x86_64-ebs (ami-02507631a9f7bc956)
Install Falco using RHEL instructions.
Run Falco agent.
Run ubuntu docker container, install nmap and run.

This produces the following log line in /var/log/messages

Notice Network tool launched in container (user=root command=nmap parent_process=bash container_id=1aa8f3e7122c container_name=incomplete image=incomplete:incomplete)

We have observed that if the container was started before running falco agent, the image metadata will be correctly set, but it container is started after falco agent it is always incomplete.

Thanks!!

billyshambrook · 2019-07-12T14:23:31Z

Some more information I have found:

If I reboot the machine and falco starts automatically at boot, the container name is given correctly...

$ reboot
$ docker pull ubuntu
$ sudo docker run -it ubuntu /bin/bash -c 'apt update -y && apt install nmap -y && nmap'
$ sudo tail -n 100 /var/log/messages

...
Jul 12 13:59:09 ip-10-0-10-76 falco: 13:59:09.807425871: Error Package management process launched in container (user=root command=apt update -y container_id=2e6f84ff69ef container_name=incomplete image=incomplete:incomplete)
Jul 12 13:59:13 ip-10-0-10-76 falco: 13:59:13.218422550: Error Package management process launched in container (user=root command=apt install nmap -y container_id=2e6f84ff69ef container_name=sharp_proskuriakova image=ubuntu:latest)
Jul 12 13:59:16 ip-10-0-10-76 falco: 13:59:16.770537484: Notice Network tool launched in container (user=root command=nmap parent_process=<NA> container_id=2e6f84ff69ef container_name=sharp_proskuriakova image=ubuntu:latest)
...

However if I then restart falco service, I get incomplete for all events every time...

$ service falco restart
$ sudo docker run -it ubuntu /bin/bash -c 'apt update -y && apt install nmap -y && nmap'
$ sudo tail -n 100 /var/log/messages

...
Jul 12 14:04:56 ip-10-0-10-11 falco: 14:04:56.816839272: Error Package management process launched in container (user=root command=apt update -y container_id=3db836502370 container_name=incomplete image=incomplete:incomplete)
Jul 12 14:04:59 ip-10-0-10-11 falco: 14:04:59.232496198: Error Package management process launched in container (user=root command=apt install nmap -y container_id=3db836502370 container_name=incomplete image=incomplete:incomplete)
Jul 12 14:05:02 ip-10-0-10-11 falco: 14:05:02.802452727: Notice Network tool launched in container (user=root command=nmap parent_process=<NA> container_id=3db836502370 container_name=incomplete image=incomplete:incomplete)
...

Here's also the output of falco --support | jq .system_info

{
  "machine": "x86_64",
  "nodename": "ip-10-0-10-11.ec2.internal",
  "release": "4.14.123-111.109.amzn2.x86_64",
  "sysname": "Linux",
  "version": "#1 SMP Mon Jun 10 19:37:57 UTC 2019"
}

mfdii · 2019-07-15T15:43:48Z

@gnosek Is looking at this. It appears to be a bug in the container metadata fetch in the sysdig libraries.

mfdii · 2019-07-17T13:16:04Z

Ok, I've traced this down to how we daemonize the falco process. This bug doesn't happen if the --daemon option isn't passed.

Basically what's happening is that when we daemonize, and eventually kill the parent falco process, we kill the thread linsinsp creates to fetch metadata. Then the new daemonized process never respawns the metadata fetch thread.

We create the inspector object before we create the daemon in order to have any errors in creating the inspector object sent to the foregrounnd. One idea is to move the inspector->open to be executed after we daemonize but then errors will not be sent to the foreground.

I'm trying to track down the best solution.

billyshambrook · 2019-07-17T13:40:10Z

Thanks @mfdii, if we ran falco within a docker container would I be right in assuming it wouldn't be ran with the daemon flag?

mfdii · 2019-07-17T13:50:06Z

That’s correct. As a container the process is not daemonized.

…

On Jul 17, 2019, at 8:40 AM, Billy Shambrook ***@***.***> wrote: Thanks @mfdii, if we ran falco within a docker container would I be right in assuming it wouldn't be ran with the daemon flag? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

fntlnz · 2019-07-18T21:16:10Z

So, I don't think we need to fork there. Just doing the parent pid association with setsid should be fine for what we want to achieve, in that way we don't lose the context of this process (with its threads).

I will try to come up with that fix

/assign @fntlnz

fntlnz · 2019-07-18T21:29:26Z

OK tried the road above, it's not practical, probably the best option now is to make so that we reassign the parent threads or make it smart and start those threads when it's doing the daemon thing.

fntlnz · 2019-07-18T22:49:01Z

I've been able to reproduce the issue here, then found that we start the capturing before the fork is done, just moving the open logic after the fork fixed the problem. I sent a PR with a fix @mfdii @srivastavaabhinav can you please try it?

fntlnz · 2019-07-19T14:24:50Z

@srivastavaabhinav in this comment you can find the compiled rpm and deb packages to test: #731 (comment)

srivastavaabhinav · 2019-07-19T16:51:37Z

@fntlnz thanks! Let me look into it.

srivastavaabhinav added the kind/bug label Jul 10, 2019

poiana added the triage/needs-information Indicates an issue needs more information in order to work on it. label Jul 11, 2019

poiana assigned fntlnz Jul 18, 2019

fntlnz mentioned this issue Jul 18, 2019

fix: do the inspector after forking for daemon mode #731

Merged

leodido closed this as completed in #731 Jul 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Falco is not showing container information. All container events are coming with name "incomplete" #716

Falco is not showing container information. All container events are coming with name "incomplete" #716

srivastavaabhinav commented Jul 10, 2019

mstemm commented Jul 10, 2019

mstemm commented Jul 10, 2019 •

edited

fntlnz commented Jul 11, 2019

billyshambrook commented Jul 11, 2019

billyshambrook commented Jul 12, 2019

mfdii commented Jul 15, 2019

mfdii commented Jul 17, 2019

billyshambrook commented Jul 17, 2019

mfdii commented Jul 17, 2019 via email

fntlnz commented Jul 18, 2019

fntlnz commented Jul 18, 2019

fntlnz commented Jul 18, 2019

fntlnz commented Jul 19, 2019

srivastavaabhinav commented Jul 19, 2019

Falco is not showing container information. All container events are coming with name "incomplete" #716

Falco is not showing container information. All container events are coming with name "incomplete" #716

Comments

srivastavaabhinav commented Jul 10, 2019

mstemm commented Jul 10, 2019

mstemm commented Jul 10, 2019 • edited

fntlnz commented Jul 11, 2019

billyshambrook commented Jul 11, 2019

billyshambrook commented Jul 12, 2019

mfdii commented Jul 15, 2019

mfdii commented Jul 17, 2019

billyshambrook commented Jul 17, 2019

mfdii commented Jul 17, 2019 via email

fntlnz commented Jul 18, 2019

fntlnz commented Jul 18, 2019

fntlnz commented Jul 18, 2019

fntlnz commented Jul 19, 2019

srivastavaabhinav commented Jul 19, 2019

mstemm commented Jul 10, 2019 •

edited