Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Falco is not showing container information. All container events are coming with name "incomplete" #716

Closed
srivastavaabhinav opened this issue Jul 10, 2019 · 14 comments · Fixed by #731
Assignees
Labels
kind/bug triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@srivastavaabhinav
Copy link

What happened:

Falco is not showing container related information. All container events are coming with name "incomplete".

What you expected to happen:

Container events should come with the container information. I know that few events would come without container names due to the new design, however, even when container is running for more than an hour, still information is not present.

How to reproduce it (as minimally and precisely as possible):

In an already running container, we created a shell and then ran the command "nc". It reported the alert with the container name "incomplete". We are seeing other events as well coming out with container information.

Anything else we need to know?:

Environment:

  • Falco version (use falco --version): 0.15.3
  • System info
  • Cloud provider or hardware configuration: AWS/Docker/ECS
  • OS (e.g: cat /etc/os-release):Amazon Linux 2
  • Kernel (e.g. uname -a):4.14.123-111.109.amzn2.x86_64
  • Install tools (e.g. in kubernetes, rpm, deb, from source):rpm
  • Others:
@mstemm
Copy link
Contributor

mstemm commented Jul 10, 2019

Remember that with 0.15.0, docker metadata fetches are now done asynchronously, so the initial process in a container might race and not have container info at the time it runs. You can see this by running falco 0.15.3 and doing repeated calls to docker run appropriate/nc:

$ sudo falco
Tue Jul  9 18:55:22 2019: Falco initialized with configuration file /etc/falco/falco.yaml
Tue Jul  9 18:55:22 2019: Loading rules from file /etc/falco/falco_rules.yaml:
Tue Jul  9 18:55:23 2019: Loading rules from file /etc/falco/falco_rules.local.yaml:
Tue Jul  9 18:55:23 2019: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Tue Jul  9 18:55:23 2019: Starting internal webserver, listening on port 8765
18:55:26.271688772: Notice Network tool launched in container (user=root command=nc parent_process=<NA> container_id=08f38e671b88 container_name=incomplete image=incomplete:incomplete)
18:56:04.489592922: Notice Network tool launched in container (user=root command=nc parent_process=<NA> container_id=033b10771a39 container_name=heuristic_saha image=appropriate/nc:latest)
18:56:09.891225740: Notice Network tool launched in container (user=root command=nc parent_process=<NA> container_id=71027961b8d8 container_name=incomplete image=incomplete:incomplete)
18:56:11.484247160: Notice Network tool launched in container (user=root command=nc parent_process=<NA> container_id=5e00d4f5b35d container_name=incomplete image=incomplete:incomplete)
18:56:13.718492097: Notice Network tool launched in container (user=root command=nc parent_process=<NA> container_id=c37117e7fe0c container_name=incomplete image=incomplete:incomplete)

It sounded like your containers are long-lived and you're exec-ing into them to run nc, though, so let's try that. If I first start a container using docker run -it appropriate/nc /bin/sh, and then exec into it via docker exec -it 5aad18c686fb /bin/sh, and then run nc repeatedly, I see reliable complete output:

$ sudo falco
Tue Jul  9 18:59:47 2019: Falco initialized with configuration file /etc/falco/falco.yaml
Tue Jul  9 18:59:47 2019: Loading rules from file /etc/falco/falco_rules.yaml:
Tue Jul  9 18:59:48 2019: Loading rules from file /etc/falco/falco_rules.local.yaml:
Tue Jul  9 18:59:48 2019: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Tue Jul  9 18:59:48 2019: Starting internal webserver, listening on port 8765
18:59:49.961332036: Notice A shell was spawned in a container with an attached terminal (user=root incomplete (id=5aad18c686fb) shell=sh parent=<NA> cmdline=sh terminal=34816)
19:00:14.454585820: Notice Network tool launched in container (user=root command=nc parent_process=sh container_id=5aad18c686fb container_name=heuristic_pascal image=appropriate/nc:latest)
19:00:16.506453114: Notice Network tool launched in container (user=root command=nc parent_process=sh container_id=5aad18c686fb container_name=heuristic_pascal image=appropriate/nc:latest)
19:00:17.713325261: Notice Network tool launched in container (user=root command=nc parent_process=sh container_id=5aad18c686fb container_name=heuristic_pascal image=appropriate/nc:latest)

(the first incomplete is from the initial docker run command)

I'll try ECS next to see if it changes anything.

@mstemm
Copy link
Contributor

mstemm commented Jul 10, 2019

I created a single-node ECS cluster of the type Linux + EC2 e.g. not fargate, and then created a task that used appropriate/nc to run long-lived. Here's the entrypoint from docker inspect:

"Entrypoint": [
                "/bin/sh",
                "-c",
                "while true; do sleep 60; done"
            ],

I then ran falco directly on the ec2 host, using a container, with this command line:

docker run \
  --interactive \
  --privileged \
  --tty \
  --name falco \
  --volume /var/run/docker.sock:/host/var/run/docker.sock \
  --volume /dev:/host/dev \
  --volume /proc:/host/proc:ro \
  --volume /boot:/host/boot:ro \
  --volume /lib/modules:/host/lib/modules:ro \
  --volume /usr:/host/usr:ro \
  falcosecurity/falco:0.15.3

Finally, I exec'd into the appropriate/nc container via docker exec -it de3ab0a71e55 /bin/sh and ran nc. I see appropriate falco alerts:

2019-07-10T19:54:28+0000: Falco initialized with configuration file /etc/falco/falco.yaml
2019-07-10T19:54:28+0000: Loading rules from file /etc/falco/falco_rules.yaml:
2019-07-10T19:54:28+0000: Loading rules from file /etc/falco/falco_rules.local.yaml:
2019-07-10T19:54:28+0000: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
2019-07-10T19:54:29+0000: Starting internal webserver, listening on port 8765
2019-07-10T19:54:29.016459000+0000: Informational Container with sensitive mount started (user=<NA> command=container:0dac4f446ce0 ecs-agent (id=0dac4f446ce0) image=amazon/amazon-ecs-agent:latest mounts=/var/lib/ecs/data:/data::true:rprivate,/var/lib/ecs:/var/lib/ecs::true:rprivate,/run/docker/plugins:/run/docker/plugins:ro:false:rprivate,/etc/pki:/etc/pki:ro:false:rprivate,/usr/lib/docker/plugins:/usr/lib/docker/plugins:ro:false:rprivate,/lib64:/lib64:ro:false:rprivate,/var/log/ecs:/log::true:rprivate,/var/cache/ecs:/var/cache/ecs::true:rprivate,/sys/fs/cgroup:/sys/fs/cgroup::true:rprivate,/proc:/host/proc:ro:false:rprivate,/sbin:/sbin:ro:false:rprivate,/var/run:/var/run::true:rprivate,/etc/ecs:/etc/ecs::true:rprivate,/etc/docker/plugins:/etc/docker/plugins:ro:false:rprivate,/usr/lib:/usr/lib:ro:false:rprivate,/lib:/lib:ro:false:rprivate,/usr/lib64:/usr/lib64:ro:false:rprivate)
2019-07-10T19:54:29.024269000+0000: Informational Privileged container started (user=<NA> command=container:00402b3d7c11 falco (id=00402b3d7c11) image=falcosecurity/falco:0.15.3)
2019-07-10T19:55:36.276987162+0000: Notice A shell was spawned in a container with an attached terminal (user=root ecs-appropriate-nc-3-appropriate-nc-while-loop-80a684b5b9e388e9c501 (id=de3ab0a71e55) shell=sh parent=docker-runc cmdline=sh terminal=34816)
2019-07-10T19:55:37.283565213+0000: Notice Network tool launched in container (user=root command=nc parent_process=sh container_id=de3ab0a71e55 container_name=ecs-appropriate-nc-3-appropriate-nc-while-loop-80a684b5b9e388e9c501 image=appropriate/nc:latest)

So it looks like it's working to me.

Can you describe exactly how you're running falco? Are you running it on the host or as a task. If it's as a task can you paste the task definition as json? You'll want to make sure you have all the appropriate mounts set up.

@fntlnz
Copy link
Contributor

fntlnz commented Jul 11, 2019

/triage needs-information

@poiana poiana added the triage/needs-information Indicates an issue needs more information in order to work on it. label Jul 11, 2019
@billyshambrook
Copy link

Hey @mstemm thanks for taking a look into this. We have been able to reproduce the issue with a limited number of variables.

  1. Create an EC2 instance using the latest ECS Optimized AMI in us-east-1 - amzn2-ami-ecs-hvm-2.0.20190614-x86_64-ebs (ami-02507631a9f7bc956)

  2. Install Falco using RHEL instructions.

  3. Run Falco agent.

  4. Run ubuntu docker container, install nmap and run.

This produces the following log line in /var/log/messages

Notice Network tool launched in container (user=root command=nmap parent_process=bash container_id=1aa8f3e7122c container_name=incomplete image=incomplete:incomplete)

We have observed that if the container was started before running falco agent, the image metadata will be correctly set, but it container is started after falco agent it is always incomplete.

Thanks!!

@billyshambrook
Copy link

Some more information I have found:

If I reboot the machine and falco starts automatically at boot, the container name is given correctly...

$ reboot
$ docker pull ubuntu
$ sudo docker run -it ubuntu /bin/bash -c 'apt update -y && apt install nmap -y && nmap'
$ sudo tail -n 100 /var/log/messages

...
Jul 12 13:59:09 ip-10-0-10-76 falco: 13:59:09.807425871: Error Package management process launched in container (user=root command=apt update -y container_id=2e6f84ff69ef container_name=incomplete image=incomplete:incomplete)
Jul 12 13:59:13 ip-10-0-10-76 falco: 13:59:13.218422550: Error Package management process launched in container (user=root command=apt install nmap -y container_id=2e6f84ff69ef container_name=sharp_proskuriakova image=ubuntu:latest)
Jul 12 13:59:16 ip-10-0-10-76 falco: 13:59:16.770537484: Notice Network tool launched in container (user=root command=nmap parent_process=<NA> container_id=2e6f84ff69ef container_name=sharp_proskuriakova image=ubuntu:latest)
...

However if I then restart falco service, I get incomplete for all events every time...

$ service falco restart
$ sudo docker run -it ubuntu /bin/bash -c 'apt update -y && apt install nmap -y && nmap'
$ sudo tail -n 100 /var/log/messages

...
Jul 12 14:04:56 ip-10-0-10-11 falco: 14:04:56.816839272: Error Package management process launched in container (user=root command=apt update -y container_id=3db836502370 container_name=incomplete image=incomplete:incomplete)
Jul 12 14:04:59 ip-10-0-10-11 falco: 14:04:59.232496198: Error Package management process launched in container (user=root command=apt install nmap -y container_id=3db836502370 container_name=incomplete image=incomplete:incomplete)
Jul 12 14:05:02 ip-10-0-10-11 falco: 14:05:02.802452727: Notice Network tool launched in container (user=root command=nmap parent_process=<NA> container_id=3db836502370 container_name=incomplete image=incomplete:incomplete)
...

Here's also the output of falco --support | jq .system_info

{
  "machine": "x86_64",
  "nodename": "ip-10-0-10-11.ec2.internal",
  "release": "4.14.123-111.109.amzn2.x86_64",
  "sysname": "Linux",
  "version": "#1 SMP Mon Jun 10 19:37:57 UTC 2019"
}

@mfdii
Copy link
Member

mfdii commented Jul 15, 2019

@gnosek Is looking at this. It appears to be a bug in the container metadata fetch in the sysdig libraries.

@mfdii
Copy link
Member

mfdii commented Jul 17, 2019

Ok, I've traced this down to how we daemonize the falco process. This bug doesn't happen if the --daemon option isn't passed.

Basically what's happening is that when we daemonize, and eventually kill the parent falco process, we kill the thread linsinsp creates to fetch metadata. Then the new daemonized process never respawns the metadata fetch thread.

We create the inspector object before we create the daemon in order to have any errors in creating the inspector object sent to the foregrounnd. One idea is to move the inspector->open to be executed after we daemonize but then errors will not be sent to the foreground.

I'm trying to track down the best solution.

@billyshambrook
Copy link

Thanks @mfdii, if we ran falco within a docker container would I be right in assuming it wouldn't be ran with the daemon flag?

@mfdii
Copy link
Member

mfdii commented Jul 17, 2019 via email

@fntlnz
Copy link
Contributor

fntlnz commented Jul 18, 2019

So, I don't think we need to fork there. Just doing the parent pid association with setsid should be fine for what we want to achieve, in that way we don't lose the context of this process (with its threads).

I will try to come up with that fix

/assign @fntlnz

@fntlnz
Copy link
Contributor

fntlnz commented Jul 18, 2019

OK tried the road above, it's not practical, probably the best option now is to make so that we reassign the parent threads or make it smart and start those threads when it's doing the daemon thing.

@fntlnz
Copy link
Contributor

fntlnz commented Jul 18, 2019

I've been able to reproduce the issue here, then found that we start the capturing before the fork is done, just moving the open logic after the fork fixed the problem. I sent a PR with a fix @mfdii @srivastavaabhinav can you please try it?

@fntlnz
Copy link
Contributor

fntlnz commented Jul 19, 2019

@srivastavaabhinav in this comment you can find the compiled rpm and deb packages to test: #731 (comment)

@srivastavaabhinav
Copy link
Author

@fntlnz thanks! Let me look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants