Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centos 7.7: Falco Will Only Start After Multiple Restarts #1347

Closed
kylegoch opened this issue Aug 6, 2020 · 6 comments · Fixed by #1366
Closed

Centos 7.7: Falco Will Only Start After Multiple Restarts #1347

kylegoch opened this issue Aug 6, 2020 · 6 comments · Fixed by #1366
Labels

Comments

@kylegoch
Copy link

kylegoch commented Aug 6, 2020

Describe the bug

After installing Falco and the kernel headers, Falco will only start after running systemctl restart falco several times. The log messages when Falco fails are below:

[kgochenour@<host_name> ~]$ sudo journalctl  -fu falco
-- Logs begin at Thu 2020-08-06 12:15:11 UTC. --
Aug 06 12:16:03 <host_name> falco[1508]: Thu Aug  6 12:16:03 2020: Loading rules from file /etc/falco/falco_rules.local.yaml:
Aug 06 12:16:03 <host_name> falco[1532]: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Aug 06 12:16:03 <host_name> falco[1508]: Thu Aug  6 12:16:03 2020: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Aug 06 12:16:03 <host_name> falco[1675]: Unable to load the driver. Exiting.
Aug 06 12:16:03 <host_name> falco[1675]: Runtime error: error opening device /dev/falco0. Make sure you have root credentials and that the falco module is loaded.. Exiting.
Aug 06 12:16:03 <host_name> falco[1508]: [49B blob data]
Aug 06 12:16:03 <host_name> systemd[1]: falco.service: control process exited, code=exited status=1
Aug 06 12:16:03 <host_name> systemd[1]: Failed to start LSB: Falco syscall monitoring agent.
Aug 06 12:16:03 <host_name> systemd[1]: Unit falco.service entered failed state.
Aug 06 12:16:03 <host_name> systemd[1]: falco.service failed.

Eventually after running systemctl restart falco several times (usually once or twice), Falco will start right up and do it's thing:

[kgochenour@<host_name> ~]$ sudo systemctl restart falco
[kgochenour@<host_name> ~]$ sudo journalctl  -fu falco
-- Logs begin at Thu 2020-08-06 12:15:11 UTC. --
Aug 06 12:18:29 <host_name> falco[22141]: Starting falco: Thu Aug  6 12:18:29 2020: Falco initialized with configuration file /etc/falco/falco.yaml
Aug 06 12:18:29 <host_name> falco[22149]: Loading rules from file /etc/falco/falco_rules.yaml:
Aug 06 12:18:29 <host_name> falco[22141]: Thu Aug  6 12:18:29 2020: Loading rules from file /etc/falco/falco_rules.yaml:
Aug 06 12:18:29 <host_name> falco[22149]: Loading rules from file /etc/falco/falco_rules.local.yaml:
Aug 06 12:18:29 <host_name> falco[22141]: Thu Aug  6 12:18:29 2020: Loading rules from file /etc/falco/falco_rules.local.yaml:
Aug 06 12:18:29 <host_name> falco[22149]: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Aug 06 12:18:29 <host_name> falco[22141]: Thu Aug  6 12:18:29 2020: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Aug 06 12:18:30 <host_name> falco[22141]: [  OK  ]
Aug 06 12:18:30 <host_name> systemd[1]: Started LSB: Falco syscall monitoring agent.
Aug 06 12:18:30 <host_name> falco[22150]: Starting internal webserver, listening on port 8765

How to reproduce it

Install Falco 0.24 per Install Docs on Centos 7.7 with kernel 3.10.0-1127.18.2.el7.x86_64

Start Falco with systemctl start falco

Falco will fail to start with error:
Runtime error: error opening device /dev/falco0. Make sure you have root credentials and that the falco module is loaded.. Exiting.

Run systemctl restart falco until Falco starts.

Expected behaviour

Ideally Falco should start up on the first systemctl start falco or on boot when enabled, rather than manual restarts.

Environment

  • Falco version: 0.24
  • System info:
{
  "machine": "x86_64",
  "nodename": "<host_name>",
  "release": "3.10.0-1127.18.2.el7.x86_64",
  "sysname": "Linux",
  "version": "#1 SMP Sun Jul 26 15:27:06 UTC 2020"
}
  • Cloud provider or hardware configuration: AWS
  • OS: CentOS 7.7
  • Kernel: 3.10.0-1127.18.2.el7.x86_64
  • Installation method: RPM via Yum repo from Install Docs

Additional context

Posted originally in the #Falco channel in Kubernetes Slack, @leogr suggested I open an issue about this. He believed it might be related to these lines:
https://github.com/falcosecurity/falco/blob/master/scripts/rpm/falco#L55-L58

@leogr
Copy link
Member

leogr commented Aug 25, 2020

Hey @kylegoch

Could you let me know if this problem has been fixed by 0.25.0, please?

Thanks in advance!

@kylegoch
Copy link
Author

@leogr I updated to 0.25.0 and I am still seeing the error.

For further explanation, Falco is installed via yum using Ansible, on to an AWS AMI. Falco is then stopped (so it doesnt record through the AMI build, then enabled to start on boot).

And same with 0.25.0, when I start an EC2 Instance with the AMI, Falco has the same error and is not started after launching the instance. But if i restart Falco, it works just fine.

Only difference between the first report of the issue and the instance I just used is slightly newer kernel: 3.10.0-1127.19.1.el7.x86_64

@leogr
Copy link
Member

leogr commented Aug 26, 2020

Hey @kylegoch

Could you attach the new log message, please?

Also, some another questions manually running modprobe falco works?
I'm asking this question since the Falco binary tries to run modprobe falco when starts if the module is not loaded, if modprobe fails it emits the following log line:

Aug 06 12:16:03 <host_name> falco[1675]: Unable to load the driver. Exiting.

(we changed a bit the log message in the 0.25.0).

I just want to understand if the problem is here or elsewhere.

BTW, I'm reopening the issue since it's not yet solved.

Thanks in advance.

@leogr leogr reopened this Aug 26, 2020
@kylegoch
Copy link
Author

kylegoch commented Aug 26, 2020

Here is the new Log message on 0.25.0:

Aug 26 13:44:20 <host_name> systemd[1]: Starting LSB: Falco syscall monitoring agent...
Aug 26 13:44:22 <host_name> falco[1621]: Falco version 0.25.0 (driver version ae104eb20ff0198a5dcb0c91cc36c86e7c3f25c7)
Aug 26 13:44:22 <host_name> falco[1604]: Starting falco: Wed Aug 26 13:44:22 2020: Falco version 0.25.0 (driver version ae104eb20ff0198a5dcb0c91cc36c86e7c3f25c7)
Aug 26 13:44:22 <host_name> falco[1621]: Falco initialized with configuration file /etc/falco/falco.yaml
Aug 26 13:44:22 <host_name> falco[1604]: Wed Aug 26 13:44:22 2020: Falco initialized with configuration file /etc/falco/falco.yaml
Aug 26 13:44:22 <host_name> falco[1621]: Loading rules from file /etc/falco/falco_rules.yaml:
Aug 26 13:44:22 <host_name> falco[1604]: Wed Aug 26 13:44:22 2020: Loading rules from file /etc/falco/falco_rules.yaml:
Aug 26 13:44:22 <host_name> falco[1621]: Loading rules from file /etc/falco/falco_rules.local.yaml:
Aug 26 13:44:22 <host_name> falco[1604]: Wed Aug 26 13:44:22 2020: Loading rules from file /etc/falco/falco_rules.local.yaml:
Aug 26 13:44:23 <host_name> falco[1621]: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Aug 26 13:44:23 <host_name> falco[1604]: Wed Aug 26 13:44:23 2020: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Aug 26 13:44:23 <host_name> falco[1678]: Unable to load the driver.
Aug 26 13:44:23 <host_name> falco[1678]: Runtime error: error opening device /dev/falco0. Make sure you have root credentials and that the falco module is loaded.. Exiting.
Aug 26 13:44:23 <host_name> falco[1604]: [49B blob data]
Aug 26 13:44:23 <host_name> systemd[1]: falco.service: control process exited, code=exited status=1
Aug 26 13:44:23 <host_name> systemd[1]: Failed to start LSB: Falco syscall monitoring agent.
Aug 26 13:44:23 <host_name> systemd[1]: Unit falco.service entered failed state.
Aug 26 13:44:23 <host_name> systemd[1]: falco.service failed.

Running modprobe falco returns nothing (return code 0)

And just to match the opening comment. Here is the system info from Falco

{
  "machine": "x86_64",
  "nodename": "host_name",
  "release": "3.10.0-1127.19.1.el7.x86_64",
  "sysname": "Linux",
  "version": "#1 SMP Tue Aug 25 17:23:54 UTC 2020"
}

And Falco version:

Falco version: 0.25.0
Driver version: ae104eb20ff0198a5dcb0c91cc36c86e7c3f25c7

And here are the logs after running systemctl restart falco:

Aug 26 14:18:33 ip-10-205-213-238.us-east-2.compute.internal systemd[1]: Starting LSB: Falco syscall monitoring agent...
Aug 26 14:18:33 ip-10-205-213-238.us-east-2.compute.internal falco[28214]: Falco version 0.25.0 (driver version ae104eb20ff0198a5dcb0c91cc36c86e7c3f25c7)
Aug 26 14:18:33 ip-10-205-213-238.us-east-2.compute.internal falco[28206]: Starting falco: Wed Aug 26 14:18:33 2020: Falco version 0.25.0 (driver version ae104eb20ff0198a5dcb0c91cc36c86e7c3f25c7)
Aug 26 14:18:33 ip-10-205-213-238.us-east-2.compute.internal falco[28214]: Falco initialized with configuration file /etc/falco/falco.yaml
Aug 26 14:18:33 ip-10-205-213-238.us-east-2.compute.internal falco[28206]: Wed Aug 26 14:18:33 2020: Falco initialized with configuration file /etc/falco/falco.yaml
Aug 26 14:18:33 ip-10-205-213-238.us-east-2.compute.internal falco[28214]: Loading rules from file /etc/falco/falco_rules.yaml:
Aug 26 14:18:33 ip-10-205-213-238.us-east-2.compute.internal falco[28206]: Wed Aug 26 14:18:33 2020: Loading rules from file /etc/falco/falco_rules.yaml:
Aug 26 14:18:33 ip-10-205-213-238.us-east-2.compute.internal falco[28214]: Loading rules from file /etc/falco/falco_rules.local.yaml:
Aug 26 14:18:33 ip-10-205-213-238.us-east-2.compute.internal falco[28206]: Wed Aug 26 14:18:33 2020: Loading rules from file /etc/falco/falco_rules.local.yaml:
Aug 26 14:18:34 ip-10-205-213-238.us-east-2.compute.internal falco[28214]: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Aug 26 14:18:34 ip-10-205-213-238.us-east-2.compute.internal falco[28206]: Wed Aug 26 14:18:34 2020: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Aug 26 14:18:34 ip-10-205-213-238.us-east-2.compute.internal falco[28206]: [  OK  ]
Aug 26 14:18:34 ip-10-205-213-238.us-east-2.compute.internal systemd[1]: Started LSB: Falco syscall monitoring agent.
Aug 26 14:18:34 ip-10-205-213-238.us-east-2.compute.internal falco[28215]: Starting internal webserver, listening on port 8765

@leogr
Copy link
Member

leogr commented Aug 26, 2020

Thank you!

Although manually running modprobe falco works, this 👇 confirmed that the Falco binary is not able to modprobe falco for an unknown reason:

Aug 26 13:44:23 <host_name> falco[1678]: Unable to load the driver.

At least we know what is happening though we don't know why 😄

At this point, I suppose there's something else that does not allow it to load the kernel module immediately after the boot.
We have to investigate more about that.

@kylegoch
Copy link
Author

kylegoch commented Sep 2, 2020

After working with @leogr we were able to find the cause of this issue. One was the #1366 but the issue still persisted.

We were building an AWS AMI with Packer and installing Falco during that build. We were also updating the image during that time too. As a result a kernel update was usually in those updates. However that was never an issue before because the instance would be stopped and then would be started as a new instance with the image. But installing Falco after running kernel updates was causing some weirdness.

So we continued our image build process with Packer, but added a step to restart the instance Packer uses, then installed Falco.

After that Falco would start on first boot as expected.

@kylegoch kylegoch closed this as completed Sep 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants