Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable hook support by default #855

Closed
eero-t opened this issue Aug 12, 2022 · 11 comments
Closed

Disable hook support by default #855

eero-t opened this issue Aug 12, 2022 · 11 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@eero-t
Copy link

eero-t commented Aug 12, 2022

What would you like to be added:

Disable support for hooks by default, and make minimal image the default. Add NFD worker option to enable support for hooks, for backwards compatibility. Deprecate and eventually remove support for dynamic binary + script hooks (to prepare for potential removal of also static binaries hook support).

Feature files can support all the same things while avoiding the issues plaguing hooks (listed below).

Why is this needed:

There are multiple issues with hook support: https://kubernetes-sigs.github.io/node-feature-discovery/v0.11/advanced/customization-guide.html#hooks

Bash, Perl and dynamic binaries support:

  • Security: larger image + attack surface
  • Maintenance: need to update these images to handle security issues in them / their dependencies (in addition to Go ones relevant for NFD itself), see e.g: CVE scan lists high vulnerability against latest image k8s.gcr.io/nfd/node-feature-discovery:v0.11.1 #853
  • Maintenance: unintentional(?) support also for dynamic executables which happen to link to same system libraries, that NFD users could start relying on
  • Documentation: missing on what exactly is supported in given NFD version, which Bash and Perl version, which Perl modules, and which dependency libraries
    • In theory this could be auto-generated for each release image, and diff generated against previous release

Hook support in general:

  • Security: impact of running "random" scripts / binaries in same context as NFD
  • Functionality: Whomever provides the hook, not being able to impact what is exposed to it
  • Documentation: missing on what is exposed for the hook from the system and network
  • Reliability: not being able to tighten NFD worker resource limits as resource usage of hooks is unknown
  • Reliability: additional validation needed for verifying that NFD handles + logs misbehaving hooks correctly (e.g. ones generating output forever or using too much of some other resources)
  • Conformance: binaries in /etc/ being FHS spec violation: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s07.html

External feature life-cycle management

I think feature life-cycle management would also be easier if support for hooks were disabled/dropped.

Life-cycle issues:

  • Documentation: what party is supposed to clean out the obsolete features and how?
  • Functionality: what happens if e.g. Pod adds differently named hook binary or feature file on each run, or some test Pods run year ago were obsolete the next day, but their features still persist in /etc/? NFD continues running them and slowing down with time?
  • Documentation: what happens if multiple hooks or feature files update same labels, does NFD complain loudly, or just take the last value it gets?

Potential solutions:

  • Feature files could include best-before timestamp to solve that, after which NFD could ignore or remove them. With source hooks, another file and some mapping between them would be needed for that
  • NFD could trivially reject + log (maybe even remove) "obviously too large" feature files. For binaries one cannot easily say what reasonable binary size is too large (hopefully 1GB would be obviously too large)
  • Admin could also put features directory to tmpfs to make sure obsolete stuff gets cleaned out on next reboot, but with (static) binaries in there, that could take "too much" RAM
  • Within same file / hook, last of multiple value declarations for same label could be taken (with warnings on extra ones), but IMHO between different files / hooks, duplicate labels with different values should be a clearly user visible error (with same values giving just warning)
@eero-t eero-t added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 12, 2022
@eero-t
Copy link
Author

eero-t commented Aug 12, 2022

Btw. was nfd-worker local socket ever considered for label updates instead of hooks?

@marquiz marquiz added this to the v.0.12.0 milestone Aug 15, 2022
@marquiz
Copy link
Contributor

marquiz commented Aug 15, 2022

Regarding the hooks, thanks @eero-t for starting this discussion. I basically agree with this and have been anxious about the hook support myself, especially after we introduced the minimal image. Hooks are a hack'ish burden that I'm not entirely sure that how many people are even using.

One sketch of a plan to remove them completely over multiple releases:

  1. n (v0.12): declare hooks as deprecated, introduce config option to disable them but have them still enabled by defailt
  2. n+1: disable hooks by default but have still the option to enable them
  3. n+2: drop hooks support entirely, make minimal image the default image, drop the big image or e.g. rename it to -debug

The life-cycle management of "external features" is another class of issues that is worth tracking separately. I think we can concentrate on feature files there as hooks are likely to go away 😄 @eero-t may I ask you to create a separate issue about this subject?

@eero-t
Copy link
Author

eero-t commented Aug 15, 2022

Added separate tickets for the life-time management improvements, and for adding socket API for external features.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 14, 2022
@marquiz
Copy link
Contributor

marquiz commented Nov 14, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 14, 2022
@marquiz marquiz modified the milestones: v0.13.0, v0.14.0 Dec 23, 2022
@marquiz
Copy link
Contributor

marquiz commented Dec 23, 2022

Moved to v0.14.0

@marquiz
Copy link
Contributor

marquiz commented Jan 9, 2023

Maybe for v0.13 we could actually change the base image of the default image distroless/base (making it identical to the -minimal image). We could add a -full image variant to match the current default image (with debian:bullseye-slim as the base). We could also make limit the minimal image even more, base it on scratch but I don't know if that's something of interest(?)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 9, 2023
@marquiz
Copy link
Contributor

marquiz commented Apr 11, 2023

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 11, 2023
@marquiz
Copy link
Contributor

marquiz commented Jun 22, 2023

Fixed by #1182
/close

@k8s-ci-robot
Copy link
Contributor

@marquiz: Closing this issue.

In response to this:

Fixed by #1182
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

4 participants