Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webproxy/varnish: fix runtime/state dir issues and restart on unit changes #1104

Draft
wants to merge 3 commits into
base: fc-24.05-dev
Choose a base branch
from

Conversation

dpausp
Copy link
Member

@dpausp dpausp commented Sep 12, 2024

The first change already would fix our issues with state dir changes but the two other commits are still useful to make things more explicit and less confusing.

  1. webproxy/varnish: final version of the fix to manage the varnish state dir properly

a) use the real varnish upstream state directory
b) only reload if it's a pure VCL change, restart otherwise

For channel upgrades (e.g. when this statedir change is rolled out) this
happens in maintenance anyway. We'll provide better public docs about
the restart conditions in the near future.

  1. webproxy: varnishncsa should have a runtime dir named after the service

Using /run/varnish for the varnishncsa was quite confusing and can lead
to weird errors when varnish uses the same run time dir and restarting
one of the services clears the runtime directory for both. We don't do
that at the moment, but the varnish work dir can be changed.

  1. webproxy: explicitly set work dir for varnishncsa

Before, we relied on varnishncsa's behaviour, looking for the default
varnish work dir at /var/run/varnishd (where /var/run is linked to
/run). /run/varnishd is a symlink created by us, pointing to the real
location /var/spool/varnish/.

Setting the work dir in the varnishncsa command line easier to
understand and less error-prone when the state dir changes.

@flyingcircusio/release-managers

Release process

Impact:

  • [NixOS 24.05] varnish will be restarted.

Changelog:

  • webproxy/varnish: use varnish default work dir and restart varnish on unit file changes. Before, we always reloaded the service which can cause various problems when changes are not picked up. We ran into this problem when the varnish work dir changed in NixOS. To simplify things, we also override the work dir and use the varnish default of /run/varnishd instead of symlinking it to a different location. This path is checked by CLI tools like varnishadm, for example. (PL-132901).

PR release workflow (internal)

  • PR has internal ticket
  • internal issue ID (PL-…) part of branch name
  • internal issue ID mentioned in PR description text
  • ticket is on Platform agile board
  • ticket state set to Pull request ready
  • if ticket is more urgent than within the next few days, directly contact a member of the Platform team

Design notes

  • Provide a feature toggle if the change might need to be adjusted/reverted quickly depending on context. Consider whether the default should be on or off. Example: rate limiting.
  • All customer-facing features and (NixOS) options need to be discoverable from documentation. Add or update relevant documentation such that hosted and guided customers can understand it as well.

Security implications

  • Security requirements defined? (WHERE)
    • make sure that varnish is reloaded properly when config changes and that other changes to the unit file are picked up by restarting the service
  • Security requirements tested? (EVIDENCE)
    • checked on a test VM that varnish is reloaded/restarted as expected

@dpausp dpausp force-pushed the PL-132901-fix-varnish-reload branch 2 times, most recently from 173e7fc to cb0bc51 Compare September 12, 2024 15:20
@dpausp dpausp changed the title Fix varnish reload and varnishncsa crashes after state dir change webproxy/varnish: fix runtime/state dir issues, restart on unit changes Sep 12, 2024
@dpausp dpausp changed the title webproxy/varnish: fix runtime/state dir issues, restart on unit changes webproxy/varnish: fix runtime/state dir issues and restart on unit changes Sep 12, 2024
Before, we relied on varnishncsa's behaviour, looking for the default
varnish work dir at /var/run/varnishd (where /var/run is linked to
/run). /run/varnishd is a symlink created by us, pointing to the real
location /var/spool/varnish/<hostname>.

Setting the work dir in the varnishncsa command line easier to
understand and less error-prone when the state dir changes.

PL-132901
Using /run/varnish for the varnishncsa was quite confusing and can lead
to weird errors when varnish uses the same run time dir and restarting
one of the services clears the runtime directory for both. We don't do
that at the moment, but the varnish work dir can be changed.

PL-132901
…e dir properly

a) use the real varnish upstream state directory
b) only reload if it's a pure VCL change, restart otherwise

For channel upgrades (e.g. when this statedir change is rolled out) this
happens in maintenance anyway. We'll provide better public docs about
the restart conditions in the near future.

PL-132901
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants