Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve SDS file watching docs #10979

Closed
snowp opened this issue Apr 28, 2020 · 14 comments · Fixed by #13721 or #19974
Closed

Improve SDS file watching docs #10979

snowp opened this issue Apr 28, 2020 · 14 comments · Fixed by #13721 or #19974
Assignees

Comments

@snowp
Copy link
Contributor

snowp commented Apr 28, 2020

The comments in sds_api.cc imply that we're watching the directory to allow for atomic updates of the cert/key (https://github.com/envoyproxy/envoy/blob/master/source/common/secret/sds_api.cc#L66-L67), but from testing this by referencing certs in /secrets/current/cert.crt and repointing the current symlink to another directory, no file watches are triggered.

I'm not sure if this is a bug, inconsistent docs or perhaps I'm just misunderstanding the feature. Applying the following diff made this behave the way I expected it to:

diff --git a/source/common/secret/sds_api.cc b/source/common/secret/sds_api.cc
index deab859ad..476d1c25b 100644
--- a/source/common/secret/sds_api.cc
+++ b/source/common/secret/sds_api.cc
@@ -66,7 +66,7 @@ void SdsApi::onConfigUpdate(const Protobuf::RepeatedPtrField<ProtobufWkt::Any>&
         // Watch for directory instead of file. This allows users to do atomic renames
         // on directory level (e.g. Kubernetes secret update).
         const auto result = api_.fileSystem().splitPathFromFilename(filename);
-        watcher_->addWatch(absl::StrCat(result.directory_, "/"),
+        watcher_->addWatch(result.directory_,
                            Filesystem::Watcher::Events::MovedTo, [this](uint32_t) {
                              uint64_t new_hash = getHashForFiles();
                              if (new_hash != files_hash_) {

cc @lizan @tsaarni

@tsaarni
Copy link
Member

tsaarni commented Apr 28, 2020

The current behaviour is intentional. The basis is how Kubernetes secret volume mounts and atomic secret updates happen:

Assume we have a secret with two files file1 and file2 and that the secret volume is mounted at /secret-mountpoint/.

The initial state of the directory, files and symlinks are set up by Kubernetes like this:

/secret-mountpoint/file1                # symbolic link to ..data/file1
/secret-mountpoint/file2                # symbolic link to ..data/file2
/secret-mountpoint/..data               # symbolic link to ..timestamp-1
/secret-mountpoint/..timestamp-1        # directory
/secret-mountpoint/..timestamp-1/file1  # initial version of file1
/secret-mountpoint/..timestamp-1/file2  # initial version of file2 

The directory ..timestamp-1 contains the initial version of the actual files. After secret update, new directory ..timestamp-2 gets created with new versions of the files:

/secret-mountpoint/file1                # symbolic link to ..data/file1
/secret-mountpoint/file2                # symbolic link to ..data/file2
/secret-mountpoint/..data               # symbolic link to ..timestamp-2
/secret-mountpoint/..timestamp-2        # new directory
/secret-mountpoint/..timestamp-2/file1  # new version of file1
/secret-mountpoint/..timestamp-2/file2  # new version of file2

The watch can only catch this by watching moves in directory /secret-mountpoint/.

In your case, appending / at the end will trigger inotify watch to be set for directory /secrets/current/. But since /secrets/current/ is a symlink, the inotify watch will look at the target directory instead - the old directory which will see no updates anymore. When removing the slash, I think the inotify watch was set to /secrets/ instead, so I guess it catches the move of symbolic link. This would not work with how Kubernetes does the atomic move.

You can see more detailed discussion here #9359 (comment) and a test case that shows the currently expected behaviour

TEST_F(WatcherImplTest, SymlinkAtomicRename) {

@snowp
Copy link
Contributor Author

snowp commented Apr 28, 2020

So are we actually watching for any move within the directory, not moves to the directory itself? We're watching the parent directory for moves to any files within the directory and then re-reading the secrets when that happens?

It feels a bit non-intuitive (and under-documented). I wonder if it would make sense to add a config option to the SDS resource that indicates what kind of semantics should be used when doing file watches? @envoyproxy/api-shepherds

@mattklein123
Copy link
Member

It feels a bit non-intuitive (and under-documented). I wonder if it would make sense to add a config option to the SDS resource that indicates what kind of semantics should be used when doing file watches? @envoyproxy/api-shepherds

At minimum let's please start by getting the behavior well documented. After that we can pursue additional config/changes if needed?

@snowp
Copy link
Contributor Author

snowp commented Apr 28, 2020

Yeah that sounds like a good first step, I imagine most people would be able to work with the current behavior with proper documentation.

@tsaarni
Copy link
Member

tsaarni commented Apr 28, 2020

So are we actually watching for any move within the directory, not moves to the directory itself?

We are watching all moves in/to/from that directory.

In your example, the watched directory would be /secrets/current/, but since current was a symlink instead of a directory, inotify will follow that and watch the target directory instead. I believe there was no moves in that old directory anymore?

Currently the documentation says Envoy will only watch the file path for moves.
Isn't this still true? This is of course very short version of all that happens underneath, and does not explain why current cannot be a symlink.

@snowp
Copy link
Contributor Author

snowp commented Apr 28, 2020

I don't think the current documentation is wrong per se, but rather incomplete and makes it hard for an operator to understand what is required for performing atomic updates to certs. It would be great if we had an example (outside of unit tests) that explained the setup necessary.

Not all use cases use Kuberentes, so ideally we document the way it works with some detail so that operators using other deployment models can understand how to set this up.

@mattklein123
Copy link
Member

See here for the docs we have on the runtime config swap: https://www.envoyproxy.io/docs/envoy/latest/configuration/operations/runtime#updating-runtime-values-via-symbolic-link-swap. It would be great to have good docs/examples of this in either the operations guide or the config docs somewhere.

@snowp snowp changed the title SDS file watching semantics Improve SDS file watching docs Apr 28, 2020
@tsaarni
Copy link
Member

tsaarni commented Apr 28, 2020

Example method for certificate rotation specifically is of course good to have in the docs. I wonder should we also replace the runtime config swap method link in ConfigSource documentation and point to a new method instead? Symlink swap method was there in the doc even before we added certificate/key watches for SDS but I'm bit puzzled if that method works for any ConfigSource path watches? I think that symlink swap method is incompatible with "Envoy will only watch the file path for moves" since inotify would follow the symlink and keep watching moves only in the old directory (and therefore misses the symlink swap).

@mattklein123
Copy link
Member

The swap method documented definitely works for file based runtime, but I'm not sure if it works for config sources or not. It would be good to clean this up and perhaps have a docs section in operations (https://www.envoyproxy.io/docs/envoy/latest/operations/operations) around file swapping and the various use cases, and then link there as needed?

@mattklein123 mattklein123 added no stalebot Disables stalebot from closing an issue help wanted Needs help! and removed no stalebot Disables stalebot from closing an issue labels Apr 28, 2020
@ldemailly
Copy link

I am trying to get CDS file based watch to work with a configmap and failing so far (because it seems the directory change isn't detected) - any idea what I am doing wrong?

apiVersion: v1
data:
  clusters.yaml: |+
    resources:
      - '@type': type.googleapis.com/envoy.api.v2.Cluster
        name: service_cluster
        hosts:
          socket_address:
            address: service
            port_value: 15111
        connect_timeout: 18s
        type: strict_dns

  envoy.yaml: |+
    admin:
      access_log_path: /tmp/admin_access.log
      profile_path: /tmp/envoy.prof
      address:
        socket_address: { address: 0.0.0.0, port_value: 9000 }

    static_resources:
      listeners:
        - address:
            socket_address:
              address: 0.0.0.0
              port_value: 8080
          filter_chains:
          - filters:
            - name: envoy.filters.network.tcp_proxy
              config:
                stat_prefix: ingress_tcp
                cluster: service_cluster

    dynamic_resources:
      cds_config:
        path: "/etc/envoy/clusters.yaml"

    node:
      cluster: envoy-internal
      id: envoy-internal

kind: ConfigMap
metadata:
  creationTimestamp: null
  name: envoy-config

and

apiVersion: v1
kind: Pod
metadata:
  name: envoy
spec:
  containers:
    - name: proxy
      image: envoyproxy/envoy-alpine-debug:v1.14-latest
      ports:
        - name: admin
          containerPort: 9000
          protocol: TCP
        - name: proxy
          containerPort: 8080
          protocol: TCP
      volumeMounts:
        - name: config-volume
          mountPath: /etc/envoy
  volumes:
    - name: config-volume
      configMap:
        name: envoy-config

changes to config map don't get detected

@ldemailly
Copy link

ldemailly commented Jul 10, 2020

tried path: "/etc/envoy/..data/clusters.yaml" but that doesn't work either

[2020-07-10 19:05:30.541][1][debug][file] [source/common/filesystem/inotify/watcher_impl.cc:47] added watch for directory: '/etc/envoy/..data' file: 'clusters.yaml' fd: 1

@tsaarni
Copy link
Member

tsaarni commented Jul 10, 2020

If you would have a real file clusters.yaml in directory /etc/envoy/ and then moved new version of the file over it, then I think the watch will work. But this is not what happens in Kubernetes, since it uses symlink swap approach. So I afraid the current code will not work for CDS cluster configs on ConfigMaps.

Note that the topic discussed in this issue is specifically certificate and key file hot-reload, which was added in PR #10163. It was a special case since it was new logic and it could be implemented in a way that works with Kubernetes symlink swap approach.

If you are interested, the fundamental problem is described in "Problem 1" here #9359 (comment). In your case the code expects to see inotify MOVE_TO event in directory /etc/envoy for file clusters.yaml. That kind of event is never emitted in case of symlink swap approach. Changing path to /etc/envoy/..data/clusters.yaml does not help either, because then inotify will follow the symlink and actually watch for events in directory /etc/envoy/..timestampX (where timestampX is a time when the ConfifMap volume was last updated). Kubernetes will never update files in /etc/envoy/..timestampX, instead it will create a new directory at every update. There's quite a lot of details in #9359 about the problem. Fixing it would require specifically supporting symlink swap, which would then break the current move approach.

@ldemailly
Copy link

@tsaarni thank you for the detailed reply (and I realize I'm hijacking a bit a SDS thread but I would hope all file based *DS would work the same) - in the same vein, we claim envoy is built to work well with kubernetes, so it seems supporting/detecting the way kubernetes does atomic updates would be beneficial (and I do realize many people use full fledged control planes like istio but I believe there are valid use case for file base boostrap/simpler setup, yet allowing the basic of 1990s like kill -1 for config reload / some way to do dynamic config out of the box, using the standard image)

So I have 2 choices if I understand, a) write a wrapper script that duplicates a watch for change and moves the files into a writeable copy that envoy detects; b) find how to hack the envoy code to support k8s configmap's way of doing atomic changes.

I think b) would be better - do we agree it's worth doing?

htuch added a commit that referenced this issue Nov 13, 2020
There are a few limitations in our existing support for symlink-based
key rotation:

We don't atomically resolve symlinks, so a single snapshot might have
inconsistent symlink resolutions for different watched files.
Watches are on parent directories, e.g. for /foo/bar/baz on /foo/bar,
which doesn't support common key rotation schemes were /foo/new/baz
is rotated via a mv -Tf /foo/new /foo/bar.
The solution is to provide a structured WatchedDirectory for Secrets to
opt into when monitoring DataSources. SDS will used WatchedDirectory
to setup the inotify watch instead of the DataSource path. On update, it will
read key/cert twice, verifying file content hash consistency.

Risk level: Low (opt-in feature)
Testing: Unit and integration tests added.

Fixes #13663
Fixes #10979
Fixes #13370

Signed-off-by: Harvey Tuch <htuch@google.com>
qqustc pushed a commit to qqustc/envoy that referenced this issue Nov 24, 2020
…3721)

There are a few limitations in our existing support for symlink-based
key rotation:

We don't atomically resolve symlinks, so a single snapshot might have
inconsistent symlink resolutions for different watched files.
Watches are on parent directories, e.g. for /foo/bar/baz on /foo/bar,
which doesn't support common key rotation schemes were /foo/new/baz
is rotated via a mv -Tf /foo/new /foo/bar.
The solution is to provide a structured WatchedDirectory for Secrets to
opt into when monitoring DataSources. SDS will used WatchedDirectory
to setup the inotify watch instead of the DataSource path. On update, it will
read key/cert twice, verifying file content hash consistency.

Risk level: Low (opt-in feature)
Testing: Unit and integration tests added.

Fixes envoyproxy#13663
Fixes envoyproxy#10979
Fixes envoyproxy#13370

Signed-off-by: Harvey Tuch <htuch@google.com>
Signed-off-by: Qin Qin <qqin@google.com>
@mattklein123 mattklein123 self-assigned this Feb 11, 2022
@mattklein123
Copy link
Member

I ran into this issue also trying to send RTDS over a k8s config map. I'm going to fix this. I discussed with @htuch and we think the right solution is to:

  1. Deprecate ConfigSource.path since it's just a string
  2. Replace with a new FileConfigSource message which requires a path, but also has an option watched directory override that can be set. This will allow the watch to be on a directory instead of the file, with the file reloading when the watched directory is moved to.

This should allow us to solve this in the general case for any xDS.

cc @snowp

@mattklein123 mattklein123 reopened this Feb 11, 2022
mattklein123 pushed a commit that referenced this issue Feb 15, 2022
For xDS over the file system, sometimes more control is required over
what directory/file is watched for symbolic link swaps. Specifically,
in order to deliver xDS over a Kubernetes ConfigMap, this extra
configuration is required.

Fixes #10979

Signed-off-by: Matt Klein <mklein@lyft.com>
mattklein123 pushed a commit that referenced this issue Feb 17, 2022
For xDS over the file system, sometimes more control is required over
what directory/file is watched for symbolic link swaps. Specifically,
in order to deliver xDS over a Kubernetes ConfigMap, this extra
configuration is required.

Fixes #10979

Signed-off-by: Matt Klein <mklein@lyft.com>
lizan pushed a commit to envoyproxy/data-plane-api that referenced this issue Feb 17, 2022
For xDS over the file system, sometimes more control is required over
what directory/file is watched for symbolic link swaps. Specifically,
in order to deliver xDS over a Kubernetes ConfigMap, this extra
configuration is required.

Fixes envoyproxy/envoy#10979

Signed-off-by: Matt Klein <mklein@lyft.com>

Mirrored from https://github.com/envoyproxy/envoy @ 8670309bce9a488ccfc04a87d0c4367ca59c4179
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants