Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon: load and cache sysInfo on initialization #43130

Merged
merged 1 commit into from
Feb 18, 2022

Conversation

thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Jan 7, 2022

The daemon.RawSysInfo() function can be a heavy operation, as it collects
information about all cgroups on the host, networking, AppArmor, Seccomp, etc.

While looking at our code, I noticed that various parts in the code call this
function, potentially even multiple times per container, for example, it is
called from:

  • verifyPlatformContainerSettings()
  • oci.WithCgroups() if the daemon has cpu-rt-period or cpu-rt-runtime configured
  • in ContainerDecoder.DecodeConfig(), which is called on boith container create and container commit

Given that this information is not expected to change during the daemon's
lifecycle, and various information coming from this (such as seccomp and
apparmor status) was already cached, we may as well load it once, and cache
the results in the daemon instance.

This patch updates daemon.RawSysInfo() to use a sync.Once() so that
it's only executed once for the daemon's lifecycle.

For sake of completenes: I considered updating daemon.RawSysInfo() to use a sync.Once() instead (to make this automatic on first use), but (for now) decided against it, as it could complicate some tests, and for regular use of the daemon, the daemon would be constructed using daemon.NewDaemon().

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@thaJeztah

This comment was marked as outdated.

daemon/seccomp_linux.go Outdated Show resolved Hide resolved
@thaJeztah
Copy link
Member Author

Seeing a lot of failures like this on Windows now 😞

=== RUN   TestDockerSuite/TestAPIImagesHistory
    docker_api_images_test.go:126: assertion failed: 
        Command:  d:\CI\PR-43130\7\binary\docker.exe build -t test-api-images-history -
        ExitCode: 1
        Error:    exit status 1
        Stdout:   Sending build context to Docker daemon  2.048kB

        Step 1/2 : FROM busybox
         ---> 0b198bce8e39
        Step 2/2 : ENV FOO bar
         ---> Running in d8617b112fd5
        
        Stderr:   rename D:\CI\PR-43130\7\daemon\image\windowsfilter\layerdb\tmp\write-set-3947302945 D:\CI\PR-43130\7\daemon\image\windowsfilter\layerdb\sha256\de62b214663d85a4dc21df186b9b469263701ff02a585cabb1bbf31662af5c75: Access is denied.
        
        
        Failures:
        ExitCode was 1 expected 0
        Expected no error
    --- FAIL: TestDockerSuite/TestAPIImagesHistory (0.80s)

@thaJeztah
Copy link
Member Author

win-2022; TestSlowStdinClosing: known flaky test; #43012

=== FAIL: github.com/docker/docker/integration-cli TestDockerSuite/TestSlowStdinClosing (0.97s)
    docker_cli_run_test.go:4177: assertion failed: error is not nil: exit status 127
    --- FAIL: TestDockerSuite/TestSlowStdinClosing (0.97s)

unit-tests: TestFollowLogsHandleDecodeErr: known flaky (being worked on in #43105)

=== FAIL: daemon/logger/loggerutils TestFollowLogsHandleDecodeErr (0.02s)
time="2022-01-11T09:24:41Z" level=error msg="Log reader notified that it must re-open log file, some log data may not be streamed to the client." file=/tmp/TestFollowLogsHandleDecodeErr4202986584
    logfile_test.go:317: assertion failed: 5 (int) != 6 (dec.resetCount int)

The `daemon.RawSysInfo()` function can be a heavy operation, as it collects
information about all cgroups on the host, networking, AppArmor, Seccomp, etc.

While looking at our code, I noticed that various parts in the code call this
function, potentially even _multiple times_ per container, for example, it is
called from:

- `verifyPlatformContainerSettings()`
- `oci.WithCgroups()` if the daemon has `cpu-rt-period` or `cpu-rt-runtime` configured
- in `ContainerDecoder.DecodeConfig()`, which is called on boith `container create` and `container commit`

Given that this information is not expected to change during the daemon's
lifecycle, and various information coming from this (such as seccomp and
apparmor status) was already cached, we may as well load it once, and cache
the results in the daemon instance.

This patch updates `daemon.RawSysInfo()` to use a `sync.Once()` so that
it's only executed once for the daemon's lifecycle.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah
Copy link
Member Author

thaJeztah commented Jan 20, 2022

Flakiness during build; looks like there may be some issues with the SUSE repositories

failed to load cache key:
Get "https://download.opensuse.org/repositories/devel:/tools:/criu/Debian_11/Release.key":
dial tcp 195.135.221.134:443: connect: connection refused
script returned exit code 1

Interesting; somehow a test running where it doesn't support devicemapper? Or just some leftover state from a previous run?

=== RUN   TestDevmapperIncreaseLoopBackSize
time="2022-01-19T20:38:44Z" level=error msg="Udev sync is not supported. This will lead to data loss and unexpected behavior. Install a more recent version of libdevmapper or select a different storage driver. For more information, see https://docs.docker.com/engine/reference/commandline/dockerd/#storage-driver-options" storage-driver=devicemapper
time="2022-01-19T20:38:44Z" level=warning msg="Usage of loopback devices is strongly discouraged for production use. Please use `--storage-opt dm.thinpooldev` or use `man dockerd` to refer to dm.thinpooldev section." storage-driver=devicemapper
time="2022-01-19T20:38:44Z" level=info msg="Creating filesystem xfs on device docker-0:64-17359-base, mkfs args: [/dev/mapper/docker-0:64-17359-base]" storage-driver=devicemapper
time="2022-01-19T20:38:44Z" level=info msg="Successfully created filesystem xfs on device docker-0:64-17359-base" storage-driver=devicemapper
time="2022-01-19T20:38:44Z" level=warning msg="unmountAndDeactivate: open /tmp/docker-graphtest-1787891524/devicemapper/mnt: no such file or directory" storage-driver=devicemapper
time="2022-01-19T20:38:44Z" level=error msg="Udev sync is not supported. This will lead to data loss and unexpected behavior. Install a more recent version of libdevmapper or select a different storage driver. For more information, see https://docs.docker.com/engine/reference/commandline/dockerd/#storage-driver-options" storage-driver=devicemapper
time="2022-01-19T20:38:44Z" level=warning msg="Usage of loopback devices is strongly discouraged for production use. Please use `--storage-opt dm.thinpooldev` or use `man dockerd` to refer to dm.thinpooldev section." storage-driver=devicemapper
time="2022-01-19T20:38:44Z" level=warning msg="Base device already exists and has filesystem xfs on it. User specified filesystem  will be ignored." storage-driver=devicemapper
    devmapper_test.go:124: data or metadata loop back size is incorrect
time="2022-01-19T20:38:44Z" level=warning msg="unmountAndDeactivate: open /tmp/docker-graphtest-1787891524/devicemapper/mnt: no such file or directory" storage-driver=devicemapper
--- FAIL: TestDevmapperIncreaseLoopBackSize (0.19s)
Base device already exists and has filesystem xfs on it.
User specified filesystem  will be ignored."
..
devmapper_test.go:124: data or metadata loop back size is incorrect

@thaJeztah
Copy link
Member Author

Windows failure is a known flaky test (we need to look into that one) #43012

=== RUN   TestDockerSuite/TestSlowStdinClosing
    docker_cli_run_test.go:4177: assertion failed: error is not nil: exit status 127
    --- FAIL: TestDockerSuite/TestSlowStdinClosing (4.55s)

@thaJeztah
Copy link
Member Author

@AkihiroSuda ptal

@thaJeztah
Copy link
Member Author

@AkihiroSuda ptal 🤗

@thaJeztah
Copy link
Member Author

@AkihiroSuda PTAL

@AkihiroSuda AkihiroSuda merged commit 54d35c0 into moby:master Feb 18, 2022
@thaJeztah thaJeztah deleted the daemon_cache_sysinfo branch February 18, 2022 06:59
@thaJeztah
Copy link
Member Author

Thanks! (sorry for the repeated ping; had some other changes in this area that may conflict, so wanted to get this one in if possible)

@thaJeztah
Copy link
Member Author

This may have caused a regression; first time the daemon is started, docker info now shows the wrong information for networking;

WARNING: IPv4 forwarding is disabled
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

Checking on the host shows the correct status;

cat /proc/sys/net/ipv4/ip_forward
1
cat /proc/sys/net/bridge/bridge-nf-call-iptables
1
cat /proc/sys/net/bridge/bridge-nf-call-ip6tables
1

After restarting the service (systemctl restart docker.service), values are correct.

I suspect the reason for that is that the SystemInfo is read during daemon startup (and uses a sync.Once()), but the daemon actually modifies those values, and first time the daemon is started, those changes have not yet been set;

For example;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants