Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add network namespace pool support #3107

Merged
merged 4 commits into from
Sep 15, 2022
Merged

Conversation

aaronlehmann
Copy link
Collaborator

This adds netNSPoolSize pool options which allow setting a target network namespace pool size. buildkitd will create this number of network namespaces at startup (without blocking). When a container execution finishes, the network namespace gets returned to the pool. If the pool goes above the target size, there is a grace period to allow network namespaces to be reused, and if this passes without reuse, the extra namespaces will be released.

@aaronlehmann
Copy link
Collaborator Author

I was also wondering if there should be some kind of orderly shutdown where we release the network namespaces in the pool when buildkitd shuts down (in addition to cleaning up on restart). It doesn't look like there's a mechanism right now to block the shutdown on cleanup activities like this, and I think in some situations it could block shutdown for a long time, so I'm not sure if it makes sense or not.

@coryb
Copy link
Collaborator

coryb commented Sep 10, 2022

lgtm, should probably add the new config option to the example buildkitd.toml: https://github.com/moby/buildkit/blob/master/docs/buildkitd.toml.md

@aaronlehmann
Copy link
Collaborator Author

should probably add the new config option to the example buildkitd.toml

Good point, added.

@tonistiigi
Copy link
Member

I was also wondering if there should be some kind of orderly shutdown where we release the network namespaces in the pool when buildkitd shuts down (in addition to cleaning up on restart).

Yes, if buildkitd is shut down gracefully then it should not leave any netns mounted. The cleanup on boot is only for the case where the daemon got killed with sigkill and no cleanup was possible. We can create a cleanup function that gets called on defer that would release the items available in the pool and ideally way for all the other mounted ones to be released as well and detect if any of these paths is misbehaving(assuming everything is correct the code path that asked for the netns also cleans up after itself but we can double check).

@@ -208,6 +213,9 @@ func applyContainerdFlags(c *cli.Context, cfg *config.Config) error {
if c.GlobalIsSet("containerd-cni-config-path") {
cfg.Workers.Containerd.NetworkConfig.CNIConfigPath = c.GlobalString("containerd-cni-config-path")
}
if c.GlobalIsSet("containerd-netns-pool-size") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current implementation isn't it more like a cni-pool-size?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

pool.mu.Lock()
if pool.actualSize > pool.targetSize && ns.lastUsed.Equal(putTime) {
for i, poolNS := range pool.available {
if poolNS == ns {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is quite correct. You do actualSize > pool.targetSize comparison when deciding to add a cleanup func and also inside the cleanup but I don't think there is a guarantee that these conditions are the same when these places get called for a specific ns.

At least it seems safer if we skip this check end just always call cleanUpToDefaultSize() that is a no-op if available is not greater than size.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The namespaces are reused in FIFO order, so I think this is correct as written. If we were not above the target size at put time, it will be some other namespace that brings us the target size, and that's the one we'll want to release if it hasn't been reused after 5 minutes.

But thinking about this some more, I don't think we want the FIFO order. It means we'll cycle through the namespaces and it will be hard to shrink back to the target size if there's constant activity. LIFO seems better. So I've rewritten this to use LIFO and align more closely to your suggestion.

@aaronlehmann
Copy link
Collaborator Author

Yes, if buildkitd is shut down gracefully then it should not leave any netns mounted. The cleanup on boot is only for the case where the daemon got killed with sigkill and no cleanup was possible. We can create a cleanup function that gets called on defer that would release the items available in the pool

I couldn't find a place to put this defer, so I tried a different approach where cniProvider will launch a cleanup when the main application context is cancelled. This will not block graceful shutdown (which has pros and cons). Feel free to point me in a different direction.

@aaronlehmann
Copy link
Collaborator Author

I couldn't find a place to put this defer, so I tried a different approach where cniProvider will launch a cleanup when the main application context is cancelled. This will not block graceful shutdown (which has pros and cons). Feel free to point me in a different direction.

Hmm, this doesn't seem to work well in practice because buildkitd exits before cniProvider has a chance to do much cleanup. I could create a blocking Close method on cniProvider, but I'm not sure where to call this. Any suggestions?

@tonistiigi
Copy link
Member

Yeah, it looks like this requires a bit bigger change. I think we need to pass it through from WorkerController -> Worker -> NetProviders and add blocking cleanup method for each one. There might be something else in this call path as well that we have missed and might need cleanup.

A quicker patch would be to let the grpc shut down and then just do a hardcoded cleanup, maybe just looking for mountpoints under the state folder and unmount them directory without too much cni specifics. But I think the first solution is better.

This adds netNSPoolSize pool options which allow setting a target
network namespace pool size. buildkitd will create this number of
network namespaces at startup (without blocking). When a container
execution finishes, the network namespace gets returned to the pool. If
the pool goes above the target size, there is a grace period to allow
network namespaces to be reused, and if this passes without reuse, the
extra namespaces will be released.

Signed-off-by: Aaron Lehmann <alehmann@netflix.com>
Signed-off-by: Aaron Lehmann <alehmann@netflix.com>
Signed-off-by: Aaron Lehmann <alehmann@netflix.com>
Signed-off-by: Aaron Lehmann <alehmann@netflix.com>
@aaronlehmann
Copy link
Collaborator Author

Updated to create these Close methods and close the Controller with a defer in main.

@tonistiigi tonistiigi merged commit 81b6ff2 into moby:master Sep 15, 2022
// We can't use the pool for namespaces that need a custom hostname.
// We also avoid using it on windows because we don't have a cleanup
// mechanism for Windows yet.
if hostname == "" || runtime.GOOS == "windows" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this supposed to be && runtime.GOOS != "windows" . If not then I think the comment is confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants