-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add api for stats all containers #25361
Conversation
nice!! |
7dab337
to
4fa3ee9
Compare
Thanks for the work @WeiZhang555 :) Much appreciated that will save a bit of cpu on those servers with 100+ containers :) I was wondering if we could potentially add the names of the containers in the response, I guess id is fine but it is just more convenient if you have a script pulling this endpoint regularly rather than having to rely on some other external mapping or have to do a second api call to /containers/json. I guess for most people the usage of that API endpoint is to get metrics in order to present them nicely, since a container in itself has a short lifetime you most likely graph the name of your container rather than its id. |
@rileysaurus if you have some time can you kick the tires on this PR? |
ba69704
to
35143fb
Compare
I think it's inappropriate to put names in response, there's no space for docker client showing the container names, and ID is already a unique key for identifying a container. |
35143fb
to
e8b844e
Compare
@WeiZhang555 there's an open pull request to add |
@WeiZhang555 @thaJeztah this PR is interesting (#24987) but it looks like it is doing the formatting on the client side, not as a response to the API call though. |
@jeanpralo correct; both PR's solve a different issue; this PR implements the current stats (all containers) server side, but doesn't change the output; I think that changing the output belongs in #24987, which could build on top of this PR (and add extra fields to the API response. I think having both merged / implemented could be realistic for the next release (famous last words) |
@jeanpralo Just as @thaJeztah said, these two PRs are solving different issues. Combining these two will give what you want, but need to solve the code conflict, I believe there will be lots of. |
Yes, this would conflict with #25015. If there is a valid reason for doing this instead of multiple calls in parallel (which this does seem to be), then we need to figure out a good way of doing batching. There is no agreed upon way of doing batch operations with a REST API, and it's often a bit messy. See also #24724 where I explain in some more detail. Having more endpoints at
(1) sounds like a big chunk of work, but (2) seems workable. Anybody got any other ideas? |
Also |
I tested this PR out with 250 containers. I see only a negligible change in CPU usage, and just a slight shifting of the CPU usage from dockerd to docker-containerd. I've testing with both requesting a stream of stats and just a single sample of the stats (once a second). I'm running a ubuntu 14.04 VM on a mid 2015 Macbook Pro. Here is the output of 12.0: 13.0-dev: |
be7a30c
to
95d0c31
Compare
@rileysaurus I just tested this with 1000 containers, I agree with you that the CPU usage didn't change much on daemon side, but I didn't see the shifting of the CPU usage from dockerd to docker-containerd. Interestingly, I noticed that the client CPU usage is much more lower with this PR. hardware for client: 12 CPU, 125G Mem
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< client2(from upstream master):
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Below is the performance data I got: 1. deducation of CPU usage on Client sideclient1(from this PR):
client2(from upstream):
2. less resource consumption on daemon sideThis PR will save TCP connections, File descriptors and Goroutines, which all will influence numbers of clients daemon can serve. client1(from this PR):
client2(from upstream):
This PR can save 999 TCP connections, 3000 Goroutines and 1000 File Descriptors when stats 1000 containers! 3. this PR guarantee client always get all stats numbers.client1: (from this PR)
client2:(from upstream)
This is because old client trys to get data of containers one by one but can't guarantee it get correct numbers while displaying, but client from this PR will only make one request, and daemon will batch all data and send it to client once, which will guarantee client can always get all the data without care of synchronization. That's all my point, correct me if I'm wrong @rileysaurus 😄 Also ping @docker-maintainers, please tell me what do you think of this change. |
In general, I'm +1 for this PR; I think this would also make it easier to (for example) implement a w.r.t. design, I think we need to have a look at the endpoint (#25361 (comment)), to prevent future conflicts, all of |
ping @bfirsh @WeiZhang555 any preference for one of those options? |
I vote to I'm not quite used to format with special name of what do you think? @thaJeztah @bfirsh @justincormack |
I can't think of any examples of this off the top of my head, but we do have some pretty clear conventions in place that we can follow. I am -1 on I am -0 about I prefer |
@bfirsh Sound reasonable. Then it looks like Any more suggestions from any one? |
I don't expect multiple entities to be included in a single stats stream (e.g. containers, and services in the same stream), so having it under I can't really put my finger on it why but not fond of
It looks like |
The reason I picked I've looked into #25015, the suggestion is good, but there're some hard problems on the backward compatibility, so I don't expect it(API renaming) will take place soon. Considering this, I'll re-propose the Involve @calavera for API name discussion, I think we need his approval to merge the new API into |
I don't think a new endpoint at |
Overall it seems good. One question, if you look at the output you've shown (e.g. #25361 (comment) ) you'll see that some containers have |
@duglin Thanks 😄 I don't think it's what you said. Note: only part of client changed, I'll do another rebase to resolve conflicts. |
When i run the old stuff i don't see any --'s but with this pr i get a lot of them. That doesn't seem right. |
@duglin Weird, I get the opposite result. How did you test it, maybe I miss something... This is my test result with 1000 running containers (run Test with master branch Test with this PR Result of |
Perhaps its related to the stopped containers. For me, the old stuff shows me:
and the new stuff shows me:
All zeros vs all --'s |
@duglin I see, you are right 👍 I'll fix it during the rebasing~ |
@WeiZhang555 one thing I just realised we should take into account is that, starting with docker 1.13, the client is able to talk to older daemon versions; that means that when talking to a daemon that not supports the "all" stats endpoint, it should still use the old approach |
6bf6aa1
to
06a0e1d
Compare
Add a new API for docker stats all containers, and refactored docker client to make use of new API. There're some obvious benefits from this commit: 1. it saves lots of TCP connections when stats bundles of containers. 2. the client side is expected to gain higher performance. 3. client codes are simplier and easier to maintain. Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
This commit did several things: 1. adds filter support for API `/containers/-/stats`, we reuse `ps` filters for this, so currently container stats API can also support all filters from `ps` command. 2. `docker stats --all` will bind to default `/containers/-/stats` without filters. 3. `docker stats` is implemented based on `/containers/-/stats` with filter `status=running` Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
This commit redirects API `/containers/{name:*}/stats` to handler of `/containers/-/stats`, now both stating one container and stating all containers will be handled by same backend function. For stating one container, we will first get all stating data for all containers, and filter them by name to get specific one. This also fixes bug of concurrent map write Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
With new `/containers/{name:*}/stats` handler, one windows test case hangs, this commit fixes the hanging issue. Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
06a0e1d
to
9940b1a
Compare
For consistency, we need to set "0" as result instead of "--" for non-running containers. Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
@duglin Your issue is addressed. @thaJeztah Dammit, this is really a terrible problem. I can fix it, but it requires lots of work, and lots of duplicate codes. I'm afraid I need to refactor again and the client implementation will be ugly afterwards. 😢 |
Ok, so I have to say I've pretty much never been +1 for this, just a Is this solving a real, urgent problem? I think stats in general has a ton of room for improvement as even the stats collection is extremely inefficient. So is this really improving things? At some point we'll have container stats exposed in the metrics API, which I think is much more suitable for this kind of thing. I think adding this will lead to tremendous regret later (even for reasons beyond the metrics API). |
I'm against expanding the scope of the stats API. It is very inefficient and I am not sure it could be streamlined. I'd prefer we focus our efforts on expanding prometheus support. |
I see that overall decision is -1 now. Bringing a lot of complexity doesn't sound very cool(for stats in particular). Let's start from refactoring and maybe return to this later. |
Bumping this issue. Has this issue been revisited? There is a feature I'm working on where I need to pull stats for all the containers. I would rather not make a restful call for each and every container as there could be hundreds of containers running on a particular machine. Can we add a parameter to the restful call for pulling all container stats (or adding a new restful api)? I'm sure this would be useful for other people as I don't think I'm the only person running into this issue. |
@thaJeztah ping on this? |
^^ conversation related to the above ongoing in #22052 |
Currently fetching container stats is very slow as each request takes up to 2 seconds. To improve the fetching time if lots of containers are around, this creates the rrequests in parallel. The main downside is that this opens lots of connections. This fix should only temporary until the bulk api is available: moby/moby#25361 (cherry picked from commit f1ecd31)
Fixes #22052
Add a new API for docker stats all containers, and refactored docker
client to make use of new API.
There're some obvious benefits from this commit:
Signed-off-by: Zhang Wei zhangwei555@huawei.com
ping @runcom @thaJeztah
/cc @vincentwoo @jeanpralo Is this what you want?
TODO: