C++ gRPC server implementation spawns uncontrolled number of threads #25145

lieroz · 2021-01-13T09:57:11Z

Is your feature request related to a problem? Please describe.

We tried using gRPC in one of our projects, a simple microservice that computes math models and returns result to user. We found out that during its work, service can spawn a lot of threads and most of them are waiting on gpr_cv_wait. After further investigation, we understood that it is caused by current ThreadManager implementation, function MainWorkLoop. And there is a comment about that. Uncontrolled thread creation causes huge memory consumption when using standard glibc allocator:

We decided to profile app with gperftools, which requires tcmalloc to make memory measurements, and accidentally found out that memory footprint is several times lower than when using standard linux allocator:

Here is a post that addresses this problem: https://habr.com/en/company/mailru/blog/534414/.

Describe the solution you'd like

As of now, SetResourceQuota won't limit thread spawning, only active threads in some specific moment. How about an idea that ThreadManager manages some underlying thread pool and user can pass it an integer to control threads. This way some problems will be resolved:

removing heavy operation on thread creation and deletion;
service can return RESOURCE_EXHAUSTED right away, if all threads in pool are busy;
there won't by huge and unclear memory footprints when using standard allocator, because number of threads is fixed (here is article about how glibc allocator works: https://sploitfun.wordpress.com/2015/02/10/understanding-glibc-malloc/), due to heavy thread creation, there are lots of memory arenas.

Describe alternatives you've considered

Heavy thread creation can't be avoided with current implementation and tcmalloc can be used to bypass memory consumption problems.

markdroth · 2021-01-14T15:34:18Z

Esun and/or Vijay, can you please take a look at this? What I find particularly interesting is that they continued to have this problem even when switching to the C++ async API, which I didn't think used the ThreadManager at all, so I'm not sure what's going on here.

lieroz · 2021-01-14T17:52:55Z

Maybe this will help a bit. As I understood when was browsing source code, in file src/cpp/server/server_builder.cc:220 in method BuildAndStart, there is always one initial completion queue created and then passed to Server class, then in its constructor sync_server_cqs_ != nullptr check passes and sync_req_mgrs_ gets initialized with single SyncRequestThreadManager instance and then in _Start(grpc::ServerCompletionQueue cqs, size_t num_cqs)_** it calls SyncRequestThreadManager's method Start which then calls Initialize of ThreadManager class.

vjpai · 2021-01-15T17:00:20Z

Can you mention your platform? I believe that I've seen a similar concern about this before which was on the Mac but want to know your platform for the most detailed information.

lieroz · 2021-01-15T18:18:32Z

It is Centos 7.

rockyzhengwu · 2021-05-07T11:40:39Z

I had the same problem

DmitriyDN · 2021-08-13T13:40:02Z

Are there any news regarding it? We also face this issue :(

energygreek · 2022-02-07T09:23:53Z

@lieroz the number of arena is limited, so what's problem with memory?

here's from your referred article:

Number of arena’s: In above example, we saw main thread contains main arena and thread 1 contains its own thread arena. So can there be a one to one mapping between threads and arena, irrespective of number of threads? Certainly not. An insane application can contain more number of threads (than number of cores), in such a case, having one arena per thread becomes bit expensive and useless. Hence for this reason, application’s arena limit is based on number of cores present in the system.

For 32 bit systems:
Number of arena = 2 * number of cores.
For 64 bit systems:
Number of arena = 8 * number of cores.

lieroz · 2022-02-07T16:39:13Z

The number is limited, but one thread can use quite a bit of it and it will be mostly equal in this case, cause each request does the same almost same things. For example, each thread consumes 20mb, total number of arenas is 56 cpus (in this case) * 8 multiplied by 20mb, that becomes 9gb of ram marked by allocator as used. Thus, memory is reused, but footprint is large. That is not the case with tcmalloc, cause it has tls cache for small objects and intermediate list for reusable memory. And allocator doesn’t give memory back in most cases until process exited.
When you have this in k8s cluster with limited memory and cpu for one pod, system will return all cpus for cpu count syscall, you get 400+ arenas on 4gb free mem for your container and it will be OOM killed.
Also this problem was on 3.10 kernel which is quite old, when tested on newer 5+ kernel, there were no such problems. I guess that current server implementation works for most people and to my mind google uses tcmalloc for all their projects.
Think that issue can be closed.

lieroz added kind/enhancement priority/P2 labels Jan 13, 2021

lieroz assigned markdroth Jan 13, 2021

markdroth added the lang/c++ label Jan 14, 2021

markdroth assigned veblush and vjpai and unassigned markdroth Jan 14, 2021

ctiller assigned ctiller and unassigned vjpai May 25, 2021

lieroz closed this as completed Feb 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++ gRPC server implementation spawns uncontrolled number of threads #25145

C++ gRPC server implementation spawns uncontrolled number of threads #25145

lieroz commented Jan 13, 2021

markdroth commented Jan 14, 2021

lieroz commented Jan 14, 2021

vjpai commented Jan 15, 2021

lieroz commented Jan 15, 2021

rockyzhengwu commented May 7, 2021 •

edited

DmitriyDN commented Aug 13, 2021 •

edited

energygreek commented Feb 7, 2022 •

edited

lieroz commented Feb 7, 2022

C++ gRPC server implementation spawns uncontrolled number of threads #25145

C++ gRPC server implementation spawns uncontrolled number of threads #25145

Comments

lieroz commented Jan 13, 2021

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

markdroth commented Jan 14, 2021

lieroz commented Jan 14, 2021

vjpai commented Jan 15, 2021

lieroz commented Jan 15, 2021

rockyzhengwu commented May 7, 2021 • edited

DmitriyDN commented Aug 13, 2021 • edited

energygreek commented Feb 7, 2022 • edited

lieroz commented Feb 7, 2022

rockyzhengwu commented May 7, 2021 •

edited

DmitriyDN commented Aug 13, 2021 •

edited

energygreek commented Feb 7, 2022 •

edited