Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

com.mongodb.MongoWaitQueueFullException with Webflux + reactive Mongo driver #22775

Closed
vpatil1311 opened this issue Apr 9, 2019 · 4 comments
Closed
Labels
status: invalid An issue that we don't feel is valid

Comments

@vpatil1311
Copy link

vpatil1311 commented Apr 9, 2019

I am trying to do load test of a simple Spring webflux application using gatling tool. Application is developed using 'spring-boot-starter-webflux' and 'spring-boot-starter-data-mongodb-reactive' projects. It simply read mongo document with a specific unique column. I inject concurrent users using gatling setUp(scn.inject(atOnceUsers(userCount)).protocols(httpConf))

I start mongo db instance like shown below

Replica

mongod --replSet rs0 --port 27020 --bind_ip localhost,somehostname,some_ip --dbpath C:\mongo\data\db0 --smallfiles --oplogSize 128 mongod --replSet rs0 --port 27021 --bind_ip localhost,somehostname,some_ip --dbpath C:\mongo\data\db1 --smallfiles --oplogSize 128 mongod --replSet rs0 --port 27022 --bind_ip localhost,somehostname,some_ip --dbpath C:\mongo\data\db2 --smallfiles --oplogSize 128

standalone

mongod --port 27018 --bind_ip localhost,somehostname,some_ip --dbpath C:\mongo\data\db9 --smallfiles --oplogSize 128

Setup 1: application running in Windows Desktop(Intel i5 and 16GB RAM), Mongo DB replica mode (3 node) running on windows laptop (Intel i7 processor and 16GB RAM) and Gatling load test scripts also on desktop. Both application and Gatling scripts on desktop are containerized. Queue size is 500 by default, I have overridden with 1000 queue size using waitQueueMultiple
I am getting this com.mongodb.MongoWaitQueueFullException: with even 3000 concurrency itself.
Setup 2: I have same setup as above but Mongo DB is running in standalone mode
I am getting this com.mongodb.MongoWaitQueueFullException: with even 3000 concurrency itself.
Setup 3: application, Mongo DB standalone mode and Gatling load test scripts all running on Desktop. And all are containerized and connected with a bridge network Queue size is 500 by default, overridden with 1000 queue size this setup works fine till 10000 concurrency. I understand here there is no role of network latency , so the better performance

I have below question

  1. How to resolve this exception , apart from increasing the queue size .

  2. Why there is a Mongo DB performance difference between Standalone and Replica mode
    As mentioned above setup 1 and 2 raises exception when concurrency ~3000 users. I repeat these tests at random time of the day. But at some point of time standalone setup of mongo database performance extremely well and till 48000 concurrent users application scales well(no exception thrown). I checked that Mongo is receivng the requests by watching mongostat/mongotop and also confirmed by running db.adminCommand("top") before and after the each test. I can confirm that read counts are increasing by number of concurrent users i used for test. My only worry is same time if I use mongo DB in replica mode (Setup 1) it does not show the better performance , it continues to throw exception at 3000 concurrency itself. Why replica mode is not performaing equal to standalone mode ) Application code is same , gatling script is same.

I am using mongodb-driver-reactivestreams 1.9.2 Mongo Database 4.0.8

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Apr 9, 2019
@rstoyanchev
Copy link
Contributor

This is related to #22332 so see discussion there. Can you provide some information about what controller methods do so we can understand how many queries are made per request, etc? /cc @mp911de

@vpatil1311
Copy link
Author

vpatil1311 commented Apr 10, 2019

@rstoyanchev thanks for reply.

My controller is simple, it just takes string field as path variable and uses ReactiveMongoRepository to find document from Mongo DB.

`
@GetMapping("/{connectId}")
public Mono<ResponseEntity> getCustomerByConnectId(@PathVariable("connectId") String connectId) {

    try {
        return service.getUserEligibility(connectId).
                map(ResponseEntity::ok).defaultIfEmpty(ResponseEntity.notFound().build());

    } catch (SomeException e) {
        log.info("Error during retrieval of customer : {}", e.getMessage());
    }
    return Mono.just(ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).build());

}

`

I did go through the #22332 , but I need some more clarification. I see in #22332 issue reporter ultimately increased the mongo driver wait queue size to pass the tests.
I tried to do the same by using waitQueueMultiple=100 but I see that webflux implementation used more CPU and memory than Tomcat container. I am using Docker for windows to containerize and used docker stats to watch the usage.

I tested both MVC and Webflux with concurrent users of 10K with waitQueueMultiple=100 , I see the below resource usage

Webflux
for a duration of ~16 sec CPU utilization was more than 100% and highest being 197%
Memory spyke was from 534Mb to 617 Mb = 83 MB

MVC
~22 sec CPU utilization was more than 100% and highest being 179%
Memory spyke was from 643Mb to 669 Mb = 26 Mb

If I increase the concurrent users to 15K, even with waitQueueMultiple=100 reactive mongo driver will break with com.mongodb.MongoWaitQueueFullException . So increasing queue size is not a solution for this issue. I see that Tomcat runs fine even with 40K concurrent users, just that it uses more threads to fulfil the concurrent requests.
How to fix this issue ? is it like my application can only handle 10K concurrent users that too at the cost of more memory and CPU utilization ?
I might be missing very basic , please help me understand how to achieve 'do more with less' tagline of Webflux.

@mp911de
Copy link
Member

mp911de commented Apr 10, 2019

The root cause of MongoWaitQueueFullException is that queries do not return fast enough to handle the incoming load. It might be due to overloading your MongoDB servers or that MongoDB response times are not working with your desired load.

An increase of CPU usage using WebFlux and reactive MongoDB is exactly what you should see. Because all I/O is non-blocking, the CPU is able to perform more work with less context switches between threads.

With Tomcat you have a natural barrier: If you run out of threads, then you cannot issue more queries. With WebFlux, that's different. Threads are not a limiting factor.

How to fix this issue ?

From my perspective you can do the following things:

  • Distribute load across multiple WebFlux servers
  • Reduce load
  • Reduce MongoDB query times so that MongoDB queries complete faster

do more with less

The expanded version is: Perform more work (higher CPU usage) with less resources (Threads).

@vpatil1311
Copy link
Author

@mp911de thanks for details and sorry for delay. I am a newbie and was trying to digest and experiment your suggestions. got some descent understanding on the topic now, we can close the issue. Thanks @rstoyanchev

@rstoyanchev rstoyanchev added status: invalid An issue that we don't feel is valid and removed status: waiting-for-triage An issue we've not yet triaged or decided on labels Apr 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: invalid An issue that we don't feel is valid
Projects
None yet
Development

No branches or pull requests

4 participants