Skip to content

Conversation

@timvaillancourt
Copy link

@timvaillancourt timvaillancourt commented Aug 18, 2017

On many systems I've seen the mongodb_exporter use very high CPU % under PMM 1.2.x. This PR aims to reduce some of the work the mgo driver is doing and reduce the amount of connections/log-noise from reconnecting each scrape.

  1. Disable cursor pre-fetching (not needed on simple exporter queries - wastes resources fetching 50 docs we won't use).
  2. Limit connection pool size to default 1. I added flag -mongodb.max-connections to control the pool limit. All other request will wait for the pool instead of creating more connections. Each additional mongodb connection adds CPU due to background pings, etc so it's best to restrict to a small number in this simple exporter.
  3. Re-use database connection to MongoDB. Currently we reconnect to the DB for every single scrape - this causes MongoDB to have to create a new connection (OS thread), setup buffers etc and log a lot of noise to the logfiles. This results in a single mongodb connection for the runtime of the exporter and mgo will reconnect if there is a failure all by itself. A .Close() method was added to cleanly close the session handle at end of program. Locking was added around the session due to race conditions.

…l size and re-use database connection to MongoDB
@timvaillancourt
Copy link
Author

I ran a test with 100 gets to the /metrics endpoint for 'master' and this PR. It seems go tracks 25% less CPU used with these changes:

### PR79 branch
[tim@centos7 ~]$ CNT=0; while [ $CNT -le 100 ]; do curl -so /dev/null http://localhost:9216/metrics; CNT=$((CNT+1)); done
[tim@centos7 ~]$ curl -s http://localhost:9216/metrics|grep process_cpu_seconds_total
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.53

### master branch
[tim@centos7 ~]$ CNT=0; while [ $CNT -le 100 ]; do curl -so /dev/null http://localhost:9216/metrics; CNT=$((CNT+1)); done
[tim@centos7 ~]$ curl -s http://localhost:9216/metrics|grep process_cpu_seconds_total
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.71

@timvaillancourt
Copy link
Author

I removed a .Ping() on the mgo.Session that I added initially as mgo handles this for us. After this commit the CI system detected a race condition in MongodbCollector.getSession that needed to be fixed with a sync.Mutex lock to stop the race condition of making the session in many goroutines.

@timvaillancourt
Copy link
Author

I ran some ab (Apache Bench) tests and noticed a deadlock in the .getSession() logic at concurrency. This was fixed with returning a .Copy() of the database session (stolen from mongo-tools/common/db .SessionProvider). This should be the final change for this PR.

@percona percona deleted a comment from codecov bot Aug 21, 2017
@percona percona deleted a comment from codecov bot Aug 21, 2017
@AlekSi AlekSi closed this Oct 17, 2017
@AlekSi AlekSi changed the base branch from develop to master October 17, 2017 08:13
@AlekSi AlekSi reopened this Oct 17, 2017
@timvaillancourt
Copy link
Author

@AlekSi / @michaelcoburn: anything holding this PR back? I think this would be very beneficial to rollout asap

@AlekSi
Copy link
Contributor

AlekSi commented Nov 29, 2017

It is currently planned for PMM 1.6.

@AlekSi AlekSi changed the base branch from master to PMM-1764-small-improvements January 4, 2018 11:51
@percona percona deleted a comment from codecov bot Jan 4, 2018
@percona percona deleted a comment from codecov bot Jan 4, 2018
@AlekSi AlekSi merged commit b836caa into percona:PMM-1764-small-improvements Jan 4, 2018
@AlekSi
Copy link
Contributor

AlekSi commented Jan 4, 2018

@timvaillancourt Do you think we should also set FailFast?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants