Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus cannot send > ~25k/s to Influxdb #1199

Closed
dan-cleinmark opened this Issue Nov 4, 2015 · 4 comments

Comments

Projects
None yet
3 participants
@dan-cleinmark
Copy link

dan-cleinmark commented Nov 4, 2015

Following up from IRC -

I have a 4 node influx cluster behind an ELB. When I configure a single instance of Prometheus to send to this cluster, I'm able to send a max of around 25k events per-second to the cluster before the remote storage queue blocks and events start being dropped. Increasing 'maxSamplesPerSend' (10 -> 50) and 'maxConcurrentSends' (100 -> 1000) in storage/remote/queue_manager.go didn't seem to help (master...dan-cleinmark:increase_remote_maxSamplesPerSend)

The bottleneck does not seem to be on the Influx side as I'm able to send 75k messages per-second to the same Influx cluster (15k/s from 5x prometheus instances). Looking at the ELB stats while running 0086d48, I'm seeing ~2300 HTTP requests/minute through the ELB with an average latency of ~75ms.

Are there other parameters that could be tweaked to improve performance when sending to Influx from a single prometheus instance?

@dan-cleinmark

This comment has been minimized.

Copy link
Author

dan-cleinmark commented Nov 4, 2015

Running without the changes to storage/remote/queue_manager.go (revision: e91d85b) has the same result (a single prometheus instance is bottlenecked at ~25k/s but 5x prometheus instances can send to the same influx cluster at ~75k/s aggregate). The ELB in front of the influx cluster reports ~40k HTTP requests/minute with latency of ~70ms.

@brian-brazil brian-brazil added the bug label Dec 16, 2015

@fabxc

This comment has been minimized.

Copy link
Member

fabxc commented Apr 6, 2016

I think it's fair to say that we won't pursue improving the existing InfluxDB integration any further – especially in light of clustering being closed source now.
The future focus will be on a generic write path. Likely someone will provide an integration via that one then that will hopefully be performant.

@fabxc fabxc closed this Apr 6, 2016

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Apr 6, 2016

I note that I managed to get 100k/s out of the generic write path, so it's unclear that there's a problem here on the Prometheus end.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.