This repository has been archived by the owner on Oct 29, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 137
InfluxDBStore: point batching during Collect is suboptimal #132
Comments
2 tasks
#139 did a huge deal for us, prior to that change we saw numbers of 1/req/sec causing ChunkedCollector to go over it's 50ms threshold and spill trace data. I've confirmed after #139 just now that we are able to do 75/req/sec before hitting that threshold. a 75x improvement, good job @chris-ramon ! |
In the |
chris-ramon
referenced
this issue
Apr 18, 2016
slimsag
added a commit
that referenced
this issue
Apr 26, 2016
slimsag
added a commit
that referenced
this issue
Apr 26, 2016
slimsag
added a commit
that referenced
this issue
Apr 26, 2016
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Issue
Ideally,
InfluxDBStore.Collect
is as fast as reasonably possible. Right now, withexamples/cmd/webapp-influxdb
I've noticedCollect
times right now in the range of 60-200ms (just eyeballing it, I could be off by a bit).If
Collect
cannot complete in under 50ms we lose trace data, because we cannot have trace data build up in memory forever (memory leak), nor can it block pending HTTP requests. 50ms forCollect
is, ideally, an upper time bound (hopefully mostCollect
are much quicker).To measure this, I've added some hacky timing debug information to
influxdb_store.go
and changed thewebapp-influxdb
command, you can try my test branchissue131
for example, or see my changes here: https://github.com/sourcegraph/appdash/compare/issue131 (note: this branch is just PR #127 and #131 merged, then f552611 applied on top).Reproducing
Run the example app cleanly:
Then using vegeta HTTP profiling tool, perform 1 HTTP request/sec for 8s:
You should observe some logs that look like:
Possible solution
Note that
in.con.Write
takes most of the time spent duringCollect
, i.e. theCollect
function itself is not very expensive, but writing to InfluxDB viain.con.Write
is!I think this is because
Collect
is inherently a very small operation, at most it will be writing a single InfluxDB data point. Consider our code:We only write a single point to InfluxDB, and this becomes very expensive because InfluxDB cannot handle small writes very easily, and adds a large overhead like 50-200ms to them. However, InfluxDB can write a very large number of points (500+, a batch of points) in almost the same amount of time (50-200ms) from my tests.
I think the solution here is to make
InfluxDBStore.Collect
append to an internal slice, such that it queues up an entire batch of points, and then after some period of time writes them to InfluxDB in a background goroutine. Important aspects would be:The text was updated successfully, but these errors were encountered: