Take too long to load 1 datapoint into 3000 distinct series #375

samuraraujo · 2014-03-29T01:29:37Z

I have 3000 series with two columns each (value, timestamp) where I try to load via REST api 1 datapoint every 20s. The task takes many minutes to complete. I would expect it to be loaded in a few seconds.

So, I have a question: How many distinct series InfluxDB support? Thousands? Hundred thousands? Millions?

Are there any benchmark testing the time to load data into InfluxDB?

Do you have a sample or tutorial on modelling large web data analytics on InfluxDB?

toddboom · 2014-03-29T03:41:26Z

@samuraraujo The problem with only writing a single point on each HTTP request is that you'll end up spending a majority of each request in HTTP overhead, so even if it's relatively quick (20ms total), 3000 points will still take 60 seconds, and I would expect that's a best case scenario. For optimum throughput, you should batch up the points into a single request, like this:

[
  {
    "name": "series_one",
    "columns": ["value", "time"],
    "points": [
      [99, 123412341234]
    ]
  },
  {
    "name": "series_two",
    "columns": ["value", "time"],
    "points": [
      [33, 432143214321]
    ]
  }
]

Are you using a library to write the data in, or are you hitting the API directly?

Regarding your question about the maximum number of series, InfluxDB is designed to handle millions of time series relatively easily.

samuraraujo · 2014-03-31T00:05:38Z

Todd, I am sending the datapoints in one single batch. More precisely, 3000 datapoints are sent at once via the API (HTTP). Each datapoint represents one series.

Is it really normal that InfluxDB takes minutes to handle this request?

You wrote: " InfluxDB is designed to handle millions of time series relatively easily."
Do you have a benchmark to show the performance of InfluxDB.

Notice that I use one single machine with 8GB memory.

toddboom · 2014-03-31T00:13:20Z

@samuraraujo Writing a single batch of 3000 points should take less than a second. When running tests locally with points in 10k batches, we see point throughput of > 100k/sec sustained. Adding the overhead of a remote server would reduce that, but not drastically so.

Can you send over any sample data or sample code that you're using to write the points? How much data are you including with each point?

We don't have any benchmarks that specifically address number of time series, but the series names function as an index, so the performance will remain high even as the number of series grows.

samuraraujo · 2014-03-31T13:20:50Z

Todd, here is a sample of the data that I am trying to add to influx. Do you see anything that could cause the delay that I reported?

[{"name"=>"metric.news.item.b167cafa4d700a938800e9d72507ba206c49f480.view", "columns"=>["value", "time"], "points"=>[[1, 1396038465]]},
{"name"=>"metric.news.item.a6c2333267bea34350b0dec33d38989dd11d05cf.view", "columns"=>["value", "time"], "points"=>[[1, 1396038465]]},
{"name"=>"metric.news.item.289cadefe3d93953aba1dbae71099dc1093b910c.view", "columns"=>["value", "time"], "points"=>[[3, 1396038465]]},
{"name"=>"metric.news.item.7b3874298fa2d95c0651828d1d13fb6f20b0ce5f.view", "columns"=>["value", "time"], "points"=>[[2, 1396038465]]},
{"name"=>"metric.news.item.02d171b106caf3584dbfb56fb5aaca2027cabe2c.view", "columns"=>["value", "time"], "points"=>[[1, 1396038465]]},
{"name"=>"metric.news.item.f4edf7ec88215ec1c03b1c3f19ff788a18ceecbc.view", "columns"=>["value", "time"], "points"=>[[6, 1396038465]]},
{"name"=>"metric.news.item.4c35dc637aff4eec9c8e3f77a2d67a2bab998d4d.view", "columns"=>["value", "time"], "points"=>[[1, 1396038465]]},
{"name"=>"metric.news.item.dc667f337ba0a0b8e44af171419f15a5312d0762.view", "columns"=>["value", "time"], "points"=>[[1, 1396038465]]},
{"name"=>"metric.news.item.062005b04be79af3fa28d55416498085a9961e98.ctr", "columns"=>["value", "time"], "points"=>[[0.029411764705882353, 1396038465]]},
{"name"=>"metric.news.item.5f87b2e08af7e58eecc5b5080f20c275f08f5386.ctr", "columns"=>["value", "time"], "points"=>[[0.5, 1396038465]]},
{"name"=>"metric.news.item.ad048a2726f48ad29b5461c3115801bab741d606.ctr", "columns"=>["value", "time"], "points"=>[[0.25, 1396038465]]},
{"name"=>"metric.news.item.7203f60774ce2df6e27983c6b0d666ea2bce7421.ctr", "columns"=>["value", "time"], "points"=>[[0.3333333333333333, 1396038465]]}
{"name"=>"metric.news.item.208b4d79a681e2e6f63985a8d76a65e1f3b0fded.ctr", "columns"=>["value", "time"], "points"=>[[0.5, 1396038465]]},
{"name"=>"metric.news.item.4c32036d5d38e4280079df17b6853ac88da0e32f.ctr", "columns"=>["value", "time"], "points"=>[[1.0, 1396038465]]},
{"name"=>"metric.news.item.2805823d54a43dd1c45ec75f74441adccda38a84.ctr", "columns"=>["value", "time"], "points"=>[[0.010752688172043012, 1396038465]]}
]```

samuraraujo · 2014-03-31T14:26:45Z

Todd, you can download the file with the entire data here:
https://drive.google.com/file/d/0Bwci5MBqp2fyOUNVTHUzUmRWb2s/edit?usp=sharing

toddboom · 2014-03-31T14:40:40Z

@samuraraujo This is great, thanks! With some local testing, I wasn't able to reproduce the large delay you were seeing, but I will try this dataset and let you know what I see. Could you also send along the code or command you're issuing to actually write the data?

samuraraujo · 2014-03-31T14:46:50Z

Todd, I observe the same delay using the command:

curl -X POST 'http://localhost:8086/db/metrics/series?u=root&p=root&time_precision=s' -d @log.txt

The code is not the issue.

toddboom · 2014-03-31T17:14:24Z

@samuraraujo Alright, I just tried your dataset and the same curl command you provided above and the request returns in 2-3 seconds. Can you give me any more information about the machine you're running this on? Also, what is the behavior of the process while the query is running (cpu %, memory utilization, etc.)? Is there anything out of the ordinary in the log file?

toddboom · 2014-03-31T17:15:20Z

@samuraraujo Could you also send over a copy of your config file?

samuraraujo · 2014-03-31T17:18:34Z

Here is my config file:
https://drive.google.com/file/d/0Bwci5MBqp2fyc1ZxRlhOWWN0X1k/edit?usp=sharing

samuraraujo · 2014-03-31T17:32:40Z

Here is the machine details:
Linux zwks02 3.11.0-18-generic #32-Ubuntu SMP Tue Feb 18 21:11:14 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Below is the machine status while running the command.
https://drive.google.com/file/d/0Bwci5MBqp2fyN0tTUDZhTUhJVVk/edit?usp=sharing
https://drive.google.com/file/d/0Bwci5MBqp2fySVVkcDljckZ5QVk/edit?usp=sharing

Notice that I did not change the default config file.

toddboom · 2014-03-31T17:40:28Z

@samuraraujo ok, it sounds like something else is going on. the cpu utilization should be much higher during writes, so it sounds like something else is causing the bottleneck. can you run the following command while the writes are happening and send over some of the output?

iostat -mx 1

Thanks!

samuraraujo · 2014-03-31T20:13:28Z

Here is the output:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00   163,00    0,00  103,00     0,00     2,56    50,95     2,41   23,38    0,00   23,38   9,63  99,20

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4,53    0,00    0,76   32,24    0,00   62,47

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00   151,00    0,00   82,00     0,00     3,28    81,95     1,69   20,49    0,00   20,49  12,10  99,20

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4,55    0,00    0,51   28,28    0,00   66,67

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00   160,00    0,00   96,00     0,00     2,51    53,50     1,58   16,21    0,00   16,21  10,38  99,60

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3,28    0,00    0,51   35,86    0,00   60,35

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00   135,00    0,00   81,00     0,00     2,36    59,75     1,68   21,09    0,00   21,09  12,35 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3,82    0,00    0,25   30,28    0,00   65,65

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00   192,00    0,00  109,00     0,00     3,48    65,39     1,58   14,50    0,00   14,50   9,17 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6,85    0,00    1,02   32,49    0,00   59,64

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00   146,00    0,00   91,00     0,00     3,38    76,13     1,55   16,66    0,00   16,66  10,90  99,20

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4,82    0,00    0,25   28,93    0,00   65,99

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00   155,00    0,00   89,00     0,00     3,27    75,15     1,54   17,44    0,00   17,44  11,19  99,60

jvshahid · 2014-03-31T20:23:08Z

hey @samuraraujo, can you run sudo iotop -a that will give you a sorted list of processes doing io on your machine. It could be InfluxDB but I'm curious if there's another process writing to your device and slowing down InfluxDB

samuraraujo · 2014-03-31T20:47:46Z

Hi Shahid, here is the output:

https://drive.google.com/file/d/0Bwci5MBqp2fyeDFYX3dxbWc1VG8/edit?usp=sharing

Soon I run the command below, the io grows from 0.0% to 67%.

curl -X POST 'http://localhost:8086/db/metrics/series?u=root&p=root&time_precision=s' -d @log_json.txt

jvshahid · 2014-03-31T21:00:34Z

It looks like the filesystem is doing a lot of io. This is probably due to the fact that we're breaking up the request per time series (we have a story to fix this #310) and the fact that the default config flushes the wal to disk on every request. So currently with your config, you're flushing to disk 3000 times per reqeust. To verify my theory, can you change your config to match the following:

flush-after = 10000 # the number of writes after which wal will be flushed, 0 for flushing on every write
bookmark-after = 10000 # the number of writes after which a bookmark will be created

# the number of writes after which an index entry is created pointing
# to the offset of the first request, default to 1k
index-after = 10000

# the number of requests per one log file, if new requests came in a
# new log file will be created
requests-per-logfile = 100000

samuraraujo · 2014-03-31T21:12:20Z

Now it works!!! Very fast!

Congrats!

Can you point me the documentation that explain better these parameters?

jvshahid · 2014-03-31T21:14:09Z

Improving the documentation is on the top of our todo list, we'll update them in the next few weeks. Currently we have very poor documentation when it comes to configuration options. Actually, no documentation other than what you get in the config file itself.

jvshahid · 2014-03-31T21:14:45Z

I'll go ahead and close this issue, since it's been resolved.

jvshahid closed this as completed Mar 31, 2014

This was referenced Apr 27, 2016

Influxdb Choking Config help dnschneid/crouton#2551

Closed

Influxdb Choking Config help #6473

Closed

Influxdb Choking Configuration help #6475

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Take too long to load 1 datapoint into 3000 distinct series #375

Take too long to load 1 datapoint into 3000 distinct series #375

samuraraujo commented Mar 29, 2014

toddboom commented Mar 29, 2014

samuraraujo commented Mar 31, 2014

toddboom commented Mar 31, 2014

samuraraujo commented Mar 31, 2014

samuraraujo commented Mar 31, 2014

toddboom commented Mar 31, 2014

samuraraujo commented Mar 31, 2014

toddboom commented Mar 31, 2014

toddboom commented Mar 31, 2014

samuraraujo commented Mar 31, 2014

samuraraujo commented Mar 31, 2014

toddboom commented Mar 31, 2014

samuraraujo commented Mar 31, 2014

jvshahid commented Mar 31, 2014

samuraraujo commented Mar 31, 2014

jvshahid commented Mar 31, 2014

samuraraujo commented Mar 31, 2014

jvshahid commented Mar 31, 2014

jvshahid commented Mar 31, 2014

Take too long to load 1 datapoint into 3000 distinct series #375

Take too long to load 1 datapoint into 3000 distinct series #375

Comments

samuraraujo commented Mar 29, 2014

toddboom commented Mar 29, 2014

samuraraujo commented Mar 31, 2014

toddboom commented Mar 31, 2014

samuraraujo commented Mar 31, 2014

samuraraujo commented Mar 31, 2014

toddboom commented Mar 31, 2014

samuraraujo commented Mar 31, 2014

toddboom commented Mar 31, 2014

toddboom commented Mar 31, 2014

samuraraujo commented Mar 31, 2014

samuraraujo commented Mar 31, 2014

toddboom commented Mar 31, 2014

samuraraujo commented Mar 31, 2014

jvshahid commented Mar 31, 2014

samuraraujo commented Mar 31, 2014

jvshahid commented Mar 31, 2014

samuraraujo commented Mar 31, 2014

jvshahid commented Mar 31, 2014

jvshahid commented Mar 31, 2014