Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take too long to load 1 datapoint into 3000 distinct series #375

Closed
samuraraujo opened this issue Mar 29, 2014 · 19 comments
Closed

Take too long to load 1 datapoint into 3000 distinct series #375

samuraraujo opened this issue Mar 29, 2014 · 19 comments

Comments

@samuraraujo
Copy link

I have 3000 series with two columns each (value, timestamp) where I try to load via REST api 1 datapoint every 20s. The task takes many minutes to complete. I would expect it to be loaded in a few seconds.

So, I have a question: How many distinct series InfluxDB support? Thousands? Hundred thousands? Millions?

Are there any benchmark testing the time to load data into InfluxDB?

Do you have a sample or tutorial on modelling large web data analytics on InfluxDB?

@toddboom
Copy link
Contributor

@samuraraujo The problem with only writing a single point on each HTTP request is that you'll end up spending a majority of each request in HTTP overhead, so even if it's relatively quick (20ms total), 3000 points will still take 60 seconds, and I would expect that's a best case scenario. For optimum throughput, you should batch up the points into a single request, like this:

[
  {
    "name": "series_one",
    "columns": ["value", "time"],
    "points": [
      [99, 123412341234]
    ]
  },
  {
    "name": "series_two",
    "columns": ["value", "time"],
    "points": [
      [33, 432143214321]
    ]
  }
]

Are you using a library to write the data in, or are you hitting the API directly?

Regarding your question about the maximum number of series, InfluxDB is designed to handle millions of time series relatively easily.

@samuraraujo
Copy link
Author

Todd, I am sending the datapoints in one single batch. More precisely, 3000 datapoints are sent at once via the API (HTTP). Each datapoint represents one series.

Is it really normal that InfluxDB takes minutes to handle this request?

You wrote: " InfluxDB is designed to handle millions of time series relatively easily."
Do you have a benchmark to show the performance of InfluxDB.

Notice that I use one single machine with 8GB memory.

@toddboom
Copy link
Contributor

@samuraraujo Writing a single batch of 3000 points should take less than a second. When running tests locally with points in 10k batches, we see point throughput of > 100k/sec sustained. Adding the overhead of a remote server would reduce that, but not drastically so.

Can you send over any sample data or sample code that you're using to write the points? How much data are you including with each point?

We don't have any benchmarks that specifically address number of time series, but the series names function as an index, so the performance will remain high even as the number of series grows.

@samuraraujo
Copy link
Author

Todd, here is a sample of the data that I am trying to add to influx. Do you see anything that could cause the delay that I reported?

[{"name"=>"metric.news.item.b167cafa4d700a938800e9d72507ba206c49f480.view", "columns"=>["value", "time"], "points"=>[[1, 1396038465]]},
{"name"=>"metric.news.item.a6c2333267bea34350b0dec33d38989dd11d05cf.view", "columns"=>["value", "time"], "points"=>[[1, 1396038465]]},
{"name"=>"metric.news.item.289cadefe3d93953aba1dbae71099dc1093b910c.view", "columns"=>["value", "time"], "points"=>[[3, 1396038465]]},
{"name"=>"metric.news.item.7b3874298fa2d95c0651828d1d13fb6f20b0ce5f.view", "columns"=>["value", "time"], "points"=>[[2, 1396038465]]},
{"name"=>"metric.news.item.02d171b106caf3584dbfb56fb5aaca2027cabe2c.view", "columns"=>["value", "time"], "points"=>[[1, 1396038465]]},
{"name"=>"metric.news.item.f4edf7ec88215ec1c03b1c3f19ff788a18ceecbc.view", "columns"=>["value", "time"], "points"=>[[6, 1396038465]]},
{"name"=>"metric.news.item.4c35dc637aff4eec9c8e3f77a2d67a2bab998d4d.view", "columns"=>["value", "time"], "points"=>[[1, 1396038465]]},
{"name"=>"metric.news.item.dc667f337ba0a0b8e44af171419f15a5312d0762.view", "columns"=>["value", "time"], "points"=>[[1, 1396038465]]},
{"name"=>"metric.news.item.062005b04be79af3fa28d55416498085a9961e98.ctr", "columns"=>["value", "time"], "points"=>[[0.029411764705882353, 1396038465]]},
{"name"=>"metric.news.item.5f87b2e08af7e58eecc5b5080f20c275f08f5386.ctr", "columns"=>["value", "time"], "points"=>[[0.5, 1396038465]]},
{"name"=>"metric.news.item.ad048a2726f48ad29b5461c3115801bab741d606.ctr", "columns"=>["value", "time"], "points"=>[[0.25, 1396038465]]},
{"name"=>"metric.news.item.7203f60774ce2df6e27983c6b0d666ea2bce7421.ctr", "columns"=>["value", "time"], "points"=>[[0.3333333333333333, 1396038465]]}
{"name"=>"metric.news.item.208b4d79a681e2e6f63985a8d76a65e1f3b0fded.ctr", "columns"=>["value", "time"], "points"=>[[0.5, 1396038465]]},
{"name"=>"metric.news.item.4c32036d5d38e4280079df17b6853ac88da0e32f.ctr", "columns"=>["value", "time"], "points"=>[[1.0, 1396038465]]},
{"name"=>"metric.news.item.2805823d54a43dd1c45ec75f74441adccda38a84.ctr", "columns"=>["value", "time"], "points"=>[[0.010752688172043012, 1396038465]]}
]```

@samuraraujo
Copy link
Author

Todd, you can download the file with the entire data here:
https://drive.google.com/file/d/0Bwci5MBqp2fyOUNVTHUzUmRWb2s/edit?usp=sharing

@toddboom
Copy link
Contributor

@samuraraujo This is great, thanks! With some local testing, I wasn't able to reproduce the large delay you were seeing, but I will try this dataset and let you know what I see. Could you also send along the code or command you're issuing to actually write the data?

@samuraraujo
Copy link
Author

Todd, I observe the same delay using the command:

curl -X POST 'http://localhost:8086/db/metrics/series?u=root&p=root&time_precision=s' -d @log.txt

The code is not the issue.

@toddboom
Copy link
Contributor

@samuraraujo Alright, I just tried your dataset and the same curl command you provided above and the request returns in 2-3 seconds. Can you give me any more information about the machine you're running this on? Also, what is the behavior of the process while the query is running (cpu %, memory utilization, etc.)? Is there anything out of the ordinary in the log file?

@toddboom
Copy link
Contributor

@samuraraujo Could you also send over a copy of your config file?

@samuraraujo
Copy link
Author

@samuraraujo
Copy link
Author

Here is the machine details:
Linux zwks02 3.11.0-18-generic #32-Ubuntu SMP Tue Feb 18 21:11:14 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Below is the machine status while running the command.
https://drive.google.com/file/d/0Bwci5MBqp2fyN0tTUDZhTUhJVVk/edit?usp=sharing
https://drive.google.com/file/d/0Bwci5MBqp2fySVVkcDljckZ5QVk/edit?usp=sharing

Notice that I did not change the default config file.

@toddboom
Copy link
Contributor

@samuraraujo ok, it sounds like something else is going on. the cpu utilization should be much higher during writes, so it sounds like something else is causing the bottleneck. can you run the following command while the writes are happening and send over some of the output?

iostat -mx 1

Thanks!

@samuraraujo
Copy link
Author

Here is the output:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00   163,00    0,00  103,00     0,00     2,56    50,95     2,41   23,38    0,00   23,38   9,63  99,20

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4,53    0,00    0,76   32,24    0,00   62,47

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00   151,00    0,00   82,00     0,00     3,28    81,95     1,69   20,49    0,00   20,49  12,10  99,20

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4,55    0,00    0,51   28,28    0,00   66,67

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00   160,00    0,00   96,00     0,00     2,51    53,50     1,58   16,21    0,00   16,21  10,38  99,60

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3,28    0,00    0,51   35,86    0,00   60,35

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00   135,00    0,00   81,00     0,00     2,36    59,75     1,68   21,09    0,00   21,09  12,35 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3,82    0,00    0,25   30,28    0,00   65,65

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00   192,00    0,00  109,00     0,00     3,48    65,39     1,58   14,50    0,00   14,50   9,17 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6,85    0,00    1,02   32,49    0,00   59,64

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00   146,00    0,00   91,00     0,00     3,38    76,13     1,55   16,66    0,00   16,66  10,90  99,20

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4,82    0,00    0,25   28,93    0,00   65,99

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00   155,00    0,00   89,00     0,00     3,27    75,15     1,54   17,44    0,00   17,44  11,19  99,60

@jvshahid
Copy link
Contributor

hey @samuraraujo, can you run sudo iotop -a that will give you a sorted list of processes doing io on your machine. It could be InfluxDB but I'm curious if there's another process writing to your device and slowing down InfluxDB

@samuraraujo
Copy link
Author

Hi Shahid, here is the output:

https://drive.google.com/file/d/0Bwci5MBqp2fyeDFYX3dxbWc1VG8/edit?usp=sharing

Soon I run the command below, the io grows from 0.0% to 67%.

curl -X POST 'http://localhost:8086/db/metrics/series?u=root&p=root&time_precision=s' -d @log_json.txt

@jvshahid
Copy link
Contributor

It looks like the filesystem is doing a lot of io. This is probably due to the fact that we're breaking up the request per time series (we have a story to fix this #310) and the fact that the default config flushes the wal to disk on every request. So currently with your config, you're flushing to disk 3000 times per reqeust. To verify my theory, can you change your config to match the following:

flush-after = 10000 # the number of writes after which wal will be flushed, 0 for flushing on every write
bookmark-after = 10000 # the number of writes after which a bookmark will be created

# the number of writes after which an index entry is created pointing
# to the offset of the first request, default to 1k
index-after = 10000

# the number of requests per one log file, if new requests came in a
# new log file will be created
requests-per-logfile = 100000

@samuraraujo
Copy link
Author

Now it works!!! Very fast!

Congrats!

Can you point me the documentation that explain better these parameters?

@jvshahid
Copy link
Contributor

Improving the documentation is on the top of our todo list, we'll update them in the next few weeks. Currently we have very poor documentation when it comes to configuration options. Actually, no documentation other than what you get in the config file itself.

@jvshahid
Copy link
Contributor

I'll go ahead and close this issue, since it's been resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants