Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory consumption #6015

Closed
bogdanveres opened this issue Mar 15, 2016 · 18 comments
Closed

High memory consumption #6015

bogdanveres opened this issue Mar 15, 2016 · 18 comments

Comments

@bogdanveres
Copy link

I have a Core i5 server with 8GB RAM, 31 devices that are inserting data in Influx via CURL.
(curl -i -XPOST http://192.168.122.189:8086/write?db=HWSTATS --data-binary "PC_01,component=TestCase value=1") each device will send 34 POST requests at a 1 minute rate. That means 34 requests * 60 minutes = 2040 requests/hour (this value is per device - 31 devices * 2040 requests = 63240 requests/hour from al devices). I can't drop the database because I need all the data. The retention policy is set to infinite. Is there a way to reduce the memory consumption?

Here is a screenshot from htop:
htop

I already restarted the Influx service to clean the memory, also deleted the WAL folder but the memory consumption is still high.

Some data:
sudo du -sh /var/lib/influxdb/wal/
10M
sudo du -sh /var/lib/influxdb
406M
sudo du -sh /var/lib/influxdb/data
495M

Here is the config file.
influxdb.zip

The only changes that I've made to the default config file are:
cache-snapshot-memory-size = 26214400 and
cache-snapshot-write-cold-duration = "1m"

Is there a way to clean the memory? I'm also using Grafana with 31 dashboards (one for each device). The refresh rate for each dashboard is set to 1 minute.

InfluxDB v0.10.0

@zstyblik
Copy link

@bogdanveres, please, provide version of InfluxDB as it will be useful to devs.

@bogdanveres
Copy link
Author

@zstyblik InfluxDb version was added. Thank you!

@jonseymour
Copy link
Contributor

@bogdanveres does the memory usage change if you temporarily shutdown the grafana dashboards? (I am wondering whether the memory usage is related to the grafana query load).

@bogdanveres
Copy link
Author

@jonseymour Grafana is turned off. Memory consumption is at 3807 MB. There is only one device that is inserting data into influx right now. I also restarted the influx service but there are no changes. Also changed in config file the following value:

cache-max-memory-size = 524288000

@jonseymour
Copy link
Contributor

So from 5526MB to 3807MB when Grafana was switched off? This still does seem quite high. How many points are being logged per request? My environment is logging 75 points per second which is about 1 point per device per second however I have a batching process in front of influx that batches the writes together so the number of requests per second would be a lot less. My server typically sits at around 525MB RES. It only ever climbs in to the 2-3 GB if I run outrageous queries that scan the entire time range.

You might also consider using the memory stats in the _internal runtime measurement to see if you can correlate increases in memory usage with particular load profiles.

@jonseymour
Copy link
Contributor

Also this setting is probably a bit low - cache-snapshot-write-cold-duration = "1m" it is going to be causing the snapshot algorithm to be working quite hard (e.g. doing 60x more work than it normally would) and then creating a lot of secondary compaction load.

0.10.0 has a few thread safety issues in that part of the code that have been fixed in v0.10.3 which might be worth upgrading to, if you can, although I can't think of anything between v0.10.0 and v0.10.3 which will have a substantial impact on memory footprint.

@bogdanveres
Copy link
Author

@jonseymour the memory consumption reduced when I turned off the devices. Right now I have only two devices that are running and inserting data into Influx. At the moment the memory consumption is at 5196 MB. I also updated to v0.10.3. Grafana is started but I turned off the auto refresh feature for all dashboards. I kept the config file from the v0.10.1. Is there something different? Thank you.

@jonseymour
Copy link
Contributor

@bogdanveres There were no config changes between v0.10.1 and v0.10.3

Can you share the output of this query run against the _internal database.

select * from runtime where time > now() - 15m

Could point you grafana at the _internal.runtime measurement and grab a graph of the HeapInUse statistic over time?

Also, do you have any continuous queries defined?

@bogdanveres
Copy link
Author

@jonseymour I don't have continuous queries defined.
Here is a snapshot for heap in use from Grafana:

https://snapshot.raintank.io/dashboard/snapshot/kcFgJ0GF1X36nfrG69f6gknHk4ohmXgr

_internal.zip

The results for the query that you suggested are attached (file was saved using notepad++). Thank you again.

@jonseymour
Copy link
Contributor

I am sorry, @bogdanveres, I am still at loss to explain why your server is using so much RAM when it is in a relatively idle state.

Are you able to disconnect all devices and all grafana dashboards and restart the server? The objective of this experiment would be to understand the amount of memory used when the server is in a completely idle rest state. Could you please attach the logs captured from the server during the restart, a grafana screen cap of the HeapInUse statistic covering the restart period and also the output of the following curl command?

curl -s http://localhost:8086/debug/vars

@bogdanveres
Copy link
Author

@jonseymour here is a snapshot from grafana that covers the restart sequence:

https://snapshot.raintank.io/dashboard/snapshot/zAnBenTppEPQn5Hqt97dRhKRYjwqdRhC

Log files and output for the curl command are attached. All devices were turned off. Please let me know if you need other log files.
LOG_Files.zip

@jonseymour
Copy link
Contributor

Thanks for those - can you also send me the influx server log (e.g the output of the influxd process)?

@bogdanveres
Copy link
Author

@jonseymour Influx server log files were added. I also changed the way how the data is inserted into Influx. Now we use this library https://github.com/AdysTech/InfluxDB.Client.Net I don't know if this is changing something. Thanks.

L.E. I've done some investigations and the memory consumption starts to increase when I navigate trough Grafana dashboards. Maybe this will help you.

Influx_logs.zip

@jonseymour
Copy link
Contributor

@bogdanveres

Can you tell me how many points are written by each request? Is it only one?

My view is that the issues you are experiencing are caused by the relatively low number of points being written per write request. From my analysis of the 0317 logs, there are typically up to 68 requests active at one time. Each of these will be consuming some server resources.

If you can find a a way to batch the points up so that there a larger number of points written per write request, the number of concurrent requests being processed by the server will be reduced and this should reduce the memory footprint substantially.

@bogdanveres
Copy link
Author

@jonseymour
Yesterday I created a backup of our database. After backup process was completed I dropped the database. The memory consumption started to decrease after this step. I also changed the way how the data is inserted in database. Now data is inserted in batch. The problem is that I have a lot of dashboards in Grafana and the memory consumption starts to increase when I'm looking over some graphs and I'm changing the date range. When I'm out of memory I'm not able to insert new data into database. I don't understand why the memory is not released when graphs are not generated (refresh rate in Grafana is OFF). Current HW stats from influx machine:

Mem: 305/7839 MB (data is inserted from 11 devices - 2 request/min from 1 device | 1320 requests/hour from 11 devices)

sudo du -sh /var/lib/influxdb/wal/
38MB
sudo du -sh /var/lib/influxdb/
52MB
sudo du -sh /var/lib/influxdb/data/
14MB

Retention policy: inifinite.
No continuous queries.
For each device I have a dashboard in Grafana.
InfluxDB v0.10.3

@joelegasse
Copy link
Contributor

@bogdanveres Are you able to try this with the 0.11 release candidate? The query engine has been replaced since 0.10, and might help in your scenario.

@bogdanveres
Copy link
Author

@joelegasse I'll try to update on Friday. Right now we have some tests in progress and I can't stop the execution. I also asked our it guys to upgrade the server memory to 32 GB. The config file in 0.11 is different from 0.10? Thanks for your support.

@bogdanveres
Copy link
Author

Memory consumption dropped to 400 mb using the latest version 0.12.0. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants