High memory usage #5440

tahmasebi · 2016-01-25T08:58:42Z

Hi,
I install Influxdb on ubuntu server with 4G of memory and use python's requests module to write 10M points into db.
The python script inserts 20k points per second into db successfully but ubuntu's memory goes high until Influxdb use 97% of Ram.
After that I can't query a simple request like "select * from srcipm limit 1".
Even after write process is finished, Influxdb doesn't release memory.

Details:

Server: 64bit Ubuntu Server 15.10, 1 Core CPU, 4G RAM
InfluxDB Version 0.9.6

Write query in python:

requests.post("http://192.168.1.104:8086/write?db=mydb", "srcipm,device_id=4,input_snmp=2,output_snmp=3,direction=1,ip=192.168.1.1 bytes=25478")

Where is the problem? should I config something or my query is wrong?

The text was updated successfully, but these errors were encountered:

desa · 2016-01-25T18:00:46Z

@tahmasebi what do you mean by I can't query a simple request like "select * from srcipm limit 1".? Does the query not return? Does the instance OOM?

tahmasebi · 2016-01-25T18:48:19Z

@mjdesa, The query execution takes a lot of time, one time, it didn't return anything for 30 minutes, then i cancel the query by ctr+c. even after restarting Influxdb service manually (because of memory usage), the query returns result very slow and memory is full again.

desa · 2016-01-25T18:54:56Z

How many cores does the machine have?

tahmasebi · 2016-01-25T18:59:03Z

1 Core, for 2 or 3 seconds cpu usage goes 100% and then return to under 5% when i request a query. the problem I can see in htop is memory usage of influxdb by 97-8%.

desa · 2016-01-25T19:07:27Z

How many unique series are you writing to the database?

tahmasebi · 2016-01-25T19:14:34Z

How should I get number of series? I'm new in Influxdb.
I don't have access to machine now, I should check it tomorrow at company. I'm sorry.

desa · 2016-01-25T19:19:02Z

No need to be sorry. :)

The query show series will give you all of the series the are stored in a database.

tahmasebi · 2016-01-26T06:24:33Z

@mjdesa , I use show series to find number of series in my db but it displays a list of all series. so I used python to count number of series in my db like this:

SHOW SERIES FROM srcipm
len(result['results'][0]['series'][0]['values'])

and I get 153K of series. Is it too many series for 10M points?

desa · 2016-01-26T17:27:57Z

@tahmasebi No, you should be well within your limits. We've had problems with 0.9.6 in the past consuming too much memory. Can you run the test against 0.10.0beta2 and see if Influx behave more stably

https://influxdb.s3.amazonaws.com/influxdb_0.10.0-0.beta2_amd64.deb
https://influxdb.s3.amazonaws.com/influxdb-0.10.0-0.beta2.x86_64.rpm

tahmasebi · 2016-01-26T19:11:42Z

Sure, I'll test this version of Influxdb and I'll inform you tomorrow. (It's time difference :) )
Thank you.

tahmasebi · 2016-01-27T08:11:45Z

Hi @mjdesa ,
I removed influxdb from ubuntu by apt-get remove influxdb and then install that package you gave me above.

Database from previous version exist yet, but when I execute a query on old db, it returns an error ERR: read message type: read tcp 127.0.0.1:8088: i/o timeout.
So I create a new db and write 10M point to it. the write process was very slow than before (6k/s) and this time the memory goes up to 74%.

I take a screenshot of htop and iotop for more details.

lpc921 · 2016-02-25T03:39:26Z

Try reduce cache-snapshot-memory-size and cache-snapshot-write-cold-duration in the configuration.
I set mine cache-snapshot-memory-size = 2621440 and cache-snapshot-write-cold-duration = "1m" during massive writes.

fffw · 2016-03-04T04:44:50Z

I resolved ERR: read message type: read tcp 127.0.0.1:8088: i/o timeout. by increasing shard-mapper-timeout under [cluster] section, FYI

desa · 2016-03-04T23:06:36Z

@tahmasebi Are you still experiencing this problem?

adilbaig · 2016-04-12T13:07:50Z

I am brand new to influxdb and ran into this problem.

I started by inserting 6928 price points for one stock. Ex of data points is such:

INSERT price,ric=VOD.L,open=12.3,close=12.3,volume=1200 dummy=1 12345644687

Then i did a query like this :

SELECT ric, close, volume FROM price WHERE ric = 'VOD.L' LIMIT 1;

This killed my laptop by consuming all 16Gb of RAM.

dy1901 · 2016-06-12T03:27:04Z

I'm making a similar stock system like @adilbaig, and counting just the same problem with InfluxDB 0.13.0.

When I'm tried to insert about 4k trading points all at once using batch insert via http api, InfluxDB just took all of my 8G memory and all swap memory, which made the whole system deadly slow.

Is there a way to limit the total memory usage?

My OS is Ubuntu 16.04 LTS

desa · 2016-06-13T15:34:36Z

@adilbaig In the example point you listed

price,ric=VOD.L,open=12.3,close=12.3,volume=1200 dummy=1 12345644687

ric, open, close, and volume are all tags. Meaning that both key and value will be treated as strings. Additionally they will be index. And presumably will have an absurdly high cardinality. This is most likely why you're seeing so much memory used. Try changing open, close, and volume to be fields instead.

desa · 2016-06-13T15:35:07Z

@dy1901 What does your schema look like?

DavidSoong128 · 2016-06-20T08:00:25Z

what is the result now about the question? , if someone know that,please tell me, thank you

desa · 2016-06-20T15:17:45Z

@DavidSoong128 what specifically would you like to know?

DavidSoong128 · 2016-06-21T01:20:48Z

@mjdesa Thank you for your reply.
i install influxdb V0.13.0 on linux server with 16G of memory and use java's requests to write some points into db. but memory is continuously reduced, even consumed 11G，then , the only operation what i can do is just restart influxdb， i think it is the same problem @dy1901

desa · 2016-06-21T15:10:18Z

@DavidSoong128
I have a few more questions:

Can you give some examples of the data that you're writing in line protocol?
About how long was InfluxDB running before you hit this problem?
Did you have any queries running when you hit this problem? If so, what were they?

DavidSoong128 · 2016-06-22T01:02:42Z

first question: 

    point like this :  tps is the measurement, appName, serviceName are tags,  tpsAmount, currency are fields .  
    example:  tps, order, addOrder   2000, 300  1466557284000

second question:

   the memory  will be reduced once i insert  data to db, more data，consume more memory

third question:

   there have no any queries, i just insert  some datas for a test

@Test(enabled = true)
    public void maxWritePointsPerformance() {
        String dbName = "d";
        this.influxDB.createDatabase(dbName);
        this.influxDB.enableBatch(100000, 60, TimeUnit.SECONDS);

        Stopwatch watch = Stopwatch.createStarted();
        for (int i = 0; i < 2000000; i++) {
            Point point = Point.measurement("s").addField("v", 1.0).build();
            this.influxDB.write(dbName, "default", point);
        }
        System.out.println("5Mio points:" + watch);
        this.influxDB.deleteDatabase(dbName);
    }

this test code is copied from https://github.com/influxdata/influxdb-java, and will
cause the same problem

nickjones · 2016-07-15T20:42:00Z

I'm having a similar problem with memory usage growing to consuming the entire machine and never really dropping back. There are three main data streams of incoming data using the InfluxDB Go client to different databases:

50 points/sec in batches of 1000
Telegraf data at 10s intervals for system metrics of a few hosts
84 points/sec in batches of 10000

Occasionally, a Grafana client will pull 5-20 queries to draw charts but it isn't a constant request rate. We're recently starting to consume considerable swap space. The node has 128GB (usually between 80-98% of memory used) and has been up for 30d.

carbolymer · 2016-07-17T09:38:08Z

I'm having the exact issues as everyone here. During inserts my memory usage jumps over 8G and then influx throws error about memory allocation failure (my VM is limited to ~8GB RAM). I've tried settings proposed by @lpc921 in his post here:

  cache-snapshot-memory-size = 262144
  cache-snapshot-write-cold-duration = "0h1m0s"
  compact-full-write-cold-duration = "0h1m0s"

It didn't change a thing.
I am using your official docker image: influxdb:0.13-alpine

Guys, seriously. This issue has been open for half a year. For me this is a critical issue which rules out usage of influx on a production environments.

EDIT:
I've prepared docker containers which demonstrate this bug in influxdb 0.13. You can find them here: https://github.com/carbolymer/influxdb-large-memory-proof
EDIT2:
The same happens for the 1.0.0-beta2-alpine image.
EDIT3:
Many thanks to @jwilder for the advice! The latest commit on master: https://github.com/carbolymer/influxdb-large-memory-proof contains working solution.

jwilder · 2016-07-18T05:30:16Z

@carbolymer It looks like you have sparse data (stock prices) which ends up creating hundreds of small shards. In your docker sample, I'd recommend increasing the shard group duration on your default retention policy after creating the databases.

For example, running the following before writing data:

alter retention policy default on no_memory shard group duration 520w

will change the shard group duration to 10y which should reduce the number of shards from ~1500 to 4.

I would also suggest setting cache-snapshot-write-cold-duration = "10s". You should not need to change compact-full-write-cold-duration or cache-snapshot-memory-size from the default values though.

adampl · 2016-07-18T13:29:24Z

@carbolymer My server has 48 GB of RAM and 24 high-end CPUs, but it's still not enough for InfluxDB (having 30 GB RAM limit) with just several tens of thousands of daily series (several GB of data). Some queries end after a timeout, other end because Influx reaches the memory limit and crashes. I'm beginning to deeply regret that choice... I have no idea what to do now.

jwilder · 2016-07-18T15:52:29Z

@adampl What kind of data are your writing and what is writing it? I suspect you have an issue with your schema, but grabbing some profiles when memory is high would help to diagnose:

curl -o heap.txt "localhost:8086/debug/pprof/heap?debug=2"
curl -o goroutine.txt "localhost:8086/debug/pprof/goroutine?debug=2"
curl -o block.txt "localhost:8086/debug/pprof/block?debug=2"

Also, can you attach the output of the following:

influx -execute "show shards" > shards.txt
influx -execute "show stats" > stats.txt
influx -execute "show diagnostics" > diagnostics.txt

adampl · 2016-07-18T17:15:15Z

@jwilder In my case the data is very sparse - just one point a day in each series - so I've dropped the entire database in order to try that trick with shard duration, and now the data is being loaded again. If the timeouts and crashes don't disappear, I'll provide you with the diagnostics.

adampl · 2016-07-19T18:59:41Z

@jwilder Increasing shard duration indeed helped (set to 1000w) - now I don't get OOMs as it takes "only" 5 GB and doesn't go up.

Still, requests covering all of the measurement's data take long to complete (15 seconds) - much longer than simply reading all of the measurement's rows from text file, filtering them by tags and aggregating in Python on a single process (2 seconds).

jwilder · 2016-07-19T19:38:47Z

@adampl What version are you running?

adampl · 2016-07-19T22:04:05Z

Version 0.13 on CentOS 7

jwilder · 2016-07-19T22:57:18Z

@adampl I'd suggest upgrading to the 1.0beta3 release or latest nightly. There are many query optimization since 0.13.

adampl · 2016-07-20T11:21:02Z

Ok, I'll give it a try. Meanwhile, please look into #6994 which is a very serious functional bug IMHO.

carbolymer · 2016-07-23T15:05:20Z

@jwilder Many thanks! It helped.

DavidSoong128 · 2016-08-04T10:04:01Z

@carbolymer Have you solved the problem？ please tell me some solutions, thank you

carbolymer · 2016-08-04T10:12:28Z

@DavidSoong128, yes. This worked for me: #5440 (comment)

You can find working configuration in the latest commit on master on https://github.com/carbolymer/influxdb-large-memory-proof

DavidSoong128 · 2016-08-05T10:07:35Z

@carbolymer ok, thank you for your reply, i will do some tests。 If there is any question, ask again

picked some suggestions from influxdata/influxdb#5440 to try to reduce high memory usage from influxdb

desa closed this as completed Jun 13, 2016

desa reopened this Jun 13, 2016

DavidSoong128 mentioned this issue Jun 20, 2016

[InfluxDB V0.13.0] High memory usage influxdata/influxdb-java#178

Closed

This was referenced Jul 17, 2016

error compacting TSM files: cannot allocate memory #6975

Closed

Memory issue on influxDB 0.11.1 #6762

Closed

carbolymer mentioned this issue Jul 17, 2016

[InfluxDB V0.12.0] High memory usage (RES, VIRT) observed #6718

Closed

jwilder closed this as completed Aug 4, 2016

code-decode mentioned this issue Aug 16, 2016

CPU usage spikes and db not usable under simple select queries from grafana #7160

Closed

gmauro added a commit to usegalaxy-eu/infrastructure-playbook that referenced this issue Jun 9, 2020

Update influxdb.yml

6ee74fa

picked some suggestions from influxdata/influxdb#5440 to try to reduce high memory usage from influxdb

High memory usage #5440

High memory usage #5440

Comments

tahmasebi commented Jan 25, 2016

desa commented Jan 25, 2016

tahmasebi commented Jan 25, 2016

desa commented Jan 25, 2016

tahmasebi commented Jan 25, 2016

desa commented Jan 25, 2016

tahmasebi commented Jan 25, 2016

desa commented Jan 25, 2016

tahmasebi commented Jan 26, 2016

desa commented Jan 26, 2016

tahmasebi commented Jan 26, 2016

tahmasebi commented Jan 27, 2016

lpc921 commented Feb 25, 2016

fffw commented Mar 4, 2016

desa commented Mar 4, 2016

adilbaig commented Apr 12, 2016

dy1901 commented Jun 12, 2016 • edited Loading

desa commented Jun 13, 2016

desa commented Jun 13, 2016

DavidSoong128 commented Jun 20, 2016

desa commented Jun 20, 2016

DavidSoong128 commented Jun 21, 2016

desa commented Jun 21, 2016

DavidSoong128 commented Jun 22, 2016 • edited Loading

nickjones commented Jul 15, 2016

carbolymer commented Jul 17, 2016 • edited Loading

jwilder commented Jul 18, 2016

adampl commented Jul 18, 2016

jwilder commented Jul 18, 2016

adampl commented Jul 18, 2016

adampl commented Jul 19, 2016

jwilder commented Jul 19, 2016

adampl commented Jul 19, 2016

jwilder commented Jul 19, 2016

adampl commented Jul 20, 2016

carbolymer commented Jul 23, 2016

DavidSoong128 commented Aug 4, 2016

carbolymer commented Aug 4, 2016

DavidSoong128 commented Aug 5, 2016

dy1901 commented Jun 12, 2016 •

edited

Loading

DavidSoong128 commented Jun 22, 2016 •

edited

Loading

carbolymer commented Jul 17, 2016 •

edited

Loading