Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage #5440

Closed
tahmasebi opened this issue Jan 25, 2016 · 38 comments
Closed

High memory usage #5440

tahmasebi opened this issue Jan 25, 2016 · 38 comments

Comments

@tahmasebi
Copy link

Hi,
I install Influxdb on ubuntu server with 4G of memory and use python's requests module to write 10M points into db.
The python script inserts 20k points per second into db successfully but ubuntu's memory goes high until Influxdb use 97% of Ram.
After that I can't query a simple request like "select * from srcipm limit 1".
Even after write process is finished, Influxdb doesn't release memory.

Details:

  • Server: 64bit Ubuntu Server 15.10, 1 Core CPU, 4G RAM
  • InfluxDB Version 0.9.6

Write query in python:

requests.post("http://192.168.1.104:8086/write?db=mydb", "srcipm,device_id=4,input_snmp=2,output_snmp=3,direction=1,ip=192.168.1.1 bytes=25478")

Where is the problem? should I config something or my query is wrong?

@desa
Copy link
Contributor

desa commented Jan 25, 2016

@tahmasebi what do you mean by I can't query a simple request like "select * from srcipm limit 1".? Does the query not return? Does the instance OOM?

@tahmasebi
Copy link
Author

@mjdesa, The query execution takes a lot of time, one time, it didn't return anything for 30 minutes, then i cancel the query by ctr+c. even after restarting Influxdb service manually (because of memory usage), the query returns result very slow and memory is full again.

@desa
Copy link
Contributor

desa commented Jan 25, 2016

How many cores does the machine have?

@tahmasebi
Copy link
Author

1 Core, for 2 or 3 seconds cpu usage goes 100% and then return to under 5% when i request a query. the problem I can see in htop is memory usage of influxdb by 97-8%.

@desa
Copy link
Contributor

desa commented Jan 25, 2016

How many unique series are you writing to the database?

@tahmasebi
Copy link
Author

How should I get number of series? I'm new in Influxdb.
I don't have access to machine now, I should check it tomorrow at company. I'm sorry.

@desa
Copy link
Contributor

desa commented Jan 25, 2016

No need to be sorry. :)

The query show series will give you all of the series the are stored in a database.

@tahmasebi
Copy link
Author

@mjdesa , I use show series to find number of series in my db but it displays a list of all series. so I used python to count number of series in my db like this:

SHOW SERIES FROM srcipm
len(result['results'][0]['series'][0]['values'])

and I get 153K of series. Is it too many series for 10M points?

@desa
Copy link
Contributor

desa commented Jan 26, 2016

@tahmasebi No, you should be well within your limits. We've had problems with 0.9.6 in the past consuming too much memory. Can you run the test against 0.10.0beta2 and see if Influx behave more stably

https://influxdb.s3.amazonaws.com/influxdb_0.10.0-0.beta2_amd64.deb
https://influxdb.s3.amazonaws.com/influxdb-0.10.0-0.beta2.x86_64.rpm

@tahmasebi
Copy link
Author

Sure, I'll test this version of Influxdb and I'll inform you tomorrow. (It's time difference :) )
Thank you.

@tahmasebi
Copy link
Author

Hi @mjdesa ,
I removed influxdb from ubuntu by apt-get remove influxdb and then install that package you gave me above.

Database from previous version exist yet, but when I execute a query on old db, it returns an error ERR: read message type: read tcp 127.0.0.1:8088: i/o timeout.
So I create a new db and write 10M point to it. the write process was very slow than before (6k/s) and this time the memory goes up to 74%.

I take a screenshot of htop and iotop for more details.

capture

@lpc921
Copy link

lpc921 commented Feb 25, 2016

Try reduce cache-snapshot-memory-size and cache-snapshot-write-cold-duration in the configuration.
I set mine cache-snapshot-memory-size = 2621440 and cache-snapshot-write-cold-duration = "1m" during massive writes.

@fffw
Copy link

fffw commented Mar 4, 2016

I resolved ERR: read message type: read tcp 127.0.0.1:8088: i/o timeout. by increasing shard-mapper-timeout under [cluster] section, FYI

@desa
Copy link
Contributor

desa commented Mar 4, 2016

@tahmasebi Are you still experiencing this problem?

@adilbaig
Copy link

I am brand new to influxdb and ran into this problem.

I started by inserting 6928 price points for one stock. Ex of data points is such:

INSERT price,ric=VOD.L,open=12.3,close=12.3,volume=1200 dummy=1 12345644687

Then i did a query like this :

SELECT ric, close, volume FROM price WHERE ric = 'VOD.L' LIMIT 1;

This killed my laptop by consuming all 16Gb of RAM.

@dy1901
Copy link

dy1901 commented Jun 12, 2016

I'm making a similar stock system like @adilbaig, and counting just the same problem with InfluxDB 0.13.0.

When I'm tried to insert about 4k trading points all at once using batch insert via http api, InfluxDB just took all of my 8G memory and all swap memory, which made the whole system deadly slow.

Is there a way to limit the total memory usage?

My OS is Ubuntu 16.04 LTS

@desa
Copy link
Contributor

desa commented Jun 13, 2016

@adilbaig In the example point you listed

price,ric=VOD.L,open=12.3,close=12.3,volume=1200 dummy=1 12345644687

ric, open, close, and volume are all tags. Meaning that both key and value will be treated as strings. Additionally they will be index. And presumably will have an absurdly high cardinality. This is most likely why you're seeing so much memory used. Try changing open, close, and volume to be fields instead.

@desa desa closed this as completed Jun 13, 2016
@desa desa reopened this Jun 13, 2016
@desa
Copy link
Contributor

desa commented Jun 13, 2016

@dy1901 What does your schema look like?

@DavidSoong128
Copy link

what is the result now about the question? , if someone know that,please tell me, thank you

@desa
Copy link
Contributor

desa commented Jun 20, 2016

@DavidSoong128 what specifically would you like to know?

@DavidSoong128
Copy link

@mjdesa Thank you for your reply.
i install influxdb V0.13.0 on linux server with 16G of memory and use java's requests to write some points into db. but memory is continuously reduced, even consumed 11G,then , the only operation what i can do is just restart influxdb, i think it is the same problem @dy1901

@desa
Copy link
Contributor

desa commented Jun 21, 2016

@DavidSoong128
I have a few more questions:

  • Can you give some examples of the data that you're writing in line protocol?
  • About how long was InfluxDB running before you hit this problem?
  • Did you have any queries running when you hit this problem? If so, what were they?

@DavidSoong128
Copy link

DavidSoong128 commented Jun 22, 2016

  1. first question: 
    
        point like this :  tps is the measurement, appName, serviceName are tags,  tpsAmount, currency are fields .  
        example:  tps, order, addOrder   2000, 300  1466557284000 
    
  2. second question:
    
       the memory  will be reduced once i insert  data to db, more data,consume more memory
    
  3. third question:
    
       there have no any queries, i just insert  some datas for a test
    
@Test(enabled = true)
    public void maxWritePointsPerformance() {
        String dbName = "d";
        this.influxDB.createDatabase(dbName);
        this.influxDB.enableBatch(100000, 60, TimeUnit.SECONDS);

        Stopwatch watch = Stopwatch.createStarted();
        for (int i = 0; i < 2000000; i++) {
            Point point = Point.measurement("s").addField("v", 1.0).build();
            this.influxDB.write(dbName, "default", point);
        }
        System.out.println("5Mio points:" + watch);
        this.influxDB.deleteDatabase(dbName);
    }

this test code is copied from https://github.com/influxdata/influxdb-java, and will
cause the same problem

@nickjones
Copy link

I'm having a similar problem with memory usage growing to consuming the entire machine and never really dropping back. There are three main data streams of incoming data using the InfluxDB Go client to different databases:

  • 50 points/sec in batches of 1000
  • Telegraf data at 10s intervals for system metrics of a few hosts
  • 84 points/sec in batches of 10000

Occasionally, a Grafana client will pull 5-20 queries to draw charts but it isn't a constant request rate. We're recently starting to consume considerable swap space. The node has 128GB (usually between 80-98% of memory used) and has been up for 30d.

@carbolymer
Copy link

carbolymer commented Jul 17, 2016

I'm having the exact issues as everyone here. During inserts my memory usage jumps over 8G and then influx throws error about memory allocation failure (my VM is limited to ~8GB RAM). I've tried settings proposed by @lpc921 in his post here:

  cache-snapshot-memory-size = 262144
  cache-snapshot-write-cold-duration = "0h1m0s"
  compact-full-write-cold-duration = "0h1m0s"

It didn't change a thing.
I am using your official docker image: influxdb:0.13-alpine

Guys, seriously. This issue has been open for half a year. For me this is a critical issue which rules out usage of influx on a production environments.

EDIT:
I've prepared docker containers which demonstrate this bug in influxdb 0.13. You can find them here: https://github.com/carbolymer/influxdb-large-memory-proof
EDIT2:
The same happens for the 1.0.0-beta2-alpine image.
EDIT3:
Many thanks to @jwilder for the advice! The latest commit on master: https://github.com/carbolymer/influxdb-large-memory-proof contains working solution.

@jwilder
Copy link
Contributor

jwilder commented Jul 18, 2016

@carbolymer It looks like you have sparse data (stock prices) which ends up creating hundreds of small shards. In your docker sample, I'd recommend increasing the shard group duration on your default retention policy after creating the databases.

For example, running the following before writing data:

alter retention policy default on no_memory shard group duration 520w

will change the shard group duration to 10y which should reduce the number of shards from ~1500 to 4.

I would also suggest setting cache-snapshot-write-cold-duration = "10s". You should not need to change compact-full-write-cold-duration or cache-snapshot-memory-size from the default values though.

@adampl
Copy link

adampl commented Jul 18, 2016

@carbolymer My server has 48 GB of RAM and 24 high-end CPUs, but it's still not enough for InfluxDB (having 30 GB RAM limit) with just several tens of thousands of daily series (several GB of data). Some queries end after a timeout, other end because Influx reaches the memory limit and crashes. I'm beginning to deeply regret that choice... I have no idea what to do now.

@jwilder
Copy link
Contributor

jwilder commented Jul 18, 2016

@adampl What kind of data are your writing and what is writing it? I suspect you have an issue with your schema, but grabbing some profiles when memory is high would help to diagnose:

curl -o heap.txt "localhost:8086/debug/pprof/heap?debug=2"
curl -o goroutine.txt "localhost:8086/debug/pprof/goroutine?debug=2"
curl -o block.txt "localhost:8086/debug/pprof/block?debug=2"

Also, can you attach the output of the following:

influx -execute "show shards" > shards.txt
influx -execute "show stats" > stats.txt
influx -execute "show diagnostics" > diagnostics.txt

@adampl
Copy link

adampl commented Jul 18, 2016

@jwilder In my case the data is very sparse - just one point a day in each series - so I've dropped the entire database in order to try that trick with shard duration, and now the data is being loaded again. If the timeouts and crashes don't disappear, I'll provide you with the diagnostics.

@adampl
Copy link

adampl commented Jul 19, 2016

@jwilder Increasing shard duration indeed helped (set to 1000w) - now I don't get OOMs as it takes "only" 5 GB and doesn't go up.

Still, requests covering all of the measurement's data take long to complete (15 seconds) - much longer than simply reading all of the measurement's rows from text file, filtering them by tags and aggregating in Python on a single process (2 seconds).

@jwilder
Copy link
Contributor

jwilder commented Jul 19, 2016

@adampl What version are you running?

@adampl
Copy link

adampl commented Jul 19, 2016

Version 0.13 on CentOS 7

@jwilder
Copy link
Contributor

jwilder commented Jul 19, 2016

@adampl I'd suggest upgrading to the 1.0beta3 release or latest nightly. There are many query optimization since 0.13.

@adampl
Copy link

adampl commented Jul 20, 2016

Ok, I'll give it a try. Meanwhile, please look into #6994 which is a very serious functional bug IMHO.

@carbolymer
Copy link

@jwilder Many thanks! It helped.

@DavidSoong128
Copy link

@carbolymer Have you solved the problem? please tell me some solutions, thank you

@carbolymer
Copy link

@DavidSoong128, yes. This worked for me: #5440 (comment)

You can find working configuration in the latest commit on master on https://github.com/carbolymer/influxdb-large-memory-proof

@jwilder jwilder closed this as completed Aug 4, 2016
@DavidSoong128
Copy link

@carbolymer ok, thank you for your reply, i will do some tests。 If there is any question, ask again

gmauro added a commit to usegalaxy-eu/infrastructure-playbook that referenced this issue Jun 9, 2020
picked some suggestions from influxdata/influxdb#5440 to try to reduce high memory usage from influxdb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests