High memory usage 0.12.2 #6513

freeseacher · 2016-04-29T17:45:08Z

Bug report

System info: [ version 0.12.2, rhel 7.2]

Steps to reproduce:

start system
wait for system started that's is about 4 minutes.
start pooler.
got

ps wavx | grep influx

6940 ? Sl 25:19 109030 5031 120035932 *74729508* 75.6 /usr/bin/influxd -config /etc/influxdb/influxdb.conf

Expected behavior: memory usage under 20G
Actual behavior: memory usage above 75G
with
go tool pprof /bin/influxd http://localhost:8086/debug/pprof/heap
got result in file
usage.zip
my config is just like default one
config.zip

stats and others
stats.zip
logs.zip

free -m
              total        used        free      shared  buff/cache   available
Mem:          96507       82178       13470          96         858       13564
Swap:         54271       16075       38196

Please help to understand why memory usage so extremely high.

The text was updated successfully, but these errors were encountered:

jwilder · 2016-04-29T17:57:12Z

What does free -m show?

jwilder · 2016-04-29T18:10:26Z

It looks like you have about 5M series. Are you running any queries? If so, what are they?

freeseacher · 2016-04-29T18:14:50Z

no. i just adding new metrics.
after several minutes of running pooler i start got such lines in logs

2016-04-29 21:14:14,939 [discovery] Failed to spool collected metrics to 10.36.129.4:8086: HTTP 599: Failed connect to 10.36.129.4:8086; Connection refused

jwilder · 2016-04-29T18:19:33Z

Leaking connections possibly? Are the your writes batched? How big are the batches? How many writers?

freeseacher · 2016-04-29T18:19:52Z

oh i see. in that moment i have a stacktrace in my influx logs
logs.txt

jwilder · 2016-04-29T18:23:36Z

What is your writer process [discovery]? If it's written in go and use the v2 client, there was a bug about leaked connections that was recently fixed.

freeseacher · 2016-04-29T18:29:24Z

no. Its written in python and i have lots of them.
4 servers with each with 18 process with 187 threads. not every thread works with metrics of cause.
i am unsure about batch size. but of cause it is batched.

i will provide information about batch size shorly

dvolodin7 · 2016-04-30T08:50:40Z

discovery process uses libcurl to spool the metrics. Each discovery process (72 at all) spools all collected metrics every 250ms in a single batch.

At this time we collect ~100 metrics from 50k objects every 300 seconds and have the plans to increase amount of objects up to 250k

So expected size of the batch will be about 4200 lines

freeseacher · 2016-05-01T19:01:51Z

ok. looks like we found and fixed problem with our discovery process.

i am unsure weather i have to open another issue or not.
but problem looks the same. high memory usage.

but probably the reason is different.
the_log.txt
it looks like while compacting the database influx eats all the memory

then after restart. it begun to compact db again, at the moment influx is still not able to accept metrics. So

restart time is awful
compact db east all the memory.

i think the first problem is in architecture and can't be fixed in soon, but can i handle the second one somehow ?

jwilder · 2016-05-02T15:46:43Z

The restart time is a known issue with some datasets. Issue to follow is: #6250. There is lock contention when reloading the in-memory index that slows some datasets down a lot.

For the compactions, are you overwriting points or writing to series in the past? Can you provide some sample data that you're writing?

whitelynx · 2016-05-02T15:58:47Z

Possible duplicate of #6243?

jwilder · 2016-05-03T17:25:13Z

@freeseacher When your heap starts to spike, would you be able to grab a snapshot of the heap and goroutines and attach the output here?

curl -o heap.txt http://localhost:8086/debug/pprof/heap?debug=1
curl -o goroutine.txt http://localhost:8086/debug/pprof/goroutine?debug=1

jwilder · 2016-05-13T21:30:43Z

#6618 should help start up time some.

@freeseacher Are you able to grab a heap and profile using the commands above when you heap starts to spike?

jwilder · 2016-06-01T16:44:54Z

Should be fixed via #6653 and #6743. If there are still issues, please let us know.

jwilder added the area/performance label Apr 30, 2016

jwilder added the area/tsm label May 2, 2016

jwilder added this to the 1.0.0 milestone May 3, 2016

jwilder modified the milestones: 1.0.0 beta, 1.0.0 May 26, 2016

jwilder closed this as completed Jun 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High memory usage 0.12.2 #6513

High memory usage 0.12.2 #6513

freeseacher commented Apr 29, 2016 •

edited

Loading

jwilder commented Apr 29, 2016

jwilder commented Apr 29, 2016

freeseacher commented Apr 29, 2016

jwilder commented Apr 29, 2016

freeseacher commented Apr 29, 2016

jwilder commented Apr 29, 2016

freeseacher commented Apr 29, 2016 •

edited

Loading

dvolodin7 commented Apr 30, 2016 •

edited

Loading

freeseacher commented May 1, 2016

jwilder commented May 2, 2016

whitelynx commented May 2, 2016

jwilder commented May 3, 2016

jwilder commented May 13, 2016

jwilder commented Jun 1, 2016

High memory usage 0.12.2 #6513

High memory usage 0.12.2 #6513

Comments

freeseacher commented Apr 29, 2016 • edited Loading

Bug report

jwilder commented Apr 29, 2016

jwilder commented Apr 29, 2016

freeseacher commented Apr 29, 2016

jwilder commented Apr 29, 2016

freeseacher commented Apr 29, 2016

jwilder commented Apr 29, 2016

freeseacher commented Apr 29, 2016 • edited Loading

dvolodin7 commented Apr 30, 2016 • edited Loading

freeseacher commented May 1, 2016

jwilder commented May 2, 2016

whitelynx commented May 2, 2016

jwilder commented May 3, 2016

jwilder commented May 13, 2016

jwilder commented Jun 1, 2016

freeseacher commented Apr 29, 2016 •

edited

Loading

freeseacher commented Apr 29, 2016 •

edited

Loading

dvolodin7 commented Apr 30, 2016 •

edited

Loading