Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage 0.12.2 #6513

Closed
freeseacher opened this issue Apr 29, 2016 · 14 comments
Closed

High memory usage 0.12.2 #6513

freeseacher opened this issue Apr 29, 2016 · 14 comments

Comments

@freeseacher
Copy link

freeseacher commented Apr 29, 2016

Bug report

System info: [ version 0.12.2, rhel 7.2]

Steps to reproduce:

  1. start system
  2. wait for system started that's is about 4 minutes.
  3. start pooler.
  4. got

ps wavx | grep influx

6940 ? Sl 25:19 109030 5031 120035932 *74729508* 75.6 /usr/bin/influxd -config /etc/influxdb/influxdb.conf

Expected behavior: memory usage under 20G
Actual behavior: memory usage above 75G
with
go tool pprof /bin/influxd http://localhost:8086/debug/pprof/heap
got result in file
usage.zip
my config is just like default one
config.zip

stats and others
stats.zip
logs.zip

free -m
              total        used        free      shared  buff/cache   available
Mem:          96507       82178       13470          96         858       13564
Swap:         54271       16075       38196

Please help to understand why memory usage so extremely high.

@jwilder
Copy link
Contributor

jwilder commented Apr 29, 2016

What does free -m show?

@jwilder
Copy link
Contributor

jwilder commented Apr 29, 2016

It looks like you have about 5M series. Are you running any queries? If so, what are they?

@freeseacher
Copy link
Author

no. i just adding new metrics.
after several minutes of running pooler i start got such lines in logs

2016-04-29 21:14:14,939 [discovery] Failed to spool collected metrics to 10.36.129.4:8086: HTTP 599: Failed connect to 10.36.129.4:8086; Connection refused

@jwilder
Copy link
Contributor

jwilder commented Apr 29, 2016

Leaking connections possibly? Are the your writes batched? How big are the batches? How many writers?

@freeseacher
Copy link
Author

oh i see. in that moment i have a stacktrace in my influx logs
logs.txt

@jwilder
Copy link
Contributor

jwilder commented Apr 29, 2016

What is your writer process [discovery]? If it's written in go and use the v2 client, there was a bug about leaked connections that was recently fixed.

@freeseacher
Copy link
Author

freeseacher commented Apr 29, 2016

no. Its written in python and i have lots of them.
4 servers with each with 18 process with 187 threads. not every thread works with metrics of cause.
i am unsure about batch size. but of cause it is batched.

i will provide information about batch size shorly

@dvolodin7
Copy link

dvolodin7 commented Apr 30, 2016

discovery process uses libcurl to spool the metrics. Each discovery process (72 at all) spools all collected metrics every 250ms in a single batch.

At this time we collect ~100 metrics from 50k objects every 300 seconds and have the plans to increase amount of objects up to 250k

So expected size of the batch will be about 4200 lines

@freeseacher
Copy link
Author

ok. looks like we found and fixed problem with our discovery process.

i am unsure weather i have to open another issue or not.
but problem looks the same. high memory usage.
pic

but probably the reason is different.
the_log.txt
it looks like while compacting the database influx eats all the memory

then after restart. it begun to compact db again, at the moment influx is still not able to accept metrics. So

  1. restart time is awful
  2. compact db east all the memory.

i think the first problem is in architecture and can't be fixed in soon, but can i handle the second one somehow ?

@jwilder
Copy link
Contributor

jwilder commented May 2, 2016

The restart time is a known issue with some datasets. Issue to follow is: #6250. There is lock contention when reloading the in-memory index that slows some datasets down a lot.

For the compactions, are you overwriting points or writing to series in the past? Can you provide some sample data that you're writing?

@whitelynx
Copy link

Possible duplicate of #6243?

@jwilder
Copy link
Contributor

jwilder commented May 3, 2016

@freeseacher When your heap starts to spike, would you be able to grab a snapshot of the heap and goroutines and attach the output here?

curl -o heap.txt http://localhost:8086/debug/pprof/heap?debug=1
curl -o goroutine.txt http://localhost:8086/debug/pprof/goroutine?debug=1

@jwilder jwilder added this to the 1.0.0 milestone May 3, 2016
@jwilder
Copy link
Contributor

jwilder commented May 13, 2016

#6618 should help start up time some.

@freeseacher Are you able to grab a heap and profile using the commands above when you heap starts to spike?

@jwilder jwilder modified the milestones: 1.0.0 beta, 1.0.0 May 26, 2016
@jwilder
Copy link
Contributor

jwilder commented Jun 1, 2016

Should be fixed via #6653 and #6743. If there are still issues, please let us know.

@jwilder jwilder closed this as completed Jun 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants