Shard deletion triggers infinite memory usage due to cache lock contention #7226

stevenh · 2016-08-28T12:52:47Z

Bug report

Influxdb: 1.0.0-beta3
OS Linux: 3.10.0-327.el7.x86_64

Steps to reproduce:

Ingress data to influx at a decent rate
Wait for 7 day retention policy to trigger
Watch memory rise indefinitely

Expected behaviour:
Compaction should have little impact on memory

Actual behaviour:
Memory increases until it triggers out of memory

Additional info:
Looking into the details from pprof its clear that the issue is serious contention on the cache lock, enough so that all queries (read / write) to influx are stalling pretty much indefinitely while compaction takes place.

When incoming requests get behind, the problem spirals out of control with more and more memory used for query goroutines.

When we captured it we had 100k goroutines waiting on the cache lock using
tsm1.newIntegerAscendingCursor accounting for 129GB of active memory, this is very clear to see in heep.pdf below.

In our case we have a master slave setup for reads, with this node being the slave; so its only taking writes from influx-relay unless the primary is down, which it wasn't so all of this we believe is write not read traffic.

Once we realised this we disabled writes by taking this node out of influx-relay and it recovered in ~30mins even from 312GB. At the same point we saw a query complete which took 13555 minutes.

Machine Spec

2 x Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz
168GB RAM
Disk provided by a Purestorage SSD backed over 10Gbps iSCSI

Logs:
block.txt
goroutine2.txt
heap2.txt
vars.txt
iostat.txt
heap.pdf

stevenh · 2016-08-28T22:11:58Z

Just put in a PR which should help with this #7228

stevenh · 2016-08-29T13:16:25Z

Another thing that could help significantly is to introduce the concept of handlers. By doing this the amount of contention on the lock could be controlled, something like having 10 write handlers, 1 snapshot handler and 50 read handlers.

Each type would be processed by sending requests to a buffered channel which the relevant handlers read from and process. If the buffer is full the action could be rejected preventing excessive backlog build up, which in turn would ensure that memory usage stays bounded.

In addition to this sharding the cache and looking to eliminate the impact of GC should also be done, as the current cache is likely to cause significant GC pauses due to the use of pointers, see http://allegro.tech/2016/03/writing-fast-cache-service-in-go.html for details.

jackzampolin · 2016-09-12T16:51:37Z

@stevenh Have you tested out 1.0 yet? There was a pull merged which may help with this: #7258. There is also some in flight work that might help: #7165

stevenh · 2016-09-12T22:10:11Z

We're currently testing 1.0 yes but we haven't let a compaction happen yet.

Given the description of the problem in #7258 I don't believe that's the cause of this, as it indicates it was a irrecoverable situation, so given stopping the incoming data allowed recovery from our issue it seems unlikely to be the cause.

stevenh · 2016-09-12T22:14:30Z

Also just read through #7165 and it doesn't seem to be related, neither of these seem to address the underlying issue of the high lock contention or the data race which will cause invalid cache size restriction.

jwilder · 2016-09-13T21:24:17Z

@stevenh What size of writes are you sending? Are you batching writes or just sending many small ones? Can you provide some sample writes in line protocol format?

Also, compactions are not the same thing as retention policy expiring shards. Compactions happen all the time while writes are occurring on a shard. Snapshot compactions are the only ones that lock the cache, but they do not hold onto the locks while the compactions are run.

The large heap allocations from queries look like they may be querying against large number of series which may indicate a cardinality issue with your tags or a query that is not selective enough. There are also a number of goroutines gzipping responses which is odd. Perhaps a slow query reader?

You may want to try enabling some of these limits to help mitigate some of the issues as well as pinpoint any problem queries:

[http]
max-connection-limit = 20

[coordinator]
  max-concurrent-queries = 20
  query-timeout = "1m"
  log-queries-after = "30s"
  max-select-series = 100000

liv3d · 2016-09-15T09:55:59Z

@jwilder I'll add the settings to our setup later on today :)

In the meantime, I grabbed a backup of production data and put it against 1.0 in our dev environment and set the retention down to 7 days. After ~30 minutes the retention kicked in and deleted ~271,000 series (kind of expected) - from our graphing the retention took 8 minutes and 30 seconds to run, but during this time it completely stopped writing data to both the _internal database and other(s)

I'm going to retest this shortly and grab heap, goroutine, blocks and vars while the node isn't graphing.

liv3d · 2016-09-15T11:54:51Z

block.txt
goroutine.txt
heap.txt
vars.txt

I managed to grab these while reducing the retention policy on the same backup we used during the test yesterday.

jwilder · 2016-09-15T18:51:59Z

@liv3d I think I see part of the issue in one of those traces now. When the shard deletion kicks in, the shard is Closed and a write lock is held during this time. The Close takes a while and during that time stats collection gets blocked which is why you don't see any data in _internal during this time.

Not sure why other writes are getting block yet though.

jwilder · 2016-09-16T06:04:51Z

@liv3d I have a possible fix for this in #7314. Would you be able to test that PR out in your environment to see if it resolves this issue?

stevenh · 2016-09-16T09:07:43Z

With regards other writes getting blocked, the initial traces indicated that they weren't totally blocked they where just progressing VERY slowly due to the cache lock contention.

liv3d · 2016-09-16T09:47:00Z

@jwilder running that build, I get panics when querying _internal or doing show stats - these are attached at here

jwilder · 2016-09-16T14:04:23Z

@liv3d I think that panic is due to a recent change in master unrelated to this. I'll rebase the PR off of the 1.0 branch while we track down the cause of that panic.

jwilder · 2016-09-16T14:35:48Z

The panic is a regression due to #7296

jwilder · 2016-09-16T15:03:13Z

@liv3d The fix for that panic is in master now. If you pull down the updated #7314 PR, it should have that fix in there now.

liv3d · 2016-09-16T16:06:17Z

@jwilder I'm not convinced that panic was unrelated. I tried show stats again and it still panic's.

[danof@influxdb-1.labs ~]$ influxd version
InfluxDB v1.0.0 (git: jw-delete-shard 0063ca4842a83eb0290c06c3e48b9783c63d1869)
[danof@influxdb-1.labs ~]$ influx -execute "SHOW STATS"
ERR: SHOW STATS [panic:assignment to entry in nil map]
Warning: It is possible this error is due to not setting a database.
Please set a database with the command "use <database>".
SHOW STATS [panic:assignment to entry in nil map]
[danof@influxdb-1.labs ~]$

My box on 1.0 GA does:

[danof@influxdb-2.labs ~]$ influxd version
InfluxDB v1.0.0 (git: master 37992377a55fbc138b2c01edd4deffed64b53989)
[danof@influxdb-2.labs ~]$ influx -execute "SHOW STATS"
name: cq
--------
queryFail   queryOk
0       13972
...

When deleting a shard, the shard is locked and then removed from the index. Removal from the index can be slow if there are a lot of series. During this time, the shard is still expected to exist by the meta store and tsdb store so stats collections, queries and writes could all be run on this shard while it's locked. This can cause everything to lock up until the unindexing completes and the shard can be unlocked. Fixes #7226

jwilder · 2016-09-16T18:03:01Z

@liv3d That's actually a different panic in the stats collection for subscribers. It was introduced in #7177 and is now fixed with #7318. I rebased on top of that fix now. master has many changes for 1.1 right now so it's more unstable. Can you test again?

liv3d · 2016-09-16T19:11:49Z

The good news @jwilder is I can indeed run SHOW STATS. I'm going to grab some food and retest in a bit.

liv3d · 2016-09-16T20:26:35Z

@jwilder This looks way better from my quick alerting of retention:

Showing the series drop

I'm also seeing the memory being roughly a 10th of what it was for 1.0 GA

@stevenh We'll need to do more testing during the week, but this actually looks good

jwilder · 2016-09-20T19:14:53Z

@liv3d @stevenh Have you had a chance to test further?

liv3d · 2016-09-21T17:17:33Z

@jwilder I've done some more testing and I'm pretty happy with the change. Hopefully this can make the next release
I did notice that during the shard retention that inserts go from ~0.09s to ~2.50s, however none of them failed

dswarbrick · 2016-09-21T18:05:21Z

I'm a little bit late to the party on this one, but I've also been having problems dropping series / measurements on a roughly 75 GB database with just under 2M series. I tried to drop a measurement earlier today, which took a while, but eventually the command returned to the prompt. However, in the following minutes, influxd started to eat a ton of memory (32 GB + 4 GB swap) and eventually crashed. I've been very hesitant about deleting stuff because this behaviour certainly is not new.

Here's hoping that this patch improves things.

jwilder · 2016-09-21T18:50:12Z

@dswarbrick #7165 may also help you out. If you are able to test it out, that would be very helpful.

jwilder · 2016-09-21T18:51:01Z

That PR needs to be rebased w/ master though.

When deleting a shard, the shard is locked and then removed from the index. Removal from the index can be slow if there are a lot of series. During this time, the shard is still expected to exist by the meta store and tsdb store so stats collections, queries and writes could all be run on this shard while it's locked. This can cause everything to lock up until the unindexing completes and the shard can be unlocked. Fixes #7226

jsternberg added area/performance area/tsm labels Sep 1, 2016

jwilder mentioned this issue Sep 16, 2016

Unload index before closing shard #7314

Merged

3 tasks

jwilder self-assigned this Sep 16, 2016

jwilder closed this as completed in #7314 Sep 21, 2016

jwilder changed the title ~~Compaction triggers infinite memory usage due to cache lock contention~~ Shard deletion triggers infinite memory usage due to cache lock contention Sep 21, 2016

jwilder mentioned this issue Sep 23, 2016

Backport unload index before closing shard #7346

Merged

3 tasks

timhallinflux modified the milestones: 1.1.0, 1.0.1 Dec 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shard deletion triggers infinite memory usage due to cache lock contention #7226

Shard deletion triggers infinite memory usage due to cache lock contention #7226

stevenh commented Aug 28, 2016

stevenh commented Aug 28, 2016

stevenh commented Aug 29, 2016 •

edited

Loading

jackzampolin commented Sep 12, 2016 •

edited

Loading

stevenh commented Sep 12, 2016

stevenh commented Sep 12, 2016

jwilder commented Sep 13, 2016

liv3d commented Sep 15, 2016

liv3d commented Sep 15, 2016

jwilder commented Sep 15, 2016

jwilder commented Sep 16, 2016

stevenh commented Sep 16, 2016

liv3d commented Sep 16, 2016

jwilder commented Sep 16, 2016

jwilder commented Sep 16, 2016

jwilder commented Sep 16, 2016

liv3d commented Sep 16, 2016 •

edited

Loading

jwilder commented Sep 16, 2016

liv3d commented Sep 16, 2016

liv3d commented Sep 16, 2016

jwilder commented Sep 20, 2016

liv3d commented Sep 21, 2016

dswarbrick commented Sep 21, 2016

jwilder commented Sep 21, 2016

jwilder commented Sep 21, 2016

Shard deletion triggers infinite memory usage due to cache lock contention #7226

Shard deletion triggers infinite memory usage due to cache lock contention #7226

Comments

stevenh commented Aug 28, 2016

Bug report

stevenh commented Aug 28, 2016

stevenh commented Aug 29, 2016 • edited Loading

jackzampolin commented Sep 12, 2016 • edited Loading

stevenh commented Sep 12, 2016

stevenh commented Sep 12, 2016

jwilder commented Sep 13, 2016

liv3d commented Sep 15, 2016

liv3d commented Sep 15, 2016

jwilder commented Sep 15, 2016

jwilder commented Sep 16, 2016

stevenh commented Sep 16, 2016

liv3d commented Sep 16, 2016

jwilder commented Sep 16, 2016

jwilder commented Sep 16, 2016

jwilder commented Sep 16, 2016

liv3d commented Sep 16, 2016 • edited Loading

jwilder commented Sep 16, 2016

liv3d commented Sep 16, 2016

liv3d commented Sep 16, 2016

jwilder commented Sep 20, 2016

liv3d commented Sep 21, 2016

dswarbrick commented Sep 21, 2016

jwilder commented Sep 21, 2016

jwilder commented Sep 21, 2016

stevenh commented Aug 29, 2016 •

edited

Loading

jackzampolin commented Sep 12, 2016 •

edited

Loading

liv3d commented Sep 16, 2016 •

edited

Loading