Fix memory leak of retained HTTP write payloads #7832

mark-rushakoff · 2017-01-13T00:30:30Z

This leak seems to have been introduced in 8aa224b,
present in 1.1.0 and 1.1.1.

When points were parsed from HTTP payloads, their tags and fields
referred to subslices of the request body; if any tag set introduced a
new series, then those tags then were stored in the in-memory series
index objects, preventing the HTTP body from being garbage collected. If
there were no new series in the payload, then the request body would be
garbage collected as usual.

Now, we clone the tags before we store them in the index. This is an
imperfect fix because the Point still holds references to the original
tags, and the Point's field iterator also refers to the payload buffer.
However, the current write code path does not retain references to the
Point or its fields; and this change will likely be obsoleted when TSI
is introduced.

This change likely fixes #7827, #7810, #7778, and perhaps others.

Rebased/mergable
Tests pass
CHANGELOG.md updated

This leak seems to have been introduced in 8aa224b, present in 1.1.0 and 1.1.1. When points were parsed from HTTP payloads, their tags and fields referred to subslices of the request body; if any tag set introduced a new series, then those tags then were stored in the in-memory series index objects, preventing the HTTP body from being garbage collected. If there were no new series in the payload, then the request body would be garbage collected as usual. Now, we clone the tags before we store them in the index. This is an imperfect fix because the Point still holds references to the original tags, and the Point's field iterator also refers to the payload buffer. However, the current write code path does not retain references to the Point or its fields; and this change will likely be obsoleted when TSI is introduced. This change likely fixes #7827, #7810, #7778, and perhaps others.

mark-rushakoff · 2017-01-13T00:35:22Z

If anyone is running into this issue now and needs a workaround until this is merged or released, the best option is to keep the HTTP payload as small as possible for writes that introduce a new series. Writes that only affect existing series should not contribute to the leak. Once new series have been written, I believe the only way to release the memory of the associated HTTP payloads is to restart the server.

jwilder

Tested locally and verified this released the allocations now.

joelegasse · 2017-01-13T01:07:16Z

If we're concerned about total allocations, this will create a lot of smaller allocations. We could instead just perform one large allocations for cloning Tags, rather than cloning each tag Key/Value separately. It would be a two-pass Clone(), and there wouldn't be any Clone() defined for a single Tag, but I think given the usage and goal, that would be OK

mark-rushakoff · 2017-01-13T02:17:12Z

@joelegasse I see what you mean. We have a couple options for a more efficient Tags.Clone:

One backing slice containing all deduplicated keys and values
- Advantage: one allocation
- Disadvantage: that one allocation stays in memory until all referring series are deleted
One backing slice for deduplicated keys, one backing slice for deduplicated values
- Unlikely to be any better than the previous option
One backing slice for deduplicated keys, one backing slice per deduplicated value:
- Advantage: any individual value slice can be freed if all series containing that value are dropped
- Disadvantage: potentially many small allocations (but potentially not that many, and likely still less than the implementation in cdbdd15)

Ultimately it probably would be best if tsdb.NewSeries was instead a method on tsdb.DatabaseIndex so that backing slices for tags' keys and values could all be shared. I'll look into this approach tomorrow, and if it seems too complex to get into 1.2, we can fall back to probably the third option above.

e-dard

Just a couple of thoughts really but the tags.Clone() call does need to be moved I think.

e-dard · 2017-01-13T11:40:13Z

tsdb/shard.go

+			// If the tags were created by models.ParsePointsWithPrecision,
+			// they refer to subslices of the buffer containing line protocol.
+			// To ensure we don't refer to that buffer, preventing that buffer from being garbage collected, clone the tags.
+			tags = tags.Clone()


This is going to cause unnecessary allocations a lot of the time because CreateSeriesIndexIfNotExists only holds onto the subsliced tags if a new series needs to be created.

I think it would be better to push tags through to NewSeries as-is, and then clone the tags inside CreateSeriesIndexIfNotExists only when AddSeries is called. Maybe you could just do something like:

series.Tags = series.Tags.Clone() m.AddSeries(series)

e-dard · 2017-01-13T11:59:34Z

models/points.go

+//
+// Tags associated with a Point created by ParsePointsWithPrecision will hold references to the byte slice that was parsed.
+// Use Clone to create Tags with new byte slices that do not refer to the argument to ParsePointsWithPrecision.
+func (a Tags) Clone() Tags {


Just a suggestion: a is not at risk of holding onto anything, if we remove all the references it contains. Therefore I don't think we need to allocate a new Tags value here. We can just do a[i] = a[i].Clone() and it will probably make this method significantly faster.

Whether or not it's still appropriate to call this Clone with that implementation I'm not sure about. 😄

This change delays Tag cloning until a new series is found, and will only clone Tags acquired from `ParsePoints...` and not those referencing the mmap-ed files (TSM) that are created on startup.

mark-rushakoff added area/HTTP area/writes kind/bug labels Jan 13, 2017

mark-rushakoff added this to the 1.2.0 milestone Jan 13, 2017

Update CHANGELOG.md

7964a87

jwilder approved these changes Jan 13, 2017

View reviewed changes

e-dard suggested changes Jan 13, 2017

View reviewed changes

Adjust Tags cloning

cd00085

This change delays Tag cloning until a new series is found, and will only clone Tags acquired from `ParsePoints...` and not those referencing the mmap-ed files (TSM) that are created on startup.

joelegasse force-pushed the mr-fix-http-write-memory-leak branch from f6578af to cd00085 Compare January 13, 2017 18:15

e-dard approved these changes Jan 13, 2017

View reviewed changes

mark-rushakoff merged commit 63e5bae into master Jan 13, 2017

mark-rushakoff deleted the mr-fix-http-write-memory-leak branch January 13, 2017 20:04

mark-rushakoff mentioned this pull request Jan 14, 2017

[0.10.0] Unbounded memory usage when inserting data #5685

Closed

kostasb mentioned this pull request Jan 16, 2017

[20170116 nightly] Throttling errors #7845

Closed

jwilder mentioned this pull request Jan 23, 2017

memory leak crashing influxdb #7810

Closed

rossmcdonald mentioned this pull request Feb 16, 2017

Backport 7832 & 7929 & 8011 #8017

Merged

4 tasks

aemneina mentioned this pull request Jul 31, 2017

InfluxDB Consuming A Lot of Memory rancher/rancher#9527

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix memory leak of retained HTTP write payloads #7832

Fix memory leak of retained HTTP write payloads #7832

mark-rushakoff commented Jan 13, 2017 •

edited

mark-rushakoff commented Jan 13, 2017

jwilder left a comment

joelegasse commented Jan 13, 2017

mark-rushakoff commented Jan 13, 2017

e-dard left a comment

e-dard Jan 13, 2017

e-dard Jan 13, 2017

Fix memory leak of retained HTTP write payloads #7832

Fix memory leak of retained HTTP write payloads #7832

Conversation

mark-rushakoff commented Jan 13, 2017 • edited

mark-rushakoff commented Jan 13, 2017

jwilder left a comment

Choose a reason for hiding this comment

joelegasse commented Jan 13, 2017

mark-rushakoff commented Jan 13, 2017

e-dard left a comment

Choose a reason for hiding this comment

e-dard Jan 13, 2017

Choose a reason for hiding this comment

e-dard Jan 13, 2017

Choose a reason for hiding this comment

mark-rushakoff commented Jan 13, 2017 •

edited