[tsm1] Data older than a few minutes is not showing up in queries + memory is leaking #4354

KiNgMaR · 2015-10-07T10:51:25Z

The new storage engine looks very promising so far. Memory usage is still going up non-stop (writing over UDP exclusively), but slower than before. Overall the system seems more stable and under much less pressure.

However, after a few minutes, points that have been written and were previously showing up in queries disappear. All WHERE time >= now() - 1h queries only show data for approx. the last 2-10 minutes. This affects CQs and normal queries. It can even be reproduced using a simple SELECT * FROM blah query without any time constraints.
The data showing from this type of query behaves as follows: new points are appended and showing up. Then, every few minutes, the starting point (i.e. the timestamp of the first point returned) moves some minutes into the future.

version 0.9.5-nightly-6f80d69
Fresh installation, clean databases...
No errors in the logs
Configuration -> https://gist.github.com/KiNgMaR/2f0ba7ca6af1f83f39bc
Stats -> https://gist.github.com/KiNgMaR/3e3db326ea87ab815bdb

P.S. Thanks everyone, you're doing a great job!

The text was updated successfully, but these errors were encountered:

jwilder · 2015-10-07T15:23:52Z

What are your retention policies set to?

otoolep · 2015-10-07T18:13:45Z

Yes, please let us know what the retention policies are.

A small program (in any language) that would allow us to a) recreate your database and retention policies and b) write the data pattern you have, would be great.

KiNgMaR · 2015-10-08T05:16:20Z

Retention policies are as follows:

create database "xxx_nonprod"
create database "xxx_prod"
alter retention policy "default" on "xxx_nonprod" duration 7d
alter retention policy "default" on "xxx_prod" duration 31d

There may have been points written between create database and alter retention policy, could that be a problem?

I will try to reproduce on a clean database and/or with a code example.

otoolep · 2015-10-08T05:22:25Z

@KiNgMaR -- that may be. To be clear, when points are first written shards are created and the end-time of the shard is determined by the retention policy duration at that time. Subsequently changing the retention policy duration does not change the end-time of pre-existing shards, only shards created after the ALTER command.

KiNgMaR · 2015-10-08T06:28:40Z

So I stopped InfluxDB, deleted all DBs, shards etc. and disabled the UDP listeners. Started InfluxDB back up, created the DBs, alterered the default retention policies and checked that no shards had been created yet. Shut down InfluxDB, re-enabled the UDP listeners... but the issue remains. This log file should illustrate it: https://gist.github.com/KiNgMaR/d3419fde0b5f8c78d88e The shard properties look correct to me.

If needed I should probably be able to provide a node.js script to reproduce from scratch...

pkittenis · 2015-10-08T14:29:19Z

Also getting this and it seems to be caused by wal flushes. Retention period is the default - infinite. Doubt retention period is in any way related.

DB version is 0.9.5-nightly-64cef0f, branch master, commit 64cef0fa63487e4b675cbb0b59e83b493e88ff08

After messages like the below, data older than the time of the flush no longer appear in queries:

[tsm1wal] 2015/10/08 14:17:56 idle flush of /var/opt/influxdb/data/graphite/default/5 with 54 keys and 54 total values of 864 bytes
[tsm1wal] 2015/10/08 14:17:56 idle flush of /var/opt/influxdb/data/_internal/monitor/4 with 52 keys and 312 total values of 4992 bytes
[tsm1wal] 2015/10/08 14:17:56 idle flush of /var/opt/influxdb/data/graphite/default/2 with 83 keys and 504 total values of 8064 bytes
[tsm1wal] 2015/10/08 14:18:28 memory flush of /var/opt/influxdb/data/graphite/default/1 with 104109 keys and 328749 total values of 5259984 bytes

select  value from "development.amers.b00583.fix_eng.undefined.alpqfix01.os.cpu.idle" where time > now() - 1h

development.amers.b00583.fix_eng.undefined.alpqfix01.os.cpu.idle
time    value
2015-10-08T14:18:15Z    97
2015-10-08T14:18:17Z    97
2015-10-08T14:18:20Z    97

Then, a few minutes later, another flush:

[tsm1wal] 2015/10/08 14:25:56 idle flush of /var/opt/influxdb/data/_internal/monitor/4 with 52 keys and 312 total values of 4992 bytes
[tsm1wal] 2015/10/08 14:25:56 idle flush of /var/opt/influxdb/data/graphite/default/5 with 54 keys and 54 total values of 864 bytes
[tsm1wal] 2015/10/08 14:25:56 idle flush of /var/opt/influxdb/data/graphite/default/2 with 83 keys and 504 total values of 8064 bytes
[tsm1wal] 2015/10/08 14:26:12 memory flush of /var/opt/influxdb/data/graphite/default/1 with 127105 keys and 328706 total values of 5259296 bytes

And queries now respond with data since 14:25.

select value from "development.amers.b00583.fix_eng.undefined.alpqfix01.os.cpu.idle" where time > now() - 1h

development.amers.b00583.fix_eng.undefined.alpqfix01.os.cpu.idle
time    value
2015-10-08T14:25:52Z

Do not have a 100% reproducible test yet, will post back when I have one.

What I have noticed is that sometimes a log message is printed like the below following a wal flush:

[tsm1] 2015/10/08 14:22:53 rewriting index file /var/opt/influxdb/data/graphite/default/1/0000064.tsm1 with /var/opt/influxdb/data/graphite/default/1/0000065.tsm1

However, this message is not printed when data disappears following a wal flush.

Would presume that the merging of existing indexes on rewrite index does not happen in certain circumstances after a wal flush which causes data to 'disappear' from the index and not show up in queries.

Hope this helps.

KiNgMaR · 2015-10-09T06:25:03Z

I updated to the fresh nightly (bc569e8) and the issue appears to be resolved for me. 😃 However I browsed the commits and didn't find anything that struck me as related, so we probably have to keep an eye on it...

@pkittenis can you also test with the fresh build?

otoolep · 2015-10-09T06:33:24Z

Are you sure you are running the tsm1 engine? Just have to be sure it isn't something simple that fixed it, like inadvertently switching back to bz1. :-)

KiNgMaR · 2015-10-09T06:46:31Z

# grep 'engine =' /etc/opt/influxdb/influxdb.conf
engine = "tsm1"

# grep wal /var/log/influxdb/influxd.log | tail -n1
[tsm1wal] 2015/10/09 08:44:49 memory flush of /var/opt/influxdb/data/xxx_prod/default/2 with 140876 keys and 330139 total values of 5282224 bytes

LGTM 👍

jhofeditz · 2015-10-12T20:33:12Z

I had this same issue after wiping all data and upgrading to f1e0c59 nightly. Data disappeared after 5-10 minutes even though I changed to tsm1 (or thought I did). Even saw tsm1 messages in the log:
[tsm1wal] 2015/10/12 06:56:41 /ssd/influxdb/data/_internal/monitor/36 flush to index took 1.817µs
[tsm1wal] 2015/10/12 06:57:31 idle flush of /ssd/influxdb/data/_internal/monitor/36 with 34 keys and 166 total values of 2656 bytes

I then moved my 0.9.4 config and used a new 0.9.5 config. Then I noticed that I neglected to uncomment the engine line # engine ="tsm1". D'oh! Fixed, wiped data, and restarted. It's been working exactly as expected for the last few hours now.

Just putting this out there in case someone else has a similar issue.

KiNgMaR · 2015-10-13T05:34:16Z

Just upgraded to the f0efd1f nightly, cleaned all databases and the issue has reappeared for me. :(

Looking at the log files, the time window is resetting whenever the WAL flushes to disk:

[tsm1wal] 2015/10/13 07:31:02 memory flush of /var/opt/influxdb/data/xxx_prod/default/2
          with 101755 keys and 336873 total values of 5389968 bytes

Possibly related, this is also when the process is adding some 100 MB to its RSS - which I described here: #4096 (comment) - so it looks like the WAL flush and/or the "join" with indexed data is causing the trouble I am currently experiencing. How can we get to the bottom of this?

pkittenis · 2015-10-13T16:48:48Z

so it looks like the WAL flush and/or the "join" with indexed data is causing the trouble I am currently experiencing

Exactly my thoughts as well - see comment above.

Working on a reproducible test - have one but does not reproduce the issue 100% of the time.

pkittenis · 2015-10-14T15:33:34Z

Hi @otoolep ,

Can you please provide an email that I can send a DB dump to that has this issue? The dump is 25MB and though the data is not sensitive, do not want to publish it publicly.

Have been unable to replicate it in testing but the issue immediately starts appearing in our live test environment with real clients sending data to influxdb using tsm1.

Needless to say it's a blocker for moving ahead with the new engine :)

Here are example queries that demonstrate the issue.

Query:

select value from "development.amers.b001287.cmcm.undefined.ouy-ingestion.os.nfs.nfs_call" where time > '2015-10-14T14:00:52Z'

First response:

development.amers.b001287.cmcm.undefined.ouy-ingestion.os.nfs.nfs_call
time    value
2015-10-14T14:54:52Z     0

1min later, second response:

Success! (no results to display)

Previous data has 'disappeared'. There were also several more datapoints written and seen before the first query that have also disappeared. The data looks to be there in the DB dump judging by the size of the DB but does not show up in queries.

I am fairly confident this is an issue with WAL writes and the joining of index files. Influxdb logs during write in our test environment have no rewriting index file <..> messages, unlike in local testing where I have been unable to replicate this.

Eg:

[tsm1wal] 2015/10/14 15:22:32 idle flush of /var/opt/influxdb/data/graphite/default/2 with 83 keys and 504 total values of 8064 bytes
[tsm1wal] 2015/10/14 15:22:32 idle flush of /var/opt/influxdb/data/graphite/default/3 with 54 keys and 378 total values of 6048 bytes
[query] 2015/10/14 15:22:32 SELECT value FROM "graphite"."default"."development.amers.b001287.cmcm.undefined.ouy-ingestion.os.nfs.nfs_call" WHERE time > '2015-10-14T14:00:52Z'

Can see wal flushes, sans rewriting index, followed by the second query that resulted in no data above.

Compared to log messages while load testing and exercising the wal:

[tsm1wal] 2015/10/14 15:31:08 memory flush of /var/opt/influxdb/data/graphite/default/1 with 1000 keys and 329000 total values of 5264000 bytes
[tsm1] 2015/10/14 15:31:08 rewriting index file /var/opt/influxdb/data/graphite/default/1/0000415.tsm1 with /var/opt/influxdb/data/graphite/default/1/0000416.tsm1
[tsm1wal] 2015/10/14 15:31:11 /var/opt/influxdb/data/graphite/default/1 flush to index took 2.236247905s

otoolep · 2015-10-14T15:58:55Z

@pauldix is actually looking at all tsm1 issues, so may be interested in your data.

@beckettsean -- can you provide the upload details for CloudWok, so we can get the data from @pkittenis ?

rossmcdonald · 2015-10-15T22:21:47Z

@pkittenis If you would like to send us your data, you can upload it to our S3 bucket here:

https://www.cloudwok.com/cu/gh4354

pkittenis · 2015-10-16T09:41:38Z

Thanks @rossmcdonald - DB dump uploaded.

KiNgMaR · 2015-10-20T09:39:12Z

I checked my log files from the last few days (running with very lite workload to avoid going OOM) and I absolutely never see any rewriting index file entries in my log files. My configuration is this: https://gist.github.com/KiNgMaR/2f0ba7ca6af1f83f39bc

Furthermore, I don't have any .tsm1 files on disk. Only .wal files... wut?!

@otoolep @pauldix: Any suggestions?

pauldix · 2015-10-20T13:28:18Z

@KiNgMaR: What does an ls of the directory look like?

KiNgMaR · 2015-10-20T18:23:11Z

@pauldix: here you go: https://gist.github.com/KiNgMaR/632d0acbad19fb9f4649

KiNgMaR · 2015-10-21T14:40:05Z

I think something may be going wrong on the file or filesystem level. Looking at the following piece of code from wal.go, I think that an error is occuring in Write() in tsm1.go, but it's never logged. The return value from flush() in wal.go seems to be discarded as well. This should probably be addressed?

if l.LoggingEnabled {
    // ...
    l.logger.Printf("%s flush of %s with %d keys and %d total values of %d bytes\n", ftype, l.path, len(l.flushCache), valueCount, flushSize)
}

startTime := time.Now()
if err := l.IndexWriter.Write(l.flushCache, mfc, scc); err != nil {
    // *** SHOULD LOG HERE?! ***
    return err
}
if l.LoggingEnabled {
    l.logger.Printf("%s flush to index took %s\n", l.path, time.Since(startTime))
}

In my log file, I can see the %s flush of %s with %d keys and %d total values of %d bytes messages, but not the %s flush to index took %s ones...

I will try to come up with a pull request to log the error unless someone else beats me to it... :)

jwilder · 2015-10-21T15:14:50Z

@KiNgMaR Nice catch. #4529 adds logging for those errors.

KiNgMaR · 2015-10-21T16:53:11Z

Awesome, @jwilder! Can't wait to try the nightly tomorrow / whenever it's merged :)

KiNgMaR · 2015-10-22T06:40:31Z

My issue originates from replaceCompressedFile in tsm1.go. For some reason, the Remove call in the following lines results in a Permission Denied error:

if err := os.Remove(name); err != nil && !os.IsNotExist(err) {
    return err
}

Weirdly enough, this does not seem to happen in cleanupMetafile. Looking at the implementation of os.Remove in go, an unlink is performed which is then followed by an rmdir syscall. In this case, the unlink would most probably return ENOENT or similar. However, if the following rmdir returned something that was not ENOTDIR (e.g. EACCES), the error from rmdir would now be returned to the application. And os.IsNotExist would than be falsey.

My data directory lives on NFS, so maybe that is what happens.

According to the rename syscall manpage, the os.Remove may not be necessary at all:

   If newpath already exists, it will be atomically replaced (subject to
   a few conditions; see ERRORS below), so that there is no point at
   which another process attempting to access newpath will find it
   missing.

Thoughts?

pkittenis · 2015-10-22T11:37:34Z

Testing last night's build and it's looking more stable - no longer see data loss from queries so far and rewriting index <..> lines have started appearing in logs.

Will keep you updated if anything changes.

pauldix · 2015-10-22T14:23:35Z

@pkittenis I just merged #4530 which is a fairly important fix so it'll be interesting to hear what happens with a build that has that.

@KiNgMaR Not sure, you're saying we can just take out the remove because the rename will replace the old file? This seems like a really odd error to get a permission denied.

jwilder · 2015-10-22T16:22:10Z

@pauldix @KiNgMaR Looking at replaceCompressedFile, the Remove call is not needed. We can remove it.

pkittenis · 2015-10-22T16:43:16Z

FWIW, also saw permission denied errors from one instance which went away after I wiped the data on it.

[tsm1wal] 2015/10/22 10:12:28 idle flush of /var/opt/influxdb/data/_internal/monitor/2 with 46 keys and 276 total values of 4416 bytes
[tsm1wal] 2015/10/22 10:12:28 failed to flush to index: remove ids: permission denied
[tsm1] 2015/10/22 10:12:28 PerformMaintenance: WAL flush failed: remove ids: permission denied
[tsm1wal] 2015/10/22 10:12:28 idle flush of /var/opt/influxdb/data/graphite/default/3 with 54 keys and 54 total values of 864 bytes
[tsm1wal] 2015/10/22 10:12:28 failed to flush to index: remove ids: permission denied
[tsm1] 2015/10/22 10:12:28 PerformMaintenance: WAL flush failed: remove ids: permission denied

Did not investigate too hard and has not re-occured since.

Last night's build has been stable so far with no data loss - will test with #4530 included next.

jwilder · 2015-10-22T16:46:01Z

@KiNgMaR @pkittenis Are you installing upgrades using of the nightly packages or building manually? There may be a bug with permissions in the packaging script.

KiNgMaR · 2015-10-26T07:19:54Z

Fixed by #4555! Woohooo!! 💯

Still observing memory usage, will report back in a new issue in case it keeps growing still.

@jwilder: I use the nightly rpm builds for CentOS 6. E.g.

    Running Transaction
      Updating   : influxdb-0.9.5_nightly_9633410-1.x86_64                                                          1/2
    ln: creating symbolic link `/opt/influxdb/influx_inspect': File exists
    ln: creating symbolic link `/opt/influxdb/influx_stress': File exists
    chown: cannot dereference `/opt/influxdb/influx_inspect': No such file or directory
    chown: cannot dereference `/opt/influxdb/influx_stress': No such file or directory
      Cleanup    : influxdb-0.9.5_nightly_3a5044d-1.x86_64                                                          2/2
      Verifying  : influxdb-0.9.5_nightly_9633410-1.x86_64                                                          1/2
      Verifying  : influxdb-0.9.5_nightly_3a5044d-1.x86_64                                                          2/2

    Updated:
      influxdb.x86_64 0:0.9.5_nightly_9633410-1

pkittenis · 2015-10-27T11:28:55Z

Can also confirm issue appears resolved with changes from #4530 included.

In general latest nightly builds seem much more stable, thanks for your hard work.

otoolep added area/tsm area/writes labels Oct 8, 2015

KiNgMaR changed the title ~~[tsm1] Data older than a few minutes is not showing up in queries~~ [tsm1] Data older than a few minutes is not showing up in queries + memory is leaking Oct 13, 2015

KiNgMaR closed this as completed Oct 26, 2015

pcable3d mentioned this issue Nov 10, 2015

[0.9.4.2] Failed to write point batch: timeout #4712

Closed

pkittenis mentioned this issue Nov 20, 2015

[0.9.5-rc2] data not returned by SELECT * query #4832

Closed

jackzampolin mentioned this issue Dec 9, 2015

ALTER RETENTION doesn't affect existing shards influxdata/docs.influxdata.com-ARCHIVE#48

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tsm1] Data older than a few minutes is not showing up in queries + memory is leaking #4354

[tsm1] Data older than a few minutes is not showing up in queries + memory is leaking #4354

KiNgMaR commented Oct 7, 2015

jwilder commented Oct 7, 2015

otoolep commented Oct 7, 2015

KiNgMaR commented Oct 8, 2015

otoolep commented Oct 8, 2015

KiNgMaR commented Oct 8, 2015

pkittenis commented Oct 8, 2015

KiNgMaR commented Oct 9, 2015

otoolep commented Oct 9, 2015

KiNgMaR commented Oct 9, 2015

jhofeditz commented Oct 12, 2015

KiNgMaR commented Oct 13, 2015

pkittenis commented Oct 13, 2015

pkittenis commented Oct 14, 2015

otoolep commented Oct 14, 2015

rossmcdonald commented Oct 15, 2015

pkittenis commented Oct 16, 2015

KiNgMaR commented Oct 20, 2015

pauldix commented Oct 20, 2015

KiNgMaR commented Oct 20, 2015

KiNgMaR commented Oct 21, 2015

jwilder commented Oct 21, 2015

KiNgMaR commented Oct 21, 2015

KiNgMaR commented Oct 22, 2015

pkittenis commented Oct 22, 2015

pauldix commented Oct 22, 2015

jwilder commented Oct 22, 2015

pkittenis commented Oct 22, 2015

jwilder commented Oct 22, 2015

KiNgMaR commented Oct 26, 2015

pkittenis commented Oct 27, 2015

[tsm1] Data older than a few minutes is not showing up in queries + memory is leaking #4354

[tsm1] Data older than a few minutes is not showing up in queries + memory is leaking #4354

Comments

KiNgMaR commented Oct 7, 2015

jwilder commented Oct 7, 2015

otoolep commented Oct 7, 2015

KiNgMaR commented Oct 8, 2015

otoolep commented Oct 8, 2015

KiNgMaR commented Oct 8, 2015

pkittenis commented Oct 8, 2015

KiNgMaR commented Oct 9, 2015

otoolep commented Oct 9, 2015

KiNgMaR commented Oct 9, 2015

jhofeditz commented Oct 12, 2015

KiNgMaR commented Oct 13, 2015

pkittenis commented Oct 13, 2015

pkittenis commented Oct 14, 2015

otoolep commented Oct 14, 2015

rossmcdonald commented Oct 15, 2015

pkittenis commented Oct 16, 2015

KiNgMaR commented Oct 20, 2015

pauldix commented Oct 20, 2015

KiNgMaR commented Oct 20, 2015

KiNgMaR commented Oct 21, 2015

jwilder commented Oct 21, 2015

KiNgMaR commented Oct 21, 2015

KiNgMaR commented Oct 22, 2015

pkittenis commented Oct 22, 2015

pauldix commented Oct 22, 2015

jwilder commented Oct 22, 2015

pkittenis commented Oct 22, 2015

jwilder commented Oct 22, 2015

KiNgMaR commented Oct 26, 2015

pkittenis commented Oct 27, 2015