New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InfluxDB unresponsive due to corrupted TSM after disk disaster #9949

Closed
sbengo opened this Issue Jun 8, 2018 · 2 comments

Comments

Projects
None yet
3 participants
@sbengo
Copy link

sbengo commented Jun 8, 2018

Hi,

InfluxDB: 1.5.2
OS: RHEL 7.4
TSI: enabled

Case

Yesterday one of your disk cabins went off due to electric problem and affected to our InfluxDB data, causing a corruption into one TSM file.

Once the disk was recovered, we rebooted the host and InfluxDB tried to start without succes, with high consumption of resources due (apparently) to corrupted TSM

Note the following graphs data are stored into other DB, so it doesn't come from the unresponsive host. The null values appears on graphs due the host was unresponsive

image

As it appears on the log, it tried to open constantly the same file, eating all memory and started to swap.

influxd[1443]: ts=2018-06-07T14:26:39.319658Z lvl=error msg="[500] - \"[shard 54] error opening memory map for file /store/influxdata/data/db_metrics/1y/54/000001696-000000002.tsm: mmapAccessor: invalid indexStart\"" log_id=08YeLc3W000 service=httpd
influxd[1443]: ts=2018-06-07T14:26:39.350716Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001672-000000002.tsm id=1 duration=2743.436ms
influxd[1443]: ts=2018-06-07T14:26:39.353078Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001664-000000005.tsm id=0 duration=2745.847ms
influxd[1443]: ts=2018-06-07T14:26:39.398017Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001688-000000002.tsm id=3 duration=2790.524ms
influxd[1443]: ts=2018-06-07T14:26:40.147845Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001672-000000002.tsm id=1 duration=1242.800ms
influxd[1443]: ts=2018-06-07T14:26:40.147813Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001688-000000002.tsm id=3 duration=1242.651ms
influxd[1443]: ts=2018-06-07T14:26:40.618681Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001680-000000002.tsm id=2 duration=1713.570ms
influxd[1443]: ts=2018-06-07T14:26:41.012754Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001696-000000002.tsm id=4 duration=289.077ms
influxd[1443]: ts=2018-06-07T14:26:41.013105Z lvl=info msg="Write failed" log_id=08YeLc3W000 service=write shard=54 error="[shard 54] error opening memory map for file /store/influxdata/data/db_metrics/1y/54/000001696-000000002.tsm: mmapAccessor: invalid indexStart"
influxd[1443]: ts=2018-06-07T14:26:41.064104Z lvl=error msg="[500] - \"[shard 54] error opening memory map for file /store/influxdata/data/db_metrics/1y/54/000001696-000000002.tsm: mmapAccessor: invalid indexStart\"" log_id=08YeLc3W000 service=httpd
influxd[1443]: ts=2018-06-07T14:26:42.368331Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001664-000000005.tsm id=0 duration=3463.341ms
influxd[1443]: ts=2018-06-07T14:26:43.491397Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001664-000000005.tsm id=0 duration=2767.749ms
influxd[1443]: ts=2018-06-07T14:26:43.534201Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001688-000000002.tsm id=3 duration=2810.384ms
influxd[1443]: ts=2018-06-07T14:26:43.567208Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001672-000000002.tsm id=1 duration=2843.515ms
influxd[1443]: ts=2018-06-07T14:26:43.569437Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001680-000000002.tsm id=2 duration=2845.673ms
influxd[1443]: ts=2018-06-07T14:26:43.881832Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001696-000000002.tsm id=4 duration=24.403ms
influxd[1443]: ts=2018-06-07T14:26:43.882121Z lvl=info msg="Write failed" log_id=08YeLc3W000 service=write shard=54 error="[shard 54] error opening memory map for file /store/influxdata/data/db_metrics/1y/54/000001696-000000002.tsm: mmapAccessor: invalid indexStart"
influxd[1443]: ts=2018-06-07T14:26:43.882673Z lvl=error msg="[500] - \"[shard 54] error opening memory map for file /store/influxdata/data/db_metrics/1y/54/000001696-000000002.tsm: mmapAccessor: invalid indexStart\"" log_id=08YeLc3W000 service=httpd
influxd[1443]: ts=2018-06-07T14:26:44.091045Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001672-000000002.tsm id=1 duration=233.563ms
influxd[1443]: ts=2018-06-07T14:26:44.092702Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001688-000000002.tsm id=3 duration=235.178ms
influxd[1443]: ts=2018-06-07T14:26:45.271294Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001696-000000002.tsm id=4 duration=951.119ms
influxd[1443]: ts=2018-06-07T14:26:45.311097Z lvl=info msg="Write failed" log_id=08YeLc3W000 service=write shard=54 error="[shard 54] error opening memory map for file /store/influxdata/data/db_metrics/1y/54/000001696-000000002.tsm: mmapAccessor: invalid indexStart"
influxd[1443]: ts=2018-06-07T14:26:45.311617Z lvl=error msg="[500] - \"[shard 54] error opening memory map for file /store/influxdata/data/db_metrics/1y/54/000001696-000000002.tsm: mmapAccessor: invalid indexStart\"" log_id=08YeLc3W000 service=httpd
influxd[1443]: ts=2018-06-07T14:26:45.316940Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001664-000000005.tsm id=0 duration=1459.524ms
influxd[1443]: ts=2018-06-07T14:26:45.352201Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001680-000000002.tsm id=2 duration=1494.726ms
influxd[1443]: ts=2018-06-07T14:26:47.089930Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001696-000000002.tsm id=4 duration=1116.344ms
influxd[1443]: ts=2018-06-07T14:26:47.090293Z lvl=info msg="Write failed" log_id=08YeLc3W000 service=write shard=54 error="[shard 54] error opening memory map for file /store/influxdata/data/db_metrics/1y/54/000001696-000000002.tsm: mmapAccessor: invalid indexStart"
influxd[1443]: ts=2018-06-07T14:26:47.114677Z lvl=error msg="[500] - \"[shard 54] error opening memory map for file /store/influxdata/data/db_metrics/1y/54/000001696-000000002.tsm: mmapAccessor: invalid indexStart\"" log_id=08YeLc3W000 service=httpd
influxd[1443]: ts=2018-06-07T14:26:47.128135Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001688-000000002.tsm id=3 duration=2807.898ms
influxd[1443]: ts=2018-06-07T14:26:47.146797Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001664-000000005.tsm id=0 duration=2826.660ms
influxd[1443]: ts=2018-06-07T14:26:47.149069Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001672-000000002.tsm id=1 duration=2828.904ms
influxd[1443]: ts=2018-06-07T14:26:47.294449Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001664-000000005.tsm id=0 duration=204.076ms
influxd[1443]: ts=2018-06-07T14:26:47.305505Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001688-000000002.tsm id=3 duration=215.087ms
influxd[1443]: ts=2018-06-07T14:26:47.630979Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001680-000000002.tsm id=2 duration=540.695ms
influxd[1443]: ts=2018-06-07T14:26:48.469392Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001696-000000002.tsm id=4 duration=348.846ms
influxd[1443]: ts=2018-06-07T14:26:48.532995Z lvl=info msg="Write failed" log_id=08YeLc3W000 service=write shard=54 error="[shard 54] error opening memory map for file /store/influxdata/data/db_metrics/1y/54/000001696-000000002.tsm: mmapAccessor: invalid indexStart"
influxd[1443]: ts=2018-06-07T14:26:48.533921Z lvl=error msg="[500] - \"[shard 54] error opening memory map for file /store/influxdata/data/db_metrics/1y/54/000001696-000000002.tsm: mmapAccessor: invalid indexStart\"" log_id=08YeLc3W000 service=httpd
influxd[1443]: ts=2018-06-07T14:26:48.532924Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001680-000000002.tsm id=2 duration=4212.728ms
influxd[1443]: ts=2018-06-07T14:26:50.993309Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001672-000000002.tsm id=1 duration=3903.157ms
influxd[1443]: ts=2018-06-07T14:26:51.113610Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001672-000000002.tsm id=1 duration=2992.970ms
influxd[1443]: ts=2018-06-07T14:26:51.145787Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001688-000000002.tsm id=3 duration=3025.049ms
influxd[1443]: ts=2018-06-07T14:26:51.162725Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001680-000000002.tsm id=2 duration=3042.159ms
influxd[1443]: ts=2018-06-07T14:26:51.467095Z lvl=info msg="Opened file" log_id=08YeLc3W000 engine=tsm1 service=filestore path=/store/influxdata/data/db_metrics/1y/54/000001696-000000002.tsm id=4 duration=7.074ms

Trying to solve that we did the following:

  1. Called influx_inspect dumptsm -dir /store/influxdata
    Other shards were OK and the process stopped, without giving the name of the of the failing shard, with the following message: mmapAccessor: invalid indexStart

  2. Stopped InfluxDB and moved the TSM file
    On 17:20, we decided to stop InfluxDB and move the corrupted TSM file /store/influxdata/data/db_metrics/1y/54/000001696-000000002.tsm into another dir

  3. Started InfluxDB
    We started the InfluxDB and it was running OK with lost data from the moved TSM

Expected Behaviour

  • InfluxDB should skip/discard corrupted TSM files and start normally, giving an error into log.
  • The tool influx_inspect dumptsm should have an output of the failling TSM file.

Actual Behaviour

  • InfluxDB becomes unresponsive and consumed massive resources, with apparently an infinite memory consumption
  • The tool influx_inspect dumptsm doesn't echo the failing shard, only the error

@benbjohnson benbjohnson self-assigned this Jun 26, 2018

@benbjohnson

This comment has been minimized.

Copy link
Contributor

benbjohnson commented Jun 27, 2018

@sbengo This issue has been fixed in 1.5.3: #9800

@sbengo

This comment has been minimized.

Copy link
Author

sbengo commented Jun 28, 2018

Thanks for the info @benbjohnson , we will update or InfluxDB to 1.5.3

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment