Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tsm1]: panic: runtime error: index out of range #4365

Closed
dsw6 opened this issue Oct 7, 2015 · 26 comments
Closed

[tsm1]: panic: runtime error: index out of range #4365

dsw6 opened this issue Oct 7, 2015 · 26 comments

Comments

@dsw6
Copy link

dsw6 commented Oct 7, 2015

Installed the nightly 9.5 build (version 0.9.5-nightly-6f80d69), configured the tsm1 data store and started running my test program. InfluxDB is running on a Ubuntu 13.04 VM with 4 cores, 3G ram, 20G hd.

The test program batch inserts 15K - 30K points every 5 seconds. Each point has 7 tag values, each tag has a cardinality of 10. Two continuous queries are configured to run every 5m and 1h, down sampling the data into 5m and 1h buckets.

Influx ingested the batch inserts without problems, but memory consumption continued to grow until the process memory reached about 94% of system memory. Causal observation (using top) show memory appeared to increase as the continuous queries executed.

When process memory reached 94% of system memory, the system became unresponsive and a 500 error was received on the batch write. I gracefully restarted the system and influx failed to start. Subsequent attempts to start influx all fail with the same panic.

Below is a log snippet showing the panic.

2015/10/07 15:59:57 InfluxDB starting, version 0.9.5-nightly-6f80d69, branch master, commit 6f80d690dd9001cce860ce4a2f625a97d70587ed, built '2015-10-07T04:00:42+0000'
2015/10/07 15:59:57 Go version go1.5.1, GOMAXPROCS set to 4
2015/10/07 15:59:57 Using configuration at: /etc/opt/influxdb/influxdb.conf
[metastore] 2015/10/07 15:59:57 Using data dir: /var/opt/influxdb/meta
[metastore] 2015/10/07 15:59:57 Skipping cluster join: already member of cluster: nodeId=1 raftEnabled=true peers=[localhost:8088]
[metastore] 2015/10/07 15:59:57 Node at localhost:8088 [Follower]
[metastore] 2015/10/07 15:59:59 Node at localhost:8088 [Leader]. peers=[localhost:8088]
[monitor] 2015/10/07 15:59:59 Starting monitor system
[monitor] 2015/10/07 15:59:59 'build' registered for diagnostics monitoring
[monitor] 2015/10/07 15:59:59 'runtime' registered for diagnostics monitoring
[monitor] 2015/10/07 15:59:59 'network' registered for diagnostics monitoring
[monitor] 2015/10/07 15:59:59 'system' registered for diagnostics monitoring
[store] 2015/10/07 15:59:59 Using data dir: /var/opt/influxdb/data
[metastore] 2015/10/07 15:59:59 Updated node id=1 hostname=localhost:8088
[monitor] 2015/10/07 15:59:59 Storing statistics in database '_internal' retention policy '', at interval 10s
[tsm1wal] 2015/10/07 15:59:59 tsm1 WAL starting with 5242880 flush memory size threshold and 104857600 max memory size threshold
[tsm1wal] 2015/10/07 15:59:59 tsm1 WAL writing to /var/opt/influxdb/data/_internal/monitor/1
[tsm1wal] 2015/10/07 15:59:59 startup flush of /var/opt/influxdb/data/_internal/monitor/1 with 47 keys and 358 total values of 5728 bytes
[tsm1] 2015/10/07 15:59:59 rewriting index file /var/opt/influxdb/data/_internal/monitor/1/0000166.tsm1 with /var/opt/influxdb/data/_internal/monitor/1/0000168.tsm1
[tsm1] 2015/10/07 15:59:59 rewriting index file /var/opt/influxdb/data/_internal/monitor/1/0000148.tsm1 with /var/opt/influxdb/data/_internal/monitor/1/0000169.tsm1
panic: runtime error: index out of range

goroutine 60 [running]:
github.com/influxdb/influxdb/tsdb/engine/tsm1.Values.DecodeSameTypeBlock(0xc820361e80, 0x0, 0x0, 0x7f2b14f79e3a, 0x3c, 0x72b4a, 0x0, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/encoding.go:88 +0x284
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).DecodeAndCombine(0xc820222240, 0xc820361e80, 0x0, 0x0, 0x7f2b14f79e3a, 0x3c, 0x72b4a, 0xc8202de000, 0x0, 0x4e20, ...)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:1648 +0xba
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).rewriteFile(0xc820222240, 0xc82000cfc0, 0xc82028e5a0, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:1107 +0x208f
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Write.func1(0xc820222240, 0xc82028aeb0, 0xc82029ce00, 0xc82028aec0, 0xc82000cfc0, 0xc82028e5a0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:392 +0x3f
created by github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Write
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:398 +0x69b

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0xc82028aecc)
    /root/.gvm/gos/go1.5.1/src/runtime/sema.go:43 +0x26
sync.(*WaitGroup).Wait(0xc82028aec0)
    /root/.gvm/gos/go1.5.1/src/sync/waitgroup.go:126 +0xb4
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Write(0xc820222240, 0xc82028e030, 0xc82021b0e0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:436 +0x731
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Log).flush(0xc820222120, 0x4, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/wal.go:617 +0xab1
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Log).readAndFlushWAL(0xc820222120, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/wal.go:326 +0x1e8
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Log).Open(0xc820222120, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/wal.go:155 +0x40c
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Open(0xc820222240, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:261 +0x62a
github.com/influxdb/influxdb/tsdb.(*Shard).Open.func1(0xc8201ffa20, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tsdb/shard.go:123 +0x365
github.com/influxdb/influxdb/tsdb.(*Shard).Open(0xc8201ffa20, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tsdb/shard.go:100 +0x2d
github.com/influxdb/influxdb/tsdb.(*Store).loadShards(0xc820168140, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tsdb/store.go:301 +0xd2c
github.com/influxdb/influxdb/tsdb.(*Store).Open(0xc820168140, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tsdb/store.go:369 +0x35a
github.com/influxdb/influxdb/cmd/influxd/run.(*Server).Open.func1(0xc82016b560, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/cmd/influxd/run/server.go:365 +0xe08
github.com/influxdb/influxdb/cmd/influxd/run.(*Server).Open(0xc82016b560, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/cmd/influxd/run/server.go:311 +0x2d
github.com/influxdb/influxdb/cmd/influxd/run.(*Command).Run(0xc820148000, 0xc82000a0b0, 0x4, 0x4, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/cmd/influxd/run/command.go:121 +0xe27
main.(*Main).Run(0xc8200f9f00, 0xc82000a0b0, 0x4, 0x4, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/cmd/influxd/main.go:86 +0x6a1
main.main()
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/cmd/influxd/main.go:46 +0x29b

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /root/.gvm/gos/go1.5.1/src/runtime/asm_amd64.s:1696 +0x1

goroutine 5 [syscall]:
os/signal.loop()
    /root/.gvm/gos/go1.5.1/src/os/signal/signal_unix.go:22 +0x18
created by os/signal.init.1
    /root/.gvm/gos/go1.5.1/src/os/signal/signal_unix.go:28 +0x37

goroutine 34 [IO wait]:
net.runtime_pollWait(0x7f2b16001070, 0x72, 0xc82000e150)
    /root/.gvm/gos/go1.5.1/src/runtime/netpoll.go:157 +0x60
net.(*pollDesc).Wait(0xc82013e290, 0x72, 0x0, 0x0)
    /root/.gvm/gos/go1.5.1/src/net/fd_poll_runtime.go:73 +0x3a
net.(*pollDesc).WaitRead(0xc82013e290, 0x0, 0x0)
    /root/.gvm/gos/go1.5.1/src/net/fd_poll_runtime.go:78 +0x36
net.(*netFD).accept(0xc82013e230, 0x0, 0x7f2b18e96788, 0xc820011020)
    /root/.gvm/gos/go1.5.1/src/net/fd_unix.go:408 +0x27c
net.(*TCPListener).AcceptTCP(0xc82013c068, 0x0, 0x0, 0x0)
    /root/.gvm/gos/go1.5.1/src/net/tcpsock_posix.go:254 +0x4d
net.(*TCPListener).Accept(0xc82013c068, 0x0, 0x0, 0x0, 0x0)
    /root/.gvm/gos/go1.5.1/src/net/tcpsock_posix.go:264 +0x3d
github.com/influxdb/influxdb/tcp.(*Mux).Serve(0xc820139100, 0x7f2b160000f0, 0xc82013c068, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tcp/mux.go:48 +0x4b
created by github.com/influxdb/influxdb/cmd/influxd/run.(*Server).Open.func1
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/cmd/influxd/run/server.go:349 +0xafb

goroutine 35 [chan receive]:
github.com/influxdb/influxdb/tcp.(*listener).Accept(0xc82013c070, 0x0, 0x0, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tcp/mux.go:129 +0x6f
github.com/influxdb/influxdb/meta.(*raftLayer).Accept(0xc8201954a0, 0x0, 0x0, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/meta/store.go:2066 +0x51
github.com/hashicorp/raft.(*NetworkTransport).listen(0xc82013e2a0)
    /tmp/tmp.52T7DyVJmO/src/github.com/hashicorp/raft/net_transport.go:346 +0x79
created by github.com/hashicorp/raft.NewNetworkTransport
    /tmp/tmp.52T7DyVJmO/src/github.com/hashicorp/raft/net_transport.go:138 +0x284

goroutine 36 [select]:
github.com/hashicorp/raft.(*Raft).leaderLoop(0xc8201ae000)
    /tmp/tmp.52T7DyVJmO/src/github.com/hashicorp/raft/raft.go:849 +0xcce
github.com/hashicorp/raft.(*Raft).runLeader(0xc8201ae000)
    /tmp/tmp.52T7DyVJmO/src/github.com/hashicorp/raft/raft.go:821 +0x936
github.com/hashicorp/raft.(*Raft).run(0xc8201ae000)
    /tmp/tmp.52T7DyVJmO/src/github.com/hashicorp/raft/raft.go:602 +0xbb
github.com/hashicorp/raft.(*Raft).(github.com/hashicorp/raft.run)-fm()
    /tmp/tmp.52T7DyVJmO/src/github.com/hashicorp/raft/raft.go:252 +0x20
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc8201ae000, 0xc8201a68d0)
    /tmp/tmp.52T7DyVJmO/src/github.com/hashicorp/raft/state.go:152 +0x44
created by github.com/hashicorp/raft.(*raftState).goFunc
    /tmp/tmp.52T7DyVJmO/src/github.com/hashicorp/raft/state.go:153 +0x4d

goroutine 37 [select]:
github.com/hashicorp/raft.(*Raft).runFSM(0xc8201ae000)
    /tmp/tmp.52T7DyVJmO/src/github.com/hashicorp/raft/raft.go:509 +0x10eb
github.com/hashicorp/raft.(*Raft).(github.com/hashicorp/raft.runFSM)-fm()
    /tmp/tmp.52T7DyVJmO/src/github.com/hashicorp/raft/raft.go:253 +0x20
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc8201ae000, 0xc8201a68e0)
    /tmp/tmp.52T7DyVJmO/src/github.com/hashicorp/raft/state.go:152 +0x44
created by github.com/hashicorp/raft.(*raftState).goFunc
    /tmp/tmp.52T7DyVJmO/src/github.com/hashicorp/raft/state.go:153 +0x4d

goroutine 38 [select]:
github.com/hashicorp/raft.(*Raft).runSnapshots(0xc8201ae000)
    /tmp/tmp.52T7DyVJmO/src/github.com/hashicorp/raft/raft.go:1636 +0x493
github.com/hashicorp/raft.(*Raft).(github.com/hashicorp/raft.runSnapshots)-fm()
    /tmp/tmp.52T7DyVJmO/src/github.com/hashicorp/raft/raft.go:254 +0x20
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc8201ae000, 0xc8201a68f0)
    /tmp/tmp.52T7DyVJmO/src/github.com/hashicorp/raft/state.go:152 +0x44
created by github.com/hashicorp/raft.(*raftState).goFunc
    /tmp/tmp.52T7DyVJmO/src/github.com/hashicorp/raft/state.go:153 +0x4d

goroutine 39 [select]:
github.com/influxdb/influxdb/meta.(*localRaft).logLeaderChanges(0xc820146d20)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/meta/state.go:187 +0x4f5
created by github.com/influxdb/influxdb/meta.(*localRaft).open
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/meta/state.go:177 +0x1344

goroutine 40 [chan receive]:
github.com/influxdb/influxdb/tcp.(*listener).Accept(0xc82013c078, 0x0, 0x0, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tcp/mux.go:129 +0x6f
github.com/influxdb/influxdb/meta.(*Store).serveExecListener(0xc820168280)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/meta/store.go:622 +0xa2
created by github.com/influxdb/influxdb/meta.(*Store).Open
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/meta/store.go:239 +0x364

goroutine 41 [chan receive]:
github.com/influxdb/influxdb/tcp.(*listener).Accept(0xc82013c080, 0x0, 0x0, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tcp/mux.go:129 +0x6f
github.com/influxdb/influxdb/meta.(*Store).serveRPCListener(0xc820168280)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/meta/store.go:726 +0xa2
created by github.com/influxdb/influxdb/meta.(*Store).Open
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/meta/store.go:242 +0x3b6

goroutine 7 [select]:
github.com/influxdb/influxdb/cmd/influxd/run.(*Server).monitorErrorChan(0xc82016b560, 0xc820136480)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/cmd/influxd/run/server.go:495 +0x145
created by github.com/influxdb/influxdb/cmd/influxd/run.(*Server).Open.func1
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/cmd/influxd/run/server.go:355 +0xc7f

goroutine 8 [select]:
github.com/influxdb/influxdb/meta.(*Store).WaitForLeader(0xc820168280, 0x6fc23ac00, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/meta/store.go:556 +0x3fc
github.com/influxdb/influxdb/monitor.(*Monitor).storeStatistics(0xc82019a000)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/monitor/service.go:341 +0x2dd
created by github.com/influxdb/influxdb/monitor.(*Monitor).Open
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/monitor/service.go:138 +0x349

goroutine 9 [runnable]:
syscall.Syscall(0x107, 0xffffffffffffff9c, 0xc820208540, 0x0, 0x0, 0x0, 0x0)
    /root/.gvm/gos/go1.5.1/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.unlinkat(0xffffffffffffff9c, 0xc820208500, 0x3d, 0x0, 0x0, 0x0)
    /root/.gvm/gos/go1.5.1/src/syscall/zsyscall_linux_amd64.go:102 +0x8c
syscall.Unlink(0xc820208500, 0x3d, 0x0, 0x0)
    /root/.gvm/gos/go1.5.1/src/syscall/syscall_linux.go:81 +0x49
os.Remove(0xc820208500, 0x3d, 0x0, 0x0)
    /root/.gvm/gos/go1.5.1/src/os/file_unix.go:286 +0x3b
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).removeCheckpoint(0xc820222240, 0xc8201d4140, 0x37, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:1770 +0x1b4
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).writeIndexAndGetDataFile(0xc820222240, 0xc8202b4018, 0x140b05f0a712c181, 0x140b0612e882cf86, 0xc8202ac000, 0x2c, 0x2c, 0xc8202c60b0, 0x2c, 0x2c, ...)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:765 +0xa62
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).rewriteFile(0xc820222240, 0xc82000d680, 0xc82028e240, 0x0, 0x0)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:1149 +0xde9
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Write.func1(0xc820222240, 0xc82028aeb0, 0xc82029ce00, 0xc82028aec0, 0xc82000d680, 0xc82028e240)
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:392 +0x3f
created by github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Write
    /tmp/tmp.52T7DyVJmO/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:398 +0x69b
@jwilder
Copy link
Contributor

jwilder commented Oct 7, 2015

@pauldix Looks like DecodeSameTypeBlock is being called on an zero length Values slice which is causing the panic. https://github.com/influxdb/influxdb/blob/master/tsdb/engine/tsm1/encoding.go#L88

We could prevent this panic in DecodeSameTypeBlock, but not sure if something in the engine is calling this when it should not.

@syepes
Copy link

syepes commented Oct 8, 2015

I am also facing the same issue.
InfluxDB was running for 12 hours without any issues, it just got the following panic and now it wont start.

2015/10/08 12:40:14 InfluxDB starting, version 0.9, branch unknown, commit 93f1512, built unknown
2015/10/08 12:40:14 Go version go1.5.1, GOMAXPROCS set to 8
2015/10/08 12:40:14 Using configuration at: /opt/influxdb/influxd.conf
[metastore] 2015/10/08 12:40:14 Using data dir: /opt/influxdb/meta
[metastore] 2015/10/08 12:40:14 Node at node:8088 [Follower]
[metastore] 2015/10/08 12:40:14 Skipping cluster join: already member of cluster: nodeId=1 raftEnabled=true peers=[node:8088]
[metastore] 2015/10/08 12:40:15 Node at node:8088 [Leader]. peers=[node:8088]
[metastore] 2015/10/08 12:40:15 Updated node id=1 hostname=node:8088
[monitor] 2015/10/08 12:40:15 Starting monitor system
[monitor] 2015/10/08 12:40:15 'build' registered for diagnostics monitoring
[monitor] 2015/10/08 12:40:15 'runtime' registered for diagnostics monitoring
[monitor] 2015/10/08 12:40:15 'network' registered for diagnostics monitoring
[monitor] 2015/10/08 12:40:15 'system' registered for diagnostics monitoring
[store] 2015/10/08 12:40:15 Using data dir: /opt/influxdb/data
[monitor] 2015/10/08 12:40:15 Storing statistics in database '_internal' retention policy '', at interval 10s
[tsm1wal] 2015/10/08 12:40:15 tsm1 WAL starting with 5242880 flush memory size threshold and 104857600 max memory size threshold
[tsm1wal] 2015/10/08 12:40:15 tsm1 WAL writing to /opt/influxdb/data/ESXi/raw/2
[tsm1wal] 2015/10/08 12:40:15 startup flush of /opt/influxdb/data/ESXi/raw/2 with 0 keys and 0 total values of 0 bytes
[tsm1wal] 2015/10/08 12:40:15 /opt/influxdb/data/ESXi/raw/2 flush to index took 4.539µs
[tsm1wal] 2015/10/08 12:40:19 tsm1 WAL starting with 5242880 flush memory size threshold and 104857600 max memory size threshold
[tsm1wal] 2015/10/08 12:40:19 tsm1 WAL writing to /opt/influxdb/data/ESXi/raw/3
[tsm1wal] 2015/10/08 12:40:19 startup flush of /opt/influxdb/data/ESXi/raw/3 with 0 keys and 0 total values of 0 bytes
[tsm1wal] 2015/10/08 12:40:19 /opt/influxdb/data/ESXi/raw/3 flush to index took 3.631µs
[tsm1wal] 2015/10/08 12:40:21 tsm1 WAL starting with 5242880 flush memory size threshold and 104857600 max memory size threshold
[tsm1wal] 2015/10/08 12:40:21 tsm1 WAL writing to /opt/influxdb/data/ESXi/raw/4
[tsm1wal] 2015/10/08 12:40:24 startup flush of /opt/influxdb/data/ESXi/raw/4 with 8382 keys and 452628 total values of 7242048 bytes
[tsm1] 2015/10/08 12:40:28 rewriting index file /opt/influxdb/data/ESXi/raw/4/0001478.tsm1 with /opt/influxdb/data/ESXi/raw/4/0002062.tsm1
[tsm1] 2015/10/08 12:40:28 rewriting index file /opt/influxdb/data/ESXi/raw/4/0001957.tsm1 with /opt/influxdb/data/ESXi/raw/4/0002063.tsm1
panic: runtime error: index out of range

goroutine 309 [running]:
github.com/influxdb/influxdb/tsdb/engine/tsm1.Values.DecodeSameTypeBlock(0xc822155300, 0x0, 0x10, 0x17586288fa, 0x5f, 0x52e2a58, 0x0, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/encoding.go:88 +0x284
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).DecodeAndCombine(0xc8311cbd40, 0xc822155300, 0x0, 0x10, 0x17586288fa, 0x5f, 0x52e2a58, 0xc82b165000, 0x0, 0x4e20, ...)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:1648 +0xba
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).rewriteFile(0xc8311cbd40, 0xc831900b40, 0xc8318d3cb0, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:1107 +0x208f
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Write.func3(0xc8311cbd40, 0xc8253391b0, 0xc82981eb00, 0xc8253391f0, 0xc831900b40, 0xc8318d3cb0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:415 +0x3f
created by github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Write
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:421 +0x95c

goroutine 1 [runnable]:
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).filterDataBetweenTimes(0xc8311cbd40, 0xc831902030, 0x140b011175838000, 0x140b088b53b19802, 0xc82981eb00)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:942 +0x1de
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Write(0xc8311cbd40, 0xc8330d82a0, 0xc8319035c0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:421 +0x8f9
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Log).flush(0xc8311cbc20, 0x4, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/wal.go:617 +0xab1
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Log).readAndFlushWAL(0xc8311cbc20, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/wal.go:326 +0x1e8
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Log).Open(0xc8311cbc20, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/wal.go:155 +0x40c
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Open(0xc8311cbd40, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:261 +0x62a
github.com/influxdb/influxdb/tsdb.(*Shard).Open.func1(0xc82b188000, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/shard.go:123 +0x365
github.com/influxdb/influxdb/tsdb.(*Shard).Open(0xc82b188000, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/shard.go:100 +0x2d
github.com/influxdb/influxdb/tsdb.(*Store).loadShards(0xc820168140, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/store.go:301 +0xd2c
github.com/influxdb/influxdb/tsdb.(*Store).Open(0xc820168140, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/store.go:369 +0x35a
github.com/influxdb/influxdb/cmd/influxd/run.(*Server).Open.func1(0xc82016b9e0, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/cmd/influxd/run/server.go:365 +0xe08
github.com/influxdb/influxdb/cmd/influxd/run.(*Server).Open(0xc82016b9e0, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/cmd/influxd/run/server.go:311 +0x2d
github.com/influxdb/influxdb/cmd/influxd/run.(*Command).Run(0xc820014090, 0xc82008a010, 0x1, 0x1, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/cmd/influxd/run/command.go:121 +0xe27
main.(*Main).Run(0xc82011bf00, 0xc82008a010, 0x1, 0x1, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/cmd/influxd/main.go:86 +0x6a1
main.main()
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/cmd/influxd/main.go:46 +0x29b

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1

goroutine 20 [syscall]:
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:22 +0x18
created by os/signal.init.1
        /usr/local/go/src/os/signal/signal_unix.go:28 +0x37

goroutine 24 [select]:
github.com/hashicorp/raft.(*Raft).leaderLoop(0xc8200bc1c0)
        /opt/gcollectors/src/go/src/github.com/hashicorp/raft/raft.go:849 +0xcce
github.com/hashicorp/raft.(*Raft).runLeader(0xc8200bc1c0)
        /opt/gcollectors/src/go/src/github.com/hashicorp/raft/raft.go:821 +0x936
github.com/hashicorp/raft.(*Raft).run(0xc8200bc1c0)
        /opt/gcollectors/src/go/src/github.com/hashicorp/raft/raft.go:602 +0xbb
github.com/hashicorp/raft.(*Raft).(github.com/hashicorp/raft.run)-fm()
        /opt/gcollectors/src/go/src/github.com/hashicorp/raft/raft.go:252 +0x20
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc8200bc1c0, 0xc820126620)
        /opt/gcollectors/src/go/src/github.com/hashicorp/raft/state.go:152 +0x44
created by github.com/hashicorp/raft.(*raftState).goFunc
        /opt/gcollectors/src/go/src/github.com/hashicorp/raft/state.go:153 +0x4d

goroutine 23 [chan receive]:
github.com/influxdb/influxdb/tcp.(*listener).Accept(0xc820088088, 0x0, 0x0, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tcp/mux.go:129 +0x6f
github.com/influxdb/influxdb/meta.(*raftLayer).Accept(0xc8201b2150, 0x0, 0x0, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/meta/store.go:2066 +0x51
github.com/hashicorp/raft.(*NetworkTransport).listen(0xc820071110)
        /opt/gcollectors/src/go/src/github.com/hashicorp/raft/net_transport.go:346 +0x79
created by github.com/hashicorp/raft.NewNetworkTransport
        /opt/gcollectors/src/go/src/github.com/hashicorp/raft/net_transport.go:138 +0x284

goroutine 22 [IO wait]:
net.runtime_pollWait(0x8018c1ff8, 0x72, 0xc8200760a0)
        /usr/local/go/src/runtime/netpoll.go:157 +0x60
net.(*pollDesc).Wait(0xc820071100, 0x72, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:73 +0x3a
net.(*pollDesc).WaitRead(0xc820071100, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:78 +0x36
net.(*netFD).accept(0xc8200710a0, 0x0, 0x8018c2238, 0xc820124400)
        /usr/local/go/src/net/fd_unix.go:408 +0x27c
net.(*TCPListener).AcceptTCP(0xc820088080, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/tcpsock_posix.go:254 +0x4d
net.(*TCPListener).Accept(0xc820088080, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/tcpsock_posix.go:264 +0x3d
github.com/influxdb/influxdb/tcp.(*Mux).Serve(0xc820073800, 0x801885740, 0xc820088080, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tcp/mux.go:48 +0x4b
created by github.com/influxdb/influxdb/cmd/influxd/run.(*Server).Open.func1
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/cmd/influxd/run/server.go:349 +0xafb

goroutine 25 [select]:
github.com/hashicorp/raft.(*Raft).runFSM(0xc8200bc1c0)
        /opt/gcollectors/src/go/src/github.com/hashicorp/raft/raft.go:509 +0x10eb
github.com/hashicorp/raft.(*Raft).(github.com/hashicorp/raft.runFSM)-fm()
        /opt/gcollectors/src/go/src/github.com/hashicorp/raft/raft.go:253 +0x20
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc8200bc1c0, 0xc820126630)
        /opt/gcollectors/src/go/src/github.com/hashicorp/raft/state.go:152 +0x44
created by github.com/hashicorp/raft.(*raftState).goFunc
        /opt/gcollectors/src/go/src/github.com/hashicorp/raft/state.go:153 +0x4d

goroutine 26 [select]:
github.com/hashicorp/raft.(*Raft).runSnapshots(0xc8200bc1c0)
        /opt/gcollectors/src/go/src/github.com/hashicorp/raft/raft.go:1636 +0x493
github.com/hashicorp/raft.(*Raft).(github.com/hashicorp/raft.runSnapshots)-fm()
        /opt/gcollectors/src/go/src/github.com/hashicorp/raft/raft.go:254 +0x20
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc8200bc1c0, 0xc820126640)
        /opt/gcollectors/src/go/src/github.com/hashicorp/raft/state.go:152 +0x44
created by github.com/hashicorp/raft.(*raftState).goFunc
        /opt/gcollectors/src/go/src/github.com/hashicorp/raft/state.go:153 +0x4d

goroutine 27 [select]:
github.com/influxdb/influxdb/meta.(*localRaft).logLeaderChanges(0xc82008c960)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/meta/state.go:187 +0x4f5
created by github.com/influxdb/influxdb/meta.(*localRaft).open
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/meta/state.go:177 +0x1344

goroutine 28 [chan receive]:
github.com/influxdb/influxdb/tcp.(*listener).Accept(0xc820088090, 0x0, 0x0, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tcp/mux.go:129 +0x6f
github.com/influxdb/influxdb/meta.(*Store).serveExecListener(0xc820168280)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/meta/store.go:622 +0xa2
created by github.com/influxdb/influxdb/meta.(*Store).Open
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/meta/store.go:239 +0x364

goroutine 29 [chan receive]:
github.com/influxdb/influxdb/tcp.(*listener).Accept(0xc820088098, 0x0, 0x0, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tcp/mux.go:129 +0x6f
github.com/influxdb/influxdb/meta.(*Store).serveRPCListener(0xc820168280)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/meta/store.go:726 +0xa2
created by github.com/influxdb/influxdb/meta.(*Store).Open
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/meta/store.go:242 +0x3b6


goroutine 51 [select]:
github.com/influxdb/influxdb/cluster.(*PointsWriter).WritePoints(0xc820012f00, 0xc82b5b81c0, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/cluster/points_writer.go:237 +0x586
github.com/influxdb/influxdb/monitor.(*Monitor).storeStatistics(0xc820198000)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/monitor/service.go:379 +0xd7e
created by github.com/influxdb/influxdb/monitor.(*Monitor).Open
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/monitor/service.go:138 +0x349

goroutine 7 [semacquire]:
sync.runtime_Semacquire(0xc82016814c)
        /usr/local/go/src/runtime/sema.go:43 +0x26
sync.(*RWMutex).RLock(0xc820168140)
        /usr/local/go/src/sync/rwmutex.go:36 +0x58
github.com/influxdb/influxdb/tsdb.(*Store).WriteToShard(0xc820168140, 0x5, 0xc825339e70, 0x1, 0x1, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/store.go:380 +0x3c
github.com/influxdb/influxdb/cluster.(*PointsWriter).writeToShard.func1(0xc820012f00, 0xc82000fec0, 0x9, 0xc82532c490, 0x7, 0xc8322d2240, 0x1, 0x5, 0x1, 0xc825339e70, ...)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/cluster/points_writer.go:274 +0x108
created by github.com/influxdb/influxdb/cluster.(*PointsWriter).writeToShard
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/cluster/points_writer.go:306 +0x1c5

goroutine 6 [select]:
github.com/influxdb/influxdb/cluster.(*PointsWriter).writeToShard(0xc820012f00, 0xc82981ec60, 0xc82000fec0, 0x9, 0xc82532c490, 0x7, 0x1, 0xc825339e70, 0x1, 0x1, ...)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/cluster/points_writer.go:313 +0x8f8
github.com/influxdb/influxdb/cluster.(*PointsWriter).WritePoints.func1(0xc8322d2180, 0xc820012f00, 0xc82b5b81c0, 0xc82981ec60, 0xc82000fec0, 0x9, 0xc82532c490, 0x7, 0xc825339e70, 0x1, ...)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/cluster/points_writer.go:232 +0x97
created by github.com/influxdb/influxdb/cluster.(*PointsWriter).WritePoints
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/cluster/points_writer.go:233 +0x3ab

goroutine 307 [runnable]:
syscall.Syscall(0x4, 0x1da, 0x16f9bd3055, 0x3d, 0x3d, 0x3d, 0x0)
        /usr/local/go/src/syscall/asm_freebsd_amd64.s:21 +0x5
syscall.write(0x1da, 0x16f9bd3055, 0x3d, 0xc49b3b, 0xc8201f95d8, 0x0, 0x0)
        /usr/local/go/src/syscall/zsyscall_freebsd_amd64.go:1270 +0x5f
syscall.Write(0x1da, 0x16f9bd3055, 0x3d, 0xc49b3b, 0x0, 0x0, 0x0)
        /usr/local/go/src/syscall/syscall_unix.go:176 +0x4d
os.(*File).write(0xc82024ef48, 0x16f9bd3055, 0x3d, 0xc49b3b, 0x0, 0x0, 0x0)
        /usr/local/go/src/os/file_unix.go:232 +0xd0
os.(*File).Write(0xc82024ef48, 0x16f9bd3055, 0x3d, 0xc49b3b, 0x65, 0x0, 0x0)
        /usr/local/go/src/os/file.go:139 +0x8a
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).rewriteFile(0xc8311cbd40, 0xc8318e3f80, 0xc8318d3bf0, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:1060 +0x13db
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Write.func3(0xc8311cbd40, 0xc8253391b0, 0xc82981eb00, 0xc8253391f0, 0xc8318e3f80, 0xc8318d3bf0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:415 +0x3f
created by github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Write
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:421 +0x95c

@syepes
Copy link

syepes commented Oct 9, 2015

FYI, I have applied the patch d3a483e and I am still unable to start InflxuDB. Now I am getting and "unable to determine block type" panic.

Is the tsm1 data file corrupted?

[tsm1] 2015/10/09 08:30:08 rewriting index file /opt/graphite/storage/influxdb/data/ESXi/raw/4/0001957.tsm1 with /opt/graphite/storage/influxdb/data/ESXi/raw/4/0002063.tsm1 
panic: unable to determine block type

goroutine 310 [running]:
github.com/influxdb/influxdb/tsdb/engine/tsm1.Values.DecodeSameTypeBlock(0xc829684f00, 0x0, 0x10, 0x17592288fa, 0x5f, 0x52e2a58, 0x0, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/encoding.go:97 +0xbf
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).DecodeAndCombine(0xc8311b9c20, 0xc829684f00, 0x0, 0x10, 0x17592288fa, 0x5f, 0x52e2a58, 0xc82adf9000, 0x0, 0x4e20, ...)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:1648 +0xba
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).rewriteFile(0xc8311b9c20, 0xc8318eeae0, 0xc8318a2900, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:1107 +0x208f
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Write.func3(0xc8311b9c20, 0xc8262b8770, 0xc8327575a0, 0xc8262b8780, 0xc8318eeae0, 0xc8318a2900)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:415 +0x3f
created by github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Write
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:421 +0x95c

goroutine 1 [runnable]:
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).filterDataBetweenTimes(0xc8311b9c20, 0xc8318f0000, 0x140b088b53b19802, 0x140b011175838000, 0xc8327575a0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:942 +0x1de
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Write(0xc8311b9c20, 0xc8319e00c0, 0xc8318f15f0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:412 +0x857
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Log).flush(0xc8311b9b00, 0x4, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/wal.go:617 +0xab1
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Log).readAndFlushWAL(0xc8311b9b00, 0x0, 0x0)

@pauldix
Copy link
Member

pauldix commented Oct 9, 2015

@syepes how big is /opt/graphite/storage/influxdb/data/ESXi/raw/4/0001957.tsm1?

@syepes
Copy link

syepes commented Oct 9, 2015

@pauldix , It not that big, can I just remove this file?

-rw-r--r--  1 root  influxd    83M Oct  8 01:38 /opt/graphite/storage/influxdb/data/ESXi/raw/4/0001957.tsm1

@pauldix
Copy link
Member

pauldix commented Oct 9, 2015

@syepes you can remove the file, but then you'll lose everything that was in it. Can you grep your log for that file name?

Also, would it be possible to zip that up along with the ids, fields, series, and collisions files?

@syepes
Copy link

syepes commented Oct 9, 2015

Ok, no problem. Where should i upload it?

jwilder added a commit that referenced this issue Oct 9, 2015
If DecodeSameTypeBlock is called on on an empty Values slice, it would
panic with an index out of bounds error.  This func can actually be removed
because DecodeBlock can determine what type of values are encoded already.

This will still panic if the block cannot be decoded due to other reasons.

Fixes #4365
@beckettsean
Copy link
Contributor

@syepes you can upload your files using this link: http://cwok.me/cu/ghi4365

@syepes
Copy link

syepes commented Oct 9, 2015

@beckettsean I have just uploaded the requested files
Just started the collecting process using a fresh InfluxDB lets see how long it resists ;-)

@syepes
Copy link

syepes commented Oct 10, 2015

@beckettsean I think this issue is not totally resolved, after starting a fresh InfluxDB (d8d87d9) this morning I have gotten another "index out of range" panic:

[tsm1] 2015/10/10 09:39:09 Compaction of /opt/graphite/storage/influxdb/data/ESXi/raw/4 took 2m41.842953993s
[tsm1] 2015/10/10 09:39:09 rewriting index file /opt/graphite/storage/influxdb/data/ESXi/raw/4/0001538.tsm1 with /opt/graphite/storage/influxdb/data/ESXi/raw/4/0001547.tsm1
[tsm1] 2015/10/10 09:39:10 rewriting index file /opt/graphite/storage/influxdb/data/ESXi/raw/4/0001545.tsm1 with /opt/graphite/storage/influxdb/data/ESXi/raw/4/0001548.tsm1
[tsm1] 2015/10/10 09:39:10 rewriting index file /opt/graphite/storage/influxdb/data/ESXi/raw/4/0001546.tsm1 with /opt/graphite/storage/influxdb/data/ESXi/raw/4/0001549.tsm1
panic: runtime error: index out of range

goroutine 51838 [running]:
github.com/influxdb/influxdb/tsdb/engine/tsm1.Values.MinTime(0xc831f25e80, 0x0, 0x0, 0x140bc1fd7ec37000)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/encoding.go:65 +0x75
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).DecodeAndCombine(0xc852af0240, 0xc831f25e80, 0x0, 0x0, 0x9edd5035a, 0x3c, 0x9d0a0aa, 0xc83e1e2000, 0x0, 0x4e20, ...)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:1668 +0x6bd
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).rewriteFile(0xc852af0240, 0xc8451a13e0, 0xc855897560, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:1111 +0x208f
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Write.func3(0xc852af0240, 0xc856eb0c60, 0xc85e2ea700, 0xc856eb0c70, 0xc8451a13e0, 0xc855897560)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:415 +0x3f
created by github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Write
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:421 +0x95c

goroutine 1 [chan receive, 517 minutes]:
main.(*Main).Run(0xc820053f10, 0xc82008a010, 0x1, 0x1, 0x0, 0x0)
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/cmd/influxd/main.go:96 +0xa39
main.main()
        /opt/gcollectors/src/go/src/github.com/influxdb/influxdb/cmd/influxd/main.go:46 +0x29b

goroutine 17 [syscall, 517 minutes, locked to thread]:
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1

goroutine 20 [syscall, 517 minutes]:
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:22 +0x18
created by os/signal.init.1
        /usr/local/go/src/os/signal/signal_unix.go:28 +0x37
# ls -l /opt/graphite/storage/influxdb/data/ESXi/raw/4/*00015*
-rw-r--r--  1 root  influxd   1.0M Oct 10 02:57 0000157.tsm1
-rw-r--r--  1 root  influxd   6.5M Oct 10 02:57 0000158.tsm1
-rw-r--r--  1 root  influxd    12M Oct 10 02:57 0000159.tsm1
-rw-r--r--  1 root  influxd   5.3M Oct 10 09:24 0001500.tsm1
-rw-r--r--  1 root  influxd    13M Oct 10 09:24 0001501.tsm1
-rw-r--r--  1 root  influxd   5.3M Oct 10 09:27 0001502.tsm1
-rw-r--r--  1 root  influxd    11M Oct 10 09:27 0001503.tsm1
-rw-r--r--  1 root  influxd   1.7M Oct 10 09:29 0001511.tsm1
-rw-r--r--  1 root  influxd   7.1M Oct 10 09:29 0001512.tsm1
-rw-r--r--  1 root  influxd    13M Oct 10 09:29 0001513.tsm1
-rw-r--r--  1 root  influxd    11M Oct 10 09:33 0001540.tsm1
-rw-r--r--  1 root  influxd    13M Oct 10 09:33 0001542.tsm1
-rw-r--r--  1 root  influxd    11M Oct 10 09:34 0001544.tsm1
-rw-r--r--  1 root  influxd   6.6M Oct 10 09:36 0001545.tsm1
-rw-r--r--  1 root  influxd   157M Oct 10 09:39 0001546.tsm1
-rw-r--r--  1 root  influxd   445K Oct 10 09:39 0001547.tsm1
-rw-r--r--  1 root  influxd   2.4M Oct 10 09:39 0001548.tsm1
-rw-r--r--  1 root  influxd     0B Oct 10 09:39 0001548.tsm1.check
-rw-r--r--  1 root  influxd    13K Oct 10 09:39 0001549.tsm1
-rw-r--r--  1 root  influxd     0B Oct 10 09:39 0001549.tsm1.check

@beckettsean
Copy link
Contributor

@jwilder

@beckettsean beckettsean reopened this Oct 10, 2015
@pauldix
Copy link
Member

pauldix commented Oct 10, 2015

This is probably fixed by #4402. Although you'll have to start with fresh data.

@syepes
Copy link

syepes commented Oct 10, 2015

@pauldix Thanks, no problem I will try that out when it gets merged.

@jwilder
Copy link
Contributor

jwilder commented Oct 16, 2015

#4402 was merged.

@daviesalex
Copy link
Contributor

We were seeing this panic fairly frequently; we will test with a new nightly next week and report back.

@Issif
Copy link

Issif commented Oct 19, 2015

Hi,

I still have problem, I tested with

version 0.9.5-nightly-d9f0413, branch master, commit d9f04132ef567bb9671690e4db226ff3dab9feb5, built '2015-10-17T04:00:42+0000'

I'm using a single node with almost 10k points/s.

Thanks.

@pauldix
Copy link
Member

pauldix commented Oct 19, 2015

@Issif did you start with a fresh database?

@Issif
Copy link

Issif commented Oct 19, 2015

I did. I erased everything in /wal and /data as you adviced in your previous answer.

Thanks.

@pauldix
Copy link
Member

pauldix commented Oct 19, 2015

@Issif What does ls look like in the shard directory for the one that's causing a panic? Any chance you could share your files?

@jwilder jwilder removed their assignment Oct 19, 2015
jwilder added a commit that referenced this issue Oct 19, 2015
When rewriting a tsm file, a panice on the Values slice could happen
if there were no values in the slice and the conditions of the rewrite
causes DecodeAndCombine to be called with the empty slice.  This could
happen is the sizes of the points new values was equal to
the MaxPointsInBlock config options and there were no future blocks after
the current one being written.

When this happens, DecodeAndCombine returns a zero length remaining values
slice which is passed back into DecodeAndCombine one last time.  In this case,
we now just return the original block since there is nothing new to combine.

Fixes #4444 #4365
@jwilder
Copy link
Contributor

jwilder commented Oct 19, 2015

Fixed via #4504

@jwilder jwilder closed this as completed Oct 19, 2015
@daviesalex
Copy link
Contributor

FWIW, we upgraded about 8 hours ago and have not had a crash since. We were previously seeing crashes far more frequently than this so #4402 looks good.

We have not yet tried to restart a node, which also triggered this panic. We will try that with #4504 included in the build.

@jwilder
Copy link
Contributor

jwilder commented Oct 19, 2015

@daviesalex Let us know if you hit it after #4504. I was able to reproduce it with #4402 so it was not completely fixed by that patch.

@Issif
Copy link

Issif commented Oct 20, 2015

Hi,

I still have this issue, with

version 0.9.5-nightly-66a31ee, branch master, commit 66a31ee9bcd9b5e0f5cc7a2f0b2ceda17f386e49, built '2015-10-20T04:00:43+0000'

My configuration : https://gist.github.com/Issif/0893c15e2cd75156d694

Error logs : https://gist.github.com/Issif/f9b77a2f2aab812696b2

And a tree in shard directory :

.
└── [4.0K Oct 20 10:18]  default
    ├── [4.0K Oct 20 10:18]  1
    │   ├── [ 18M Oct 20 10:18]  0000001.tsm1
    │   ├── [ 45K Oct 20 10:18]  0000002.tsm1
    │   ├── [5.9K Oct 20 10:18]  0000003.tsm1
    │   ├── [ 36K Oct 20 10:18]  0000004.tsm1
    │   ├── [   0 Oct 20 10:18]  0000004.tsm1.check
    │   ├── [2.1M Oct 20 10:18]  _00007.wal
    │   ├── [2.2M Oct 20 10:18]  _00008.wal
    │   ├── [2.0M Oct 20 10:18]  _00009.wal
    │   ├── [2.2M Oct 20 10:18]  _00010.wal
    │   ├── [2.2M Oct 20 10:18]  _00011.wal
    │   ├── [1.6M Oct 20 10:18]  _00012.wal
    │   ├── [2.1M Oct 20 10:18]  _00013.wal
    │   ├── [1.7M Oct 20 10:18]  _00014.wal
    │   ├── [ 988 Oct 20 10:18]  fields
    │   ├── [ 14M Oct 20 10:18]  ids
    │   └── [ 12M Oct 20 10:18]  series
    └── [4.0K Oct 20 10:18]  3
        ├── [5.5K Oct 20 10:18]  0000001.tsm1
        ├── [ 200 Oct 20 10:18]  fields
        ├── [3.0K Oct 20 10:18]  ids
        └── [2.7K Oct 20 10:18]  series

It crashes after few seconds each time (started with no data again).

@jwilder
Copy link
Contributor

jwilder commented Oct 20, 2015

@lssif looks like a slightly different panic this time. I'll take a look. Thanks.

@daviesalex
Copy link
Contributor

Overnight we saw the panic that you all expected (in tsm1.Values.MinTime), will upgrade and re-test. Will test with #4504 included.

[http] 2015/10/19 17:27:14 7.132.32.32 - - [19/Oct/2015:17:27:14 -0500] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 8c86118e-76b0-11e5-afd7-000000000000 46.168465ms
[tcp] 2015/10/19 17:27:14 accept remote connection from 7.128.12.181:36706
[tsm1] 2015/10/19 17:27:15 rewriting index file /data/influxdb-data/metrics/telegraf/default/4/0000499.tsm1 with /data/influxdb-data/metrics/telegraf/default/4/0000500.tsm1
[http] 2015/10/19 17:27:15 7.132.32.32 - - [19/Oct/2015:17:27:13 -0500] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 8bc45b2c-76b0-11e5-adfc-000000000000 1.779592096s
panic: runtime error: index out of range

goroutine 32839546 [running]:
github.com/influxdb/influxdb/tsdb/engine/tsm1.Values.MinTime(0xd15e604380, 0x0, 0x0, 0x140eb36458c64c00)
        /tmp/tmp.Zr5dXhOkZV/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/encoding.go:65 +0x75
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).DecodeAndCombine(0xc8201dcd80, 0xd15e604380, 0x0, 0x0, 0x7ffcabc2d282, 0x2f, 0x979734a5, 0xd43613c000, 0x0, 0x4e20, ...)
        /tmp/tmp.Zr5dXhOkZV/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:1668 +0x6bd
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).rewriteFile(0xc8201dcd80, 0xd94bd45ce0, 0xcb903de420, 0x0, 0x0)
        /tmp/tmp.Zr5dXhOkZV/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:1111 +0x220b
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Write.func3(0xc8201dcd80, 0xd6971943e0, 0xcaf7fa3ca0, 0xd6971943f0, 0xd94bd45ce0, 0xcb903de420)
        /tmp/tmp.Zr5dXhOkZV/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:415 +0x3f
created by github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Write
        /tmp/tmp.Zr5dXhOkZV/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:421 +0x95c

goroutine 1 [chan receive, 427 minutes]:
main.(*Main).Run(0xc8206eff10, 0xc82000a060, 0x4, 0x4, 0x0, 0x0)
        /tmp/tmp.Zr5dXhOkZV/src/github.com/influxdb/influxdb/cmd/influxd/main.go:96 +0xa39
main.main()
        /tmp/tmp.Zr5dXhOkZV/src/github.com/influxdb/influxdb/cmd/influxd/main.go:46 +0x29b

goroutine 17 [syscall, 427 minutes, locked to thread]:
runtime.goexit()
        /root/.gvm/gos/go1.5.1/src/runtime/asm_amd64.s:1696 +0x1

goroutine 5 [syscall, 427 minutes]:
os/signal.loop()
        /root/.gvm/gos/go1.5.1/src/os/signal/signal_unix.go:22 +0x18
created by os/signal.init.1
        /root/.gvm/gos/go1.5.1/src/os/signal/signal_unix.go:28 +0x37

goroutine 34 [chan receive, 81 minutes]:
github.com/influxdb/influxdb/tcp.(*listener).Accept(0xc8201760a8, 0x0, 0x0, 0x0, 0x0)
        /tmp/tmp.Zr5dXhOkZV/src/github.com/influxdb/influxdb/tcp/mux.go:129 +0x6f
github.com/influxdb/influxdb/meta.(*Store).serveExecListener(0xc8201aa280)
        /tmp/tmp.Zr5dXhOkZV/src/github.com/influxdb/influxdb/meta/store.go:622 +0xa2
created by github.com/influxdb/influxdb/meta.(*Store).Open
        /tmp/tmp.Zr5dXhOkZV/src/github.com/influxdb/influxdb/meta/store.go:239 +0x364

goroutine 19 [runnable]:
net.runtime_pollWait(0x7ffff4f7b430, 0x72, 0xc820010160)
        /root/.gvm/gos/go1.5.1/src/runtime/netpoll.go:157 +0x60
net.(*pollDesc).Wait(0xc820198290, 0x72, 0x0, 0x0)
        /root/.gvm/gos/go1.5.1/src/net/fd_poll_runtime.go:73 +0x3a
net.(*pollDesc).WaitRead(0xc820198290, 0x0, 0x0)
        /root/.gvm/gos/go1.5.1/src/net/fd_poll_runtime.go:78 +0x36
net.(*netFD).accept(0xc820198230, 0x0, 0x7ffff6fc1760, 0xd8977cce60)
        /root/.gvm/gos/go1.5.1/src/net/fd_unix.go:408 +0x27c
net.(*TCPListener).AcceptTCP(0xc820176098, 0xcf37e5eec8, 0x0, 0x0)
        /root/.gvm/gos/go1.5.1/src/net/tcpsock_posix.go:254 +0x4d
net.(*TCPListener).Accept(0xc820176098, 0x0, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.5.1/src/net/tcpsock_posix.go:264 +0x3d
github.com/influxdb/influxdb/tcp.(*Mux).Serve(0xc8201734c0, 0x7ffff4f7a488, 0xc820176098, 0x0, 0x0)
        /tmp/tmp.Zr5dXhOkZV/src/github.com/influxdb/influxdb/tcp/mux.go:48 +0x4b
created by github.com/influxdb/influxdb/cmd/influxd/run.(*Server).Open.func1
        /tmp/tmp.Zr5dXhOkZV/src/github.com/influxdb/influxdb/cmd/influxd/run/server.go:365 +0xafb

@otoolep
Copy link
Contributor

otoolep commented Oct 20, 2015

#4513 opened to track new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants