Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.8.2 crashes soon after starting #1040

Closed
wants to merge 2 commits into from
Closed

v0.8.2 crashes soon after starting #1040

wants to merge 2 commits into from

Conversation

jvshahid
Copy link
Contributor

@dgnorton
Copy link
Contributor Author

To build v0.8.2, I had to...

cd /tmp
rm -rf rocksdb.influxdb.amd64
git clone https://github.com/facebook/rocksdb rocksdb.influxdb.amd64
cd rocksdb.influxdb.amd64
git checkout rocksdb-3.4
make

cd ~/go/src/github.com/influxdb/rocksdb
git checkout a2c58a9

cd ~/go/src/github.com/influxdb/influxdb
git checkout v0.8.2
make build

@dgnorton
Copy link
Contributor Author

+---------------------------------------------+
|  _____        __ _            _____  ____   |
| |_   _|      / _| |          |  __ \|  _ \  |
|   | |  _ __ | |_| |_   ___  _| |  | | |_) | |
|   | | | '_ \|  _| | | | \ \/ / |  | |  _ <  |
|  _| |_| | | | | | | |_| |>  <| |__| | |_) | |
| |_____|_| |_|_| |_|\__,_/_/\_\_____/|____/  |
+---------------------------------------------+

[10/16/14 15:30:09] [INFO] Redirectoring logging to stdout
[10/16/14 15:30:09] [INFO] Starting Influx Server dev bound to 0.0.0.0...
[10/16/14 15:30:09] [INFO] Opening database at /tmp/influxdb/development/db
[10/16/14 15:30:09] [INFO] Opening wal in /tmp/influxdb/development/wal
[10/16/14 15:30:09] [INFO] Ssl will be disabled since the ssl port or certificate path weren't set
[10/16/14 15:30:09] [INFO] Initializing Raft HTTP server
[10/16/14 15:30:09] [INFO] Raft Server Listening at 0.0.0.0:8090
[10/16/14 15:30:09] [INFO] Initializing Raft Server: http://maple:8090
[10/16/14 15:30:09] [INFO] Recovered from log
[10/16/14 15:30:09] [INFO] Waiting for local server to be added
[10/16/14 15:30:10] [INFO] (raft:238de28c887f29ef) Selected as leader. Starting leader loop.
[10/16/14 15:30:11] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:12] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:13] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:14] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:15] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:16] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:17] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:18] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:19] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:20] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:21] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:22] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:23] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:24] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:25] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:26] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:27] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:28] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:29] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:30] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:31] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:32] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:33] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:34] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:35] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:36] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:37] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:38] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:39] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:40] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:41] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:42] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:43] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:44] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:45] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:46] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:47] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:48] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:49] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:50] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:51] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:52] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:53] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:54] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:55] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:56] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:57] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:58] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:30:59] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:31:00] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:31:01] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:31:02] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:31:03] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:31:04] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:31:05] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:31:06] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:31:07] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:31:08] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 15:31:09] [DEBG] Testing if we should compact the raft logs
panic: runtime error: index out of range

goroutine 26 [running]:
runtime.panic(0xbfd4e0, 0x12106dc)
    /home/dgnorton/src/go/src/pkg/runtime/panic.c:279 +0xf5
github.com/influxdb/influxdb/_vendor/raft.(*Log).commitInfo(0xc2080c9500, 0x0, 0x0)
    /home/dgnorton/go/src/github.com/influxdb/influxdb/_vendor/raft/log.go:300 +0x33a
github.com/influxdb/influxdb/_vendor/raft.(*server).TakeSnapshot(0xc20808eea0, 0x0, 0x0)
    /home/dgnorton/go/src/github.com/influxdb/influxdb/_vendor/raft/server.go:1192 +0x2aa
github.com/influxdb/influxdb/coordinator.(*RaftServer).ForceLogCompaction(0xc208023810, 0x0, 0x0)
    /home/dgnorton/go/src/github.com/influxdb/influxdb/coordinator/raft_server.go:334 +0x6a
github.com/influxdb/influxdb/coordinator.(*RaftServer).CompactLog(0xc208023810)
    /home/dgnorton/go/src/github.com/influxdb/influxdb/coordinator/raft_server.go:359 +0x2cf
created by github.com/influxdb/influxdb/coordinator.(*RaftServer).startRaft
    /home/dgnorton/go/src/github.com/influxdb/influxdb/coordinator/raft_server.go:401 +0x487

goroutine 16 [chan receive]:
github.com/influxdb/influxdb/server.(*Server).ListenAndServe(0xc20802c000, 0x0, 0x0)
    /home/dgnorton/go/src/github.com/influxdb/influxdb/server/server.go:93 +0x114
main.main()
    /home/dgnorton/go/src/github.com/influxdb/influxdb/daemon/influxd.go:166 +0xec7

goroutine 19 [finalizer wait, 1 minutes]:
runtime.park(0x50d290, 0x1228400, 0x1212989)
    /home/dgnorton/src/go/src/pkg/runtime/proc.c:1369 +0x89
runtime.parkunlock(0x1228400, 0x1212989)
    /home/dgnorton/src/go/src/pkg/runtime/proc.c:1385 +0x3b
runfinq()
    /home/dgnorton/src/go/src/pkg/runtime/mgc0.c:2644 +0xcf
runtime.goexit()
    /home/dgnorton/src/go/src/pkg/runtime/proc.c:1445

goroutine 20 [syscall, 1 minutes]:
os/signal.loop()
    /home/dgnorton/src/go/src/pkg/os/signal/signal_unix.go:21 +0x1e
created by os/signal.init·1
    /home/dgnorton/src/go/src/pkg/os/signal/signal_unix.go:27 +0x32

goroutine 21 [chan receive, 1 minutes]:
code.google.com/p/log4go.ConsoleLogWriter.run(0xc20806e000, 0x7f9b854e8440, 0xc208038008)
    /home/dgnorton/go/src/code.google.com/p/log4go/termlog.go:27 +0x79
created by code.google.com/p/log4go.NewConsoleLogWriter
    /home/dgnorton/go/src/code.google.com/p/log4go/termlog.go:19 +0x68

goroutine 17 [syscall, 1 minutes]:
runtime.goexit()
    /home/dgnorton/src/go/src/pkg/runtime/proc.c:1445

goroutine 22 [chan receive]:
code.google.com/p/log4go.ConsoleLogWriter.run(0xc20806f760, 0x7f9b854e8440, 0xc208038008)
    /home/dgnorton/go/src/code.google.com/p/log4go/termlog.go:27 +0x79
created by code.google.com/p/log4go.NewConsoleLogWriter
    /home/dgnorton/go/src/code.google.com/p/log4go/termlog.go:19 +0x68

goroutine 23 [chan receive, 1 minutes]:
github.com/influxdb/influxdb/wal.(*WAL).processEntries(0xc208028770)
    /home/dgnorton/go/src/github.com/influxdb/influxdb/wal/wal.go:252 +0x64
created by github.com/influxdb/influxdb/wal.NewWAL
    /home/dgnorton/go/src/github.com/influxdb/influxdb/wal/wal.go:103 +0xa53

goroutine 24 [sleep, 1 minutes]:
time.Sleep(0x8bb2c97000)
    /home/dgnorton/src/go/src/pkg/runtime/time.goc:39 +0x31
github.com/influxdb/influxdb/cluster.func·001()
    /home/dgnorton/go/src/github.com/influxdb/influxdb/cluster/cluster_configuration.go:170 +0x3d
created by github.com/influxdb/influxdb/cluster.(*ClusterConfiguration).CreateFutureShardsAutomaticallyBeforeTimeComes
    /home/dgnorton/go/src/github.com/influxdb/influxdb/cluster/cluster_configuration.go:177 +0x63

goroutine 49 [chan receive, 1 minutes]:
main.waitForSignals(0x7f9b854ea768, 0xc20802c000)
    /home/dgnorton/go/src/github.com/influxdb/influxdb/daemon/null_profiler.go:23 +0x14c
created by main.startProfiler
    /home/dgnorton/go/src/github.com/influxdb/influxdb/daemon/null_profiler.go:15 +0x4b

goroutine 50 [IO wait, 1 minutes]:
net.runtime_pollWait(0x7f9b854eb790, 0x72, 0x0)
    /home/dgnorton/src/go/src/pkg/runtime/netpoll.goc:146 +0x66
net.(*pollDesc).Wait(0xc2080c8060, 0x72, 0x0, 0x0)
    /home/dgnorton/src/go/src/pkg/net/fd_poll_runtime.go:84 +0x46
net.(*pollDesc).WaitRead(0xc2080c8060, 0x0, 0x0)
    /home/dgnorton/src/go/src/pkg/net/fd_poll_runtime.go:89 +0x42
net.(*netFD).accept(0xc2080c8000, 0xd95130, 0x0, 0x7f9b854e82b8, 0xb)
    /home/dgnorton/src/go/src/pkg/net/fd_unix.go:419 +0x343
net.(*TCPListener).AcceptTCP(0xc208038038, 0xc2080c2028, 0x0, 0x0)
    /home/dgnorton/src/go/src/pkg/net/tcpsock_posix.go:234 +0x5d
net.(*TCPListener).Accept(0xc208038038, 0x0, 0x0, 0x0, 0x0)
    /home/dgnorton/src/go/src/pkg/net/tcpsock_posix.go:244 +0x4b
net/http.(*Server).Serve(0xc2080043c0, 0x7f9b854ea7e8, 0xc208038038, 0x0, 0x0)
    /home/dgnorton/src/go/src/pkg/net/http/server.go:1698 +0x91
github.com/influxdb/influxdb/coordinator.func·008()
    /home/dgnorton/go/src/github.com/influxdb/influxdb/coordinator/raft_server.go:579 +0x3a
created by github.com/influxdb/influxdb/coordinator.(*RaftServer).Serve
    /home/dgnorton/go/src/github.com/influxdb/influxdb/coordinator/raft_server.go:583 +0x516

goroutine 25 [select]:
github.com/influxdb/influxdb/_vendor/raft.(*server).leaderLoop(0xc20808eea0)
    /home/dgnorton/go/src/github.com/influxdb/influxdb/_vendor/raft/server.go:834 +0x6c5
github.com/influxdb/influxdb/_vendor/raft.(*server).loop(0xc20808eea0)
    /home/dgnorton/go/src/github.com/influxdb/influxdb/_vendor/raft/server.go:609 +0x3d1
github.com/influxdb/influxdb/_vendor/raft.func·007()
    /home/dgnorton/go/src/github.com/influxdb/influxdb/_vendor/raft/server.go:470 +0x5d
created by github.com/influxdb/influxdb/_vendor/raft.(*server).Start
    /home/dgnorton/go/src/github.com/influxdb/influxdb/_vendor/raft/server.go:471 +0x3b8

goroutine 34 [select]:
github.com/influxdb/influxdb/coordinator.(*RaftServer).raftLeaderLoop(0xc208023810, 0xc209c2e140, 0xc209c2e100)
    /home/dgnorton/go/src/github.com/influxdb/influxdb/coordinator/raft_server.go:460 +0x34c
created by github.com/influxdb/influxdb/coordinator.(*RaftServer).raftEventHandler
    /home/dgnorton/go/src/github.com/influxdb/influxdb/coordinator/raft_server.go:448 +0x268

@dgnorton dgnorton changed the title v0.8.2 server won't start v0.8.2 crashes soon after starting Oct 16, 2014
@dgnorton
Copy link
Contributor Author

Same stack trace with v0.8.3 and master.

@dgnorton
Copy link
Contributor Author

Duplicate of #936.

@dgnorton
Copy link
Contributor Author

With Trace enabled in raft there are a few interesting tidbits that get logged...

[raft]21:43:41.956589 [238de28c887f29ef Term:0] readConf.open  /tmp/influxdb/development/raft/conf
[raft]21:43:41.956659 update.commit.index  0
[raft]21:43:41.956668 log.open.open  /tmp/influxdb/development/raft/log
[raft]21:43:41.956676 log.open.exist  /tmp/influxdb/development/raft/log
[raft]21:43:41.956708 open.log.append log index  1300033
[raft]21:43:41.956723 open.log.append log index  1300034

<snip> thousands of lines </snip>

[raft]21:43:42.616504 open.log.append log index  1369410
[raft]21:43:42.616510 open.log.append log index  1369411
[raft]21:43:42.616517 open.log.append: finish
[raft]21:43:42.616520 open.log.recovery number of log  69379
[raft]21:43:42.616533 [238de28c887f29ef Term:1403] start from previous saved state
[raft]21:43:42.616538 Name: 238de28c887f29ef, State: follower, Term: 1403, CommitedIndex: 0
[10/16/14 21:43:42] [INFO] Recovered from log
[10/16/14 21:43:42] [INFO] Waiting for local server to be added
un  follower
[raft]21:43:44.077757 [238de28c887f29ef Term:1403] server.loop.run  candidate
[raft]21:43:44.077956 [238de28c887f29ef Term:1404] server.candidate.recv.enough.votes
[raft]21:43:44.078063 [238de28c887f29ef Term:1404] server.loop.run  leader
[raft]21:43:44.078096 [238de28c887f29ef Term:1404] leaderLoop.set.PrevIndex to  1369411
[raft]21:43:44.078172 [238de28c887f29ef Term:1404] server.command.process
[raft]21:43:44.078282 raft.Log: Commit index 1369412 set back to  69380
[raft]21:43:44.078451 setCommitIndex.set.result index: 1, entries index: 0
[raft]21:43:44.078509 setCommitIndex.set.result index: 2, entries index: 1

<snip> thousands of lines </snip>

[raft]21:43:44.888281 setCommitIndex.set.result index: 69379, entries index: 69378
[raft]21:43:44.888284 setCommitIndex.set.result index: 69380, entries index: 69379
[raft]21:43:44.888295 [238de28c887f29ef Term:1404] commit index  1369412
[10/16/14 21:43:45] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 21:43:46] [DEBG] (raft:238de28c887f29ef) Executing leader loop.

<snip /> 

[10/16/14 21:44:42] [DEBG] (raft:238de28c887f29ef) Executing leader loop.
[10/16/14 21:44:42] [DEBG] Testing if we should compact the raft logs
[raft]21:44:42.617022 commitInfo.get.[ 1369412 / 0 ]
[10/16/14 21:44:43] [DEBG] (raft:238de28c887f29ef) Executing leader loop.

<snip />

@dgnorton
Copy link
Contributor Author

  • open func in raft/log.go reads each entry from the file and appends it to an array, starting at index 0
  • each entry has an index value stored in it (populated from the log file)
  • processCommand func in raft/server.go does commitIndex := s.log.currentIndex() followed by s.log.setCommitIndex(commitIndex)
  • currentIndex func returns the index value stored in the last entry in the array, which is much larger than the number of entries in the array, so the next time commitInfo is called it panics when using this large value as an index into the entries array

@benbjohnson thoughts? What are the indexes stored in the log?

@benbjohnson
Copy link
Contributor

@dgnorton It sounds like the Log.startIndex is not getting set during snapshot based on this line:

[raft]21:44:42.617022 commitInfo.get.[ 1369412 / 0 ]

The startIndex should represent the index of the first log entry since snapshot (or 0 if no snapshot has been taken).

It looks like the startIndex only ever gets set in two places:

tbh, I didn't work on the snapshotting in go-raft so I'm not as familiar with it. Let me know if this helps or if you need me to dig around further.

// save writes the snapshot to file.
func (ss *Snapshot) save() error {
tx := goback.Begin()
defer tx.Rollback()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is a little bit counter intuitive. The functions passed to tx.Exec are called when the transaction aborts not when the transaction commits. I understand these are cleanup steps, but I think the api should make this obvious, right now i think it's confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants