New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ERROR] thread 'raftstore-4' panicked '[region 2] 6 unexpected raft log index: last_index 522662 < applied_index 522679 #4731

Open
lotaku opened this Issue Oct 10, 2017 · 4 comments

Comments

Projects
None yet
4 participants
@lotaku

lotaku commented Oct 10, 2017

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    If possible, provide a recipe for reproducing the error.
    My KVM hosts down last night. when i restart tikv-server (sh /home/tidb/deploy/scripts/run_tikv.sh) i get the bellow erros.

  2. What did you expect to see?
    start ok

  3. What did you see instead?
    `
    2017/10/10 11:18:27.905 config.rs:704: [INFO] kernel parameters net.core.somaxconn: 32768
    2017/10/10 11:18:27.905 config.rs:704: [INFO] kernel parameters net.ipv4.tcp_syncookies: 0
    2017/10/10 11:18:27.905 tikv-server.rs:136: [WARN] Limit("kernel parameters vm.swappiness got 30, expect 0")
    2017/10/10 11:18:27.927 util.rs:430: [INFO] connect to PD leader "http://10.3.1.6:2379"
    2017/10/10 11:18:27.927 util.rs:368: [INFO] All PD endpoints are consistent: ["10.3.1.2:2379", "10.3.1.4:2379", "10.3.1.6:2379"]
    2017/10/10 11:18:27.929 tikv-server.rs:478: [INFO] connect to PD cluster 6468143079661514751
    2017/10/10 11:18:27.973 mod.rs:475: [INFO] storage RaftKv started.
    2017/10/10 11:18:27.973 mod.rs:203: [INFO] starting working thread: store address resolve worker
    2017/10/10 11:18:28.017 node.rs:331: [INFO] start raft store 4 thread
    2017/10/10 11:18:28.019 peer.rs:275: [INFO] [region 2] create peer with id 6

2017/10/10 11:20:05.544 panic_hook.rs:99: [ERROR] thread 'raftstore-4' panicked '[region 2] 6 unexpected raft log index: last_index 522662 < applied_index 522679' at "src/raftstore/store/peer_storage.rs:483"
stack backtrace:
0: 0x7f3d91c4f90e - backtrace::backtrace::libunwind::trace
at /home/jenkins/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.2.3/src/backtrace/libunwind.rs:54
- backtrace::backtrace::trace
at /home/jenkins/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.2.3/src/backtrace/mod.rs:70
1: 0x7f3d91c50093 - backtrace::capture::{{impl}}::new
at /home/jenkins/workspace/build_unportable_tikv_master/go/src/github.com/pingcap/tikv/target/release/build/backtrace-9ec19ed81b12399b/out/capture.rs:79
2: 0x7f3d91b406c5 - tikv::util::panic_hook::set_exit_hook::{{closure}}
at src/util/panic_hook.rs:98
3: 0x7f3d92238816 - std::panicking::rust_panic_with_hook
at /checkout/src/libstd/panicking.rs:611
4: 0x7f3d922386a4 - std::panicking::begin_panicalloc::string::String
at /checkout/src/libstd/panicking.rs:571
5: 0x7f3d922385a9 - std::panicking::begin_panic_fmt
at /checkout/src/libstd/panicking.rs:521
6: 0x7f3d91bab99f - tikv::raftstore::store::peer_storage::{{impl}}::new
at src/raftstore/store/peer_storage.rs:483
7: 0x7f3d9193d28f - tikv::raftstore::store::peer::{{impl}}::new<tikv::server::transport::ServerTransport<tikv::server::transport::ServerRaftStoreRouter, tikv::server::resolve::PdStoreAddrResolver>,tikv::pd::client::RpcClient>
at /home/jenkins/workspace/build_unportable_tikv_master/go/src/github.com/pingcap/tikv/src/raftstore/store/peer.rs:307
8: 0x7f3d91940f3c - tikv::raftstore::store::peer::{{impl}}::create<tikv::server::transport::ServerTransport<tikv::server::transport::ServerRaftStoreRouter, tikv::server::resolve::PdStoreAddrResolver>,tikv::pd::client::RpcClient>
at /home/jenkins/workspace/build_unportable_tikv_master/go/src/github.com/pingcap/tikv/src/raftstore/store/peer.rs:280
9: 0x7f3d9196f8d0 - tikv::raftstore::store::store::{{impl}}::init::{{closure}}<tikv::server::transport::ServerTransport<tikv::server::transport::ServerRaftStoreRouter, tikv::server::resolve::PdStoreAddrResolver>,tikv::pd::client::RpcClient>
at /home/jenkins/workspace/build_unportable_tikv_master/go/src/github.com/pingcap/tikv/src/raftstore/store/store.rs:288
- tikv::raftstore::store::engine::scan_impl
at /home/jenkins/workspace/build_unportable_tikv_master/go/src/github.com/pingcap/tikv/src/raftstore/store/engine.rs:289
- tikv::raftstore::store::engine::Iterable::scan_cfrocksdb::rocksdb::DB,closure
at /home/jenkins/workspace/build_unportable_tikv_master/go/src/github.com/pingcap/tikv/src/raftstore/store/engine.rs:265
- tikv::raftstore::store::store::{{impl}}::init<tikv::server::transport::ServerTransport<tikv::server::transport::ServerRaftStoreRouter, tikv::server::resolve::PdStoreAddrResolver>,tikv::pd::client::RpcClient>
at /home/jenkins/workspace/build_unportable_tikv_master/go/src/github.com/pingcap/tikv/src/raftstore/store/store.rs:249
- tikv::raftstore::store::store::{{impl}}::new<tikv::server::transport::ServerTransport<tikv::server::transport::ServerRaftStoreRouter, tikv::server::resolve::PdStoreAddrResolver>,tikv::pd::client::RpcClient>
at /home/jenkins/workspace/build_unportable_tikv_master/go/src/github.com/pingcap/tikv/src/raftstore/store/store.rs:229
10: 0x7f3d918995c9 - tikv::server::node::{{impl}}::start_store::{{closure}}<tikv::pd::client::RpcClient,tikv::server::transport::ServerTransport<tikv::server::transport::ServerRaftStoreRouter, tikv::server::resolve::PdStoreAddrResolver>>
at /home/jenkins/workspace/build_unportable_tikv_master/go/src/github.com/pingcap/tikv/src/server/node.rs:349
- std::sys_common::backtrace::__rust_begin_short_backtrace<closure,()>
at /checkout/src/libstd/sys_common/backtrace.rs:136
11: 0x7f3d918ab02d - std::thread::{{impl}}::spawn::{{closure}}::{{closure}}<closure,()>
at /checkout/src/libstd/thread/mod.rs:364
- std::panic::{{impl}}::call_once<(),closure>
at /checkout/src/libstd/panic.rs:296
- std::panicking::try::do_call<std::panic::AssertUnwindSafe,()>
at /checkout/src/libstd/panicking.rs:479
12: 0x7f3d9223f92c - panic_unwind::__rust_maybe_catch_panic
at /checkout/src/libpanic_unwind/lib.rs:98
13: 0x7f3d91948845 - std::panicking::try<(),std::panic::AssertUnwindSafe>
at /checkout/src/libstd/panicking.rs:458
- std::panic::catch_unwind<std::panic::AssertUnwindSafe,()>
at /checkout/src/libstd/panic.rs:361
- std::thread::{{impl}}::spawn::{{closure}}<closure,()>
at /checkout/src/libstd/thread/mod.rs:363
- alloc::boxed::{{impl}}::call_box<(),closure>
at /checkout/src/liballoc/boxed.rs:682
14: 0x7f3d922373fb - alloc::boxed::{{impl}}::call_once<(),()>
at /checkout/src/liballoc/boxed.rs:692
- std::sys_common::thread::start_thread
at /checkout/src/libstd/sys_common/thread.rs:21
- std::sys:👿:thread::{{impl}}:🆕:thread_start
at /checkout/src/libstd/sys/unix/thread.rs:84
15: 0x7f3d90e2de24 - start_thread
16: 0x7f3d9094534c - __clone
17: 0x0 - `

  1. What version of TiDB are you using (tidb-server -V)?
    bin/tidb-server -V
    Release Version: 0.9.0
    Git Commit Hash: fe31f4b
    Git Commit Branch: master
    UTC Build Time: 2017-09-20 01:56:02
@shenli

This comment has been minimized.

Member

shenli commented Oct 10, 2017

@lotaku Thanks for your feedback!
@BusyJay PTAL

@BusyJay

This comment has been minimized.

Member

BusyJay commented Oct 10, 2017

Because your machine has met power failure, there has been some data lost.

If the region only has one or two replicas, then the store can be restarted by setting the last index to apply index in theory, though we don't provide any tool to do that yet, which may be available in the following weeks.

However if the region has not less than 3 replicas, it's not safe to edit the last index anymore, which may lead to two leaders in a same term. You can use pd-ctl to delete the fail store, then add a new store. We are also working on supporting deleting just one peer from the fail store.

Set the configuration raftstore.sync-log in TiKV to true, which is the default value, can prevent data lost completely but may hurt performance.

@lotaku

This comment has been minimized.

lotaku commented Oct 10, 2017

@BusyJay Thanks for your quick reply!

I set the raftstore.sync-log from false to true on my three TiKV servers (10.3.1.1, 10.3.1.3, 10.3.1.5
).
But i get the same error.

  • start TiKV cmd: sh /home/tidb/deploy/scripts/run_tikv.sh
  • the file edited: /home/tidb/deploy/conf/tikv.toml
  • some config in my inventory like bellow:

TiDB Cluster Part

[tidb_servers]
10.3.1.2
10.3.1.6

[tikv_servers]
10.3.1.1
10.3.1.3
10.3.1.5

[pd_servers]
10.3.1.2
10.3.1.4
10.3.1.6

[spark_master]

[spark_slaves]

Monitoring Part

[monitoring_servers]
10.4.1.1

[grafana_servers]
10.4.1.1


@BusyJay

This comment has been minimized.

Member

BusyJay commented Oct 10, 2017

Set it to true could prevent data lost in the first place, but can do nothing for your current situation since the lost was done.

@morgo morgo added the question label Oct 29, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment