Skip to content
This repository has been archived by the owner on Oct 7, 2023. It is now read-only.

mesos xchk failure #4

Closed
heyitsanthony opened this issue Nov 17, 2016 · 1 comment
Closed

mesos xchk failure #4

heyitsanthony opened this issue Nov 17, 2016 · 1 comment

Comments

@heyitsanthony
Copy link
Contributor

mesos xchk failure via goreman -f scripts/Procfile.mesos.xchk start:

$ grep -i xchk zketcd.xchk 
I1116 22:23:19.634919   24947 conn.go:131] sendXchk Xid:1479363800 ZXid:2 Resp:0xc4201ac050
I1116 22:23:19.636225   24947 conn.go:131] sendXchk Xid:1479363801 ZXid:2 Resp:0xc4201993b0
I1116 22:23:19.654726   24947 conn.go:131] sendXchk Xid:1479363802 ZXid:3 Resp:&{Path:/mesos}
I1116 22:23:19.664422   24947 conn.go:131] sendXchk Xid:1479363803 ZXid:4 Resp:0xc4201ac050
I1116 22:23:19.667112   24947 conn.go:131] sendXchk Xid:1479363804 ZXid:4 Resp:&{Children:[]}
I1116 22:23:19.687624   24947 conn.go:112] xchkSendOOB response {Type:4 State:3 Path:/mesos}
W1116 22:23:19.687727   24947 zk.go:326] xchk failed (path mismatch)
I1116 22:23:19.687738   24947 conn.go:131] sendXchk Xid:1479363805 ZXid:5 Resp:&{Path:/mesos/json.info_0000000000}
I1116 22:23:19.693699   24947 conn.go:131] sendXchk Xid:1479363806 ZXid:5 Resp:&{Children:[json.info_0000000000]}
I1116 22:23:19.695883   24947 conn.go:131] sendXchk Xid:1479363807 ZXid:5 Resp:&{Children:[json.info_0000000000]}
@wegel
Copy link

wegel commented Nov 17, 2016

Without -zkbridge and -oracle, eg:

etcd: killall -9 etcd; rm -rf *.etcd && etcd --name infra1 --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:2380 --initial-advertise-peer-urls http://127.0.0.1:2380 --initial-cluster-token etcd-cluster-1 --initial-cluster 'infra1=http://127.0.0.1:2380' --initial-cluster-state new --enable-pprof

zketcd: killall -9 zetcd; sleep 3s && ./cmd/zetcd/zetcd -endpoint http://localhost:2379 -zkaddr 127.0.0.1:2181 -logtostderr -v 10 2>zketcd.xchk

mesos: docker kill `docker ps | egrep "mesos" | awk ' { print $1 } '`; sleep 10s; docker run -d --net=host -e MESOS_PORT=5050 -e MESOS_ZK=zk://localhost:2181/mesos -e MESOS_QUORUM=1 -e MESOS_REGISTRY=in_memory -e MESOS_LOG_DIR=/var/log/mesos -e MESOS_WORK_DIR=/var/tmp/mesos -v "$(pwd)/log/mesos:/var/log/mesos" -v "$(pwd)/tmp/mesos:/var/tmp/mesos" mesosphere/mesos-master:1.1.01.1.0-2.0.107.ubuntu1404

mesos-master gives:

WARNING: Logging before InitGoogleLogging() is written to STDERR
I1117 16:08:19.960755     1 main.cpp:263] Build: 2016-11-16 01:30:23 by ubuntu
I1117 16:08:19.960856     1 main.cpp:264] Version: 1.1.0
I1117 16:08:19.960861     1 main.cpp:267] Git tag: 1.1.0
I1117 16:08:19.960865     1 main.cpp:271] Git SHA: a44b077ea0df54b77f05550979e1e97f39b15873
I1117 16:08:19.962015     1 logging.cpp:194] INFO level logging started!
I1117 16:08:19.962368     1 main.cpp:370] Using 'HierarchicalDRF' allocator
2016-11-17 16:08:19,962:1(0x7f2ff7b59700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2016-11-17 16:08:19,962:1(0x7f2ff7b59700):ZOO_INFO@log_env@730: Client environment:host.name=beta
2016-11-17 16:08:19,962:1(0x7f2ff6b57700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2016-11-17 16:08:19,962:1(0x7f2ff6b57700):ZOO_INFO@log_env@730: Client environment:host.name=beta
2016-11-17 16:08:19,962:1(0x7f2ff6b57700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2016-11-17 16:08:19,962:1(0x7f2ff6b57700):ZOO_INFO@log_env@738: Client environment:os.arch=4.4.0-47-generic
2016-11-17 16:08:19,962:1(0x7f2ff6b57700):ZOO_INFO@log_env@739: Client environment:os.version=#68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016
2016-11-17 16:08:19,962:1(0x7f2ff7b59700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2016-11-17 16:08:19,962:1(0x7f2ff7b59700):ZOO_INFO@log_env@738: Client environment:os.arch=4.4.0-47-generic
2016-11-17 16:08:19,962:1(0x7f2ff7b59700):ZOO_INFO@log_env@739: Client environment:os.version=#68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016
2016-11-17 16:08:19,963:1(0x7f2ff7b59700):ZOO_INFO@log_env@747: Client environment:user.name=(null)
2016-11-17 16:08:19,963:1(0x7f2ff6b57700):ZOO_INFO@log_env@747: Client environment:user.name=(null)
2016-11-17 16:08:19,963:1(0x7f2ff7b59700):ZOO_INFO@log_env@755: Client environment:user.home=/root
2016-11-17 16:08:19,963:1(0x7f2ff7b59700):ZOO_INFO@log_env@767: Client environment:user.dir=/
2016-11-17 16:08:19,963:1(0x7f2ff6b57700):ZOO_INFO@log_env@755: Client environment:user.home=/root
2016-11-17 16:08:19,963:1(0x7f2ff7b59700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x7f30018bc200 sessionId=0 sessionPasswd=<null> context=0x7f2fc8000a90 flags=0
2016-11-17 16:08:19,963:1(0x7f2ff6b57700):ZOO_INFO@log_env@767: Client environment:user.dir=/
2016-11-17 16:08:19,963:1(0x7f2ff6b57700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x7f30018bc200 sessionId=0 sessionPasswd=<null> context=0x7f2fd40013b0 flags=0
2016-11-17 16:08:19,963:1(0x7f2fdf7fe700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181]
2016-11-17 16:08:19,963:1(0x7f2fdffff700):ZOO_INFO@check_events@1728: initiated connection to server [127.0.0.1:2181]
I1117 16:08:19.964062     1 master.cpp:380] Master e7ad0e69-673f-48ce-a39b-405a01d5c27a (beta) started on 127.0.1.1:5050
I1117 16:08:19.964079     1 master.cpp:382] Flags at startup: --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --port="5050" --quiet="false" --quorum="1" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/var/tmp/mesos" --zk="zk://localhost:2181/mesos" --zk_session_timeout="10secs"
I1117 16:08:19.964226     1 master.cpp:434] Master allowing unauthenticated frameworks to register
I1117 16:08:19.964237     1 master.cpp:448] Master allowing unauthenticated agents to register
I1117 16:08:19.964246     1 master.cpp:462] Master allowing HTTP frameworks to register without authentication
I1117 16:08:19.964269     1 master.cpp:504] Using default 'crammd5' authenticator
W1117 16:08:19.964282     1 authenticator.cpp:512] No credentials provided, authentication requests will be refused
I1117 16:08:19.964403     1 authenticator.cpp:519] Initializing server SASL
2016-11-17 16:08:19,966:1(0x7f2fdffff700):ZOO_INFO@check_events@1775: session establishment complete on server [127.0.0.1:2181], sessionId=0x75cc58730b98ab17, negotiated timeout=10000
I1117 16:08:19.966476     8 group.cpp:340] Group process (zookeeper-group(1)@127.0.1.1:5050) connected to ZooKeeper
I1117 16:08:19.966511     8 group.cpp:828] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I1117 16:08:19.966531     8 group.cpp:418] Trying to create path '/mesos' in ZooKeeper
2016-11-17 16:08:19,967:1(0x7f2fdf7fe700):ZOO_INFO@check_events@1775: session establishment complete on server [127.0.0.1:2181], sessionId=0x75cc58730b98ab19, negotiated timeout=10000
I1117 16:08:19.968055    11 group.cpp:340] Group process (zookeeper-group(2)@127.0.1.1:5050) connected to ZooKeeper
I1117 16:08:19.968083    11 group.cpp:828] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I1117 16:08:19.968093    11 group.cpp:418] Trying to create path '/mesos' in ZooKeeper
I1117 16:08:19.968611    14 master.cpp:1951] Successfully attached file '/var/log/mesos/mesos-master.INFO'
I1117 16:08:19.968691    10 contender.cpp:152] Joining the ZK group
F1117 16:08:19.983996     8 group.cpp:636] CHECK_SOME(sequence): Failed to convert '' to number 
*** Check failure stack trace: ***
I1117 16:08:19.988173    13 detector.cpp:152] Detected a new leader: (id='0')
I1117 16:08:19.988421    14 group.cpp:697] Trying to get '/mesos/json.info_0000000000' in ZooKeeper
    @     0x7f3001e8b15d  google::LogMessage::Fail()
I1117 16:08:19.990743    14 zookeeper.cpp:259] A new leading master (UPID=master@127.0.1.1:5050) is detected
I1117 16:08:19.990936    14 master.cpp:2017] Elected as the leading master!
I1117 16:08:19.990972    14 master.cpp:1560] Recovering from registrar
I1117 16:08:19.993191     9 registrar.cpp:362] Successfully fetched the registry (0B) in 2.102016ms
I1117 16:08:19.993322     9 registrar.cpp:461] Applied 1 operations in 10333ns; attempting to update the registry
    @     0x7f3001e8cf8d  google::LogMessage::SendToLog()
I1117 16:08:19.995750    10 registrar.cpp:506] Successfully updated the registry in 2.32704ms
I1117 16:08:19.995883    10 registrar.cpp:392] Successfully recovered registrar
I1117 16:08:19.996093    10 master.cpp:1676] Recovered 0 agents from the registry (108B); allowing 10mins for agents to re-register
    @     0x7f3001e8ad4c  google::LogMessage::Flush()
    @     0x7f3001e8d889  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f30018c9933  zookeeper::GroupProcess::doJoin()
    @     0x7f30018ca9fd  zookeeper::GroupProcess::join()
    @     0x7f30018db787  _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN9zookeeper5Group10MembershipENS5_12GroupProcessERKSsRK6OptionISsESsSC_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSJ_FSH_T1_T2_ET3_T4_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
    @     0x7f3001e10451  process::ProcessManager::resume()
    @     0x7f3001e10757  _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
    @     0x7f300045ca60  (unknown)
    @     0x7f2fffc79184  start_thread
    @     0x7f2fff9a637d  (unknown)
*** Aborted at 1479398900 (unix time) try "date -d @1479398900" if you are using GNU date ***
PC: @     0x7f2fff8e6177 (unknown)
*** SIGSEGV (@0x0) received by PID 1 (TID 0x7f2ff8b5b700) from PID 0; stack trace: ***
    @     0x7f2fffc81330 (unknown)
    @     0x7f2fff8e6177 (unknown)
    @     0x7f3001e93409 google::DumpStackTraceAndExit()
    @     0x7f3001e8b15d google::LogMessage::Fail()
    @     0x7f3001e8cf8d google::LogMessage::SendToLog()
    @     0x7f3001e8ad4c google::LogMessage::Flush()
    @     0x7f3001e8d889 google::LogMessageFatal::~LogMessageFatal()
    @     0x7f30018c9933 zookeeper::GroupProcess::doJoin()
    @     0x7f30018ca9fd zookeeper::GroupProcess::join()
    @     0x7f30018db787 _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN9zookeeper5Group10MembershipENS5_12GroupProcessERKSsRK6OptionISsESsSC_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSJ_FSH_T1_T2_ET3_T4_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
    @     0x7f3001e10451 process::ProcessManager::resume()
    @     0x7f3001e10757 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
    @     0x7f300045ca60 (unknown)
    @     0x7f2fffc79184 start_thread
    @     0x7f2fff9a637d (unknown)

Which is similar to what I've been getting using my actual test-bed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants