Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mount hung when use nfs-ganesha+cephfs #414

Closed
huaizong opened this issue Apr 17, 2019 · 4 comments
Closed

mount hung when use nfs-ganesha+cephfs #414

huaizong opened this issue Apr 17, 2019 · 4 comments

Comments

@huaizong
Copy link

huaizong commented Apr 17, 2019

  • nfs-ganesha: V2.7.3
  • ceph-version: v14.2.0
  • mount client os version: 5.0.5-200.fc29.x86_64 Fedora release 29
Thread 248 (Thread 0x7f695c3d2700 (LWP 6715)):
#0  0x00007f696b3cc2fe in pthread_rwlock_wrlock () from /usr/lib64/libpthread.so.0
#1  0x00000000005389a1 in mdcache_lru_unref_chunk (chunk=0x7f6937ca8080, locked=false) at /www/work/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:2075
#2  0x0000000000536d0f in chunk_lru_run_lane (lane=5) at /www/work/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1499
#3  0x000000000053707d in chunk_lru_run (ctx=0x7f695d40ef80) at /www/work/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1553
#4  0x0000000000510136 in fridgethr_start_routine (arg=0x7f695d40ef80) at /www/work/nfs-ganesha/src/support/fridgethr.c:550
#5  0x00007f696b3c8e25 in start_thread () from /usr/lib64/libpthread.so.0
#6  0x00007f696a995bad in clone () from /usr/lib64/libc.so.6

Thread 195 (Thread 0x7f6959549700 (LWP 87469)):
#0  0x00007f696b3cc2fe in pthread_rwlock_wrlock () from /usr/lib64/libpthread.so.0
#1  0x0000000000541380 in mdcache_create_handle (exp_hdl=0x7f695d46e700, fh_desc=0x7f6959545c50, handle=0x7f6959545c48, attrs_out=0x0) at /www/work/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1583
#2  0x0000000000476974 in nfs4_mds_putfh (data=0x7f6959546540) at /www/work/nfs-ganesha/src/Protocols/NFS/nfs4_op_putfh.c:211
#3  0x0000000000476b60 in nfs4_op_putfh (op=0x7f6938140250, data=0x7f6959546540, resp=0x7f69380734a0) at /www/work/nfs-ganesha/src/Protocols/NFS/nfs4_op_putfh.c:281
#4  0x000000000045e45e in nfs4_Compound (arg=0x7f69380c3f08, req=0x7f69380c3800, res=0x7f693822bf40) at /www/work/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:942
#5  0x000000000045b168 in nfs_rpc_process_request (reqdata=0x7f69380c3800) at /www/work/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1328
#6  0x000000000045b8b0 in nfs_rpc_valid_NFS (req=0x7f69380c3800) at /www/work/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1538
#7  0x00007f696cbfe094 in svc_vc_decode (req=0x7f69380c3800) at /www/work/nfs-ganesha/src/libntirpc/src/svc_vc.c:827
#8  0x000000000044e24b in nfs_rpc_decode_request (xprt=0x7f695d4b4c00, xdrs=0x7f6938098180) at /www/work/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1345
#9  0x00007f696cbfdfa5 in svc_vc_recv (xprt=0x7f695d4b4c00) at /www/work/nfs-ganesha/src/libntirpc/src/svc_vc.c:800
#10 0x00007f696cbfa6c4 in svc_rqst_xprt_task (wpe=0x7f695d4b4e18) at /www/work/nfs-ganesha/src/libntirpc/src/svc_rqst.c:769
#11 0x00007f696cbfab24 in svc_rqst_epoll_events (sr_rec=0x7f695d4ca110, n_events=1) at /www/work/nfs-ganesha/src/libntirpc/src/svc_rqst.c:941
#12 0x00007f696cbfadbd in svc_rqst_epoll_loop (sr_rec=0x7f695d4ca110) at /www/work/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1014
#13 0x00007f696cbfae73 in svc_rqst_run_task (wpe=0x7f695d4ca110) at /www/work/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1050
#14 0x00007f696cc03856 in work_pool_thread (arg=0x7f69344a0ca0) at /www/work/nfs-ganesha/src/libntirpc/src/work_pool.c:181
#15 0x00007f696b3c8e25 in start_thread () from /usr/lib64/libpthread.so.0
#16 0x00007f696a995bad in clone () from /usr/lib64/libc.so.6

Thread 193 (Thread 0x7f695ad7b700 (LWP 87623)):
#0  0x00007f696b3cf51d in __lll_lock_wait () from /usr/lib64/libpthread.so.0
#1  0x00007f696b3cae1b in _L_lock_812 () from /usr/lib64/libpthread.so.0
#2  0x00007f696b3cace8 in pthread_mutex_lock () from /usr/lib64/libpthread.so.0
#3  0x000000000047ee61 in nfs4_op_sequence (op=0x7f6934177c00, data=0x7f695ad78540, resp=0x7f6934177b60) at /www/work/nfs-ganesha/src/Protocols/NFS/nfs4_op_sequence.c:118
#4  0x000000000045e45e in nfs4_Compound (arg=0x7f6934084f08, req=0x7f6934084800, res=0x7f6934047140) at /www/work/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:942
#5  0x000000000045b168 in nfs_rpc_process_request (reqdata=0x7f6934084800) at /www/work/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1328
#6  0x000000000045b8b0 in nfs_rpc_valid_NFS (req=0x7f6934084800) at /www/work/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1538
#7  0x00007f696cbfe094 in svc_vc_decode (req=0x7f6934084800) at /www/work/nfs-ganesha/src/libntirpc/src/svc_vc.c:827
#8  0x000000000044e24b in nfs_rpc_decode_request (xprt=0x7f695d4b4c00, xdrs=0x7f6934040180) at /www/work/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1345
#9  0x00007f696cbfdfa5 in svc_vc_recv (xprt=0x7f695d4b4c00) at /www/work/nfs-ganesha/src/libntirpc/src/svc_vc.c:800
#10 0x00007f696cbfa6c4 in svc_rqst_xprt_task (wpe=0x7f695d4b4e18) at /www/work/nfs-ganesha/src/libntirpc/src/svc_rqst.c:769
#11 0x00007f696cbfab24 in svc_rqst_epoll_events (sr_rec=0x7f695d4ca110, n_events=1) at /www/work/nfs-ganesha/src/libntirpc/src/svc_rqst.c:941
#12 0x00007f696cbfadbd in svc_rqst_epoll_loop (sr_rec=0x7f695d4ca110) at /www/work/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1014
#13 0x00007f696cbfae73 in svc_rqst_run_task (wpe=0x7f695d4ca110) at /www/work/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1050
#14 0x00007f696cc03856 in work_pool_thread (arg=0x7f693719d840) at /www/work/nfs-ganesha/src/libntirpc/src/work_pool.c:181
#15 0x00007f696b3c8e25 in start_thread () from /usr/lib64/libpthread.so.0
#16 0x00007f696a995bad in clone () from /usr/lib64/libc.so.6

15/04/2019 02:10:13 : epoch 5cab1a37 : ceph-mon-1 : ganesha.nfsd-3597238[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is unhealthy. enq new: 99864, old: 99863; deq new: 99862, old: 99862
15/04/2019 09:57:33 : epoch 5cab1a37 : ceph-mon-1 : ganesha.nfsd-3597238[Admin] do_shutdown :MAIN :EVENT :NFS EXIT: stopping NFS service
15/04/2019 09:57:33 : epoch 5cab1a37 : ceph-mon-1 : ganesha.nfsd-3597238[dbus_heartbeat] gsh_dbus_thread :DBUS :EVENT :shutdown
15/04/2019 09:57:33 : epoch 5cab1a37 : ceph-mon-1 : ganesha.nfsd-3597238[Admin] do_shutdown :MAIN :EVENT :Stopping delayed executor.
15/04/2019 09:57:33 : epoch 5cab1a37 : ceph-mon-1 : ganesha.nfsd-3597238[Admin] do_shutdown :MAIN :EVENT :Delayed executor stopped.
15/04/2019 09:57:33 : epoch 5cab1a37 : ceph-mon-1 : ganesha.nfsd-3597238[Admin] do_shutdown :MAIN :EVENT :Stopping state asynchronous request thread
15/04/2019 09:57:33 : epoch 5cab1a37 : ceph-mon-1 : ganesha.nfsd-3597238[Admin] do_shutdown :THREAD :EVENT :State asynchronous request system shut down.
15/04/2019 09:57:33 : epoch 5cab1a37 : ceph-mon-1 : ganesha.nfsd-3597238[Admin] do_shutdown :MAIN :EVENT :Unregistering ports used by NFS service
15/04/2019 09:57:33 : epoch 5cab1a37 : ceph-mon-1 : ganesha.nfsd-3597238[Admin] do_shutdown :MAIN :EVENT :Shutting down RPC services
15/04/2019 09:57:33 : epoch 5cab1a37 : ceph-mon-1 : ganesha.nfsd-3597238[Admin] do_shutdown :MAIN :EVENT :Stopping worker threads
15/04/2019 09:57:33 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167294[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version 2.7.2-ceph.01
15/04/2019 09:57:33 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed
15/04/2019 09:57:33 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper.
15/04/2019 09:57:33 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized.
15/04/2019 09:57:33 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] rados_ng_init :CLIENT ID :EVENT :Rados kv store init done
15/04/2019 09:57:33 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90
15/04/2019 10:02:13 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] lower_my_caps :NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully removed for proper quota management in FSAL
15/04/2019 10:02:13 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] lower_my_caps :NFS STARTUP :EVENT :currenty set capabilities are: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36,37+ep
15/04/2019 10:02:13 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] posix2fsal_error :FSAL :CRIT :Mapping 108(default) to ERR_FSAL_SERVERFAULT
15/04/2019 10:02:13 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] nfs_Init_svc :DISP :CRIT :Cannot acquire credentials for principal nfs
15/04/2019 10:02:13 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized
15/04/2019 10:02:13 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT :Callback creds directory (/var/run/ganesha) already exists
15/04/2019 10:02:13 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN :gssd_refresh_krb5_machine_credential failed (-1765328160:0)
15/04/2019 10:02:13 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] nfs_Start_threads :THREAD :EVENT :Starting delayed executor.
15/04/2019 10:02:13 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] nfs_Start_threads :THREAD :EVENT :gsh_dbusthread was started successfully
15/04/2019 10:02:13 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] nfs_Start_threads :THREAD :EVENT :admin thread was started successfully
15/04/2019 10:02:13 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully
15/04/2019 10:02:13 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] nfs_Start_threads :THREAD :EVENT :General fridge was started successfully
15/04/2019 10:02:13 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
15/04/2019 10:02:13 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] nfs_start :NFS STARTUP :EVENT :             NFS SERVER INITIALIZED
15/04/2019 10:02:13 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
15/04/2019 10:02:13 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[reaper] nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE
15/04/2019 11:12:29 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[Admin] do_shutdown :MAIN :EVENT :NFS EXIT: stopping NFS service
15/04/2019 11:12:29 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[dbus_heartbeat] gsh_dbus_thread :DBUS :EVENT :shutdown
15/04/2019 11:12:29 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[Admin] do_shutdown :MAIN :EVENT :Stopping delayed executor.
15/04/2019 11:12:29 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[Admin] do_shutdown :MAIN :EVENT :Delayed executor stopped.
15/04/2019 11:12:29 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[Admin] do_shutdown :MAIN :EVENT :Stopping state asynchronous request thread
15/04/2019 11:12:29 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[Admin] do_shutdown :THREAD :EVENT :State asynchronous request system shut down.
15/04/2019 11:12:29 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[Admin] do_shutdown :MAIN :EVENT :Unregistering ports used by NFS service
15/04/2019 11:12:29 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[Admin] do_shutdown :MAIN :EVENT :Shutting down RPC services
15/04/2019 11:12:29 : epoch 5cb3e50d : ceph-mon-1 : ganesha.nfsd-4167298[Admin] do_shutdown :MAIN :EVENT :Stopping worker threads
15/04/2019 11:12:36 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175653[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version 2.7.2-ceph.01
15/04/2019 11:12:36 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed
15/04/2019 11:12:36 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper.
15/04/2019 11:12:36 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized.
15/04/2019 11:12:36 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] rados_ng_init :CLIENT ID :EVENT :Rados kv store init done
15/04/2019 11:12:36 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90
15/04/2019 11:12:58 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] lower_my_caps :NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully removed for proper quota management in FSAL
15/04/2019 11:12:58 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] lower_my_caps :NFS STARTUP :EVENT :currenty set capabilities are: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36,37+ep
15/04/2019 11:12:58 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] posix2fsal_error :FSAL :CRIT :Mapping 108(default) to ERR_FSAL_SERVERFAULT
15/04/2019 11:12:58 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] nfs_Init_svc :DISP :CRIT :Cannot acquire credentials for principal nfs
15/04/2019 11:12:58 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized
15/04/2019 11:12:58 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT :Callback creds directory (/var/run/ganesha) already exists
15/04/2019 11:12:58 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN :gssd_refresh_krb5_machine_credential failed (-1765328160:0)
15/04/2019 11:12:58 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] nfs_Start_threads :THREAD :EVENT :Starting delayed executor.
15/04/2019 11:12:58 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] nfs_Start_threads :THREAD :EVENT :gsh_dbusthread was started successfully
15/04/2019 11:12:58 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] nfs_Start_threads :THREAD :EVENT :admin thread was started successfully
15/04/2019 11:12:58 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully
15/04/2019 11:12:58 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] nfs_Start_threads :THREAD :EVENT :General fridge was started successfully
15/04/2019 11:12:58 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
15/04/2019 11:12:58 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] nfs_start :NFS STARTUP :EVENT :             NFS SERVER INITIALIZED
15/04/2019 11:12:58 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
15/04/2019 11:13:18 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[reaper] nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE
15/04/2019 14:14:34 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[Admin] do_shutdown :MAIN :EVENT :NFS EXIT: stopping NFS service
15/04/2019 14:14:35 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[dbus_heartbeat] gsh_dbus_thread :DBUS :EVENT :shutdown
15/04/2019 14:14:35 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[Admin] do_shutdown :MAIN :EVENT :Stopping delayed executor.
15/04/2019 14:14:35 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[Admin] do_shutdown :MAIN :EVENT :Delayed executor stopped.
15/04/2019 14:14:35 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[Admin] do_shutdown :MAIN :EVENT :Stopping state asynchronous request thread
15/04/2019 14:14:35 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[Admin] do_shutdown :THREAD :EVENT :State asynchronous request system shut down.
15/04/2019 14:14:35 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[Admin] do_shutdown :MAIN :EVENT :Unregistering ports used by NFS service
15/04/2019 14:14:35 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[Admin] do_shutdown :MAIN :EVENT :Shutting down RPC services
15/04/2019 14:14:35 : epoch 5cb3f6a4 : ceph-mon-1 : ganesha.nfsd-4175656[Admin] do_shutdown :MAIN :EVENT :Stopping worker threads
15/04/2019 14:15:21 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6709[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version V2.7.3-0-g2356c38, built at Apr 15 2019 14:13:47 on ceph-mon-1
15/04/2019 14:15:21 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[main] nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed
15/04/2019 14:15:21 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[main] init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper.
15/04/2019 14:15:21 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized.
15/04/2019 14:15:22 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[main] rados_ng_init :CLIENT ID :EVENT :Rados kv store init done
15/04/2019 14:15:22 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[main] nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90
15/04/2019 14:15:48 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[main] lower_my_caps :NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully removed for proper quota management in FSAL
15/04/2019 14:15:48 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[main] lower_my_caps :NFS STARTUP :EVENT :currenty set capabilities are: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36,37+ep
15/04/2019 14:15:48 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[main] posix2fsal_error :FSAL :CRIT :Mapping 108(default) to ERR_FSAL_SERVERFAULT
15/04/2019 14:15:48 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[main] nfs_Init_svc :DISP :CRIT :Cannot acquire credentials for principal nfs
15/04/2019 14:15:48 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized
15/04/2019 14:15:48 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT :Callback creds directory (/var/run/ganesha) already exists
15/04/2019 14:15:48 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN :gssd_refresh_krb5_machine_credential failed (-1765328160:0)
15/04/2019 14:15:48 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[main] nfs_Start_threads :THREAD :EVENT :Starting delayed executor.
15/04/2019 14:15:48 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[main] nfs_Start_threads :THREAD :EVENT :admin thread was started successfully
15/04/2019 14:15:48 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[main] nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully
15/04/2019 14:15:48 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[main] nfs_Start_threads :THREAD :EVENT :General fridge was started successfully
15/04/2019 14:15:48 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
15/04/2019 14:15:48 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[main] nfs_start :NFS STARTUP :EVENT :             NFS SERVER INITIALIZED
15/04/2019 14:15:48 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
15/04/2019 14:16:58 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[reaper] nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE
15/04/2019 14:24:33 : epoch 5cb42179 : ceph-mon-1 : ganesha.nfsd-6712[svc_19] posix2fsal_error :FSAL :CRIT :Mapping 108(default) to ERR_FSAL_SERVERFAULT
@huaizong
Copy link
Author

this is full pstack result
pstack.gz

@dang
Copy link
Contributor

dang commented Apr 17, 2019

The relevant thread is this one, which is waiting in Ceph while holding the content_lock:
Thread 197 (Thread 0x7f694dbfa700 (LWP 81856)):
#0 0x00007f696b3cc995 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1 0x00007f69510ef30b in Client::wait_on_context_list(std::list<Context*, std::allocator<Context*> >&) () from /usr/lib64/libcephfs.so.2
#2 0x00007f6951125bb5 in Client::make_request(MetaRequest*, UserPerm const&, boost::intrusive_ptr, bool, int, ceph::buffer::v14_2_0::list*) () from /usr/lib64/libcephfs.so.2
#3 0x00007f695112468a in Client::_getattr(Inode*, int, UserPerm const&, bool) () from /usr/lib64/libcephfs.so.2
#4 0x00007f695113f483 in Client::_ll_getattr(Inode*, int, UserPerm const&) () from /usr/lib64/libcephfs.so.2
#5 0x00007f695113fb11 in Client::ll_getattrx(Inode*, ceph_statx*, unsigned int, unsigned int, UserPerm const&) () from /usr/lib64/libcephfs.so.2
#6 0x00007f69545ef18c in fsal_ceph_ll_getattr (cmount=0x7f69591d4140, in=0x7f694a4ab400, stx=0x7f694dbf63b0, want=8191, creds=0x7f694dbf77d0) at /www/work/nfs-ganesha/src/FSAL/FSAL_CEPH/statx_compat.h:80
#7 0x00007f69545f0cd0 in ceph_fsal_getattrs (handle_pub=0x7f69592b3cc0, attrs=0x7f694dbf64b0) at /www/work/nfs-ganesha/src/FSAL/FSAL_CEPH/handle.c:634
#8 0x000000000053e8be in mdcache_refresh_attrs (entry=0x7f695d427600, need_acl=false, need_fslocations=false, invalidate=true) at /www/work/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:836
#9 0x000000000053ef24 in mdcache_getattrs (obj_hdl=0x7f695d427638, attrs_out=0x7f694dbf6650) at /www/work/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:903
#10 0x0000000000433c7c in populate_dirent (name=0x7f69592843b0 "public", obj=0x7f695d428538, attrs=0x7f695d4285c8, dir_state=0x7f694dbf6a70, cookie=3) at /www/work/nfs-ganesha/src/FSAL/fsal_helper.c:1028
#11 0x0000000000550b86 in mdcache_readdir_chunked (directory=0x7f695d428000, whence=0, dir_state=0x7f694dbf6a70, cb=0x433724 <populate_dirent>, attrmask=122830, eod_met=0x7f694dbf6c9a) at /www/work/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:3135
#12 0x000000000053dfa3 in mdcache_readdir (dir_hdl=0x7f695d428038, whence=0x7f694dbf6a50, dir_state=0x7f694dbf6a70, cb=0x433724 <populate_dirent>, attrmask=122830, eod_met=0x7f694dbf6c9a) at /www/work/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:559
#13 0x0000000000434037 in fsal_readdir (directory=0x7f695d428038, cookie=0, nbfound=0x7f694dbf6c8c, eod_met=0x7f694dbf6c9a, attrmask=122830, cb=0x4793e2 <nfs4_readdir_callback>, opaque=0x7f694dbf6c30) at /www/work/nfs-ganesha/src/FSAL/fsal_helper.c:1164
#14 0x000000000047aa7a in nfs4_op_readdir (op=0x7f69385652e0, data=0x7f694dbf7540, resp=0x7f693856c340) at /www/work/nfs-ganesha/src/Protocols/NFS/nfs4_op_readdir.c:664
#15 0x000000000045e45e in nfs4_Compound (arg=0x7f693848b708, req=0x7f693848b000, res=0x7f693842ea00) at /www/work/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:942
#16 0x000000000045b168 in nfs_rpc_process_request (reqdata=0x7f693848b000) at /www/work/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1328
#17 0x000000000045b8b0 in nfs_rpc_valid_NFS (req=0x7f693848b000) at /www/work/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1538
#18 0x00007f696cbfe094 in svc_vc_decode (req=0x7f693848b000) at /www/work/nfs-ganesha/src/libntirpc/src/svc_vc.c:827
#19 0x000000000044e24b in nfs_rpc_decode_request (xprt=0x7f695d4b4c00, xdrs=0x7f6938406180) at /www/work/nfs-ganesha/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1345
#20 0x00007f696cbfdfa5 in svc_vc_recv (xprt=0x7f695d4b4c00) at /www/work/nfs-ganesha/src/libntirpc/src/svc_vc.c:800
#21 0x00007f696cbfa6c4 in svc_rqst_xprt_task (wpe=0x7f695d4b4e18) at /www/work/nfs-ganesha/src/libntirpc/src/svc_rqst.c:769
#22 0x00007f696cbfab24 in svc_rqst_epoll_events (sr_rec=0x7f695d4ca110, n_events=1) at /www/work/nfs-ganesha/src/libntirpc/src/svc_rqst.c:941
#23 0x00007f696cbfadbd in svc_rqst_epoll_loop (sr_rec=0x7f695d4ca110) at /www/work/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1014
#24 0x00007f696cbfae73 in svc_rqst_run_task (wpe=0x7f695d4ca110) at /www/work/nfs-ganesha/src/libntirpc/src/svc_rqst.c:1050
#25 0x00007f696cc03856 in work_pool_thread (arg=0x7f69381388e0) at /www/work/nfs-ganesha/src/libntirpc/src/work_pool.c:181
#26 0x00007f696b3c8e25 in start_thread () from /usr/lib64/libpthread.so.0
#27 0x00007f696a995bad in clone () from /usr/lib64/libc.so.6

I'll ping Jeff to take a look at it.

@jtlayton
Copy link
Contributor

make_request calls wait_on_context_list to wait on the session to the MDS to be established.

errno 108, at least on x86_64 is:

/usr/include/asm-generic/errno.h:#define ESHUTDOWN 108 /* Cannot send after transport endpoint shutdown */

...all of that seems to point toward a problem communicating with the Ceph MDS. You may want to track down the ceph client logs for the ganesha server, and see whether they offer some hint as to what is going wrong.

@huaizong
Copy link
Author

make_request calls wait_on_context_list to wait on the session to the MDS to be established.

errno 108, at least on x86_64 is:

/usr/include/asm-generic/errno.h:#define ESHUTDOWN 108 /* Cannot send after transport endpoint shutdown */

...all of that seems to point toward a problem communicating with the Ceph MDS. You may want to track down the ceph client logs for the ganesha server, and see whether they offer some hint as to what is going wrong.

thank you for reply. The ceph cluster is in unnormal state, maybe it was the root cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants