New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace MD5 usage to enable FIPS support #230

Closed
nixpanic opened this Issue Jun 8, 2017 · 51 comments

Comments

@nixpanic
Member

nixpanic commented Jun 8, 2017

When FIPS is enforced on a system, MD5 functions are prevented from being used (will cause a segfault). We should consider moving to a stronger hashing algorithm to meet the FIPS standard.

@nixpanic

This comment has been minimized.

Show comment
Hide comment
@nixpanic

nixpanic Jun 8, 2017

Member

The md5 functions are used in at least two cases:

  1. generation of filenames for sockets that are used by different daemons
  2. rchecksum FOP for self-heal

Neither of these should be much affected by the internal change from MD5 to an other hash.

Member

nixpanic commented Jun 8, 2017

The md5 functions are used in at least two cases:

  1. generation of filenames for sockets that are used by different daemons
  2. rchecksum FOP for self-heal

Neither of these should be much affected by the internal change from MD5 to an other hash.

@nixpanic

This comment has been minimized.

Show comment
Hide comment
@nixpanic
Member

nixpanic commented Jun 8, 2017

Also has has been reported as a bug long ago: Glusterd crashes and core-dumps when starting a volume in FIPS mode.

@nixpanic nixpanic added this to FA: Technical Debt in Focus Areas Jun 8, 2017

@kshlm

This comment has been minimized.

Show comment
Hide comment
@kshlm

kshlm Jun 8, 2017

Member

The change would be easy enough. But it would be a backward incompatible change, and would require a restart of the whole cluster and clients to ensure it works correctly.

Member

kshlm commented Jun 8, 2017

The change would be easy enough. But it would be a backward incompatible change, and would require a restart of the whole cluster and clients to ensure it works correctly.

@atinmu

This comment has been minimized.

Show comment
Hide comment
@atinmu

atinmu Jun 8, 2017

Contributor

@kshlm Why do you think we need to mark the entire cluster offline in this case? If we upgrade one node one after another, when glusterd restarts then all the sock file paths will be regenerated. From glusterd perspective I don't see any issue here or am I missing something? I'd appreciate if you can clarify it. Thanks!

Contributor

atinmu commented Jun 8, 2017

@kshlm Why do you think we need to mark the entire cluster offline in this case? If we upgrade one node one after another, when glusterd restarts then all the sock file paths will be regenerated. From glusterd perspective I don't see any issue here or am I missing something? I'd appreciate if you can clarify it. Thanks!

@kshlm

This comment has been minimized.

Show comment
Hide comment
@kshlm

kshlm Jun 8, 2017

Member

AFR compares the checksums returned from the bricks during heal. Using different algorithms will give different hashes even when the data is the same. So a rolling upgrade isn't possible.

I'm guessing AFR would function as long as the replica set returns consistent hashes, so we could possibly do a rolling upgrade by taking down replica sets. But that would affect data availability. So its better to do a full offline upgrade an avoid any problems.

Member

kshlm commented Jun 8, 2017

AFR compares the checksums returned from the bricks during heal. Using different algorithms will give different hashes even when the data is the same. So a rolling upgrade isn't possible.

I'm guessing AFR would function as long as the replica set returns consistent hashes, so we could possibly do a rolling upgrade by taking down replica sets. But that would affect data availability. So its better to do a full offline upgrade an avoid any problems.

@christopher-t-mooney

This comment has been minimized.

Show comment
Hide comment
@christopher-t-mooney

christopher-t-mooney Aug 16, 2017

Is there any roadmap for this change to be implemented? I am currently working on a project where FIPS compliance is required and even though the issue can be documented to initially deploy the system, I will need to provide a timeline for rectification of the issue.

If there is no plan to do this currently, could someone point me to where the correction should be made? I may have to fork the code and do our own builds to satisfy this requirement if there are no short term plans to change this problematic behavior.

As another data point, I am currently seeing crashes in glusterd while FIPS mode is on and SSL is enabled when one node has been started and another glusterd node tries to connect to the cluster. The connecting node will always crash while connecting. If I stop all of the daemons and start the daemon on the crashed node, it will start, but then all other node daemons will crash when trying to connect. It would appear to be a crash during the initial SSL connections.

Server node:
[2017-08-16 18:28:46.066381] I [socket.c:351:ssl_setup_connection] 0-socket.management: SSL verification succeeded (client: 10.179.4.2:49151)
[2017-08-16 18:28:46.068091] I [MSGID: 106163] [glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 31000
[2017-08-16 18:28:46.098663] I [MSGID: 106490] [glusterd-handler.c:2606:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 3a4bf869-58a9-4fc6-bc8c-0addb371c7aa
[2017-08-16 18:28:47.369990] E [socket.c:3217:socket_connect] 0-management: connection attempt on failed, (Connection refused)
[2017-08-16 18:28:47.370426] E [socket.c:358:ssl_setup_connection] 0-management: SSL connect error (client: )
[2017-08-16 18:28:47.370448] E [socket.c:2454:socket_poller] 0-management: client setup failed
[2017-08-16 18:28:47.389509] I [MSGID: 106493] [glusterd-handler.c:3866:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 10.179.4.2 (0), ret: 0, op_ret: 0
[2017-08-16 18:28:47.401067] I [socket.c:348:ssl_setup_connection] 0-management: peer CN = 10.179.4.2
[2017-08-16 18:28:47.401083] I [socket.c:351:ssl_setup_connection] 0-management: SSL verification succeeded (client: )
[2017-08-16 18:28:47.439594] I [MSGID: 106492] [glusterd-handler.c:2784:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 3a4bf869-58a9-4fc6-bc8c-0addb371c7aa
[2017-08-16 18:28:47.464547] I [MSGID: 106502] [glusterd-handler.c:2829:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2017-08-16 18:28:47.726262] W [socket.c:593:__socket_rwv] 0-management: readv on 10.179.4.2:24007 failed (No data available)
[2017-08-16 18:28:47.726276] E [socket.c:2567:socket_poller] 0-management: poll error on socket
[2017-08-16 18:28:47.726282] E [socket.c:2567:socket_poller] 0-socket.management: poll error on socket
[2017-08-16 18:28:47.726323] I [MSGID: 106004] [glusterd-handler.c:5888:__glusterd_peer_rpc_notify] 0-management: Peer <10.179.4.2> (<3a4bf869-58a9-4fc6-bc8c-0addb371c7aa>), in state , has disconnected from glusterd.

Client Node:
[2017-08-16 18:28:45.884020] I [MSGID: 106513] [glusterd-store.c:2197:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31000
[2017-08-16 18:28:46.033506] I [MSGID: 106498] [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2017-08-16 18:28:46.033622] I [MSGID: 106498] [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2017-08-16 18:28:46.033671] W [MSGID: 106062] [glusterd-handler.c:3466:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2017-08-16 18:28:46.033691] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2017-08-16 18:28:46.033760] I [socket.c:4165:socket_init] 0-management: SSL support on the I/O path is ENABLED
[2017-08-16 18:28:46.033765] I [socket.c:4168:socket_init] 0-management: SSL support for glusterd is ENABLED
[2017-08-16 18:28:46.033769] I [socket.c:4185:socket_init] 0-management: using private polling thread
[2017-08-16 18:28:46.036567] E [socket.c:3217:socket_connect] 0-management: connection attempt on failed, (Connection refused)
[2017-08-16 18:28:46.036630] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2017-08-16 18:28:46.036692] I [socket.c:4165:socket_init] 0-management: SSL support on the I/O path is ENABLED
[2017-08-16 18:28:46.036698] I [socket.c:4168:socket_init] 0-management: SSL support for glusterd is ENABLED
[2017-08-16 18:28:46.036702] I [socket.c:4185:socket_init] 0-management: using private polling thread
[2017-08-16 18:28:46.036855] E [socket.c:358:ssl_setup_connection] 0-management: SSL connect error (client: )
[2017-08-16 18:28:46.036880] E [socket.c:2454:socket_poller] 0-management: client setup failed
[2017-08-16 18:28:46.036623] W [MSGID: 106062] [glusterd-handler.c:3466:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2017-08-16 18:28:46.036921] I [MSGID: 106004] [glusterd-handler.c:5888:__glusterd_peer_rpc_notify] 0-management: Peer <10.179.4.25> (<5509950c-3670-47dc-8d4b-1b326c2e76a5>), in state , has disconnected from glusterd.
[2017-08-16 18:28:46.037124] E [MSGID: 106155] [glusterd-utils.c:296:glusterd_unlock] 0-management: Cluster lock not held!
[2017-08-16 18:28:46.040347] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: 3a4bf869-58a9-4fc6-bc8c-0addb371c7aa
Final graph:
+------------------------------------------------------------------------------+
1: volume management
2: type mgmt/glusterd
3: option transport.socket.ssl-enabled on
4: option rpc-auth.auth-glusterfs on
5: option rpc-auth.auth-unix on
6: option rpc-auth.auth-null on
7: option rpc-auth-allow-insecure on
8: option transport.socket.listen-backlog 128
9: option event-threads 1
10: option ping-timeout 0
11: option transport.socket.read-fail-log off
12: option transport.socket.keepalive-interval 2
13: option transport.socket.keepalive-time 10
14: option transport-type rdma
15: option working-directory /var/lib/glusterd
16: end-volume
17:
+------------------------------------------------------------------------------+
[2017-08-16 18:28:46.040562] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2017-08-16 18:28:46.067861] I [socket.c:348:ssl_setup_connection] 0-management: peer CN = 10.179.4.31
[2017-08-16 18:28:46.067877] I [socket.c:351:ssl_setup_connection] 0-management: SSL verification succeeded (client: )
[2017-08-16 18:28:47.391101] I [MSGID: 106493] [glusterd-rpc-ops.c:485:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: 81c51483-6d2a-4d65-8e0d-2306afa2bf1c, host: 10.179.4.31, port: 0
[2017-08-16 18:28:47.402255] I [socket.c:348:ssl_setup_connection] 0-socket.management: peer CN = 10.179.4.31
[2017-08-16 18:28:47.402269] I [socket.c:351:ssl_setup_connection] 0-socket.management: SSL verification succeeded (client: 10.179.4.31:49151)
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2017-08-16 18:28:47
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.10.3
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f106f63c4d0]
/lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7f106f645dd4]
/lib64/libc.so.6(+0x35250)[0x7f106dd20250]
/lib64/libc.so.6(gsignal+0x37)[0x7f106dd201d7]
/lib64/libc.so.6(abort+0x148)[0x7f106dd218c8]
/lib64/libcrypto.so.10(+0x68baf)[0x7f106e114baf]
/lib64/libcrypto.so.10(MD5_Init+0x49)[0x7f106e11b339]
/lib64/libcrypto.so.10(MD5+0x39)[0x7f106e11b379]
/lib64/libglusterfs.so.0(md5_wrapper+0x47)[0x7f106f6411d7]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x47534)[0x7f106417e534]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x11d608)[0x7f1064254608]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x11e780)[0x7f1064255780]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x11f3f5)[0x7f10642563f5]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x12022a)[0x7f106425722a]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x4ef83)[0x7f1064185f83]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x62ee4)[0x7f1064199ee4]
/lib64/libglusterfs.so.0(synctask_wrap+0x10)[0x7f106f673750]
/lib64/libc.so.6(+0x46cf0)[0x7f106dd31cf0]

christopher-t-mooney commented Aug 16, 2017

Is there any roadmap for this change to be implemented? I am currently working on a project where FIPS compliance is required and even though the issue can be documented to initially deploy the system, I will need to provide a timeline for rectification of the issue.

If there is no plan to do this currently, could someone point me to where the correction should be made? I may have to fork the code and do our own builds to satisfy this requirement if there are no short term plans to change this problematic behavior.

As another data point, I am currently seeing crashes in glusterd while FIPS mode is on and SSL is enabled when one node has been started and another glusterd node tries to connect to the cluster. The connecting node will always crash while connecting. If I stop all of the daemons and start the daemon on the crashed node, it will start, but then all other node daemons will crash when trying to connect. It would appear to be a crash during the initial SSL connections.

Server node:
[2017-08-16 18:28:46.066381] I [socket.c:351:ssl_setup_connection] 0-socket.management: SSL verification succeeded (client: 10.179.4.2:49151)
[2017-08-16 18:28:46.068091] I [MSGID: 106163] [glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 31000
[2017-08-16 18:28:46.098663] I [MSGID: 106490] [glusterd-handler.c:2606:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 3a4bf869-58a9-4fc6-bc8c-0addb371c7aa
[2017-08-16 18:28:47.369990] E [socket.c:3217:socket_connect] 0-management: connection attempt on failed, (Connection refused)
[2017-08-16 18:28:47.370426] E [socket.c:358:ssl_setup_connection] 0-management: SSL connect error (client: )
[2017-08-16 18:28:47.370448] E [socket.c:2454:socket_poller] 0-management: client setup failed
[2017-08-16 18:28:47.389509] I [MSGID: 106493] [glusterd-handler.c:3866:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to 10.179.4.2 (0), ret: 0, op_ret: 0
[2017-08-16 18:28:47.401067] I [socket.c:348:ssl_setup_connection] 0-management: peer CN = 10.179.4.2
[2017-08-16 18:28:47.401083] I [socket.c:351:ssl_setup_connection] 0-management: SSL verification succeeded (client: )
[2017-08-16 18:28:47.439594] I [MSGID: 106492] [glusterd-handler.c:2784:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 3a4bf869-58a9-4fc6-bc8c-0addb371c7aa
[2017-08-16 18:28:47.464547] I [MSGID: 106502] [glusterd-handler.c:2829:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2017-08-16 18:28:47.726262] W [socket.c:593:__socket_rwv] 0-management: readv on 10.179.4.2:24007 failed (No data available)
[2017-08-16 18:28:47.726276] E [socket.c:2567:socket_poller] 0-management: poll error on socket
[2017-08-16 18:28:47.726282] E [socket.c:2567:socket_poller] 0-socket.management: poll error on socket
[2017-08-16 18:28:47.726323] I [MSGID: 106004] [glusterd-handler.c:5888:__glusterd_peer_rpc_notify] 0-management: Peer <10.179.4.2> (<3a4bf869-58a9-4fc6-bc8c-0addb371c7aa>), in state , has disconnected from glusterd.

Client Node:
[2017-08-16 18:28:45.884020] I [MSGID: 106513] [glusterd-store.c:2197:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 31000
[2017-08-16 18:28:46.033506] I [MSGID: 106498] [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2017-08-16 18:28:46.033622] I [MSGID: 106498] [glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
[2017-08-16 18:28:46.033671] W [MSGID: 106062] [glusterd-handler.c:3466:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2017-08-16 18:28:46.033691] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2017-08-16 18:28:46.033760] I [socket.c:4165:socket_init] 0-management: SSL support on the I/O path is ENABLED
[2017-08-16 18:28:46.033765] I [socket.c:4168:socket_init] 0-management: SSL support for glusterd is ENABLED
[2017-08-16 18:28:46.033769] I [socket.c:4185:socket_init] 0-management: using private polling thread
[2017-08-16 18:28:46.036567] E [socket.c:3217:socket_connect] 0-management: connection attempt on failed, (Connection refused)
[2017-08-16 18:28:46.036630] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2017-08-16 18:28:46.036692] I [socket.c:4165:socket_init] 0-management: SSL support on the I/O path is ENABLED
[2017-08-16 18:28:46.036698] I [socket.c:4168:socket_init] 0-management: SSL support for glusterd is ENABLED
[2017-08-16 18:28:46.036702] I [socket.c:4185:socket_init] 0-management: using private polling thread
[2017-08-16 18:28:46.036855] E [socket.c:358:ssl_setup_connection] 0-management: SSL connect error (client: )
[2017-08-16 18:28:46.036880] E [socket.c:2454:socket_poller] 0-management: client setup failed
[2017-08-16 18:28:46.036623] W [MSGID: 106062] [glusterd-handler.c:3466:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout
[2017-08-16 18:28:46.036921] I [MSGID: 106004] [glusterd-handler.c:5888:__glusterd_peer_rpc_notify] 0-management: Peer <10.179.4.25> (<5509950c-3670-47dc-8d4b-1b326c2e76a5>), in state , has disconnected from glusterd.
[2017-08-16 18:28:46.037124] E [MSGID: 106155] [glusterd-utils.c:296:glusterd_unlock] 0-management: Cluster lock not held!
[2017-08-16 18:28:46.040347] I [MSGID: 106544] [glusterd.c:158:glusterd_uuid_init] 0-management: retrieved UUID: 3a4bf869-58a9-4fc6-bc8c-0addb371c7aa
Final graph:
+------------------------------------------------------------------------------+
1: volume management
2: type mgmt/glusterd
3: option transport.socket.ssl-enabled on
4: option rpc-auth.auth-glusterfs on
5: option rpc-auth.auth-unix on
6: option rpc-auth.auth-null on
7: option rpc-auth-allow-insecure on
8: option transport.socket.listen-backlog 128
9: option event-threads 1
10: option ping-timeout 0
11: option transport.socket.read-fail-log off
12: option transport.socket.keepalive-interval 2
13: option transport.socket.keepalive-time 10
14: option transport-type rdma
15: option working-directory /var/lib/glusterd
16: end-volume
17:
+------------------------------------------------------------------------------+
[2017-08-16 18:28:46.040562] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2017-08-16 18:28:46.067861] I [socket.c:348:ssl_setup_connection] 0-management: peer CN = 10.179.4.31
[2017-08-16 18:28:46.067877] I [socket.c:351:ssl_setup_connection] 0-management: SSL verification succeeded (client: )
[2017-08-16 18:28:47.391101] I [MSGID: 106493] [glusterd-rpc-ops.c:485:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: 81c51483-6d2a-4d65-8e0d-2306afa2bf1c, host: 10.179.4.31, port: 0
[2017-08-16 18:28:47.402255] I [socket.c:348:ssl_setup_connection] 0-socket.management: peer CN = 10.179.4.31
[2017-08-16 18:28:47.402269] I [socket.c:351:ssl_setup_connection] 0-socket.management: SSL verification succeeded (client: 10.179.4.31:49151)
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2017-08-16 18:28:47
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.10.3
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f106f63c4d0]
/lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7f106f645dd4]
/lib64/libc.so.6(+0x35250)[0x7f106dd20250]
/lib64/libc.so.6(gsignal+0x37)[0x7f106dd201d7]
/lib64/libc.so.6(abort+0x148)[0x7f106dd218c8]
/lib64/libcrypto.so.10(+0x68baf)[0x7f106e114baf]
/lib64/libcrypto.so.10(MD5_Init+0x49)[0x7f106e11b339]
/lib64/libcrypto.so.10(MD5+0x39)[0x7f106e11b379]
/lib64/libglusterfs.so.0(md5_wrapper+0x47)[0x7f106f6411d7]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x47534)[0x7f106417e534]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x11d608)[0x7f1064254608]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x11e780)[0x7f1064255780]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x11f3f5)[0x7f10642563f5]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x12022a)[0x7f106425722a]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x4ef83)[0x7f1064185f83]
/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x62ee4)[0x7f1064199ee4]
/lib64/libglusterfs.so.0(synctask_wrap+0x10)[0x7f106f673750]
/lib64/libc.so.6(+0x46cf0)[0x7f106dd31cf0]

@christopher-t-mooney

This comment has been minimized.

Show comment
Hide comment
@christopher-t-mooney

christopher-t-mooney Sep 14, 2017

@kshlm. I have not heard anything on this issue in nearly a month. Is there any plan to address it? If not, could someone please point me to the right place in the code to fix the problem?

christopher-t-mooney commented Sep 14, 2017

@kshlm. I have not heard anything on this issue in nearly a month. Is there any plan to address it? If not, could someone please point me to the right place in the code to fix the problem?

@jduncan-rva

This comment has been minimized.

Show comment
Hide comment
@jduncan-rva

jduncan-rva Sep 27, 2017

+1 for a timeline estimate. I have a similar sitution involving Gluster as persistent storage on OpenShift.

jduncan-rva commented Sep 27, 2017

+1 for a timeline estimate. I have a similar sitution involving Gluster as persistent storage on OpenShift.

@kshlm

This comment has been minimized.

Show comment
Hide comment
@kshlm

kshlm Sep 27, 2017

Member

@christopher-t-mooney We hope to do this change in the 4.0 release of Gluster. While it might be simple enough to change the hash function, upgrading to a version with a different hash will require a full restart of the cluster. Upgrading to Gluster 4.0 will require a full cluster restart, so it is a good place to do this change.

Member

kshlm commented Sep 27, 2017

@christopher-t-mooney We hope to do this change in the 4.0 release of Gluster. While it might be simple enough to change the hash function, upgrading to a version with a different hash will require a full restart of the cluster. Upgrading to Gluster 4.0 will require a full cluster restart, so it is a good place to do this change.

@aravindavk

This comment has been minimized.

Show comment
Hide comment
@aravindavk

aravindavk Sep 27, 2017

Contributor

How about providing different hashing functionality for afr as volume option? By default it can be md5 in 3.x series, if anyone interested to use different hash(sha256), then it can be configured via config option. Full cluster restart is required only if different hash is configured. In Gluster 4.0, we can completely remove md5 code and support only sha256.

In all the other non-crypto use cases, we can migrate it to use xxhash(Already part of Gluster from 3.12 introduced as part of gfid-to-path feature)

Contributor

aravindavk commented Sep 27, 2017

How about providing different hashing functionality for afr as volume option? By default it can be md5 in 3.x series, if anyone interested to use different hash(sha256), then it can be configured via config option. Full cluster restart is required only if different hash is configured. In Gluster 4.0, we can completely remove md5 code and support only sha256.

In all the other non-crypto use cases, we can migrate it to use xxhash(Already part of Gluster from 3.12 introduced as part of gfid-to-path feature)

@amarts

This comment has been minimized.

Show comment
Hide comment
@amarts

amarts Sep 27, 2017

Member

@kshlm valid reason! @christopher-t-mooney you can expect it to be 'completely' FIPS compliance by Gluster 4.0.

@aravindavk good idea, but 3.12 branch wouldn't take any new option. Nearest release vehicle is 3.13, but for that, it makes sense to have it as option.

Member

amarts commented Sep 27, 2017

@kshlm valid reason! @christopher-t-mooney you can expect it to be 'completely' FIPS compliance by Gluster 4.0.

@aravindavk good idea, but 3.12 branch wouldn't take any new option. Nearest release vehicle is 3.13, but for that, it makes sense to have it as option.

@itisravi

This comment has been minimized.

Show comment
Hide comment
@itisravi

itisravi Sep 27, 2017

Contributor

I think it is better to do it in one shot for 4.0 rather than exposing options that become useless later on. Especially since a user might not care about what checksum method is used to calculate what needs to be healed and will only be concerned about the heal itself being completed as soon as possible (which is where xxhash might be a good option since it is faster).

Contributor

itisravi commented Sep 27, 2017

I think it is better to do it in one shot for 4.0 rather than exposing options that become useless later on. Especially since a user might not care about what checksum method is used to calculate what needs to be healed and will only be concerned about the heal itself being completed as soon as possible (which is where xxhash might be a good option since it is faster).

@amarts

This comment has been minimized.

Show comment
Hide comment
@amarts

amarts Sep 27, 2017

Member

Considering we are close to 4.0, I am fine with the proposal. Just need to make sure that the patch gets merged after release-3.13 branch out.

Good to start working on it, and keep it ready?

Member

amarts commented Sep 27, 2017

Considering we are close to 4.0, I am fine with the proposal. Just need to make sure that the patch gets merged after release-3.13 branch out.

Good to start working on it, and keep it ready?

@christopher-t-mooney

This comment has been minimized.

Show comment
Hide comment
@christopher-t-mooney

christopher-t-mooney Sep 27, 2017

Wow! Excellent! Do you have any idea as to a date range for the release of 4.0? Will it be within the next few months, or are we talking 6-12 months?

christopher-t-mooney commented Sep 27, 2017

Wow! Excellent! Do you have any idea as to a date range for the release of 4.0? Will it be within the next few months, or are we talking 6-12 months?

@amarts

This comment has been minimized.

Show comment
Hide comment
@amarts

amarts Sep 27, 2017

Member

Schedule is updated here : https://www.gluster.org/release-schedule/

Right now, we are speaking of making the bits free of md5 by 4.0, but if you are not using Replicate, the required changes may come in 3.13 itself.

Member

amarts commented Sep 27, 2017

Schedule is updated here : https://www.gluster.org/release-schedule/

Right now, we are speaking of making the bits free of md5 by 4.0, but if you are not using Replicate, the required changes may come in 3.13 itself.

@christopher-t-mooney

This comment has been minimized.

Show comment
Hide comment
@christopher-t-mooney

christopher-t-mooney Sep 27, 2017

We are using replication rather heavily, so we will wait for 4.0. Thank you all so much for making this change, we really appreciate it. I will get the 4.0 upgrade on our slate for January. Is there any chance that I will be able to get a hold of a nightly build when this is implemented to help with testing?

christopher-t-mooney commented Sep 27, 2017

We are using replication rather heavily, so we will wait for 4.0. Thank you all so much for making this change, we really appreciate it. I will get the 4.0 upgrade on our slate for January. Is there any chance that I will be able to get a hold of a nightly build when this is implemented to help with testing?

@amarts

This comment has been minimized.

Show comment
Hide comment
@amarts

amarts Sep 27, 2017

Member

Yes, the patches to resolve it will have a link to this github issue.. (or we will update here in github once the patch is available). You will have the source to test it at least 1-2 months in advance.

Member

amarts commented Sep 27, 2017

Yes, the patches to resolve it will have a link to this github issue.. (or we will update here in github once the patch is available). You will have the source to test it at least 1-2 months in advance.

@amarts amarts added this to the Release 4.0 (STM) milestone Sep 27, 2017

@amarts amarts added this to Release 4.0 in Releases Sep 27, 2017

kshlm added a commit to kshlm/glusterd2 that referenced this issue Oct 6, 2017

Replace md5 with xxhash
MD5 will be phased out of Gluster, and xxhash is the most popular
replacement for non-security related uses. So GD2 should as well.

See: gluster/glusterfs#230

kshlm added a commit to kshlm/glusterd2 that referenced this issue Oct 6, 2017

Replace md5 with xxhash
MD5 will be phased out of Gluster, and xxhash is the most popular
replacement for non-security related uses. So GD2 should as well.

See: gluster/glusterfs#230

@jdarcy jdarcy added the starter task label Nov 11, 2017

@amarts

This comment has been minimized.

Show comment
Hide comment
@amarts

amarts Nov 21, 2017

Member

@itisravi @kotreshhr can we complete this by 4.0 branch out? Would be great to do it now.

Member

amarts commented Nov 21, 2017

@itisravi @kotreshhr can we complete this by 4.0 branch out? Would be great to do it now.

@kotreshhr

This comment has been minimized.

Show comment
Hide comment
@kotreshhr

kotreshhr Nov 21, 2017

Contributor

Yup. I will take care of non-afr changes.

Contributor

kotreshhr commented Nov 21, 2017

Yup. I will take care of non-afr changes.

@gluster-ant

This comment has been minimized.

Show comment
Hide comment
@gluster-ant

gluster-ant Feb 20, 2018

Collaborator

A patch https://review.gluster.org/19596 has been posted that references this issue.
Commit message: posix/afr: handle backward compatibility for rchecksum fop

Collaborator

gluster-ant commented Feb 20, 2018

A patch https://review.gluster.org/19596 has been posted that references this issue.
Commit message: posix/afr: handle backward compatibility for rchecksum fop

1 similar comment
@gluster-ant

This comment has been minimized.

Show comment
Hide comment
@gluster-ant

gluster-ant Feb 21, 2018

Collaborator

A patch https://review.gluster.org/19596 has been posted that references this issue.
Commit message: posix/afr: handle backward compatibility for rchecksum fop

Collaborator

gluster-ant commented Feb 21, 2018

A patch https://review.gluster.org/19596 has been posted that references this issue.
Commit message: posix/afr: handle backward compatibility for rchecksum fop

mscherer pushed a commit that referenced this issue Feb 21, 2018

posix/afr: handle backward compatibility for rchecksum fop
Added a volume option 'fips-mode-rchecksum' tied to op version 4.
If not set, rchecksum fop will use MD5 instead of SHA256.

updates: #230
Change-Id: Id8ea1303777e6450852c0bc25503cda341a6aec2
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
(cherry picked from commit 6daa653)
@ShyamsundarR

This comment has been minimized.

Show comment
Hide comment
@ShyamsundarR

ShyamsundarR Mar 20, 2018

Contributor

@aravindavk @kotreshhr snapshot still has one pending use of MD5 as noted in the 4.0.0 release notes. Are you chasing this down? else, we may need to assign this to the appropriate person.

Contributor

ShyamsundarR commented Mar 20, 2018

@aravindavk @kotreshhr snapshot still has one pending use of MD5 as noted in the 4.0.0 release notes. Are you chasing this down? else, we may need to assign this to the appropriate person.

@atinmu

This comment has been minimized.

Show comment
Hide comment
@atinmu

atinmu Jun 24, 2018

Contributor

snapview server: gfid generation

svs_uuid_generate () in xlators/features/snapview-server/src/snapview-server-helpers.c has to be modified with sha256. That's the only pending work which we have for this issue to take to the closure?

@raghavendrabhat Could you send a patch for this?

Contributor

atinmu commented Jun 24, 2018

snapview server: gfid generation

svs_uuid_generate () in xlators/features/snapview-server/src/snapview-server-helpers.c has to be modified with sha256. That's the only pending work which we have for this issue to take to the closure?

@raghavendrabhat Could you send a patch for this?

@ShyamsundarR ShyamsundarR added this to Release 5 in Releases Jul 17, 2018

@raghavendrabhat

This comment has been minimized.

Show comment
Hide comment
@raghavendrabhat

raghavendrabhat Aug 22, 2018

Member

@atinmu

I am working on it. But IIUC sha256 returns a checksum of 32 bytes and for gfid 16 bytes is needed. Checking with xxh64 as well.

Member

raghavendrabhat commented Aug 22, 2018

@atinmu

I am working on it. But IIUC sha256 returns a checksum of 32 bytes and for gfid 16 bytes is needed. Checking with xxh64 as well.

@aravindavk

This comment has been minimized.

Show comment
Hide comment
@aravindavk

aravindavk Aug 23, 2018

Contributor

I am working on it. But IIUC sha256 returns a checksum of 32 bytes and for gfid 16 bytes is needed. Checking with xxh64 as well.

Use xxhash itself but append same value to get 16 bytes, if the default hash is not sufficient. For example, if hash is 3922156d896abb4f(8 bytes) append the same hash to get 16 bytes 3922156d896abb4f3922156d896abb4f which maches uuid

@raghavendrabhat @atinmu @amarts

Contributor

aravindavk commented Aug 23, 2018

I am working on it. But IIUC sha256 returns a checksum of 32 bytes and for gfid 16 bytes is needed. Checking with xxh64 as well.

Use xxhash itself but append same value to get 16 bytes, if the default hash is not sufficient. For example, if hash is 3922156d896abb4f(8 bytes) append the same hash to get 16 bytes 3922156d896abb4f3922156d896abb4f which maches uuid

@raghavendrabhat @atinmu @amarts

@amarts

This comment has been minimized.

Show comment
Hide comment
@amarts

amarts Aug 23, 2018

Member

Considering it is required for the life-time of the mount process, it can work. Also it is just the predictable hash which we want, it is fine from me.

Member

amarts commented Aug 23, 2018

Considering it is required for the life-time of the mount process, it can work. Also it is just the predictable hash which we want, it is fine from me.

@ShyamsundarR

This comment has been minimized.

Show comment
Hide comment
@ShyamsundarR

ShyamsundarR Aug 23, 2018

Contributor

The need is to avoid GFID collusion using close to 16 bytes of data (as in the GFID case), the code here used to hash the volname (or snap-volname) + GFID, to get the new GFID (@raghavendrabhat correct me if I am wrong). Now using xxhash, it maybe better to hash the volname and GFID separately and use that to generate the required 16 bytes.

Even further to avoid all GFIDs of the same volname having the same 8 byte header in the GFID, is there other information that can be used to fuzz the same? Like, 8 bytes of volname+GFID, and the next 8 bytes with GFID+something_else?

Contributor

ShyamsundarR commented Aug 23, 2018

The need is to avoid GFID collusion using close to 16 bytes of data (as in the GFID case), the code here used to hash the volname (or snap-volname) + GFID, to get the new GFID (@raghavendrabhat correct me if I am wrong). Now using xxhash, it maybe better to hash the volname and GFID separately and use that to generate the required 16 bytes.

Even further to avoid all GFIDs of the same volname having the same 8 byte header in the GFID, is there other information that can be used to fuzz the same? Like, 8 bytes of volname+GFID, and the next 8 bytes with GFID+something_else?

@raghavendrabhat

This comment has been minimized.

Show comment
Hide comment
@raghavendrabhat

raghavendrabhat Aug 23, 2018

Member
Member

raghavendrabhat commented Aug 23, 2018

@gluster-ant

This comment has been minimized.

Show comment
Hide comment
@gluster-ant

gluster-ant Aug 24, 2018

Collaborator

A patch https://review.gluster.org/20983 has been posted that references this issue.

features/uss: Use sha256 to generate gfid instead of md5sum

  • This is to ensure FIPS support
  • Also changed the signature of svs_uuid_generate to
    get xlator argument

Change-Id: Ide66573125dd74122430cccc4c4dc2a376d642a2
fixes: #230
Signed-off-by: Raghavendra Manjunath raghavendr@redhat.com

Collaborator

gluster-ant commented Aug 24, 2018

A patch https://review.gluster.org/20983 has been posted that references this issue.

features/uss: Use sha256 to generate gfid instead of md5sum

  • This is to ensure FIPS support
  • Also changed the signature of svs_uuid_generate to
    get xlator argument

Change-Id: Ide66573125dd74122430cccc4c4dc2a376d642a2
fixes: #230
Signed-off-by: Raghavendra Manjunath raghavendr@redhat.com

@raghavendrabhat

This comment has been minimized.

Show comment
Hide comment
@raghavendrabhat

raghavendrabhat Aug 24, 2018

Member

patch https://review.gluster.org/#/c/glusterfs/+/20983/ has been submitted for the review.

Used sha256 to generate the hash

Member

raghavendrabhat commented Aug 24, 2018

patch https://review.gluster.org/#/c/glusterfs/+/20983/ has been submitted for the review.

Used sha256 to generate the hash

@ShyamsundarR

This comment has been minimized.

Show comment
Hide comment
@ShyamsundarR

ShyamsundarR Aug 24, 2018

Contributor

The problem is that xxhash generates only numericals in its hash. Not letters. Below is a example of the gfid generated using xxhash
[2018-08-22 19:18:26.256380] I [snapview-server-helpers.c:351:svs_uuid_generate] 0-tmp-snapview-server: the uuid generated is 66316361-3639-3232-3065-396338626634
[2018-08-22 19:18:26.994256] I [snapview-server-helpers.c:351:svs_uuid_generate] 0-tmp-snapview-server: the uuid generated is 37663733-3836-6264-6333-636236636262
We need some mechanism to generate gfid that resembles the uuid.

The above happens, due to the following,

  • gf_xxh64_wrapper, returns the binary representation of the 8 byte hash (hence 16 bytes)
  • This binary representation hence contains characters that are in the range 0-9 and a-f
  • Further converting this using uuid_utoa (or uuid_parse) will hence return a 32byte character array with just the numbers in the above 16 (0-9, a-f) range

This is the reason you see the generated GFID in a limited form.

Posting this here, for the future, in case others assume the wrapper is limited in some form. The limitation is that it returns the binary representation and not the raw hash of the input string, if any.

Contributor

ShyamsundarR commented Aug 24, 2018

The problem is that xxhash generates only numericals in its hash. Not letters. Below is a example of the gfid generated using xxhash
[2018-08-22 19:18:26.256380] I [snapview-server-helpers.c:351:svs_uuid_generate] 0-tmp-snapview-server: the uuid generated is 66316361-3639-3232-3065-396338626634
[2018-08-22 19:18:26.994256] I [snapview-server-helpers.c:351:svs_uuid_generate] 0-tmp-snapview-server: the uuid generated is 37663733-3836-6264-6333-636236636262
We need some mechanism to generate gfid that resembles the uuid.

The above happens, due to the following,

  • gf_xxh64_wrapper, returns the binary representation of the 8 byte hash (hence 16 bytes)
  • This binary representation hence contains characters that are in the range 0-9 and a-f
  • Further converting this using uuid_utoa (or uuid_parse) will hence return a 32byte character array with just the numbers in the above 16 (0-9, a-f) range

This is the reason you see the generated GFID in a limited form.

Posting this here, for the future, in case others assume the wrapper is limited in some form. The limitation is that it returns the binary representation and not the raw hash of the input string, if any.

@amarts

This comment has been minimized.

Show comment
Hide comment
@amarts

amarts Aug 31, 2018

Member

The 'Spec' part for this github is already described above, and we are solving it the same way.

There is no 'Doc' for this, other than mentioning in our release notes that we did fix the last of the md5sum usage.

@ShyamsundarR let me know what you think is good spec and doc for this github issue for issuing the labels.

Member

amarts commented Aug 31, 2018

The 'Spec' part for this github is already described above, and we are solving it the same way.

There is no 'Doc' for this, other than mentioning in our release notes that we did fix the last of the md5sum usage.

@ShyamsundarR let me know what you think is good spec and doc for this github issue for issuing the labels.

@ShyamsundarR

This comment has been minimized.

Show comment
Hide comment
@ShyamsundarR

ShyamsundarR Aug 31, 2018

Contributor

The release notes are already updated in 4.0. So part or all of the doc is in there.

The Spec part I am thinking how to prevent future usage of such non-compliant code, maybe leave a note in the developer docs in the code base? and, of course code reviews.

Contributor

ShyamsundarR commented Aug 31, 2018

The release notes are already updated in 4.0. So part or all of the doc is in there.

The Spec part I am thinking how to prevent future usage of such non-compliant code, maybe leave a note in the developer docs in the code base? and, of course code reviews.

@amarts

This comment has been minimized.

Show comment
Hide comment
@amarts

amarts Aug 31, 2018

Member

The Spec part I am thinking how to prevent future usage of such non-compliant code, maybe leave a note in the developer docs in the code base? and, of course code reviews.

Reviews are fine... But meantime, I am thinking we should get a smoke test for the same?

Member

amarts commented Aug 31, 2018

The Spec part I am thinking how to prevent future usage of such non-compliant code, maybe leave a note in the developer docs in the code base? and, of course code reviews.

Reviews are fine... But meantime, I am thinking we should get a smoke test for the same?

gluster-ant pushed a commit that referenced this issue Sep 5, 2018

features/uss: Use xxh64 to generate gfid instead of md5sum
* This is to ensure FIPS support
* Also changed the signature of svs_uuid_generate to
  get xlator argument
* Added xxh64 wrapper functions in common-utils to
  generate gfid using xxh64
  - Those wrapper functions can be used by other xlators
    as well to generate gfids using xxh64. But as of now
    snapview-server is going to be the only consumer.

Change-Id: Ide66573125dd74122430cccc4c4dc2a376d642a2
Updates: #230
Signed-off-by: Raghavendra Manjunath <raghavendra@redhat.com>
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
@gluster-ant

This comment has been minimized.

Show comment
Hide comment
@gluster-ant

gluster-ant Sep 10, 2018

Collaborator

A patch https://review.gluster.org/21122 has been posted that references this issue.

features/snapview-server: remove the ret variable

Used only once to store the return value of a function.

Change-Id: Ib2b9db847b5f652ce46f220297d9c7c3503eaea2
fixes: #230
Signed-off-by: Raghavendra Bhat raghavendra@redhat.com

Collaborator

gluster-ant commented Sep 10, 2018

A patch https://review.gluster.org/21122 has been posted that references this issue.

features/snapview-server: remove the ret variable

Used only once to store the return value of a function.

Change-Id: Ib2b9db847b5f652ce46f220297d9c7c3503eaea2
fixes: #230
Signed-off-by: Raghavendra Bhat raghavendra@redhat.com

amarts added a commit to amarts/glusterfs that referenced this issue Sep 11, 2018

fips: Replace md5sum usage to enable fips support
md5sum is not fips compliant. Using xxhash64 instead of
md5sum for socket file generation in glusterd and
changelog to enable fips support.

NOTE: md5sum is 128 bit hash. xxhash used is 64 bit.

Updates: gluster#230
Change-Id: I1bf2ea05905b9151cd29fa951f903685ab0dc84c
Signed-off-by: Kotresh HR <khiremat@redhat.com>

amarts added a commit to amarts/glusterfs that referenced this issue Sep 11, 2018

rchecksum/fips: Replace MD5 usage to enable fips support
rchecksum uses MD5 which is not fips compliant. Hence
using sha256 for the same.

Updates: gluster#230
Change-Id: I7fad016fcc2a9900395d0da919cf5ba996ec5278
Signed-off-by: Kotresh HR <khiremat@redhat.com>

amarts added a commit to amarts/glusterfs that referenced this issue Sep 11, 2018

fips/geo-rep: Replace MD5 with SHA256
MD5 is not fips compliant. Hence replacing
with SHA256.

NOTE:
The hash is used to form the ctl_path for the ssh connection.
The length of ctl_path for ssh connection should not be > 108.
ssh fails with ctl_path too long if it is so. But when rsync
is piped to ssh, it is not taking > 90. rsync is failing with
error number 12. Hence using first 32 bytes of hash. Hash
collision doesn't matter as only one sock file is created
per directory.

Change-Id: I58aeb32a80b5422f6ac0188cf33fbecccbf08ae7
Updates: gluster#230
Signed-off-by: Kotresh HR <khiremat@redhat.com>

amarts added a commit to amarts/glusterfs that referenced this issue Sep 11, 2018

posix/afr: handle backward compatibility for rchecksum fop
Added a volume option 'fips-mode-rchecksum' tied to op version 4.
If not set, rchecksum fop will use MD5 instead of SHA256.

updates: gluster#230
Change-Id: Id8ea1303777e6450852c0bc25503cda341a6aec2
Signed-off-by: Ravishankar N <ravishankar@redhat.com>

amarts added a commit to amarts/glusterfs that referenced this issue Sep 11, 2018

features/uss: Use xxh64 to generate gfid instead of md5sum
* This is to ensure FIPS support
* Also changed the signature of svs_uuid_generate to
  get xlator argument
* Added xxh64 wrapper functions in common-utils to
  generate gfid using xxh64
  - Those wrapper functions can be used by other xlators
    as well to generate gfids using xxh64. But as of now
    snapview-server is going to be the only consumer.

Change-Id: Ide66573125dd74122430cccc4c4dc2a376d642a2
Updates: gluster#230
Signed-off-by: Raghavendra Manjunath <raghavendra@redhat.com>
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>

amarts added a commit to amarts/glusterfs that referenced this issue Sep 11, 2018

features/snapview-server: remove the ret variable
Used only once to store the return value of a function.

Change-Id: Ib2b9db847b5f652ce46f220297d9c7c3503eaea2
fixes: gluster#230
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment