ceph_lock_op2 is set lock implement have a bug? #205

JYang1986 · 2017-09-06T06:52:34Z

node1： when I in client1 set file range lock of 1.txt in ceph cluster by nfs-gansha1 server，
node2： then in node2 client2 set the same range of 1.txt by nfs-gansha2 server success.
then I see the implement of ceph_lock_op2, is some different from glusterfs_lock_op2

ceph_lock_op2

fsal_status_t ceph_lock_op2(struct fsal_obj_handle *obj_hdl,
			    struct state_t *state,
			    void *owner,
			    fsal_lock_op_t lock_op,
			    fsal_lock_param_t *request_lock,
			    fsal_lock_param_t *conflicting_lock)
{
	struct handle *myself = container_of(obj_hdl, struct handle, handle);
	struct flock lock_args;
...
	if (lock_op == FSAL_OP_LOCKT) {
		retval = ceph_ll_getlk(export->cmount, my_fd, &lock_args,
				       (uint64_t) owner);  
	} else {
		retval = ceph_ll_setlk(export->cmount, my_fd, &lock_args,
				       (uint64_t) owner, false); this retval = -11;
	}

	if (retval < 0) {
		LogDebug(COMPONENT_FSAL,
			 "%s returned %d %s",
			 lock_op == FSAL_OP_LOCKT
				? "ceph_ll_getlk" : "ceph_ll_setlk",
			 -retval, strerror(-retval));

		if (conflicting_lock != NULL) {
			/* Get the conflicting lock */
			retval = ceph_ll_getlk(export->cmount, my_fd,
					       &lock_args, (uint64_t) owner);   this get conflicting_lock success,then retval set by 0_

			if (retval < 0) {
				LogCrit(COMPONENT_FSAL,
					"After failing a lock request, I couldn't even get the details of who owns the lock, error %d %s",
					-retval, strerror(-retval));
				goto err;  set lock failed, return 0  is this ok?
			}

			if (conflicting_lock != NULL) {
				conflicting_lock->lock_length = lock_args.l_len;
				conflicting_lock->lock_start = lock_args.l_start;
				conflicting_lock->lock_type = lock_args.l_type;
			}
		}

		goto err;
	}
....
 err:

	if (closefd)
		(void) ceph_ll_close(myself->export->cmount, my_fd);

	if (has_lock)
		PTHREAD_RWLOCK_unlock(&obj_hdl->obj_lock);

	return ceph2fsal_error(retval);  
}

glusterfs_lock_op2

static fsal_status_t glusterfs_lock_op2(struct fsal_obj_handle *obj_hdl,
					struct state_t *state,
					void *p_owner,
					fsal_lock_op_t lock_op,
					fsal_lock_param_t *request_lock,
					fsal_lock_param_t *conflicting_lock)
{
	struct flock lock_args;
...

	retval = glfs_posix_lock(my_fd.glfd, fcntl_comm, &lock_args);

	if (retval /* && lock_op == FSAL_OP_LOCK */) {
		retval = errno;
		int rc = 0;  gluster use temporary rc to save return value

		LogDebug(COMPONENT_FSAL,
			 "fcntl returned %d %s",
			 retval, strerror(retval));

		if (conflicting_lock != NULL) {
			/* Get the conflicting lock */
			rc = glfs_posix_lock(my_fd.glfd, F_GETLK,
						 &lock_args);

			if (rc) {  if error， then reset the retval by errno.
				retval = errno; /* we lose the initial error */
				LogCrit(COMPONENT_FSAL,
					"After failing a lock request, I couldn't even get the details of who owns the lock.");
				goto err;
			}

			conflicting_lock->lock_length = lock_args.l_len;
			conflicting_lock->lock_start = lock_args.l_start;
			conflicting_lock->lock_type = lock_args.l_type;
		}

		goto err;
	}
...

 err:
	SET_GLUSTER_CREDS(glfs_export, NULL, NULL, 0, NULL);

	if (closefd)
		glusterfs_close_my_fd(&my_fd);

	if (has_lock)
		PTHREAD_RWLOCK_unlock(&obj_hdl->obj_lock);

	return fsalstat(posix2fsal_error(retval), retval);
}

The text was updated successfully, but these errors were encountered:

ffilz · 2017-09-06T13:53:35Z

Hmm, I don't see a substantial difference. There is one simplification that could be made to ceph_lock_op2 for more readability (redundant testing conflicting_lock != NULL inside an if conflicting_lock !=NULL, but of course the compiler will have optimized that out...).

There could be a bug in ceph_ll_setlk...

JYang1986 · 2017-09-07T01:05:26Z

@ffilz
I mean ceph_ll_setlk have a bug。I have write wrong site 。。。。

jtlayton · 2017-09-08T14:19:28Z

@Saber-Yang can you open a bug at https://tracker.ceph.com ? I'll plan to look at this soon if so.

JYang1986 · 2017-09-09T03:44:42Z

@jtlayton
hello jtlayton，I have another question。why first ceph_ll_getlk failed ， and if conflicting_lock ！= NULL， ceph_ll_getlk again，this is must？

JYang1986 · 2017-09-16T01:48:59Z

@jtlayton
Is this ceph's bug or nfs-ganesha bug? I have open a bug
http://tracker.ceph.com/issues/21413

jtlayton · 2017-09-16T10:57:27Z

On Fri, 2017-09-15 at 18:49 -0700, Saber-Yang wrote: @jtlayton Is this ceph's bug or nfs-ganesha bug? I have open a bug http://tracker.ceph.com/issues/21413

Thanks for opening the bug. I'm not sure yet where the bug is -- we'll need to do some tests with ceph alone to see if the locking there actually works. I'll grab the bug, but no idea when I'll have time to work on it. Thanks, -- Jeff Layton <jlayton@redhat.com>

ffilz · 2017-09-18T13:40:42Z

@jlayton - note that nfs-ganesha src/tools/multilock has ml_cephfs_client.c which was directly drive libcephfs for locking.

jtlayton · 2017-10-06T18:37:11Z

@Saber-Yang was correct initially -- this is a bug in FSAL_CEPH code. The getlk call ends up clobbering the -EAGAIN from the earlier setlk failure. Gerritt Review request here:

https://review.gerrithub.io/#/c/381714/

If a lock is denied, the code will call getlk to get the conflicting lock info. That action then clobbers the return code and makes the lock appear to be a success. Also, no need to check conflicting_lock twice here. See: #205 Change-Id: Ibfc8ca92bec84518573f425131ce969479ae15dd Signed-off-by: Jeff Layton <jlayton@redhat.com>

JYang1986 · 2017-10-08T07:35:25Z

https://review.gerrithub.io/#/c/381714/ is OK. I will close this issue.

If a lock is denied, the code will call getlk to get the conflicting lock info. That action then clobbers the return code and makes the lock appear to be a success. Also, no need to check conflicting_lock twice here. See: nfs-ganesha/nfs-ganesha#205 Change-Id: Ibfc8ca92bec84518573f425131ce969479ae15dd Signed-off-by: Jeff Layton <jlayton@redhat.com> (cherry picked from commit d9f0536) Resolves: rhbz#1500669

If a lock is denied, the code will call getlk to get the conflicting lock info. That action then clobbers the return code and makes the lock appear to be a success. Also, no need to check conflicting_lock twice here. See: nfs-ganesha#205 Change-Id: Ibfc8ca92bec84518573f425131ce969479ae15dd Signed-off-by: Jeff Layton <jlayton@redhat.com> (cherry picked from commit d9f0536)

JYang1986 closed this as completed Oct 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ceph_lock_op2 is set lock implement have a bug? #205

ceph_lock_op2 is set lock implement have a bug? #205

JYang1986 commented Sep 6, 2017 •

edited

ffilz commented Sep 6, 2017

JYang1986 commented Sep 7, 2017

jtlayton commented Sep 8, 2017 •

edited

JYang1986 commented Sep 9, 2017

JYang1986 commented Sep 16, 2017

jtlayton commented Sep 16, 2017 via email

ffilz commented Sep 18, 2017

jtlayton commented Oct 6, 2017

JYang1986 commented Oct 8, 2017

ceph_lock_op2 is set lock implement have a bug? #205

ceph_lock_op2 is set lock implement have a bug? #205

Comments

JYang1986 commented Sep 6, 2017 • edited

ffilz commented Sep 6, 2017

JYang1986 commented Sep 7, 2017

jtlayton commented Sep 8, 2017 • edited

JYang1986 commented Sep 9, 2017

JYang1986 commented Sep 16, 2017

jtlayton commented Sep 16, 2017 via email

ffilz commented Sep 18, 2017

jtlayton commented Oct 6, 2017

JYang1986 commented Oct 8, 2017

JYang1986 commented Sep 6, 2017 •

edited

jtlayton commented Sep 8, 2017 •

edited