Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dentry fop serializer xlator on brick stack #397

Closed
raghavendrahg opened this issue Jan 22, 2018 · 4 comments
Closed

Dentry fop serializer xlator on brick stack #397

raghavendrahg opened this issue Jan 22, 2018 · 4 comments

Comments

@raghavendrahg
Copy link
Member

raghavendrahg commented Jan 22, 2018

Problems addressed by this xlator :

  • To prevent race between parallel operations that create a new inode and a lookup on the dentry pointing to inode while creating or healing gfid handle.
    Fops like mkdir/create/symlink/mknod that create a new gfid handle and lookup that heals a gfid handle (by creating a new one if gfid is not present) must be serialized to ensure atomicity while creating or healing gfid handles. Consider a fresh lookup to find existence of a path whose gfid is not set yet (the inode is being created). If we let fresh lookup to assign a gfid, we would end up with a random gfid. However, this is not acceptable when the same inode is created on other bricks by cluster xlators like afr, dht, disperse. Currently storage/posix employs a ctime based heuristic 'is_fresh_file' (interval time is less than 1 second of current time) to check fresh-ness of file. With serialization of these two fops (create class of fops and lookup), we eliminate the race altogether and hence we don't need heuristic based decision making.

  • Staleness of dentries
    This causes exponential increase in inode-linking (with a dentry) time for any inode in the subtree of the directory pointed by stale dentry. The reason is __inode_link uses __is_dentry_cyclic, which explores all possible path to avoid cyclic link formation during inode linkage. __is_dentry_cyclic explores stale-dentry(ies) and its all ancestors which is increases traversing time exponentially.

    Cause : Stale dentry is created because of race between two operations:

    • dentry creation due to inode_link, done during operations like lookup, mkdir, create, mknod, symlink, create
    • dentry unlinking due to various operations like rmdir, rename, unlink.
  • Non atomicity of operations like rename between modifying path and gfid handle
    Note that in storage/posix, following two operations are not atomic during rename (src, dst):

    • renaming path src to dst
    • associating handle of src - before rename say src-gfid - with dst post rename of path

    So, if there are multiple renames on same file, we might end up having a gfid-handle pointing to a non-existent path. This problem can happen only for directories as handles are hardlinks for regular files. However, this problem is currently solved (even without having DFS) as DHT makes sure there are no parallel renames on same directory. When DFS was initially thought of, this problem was very painful one as DHT lacked synchronization for operations like rename on directories. Documenting this just as a historical tidbit, which may not have much relevance in the present.

Implementation : To acheive this all fops on dentry must take entry locks before they proceed, once they have acquired locks, they perform the fop and then release the lock.

With this patch, the feature is optional, enable it by running:

gluster volume set $volname features.sdfs enable

Also the feature is tested for a month without issues in the experiemental branch for all the regression.

Some earlier email conversations on this issue can be found here and here.

@raghavendrahg raghavendrahg added this to the Release 4.0 (STM) milestone Jan 22, 2018
@gluster-ant
Copy link
Collaborator

A patch https://review.gluster.org/13451 has been posted that references this issue.
Commit message: dentry fop serializer: added new server side xlator for dentry fop serialization

@ShyamsundarR ShyamsundarR added this to Release 4.0 in Releases Jan 25, 2018
@gluster-ant
Copy link
Collaborator

A patch https://review.gluster.org/19340 has been posted that references this issue.
Commit message: dentry fop serializer: added new server side xlator for dentry fop serialization

mscherer pushed a commit that referenced this issue Jan 30, 2018
…rialization

Problems addressed by this xlator :

[1]. To prevent race between parallel mkdir,mkdir and lookup etc.

Fops like mkdir/create, lookup, rename, unlink, link that happen on a
particular dentry must be serialized to ensure atomicity.

Another possible case can be a fresh lookup to find existance of a path
whose gfid is not set yet. Further, storage/posix employs a ctime based
heuristic 'is_fresh_file' (interval time is less than 1 second of current
time) to check fresh-ness of file. With serialization of these two fops
(lookup & mkdir), we eliminate the race altogether.

[2]. Staleness of dentries

This causes exponential increase in traversal time for any inode in the
subtree of the directory pointed by stale dentry.

Cause :  Stale dentry is created because of following two operations:

      a. dentry creation due to inode_link, done during operations like
         lookup, mkdir, create, mknod, symlink, create and
      b. dentry unlinking due to various operations like rmdir, rename,
         unlink.

       The reason is __inode_link uses __is_dentry_cyclic, which explores
       all possible path to avoid cyclic link formation during inode
       linkage. __is_dentry_cyclic explores stale-dentry(ies) and its
       all ancestors which is increases traversing time exponentially.

Implementation : To acheive this all fops on dentry must take entry locks
before they proceed, once they have acquired locks, they perform the fop
and then release the lock.

Some documentation from email conversation:
[1] http://www.gluster.org/pipermail/gluster-devel/2015-December/047314.html

[2] http://www.gluster.org/pipermail/gluster-devel/2015-August/046428.html

With this patch, the feature is optional, enable it by running:

 `gluster volume set $volname features.sdfs enable`

Also the feature is tested for a month without issues in the
experiemental branch for all the regression.

Change-Id: I6e80ba3cabfa6facd5dda63bd482b9bf18b6b79b
Fixes: #397
Signed-off-by: Sakshi Bansal <sabansal@redhat.com>
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
@gluster-ant
Copy link
Collaborator

A patch https://review.gluster.org/19444 has been posted that references this issue.
Commit message: sdfs: crash fixes

@gluster-ant
Copy link
Collaborator

A patch https://review.gluster.org/19445 has been posted that references this issue.
Commit message: sdfs: crash fixes

mscherer pushed a commit that referenced this issue Feb 2, 2018
* from the patch which got tested in experimental branch, there
  was a code cleanup involved, which missed setting of a local
  variable, which led to crash immediately after enabling the
  feature.
* added a sanity test case to validate all the fops of sdfs.

Updates: #397

Change-Id: I7e0bebfc195c344620577cb16c1afc5f4e7d2d92
Signed-off-by: Amar Tumballi <amarts@redhat.com>
mscherer pushed a commit that referenced this issue Feb 2, 2018
* from the patch which got tested in experimental branch, there
  was a code cleanup involved, which missed setting of a local
  variable, which led to crash immediately after enabling the
  feature.
* added a sanity test case to validate all the fops of sdfs.

Updates: #397

Change-Id: I7e0bebfc195c344620577cb16c1afc5f4e7d2d92
BUG: 1541117
Signed-off-by: Amar Tumballi <amarts@redhat.com>
gluster-ant pushed a commit that referenced this issue Apr 30, 2018
There have been known races between fops which add a dentry (like
lookup, create, mknod etc) and fops that remove a dentry (like rename,
unlink, rmdir etc) due to which stale dentries are left out in inode
table even though the dentry doesn't exist on backend. For eg.,
consider a lookup (parent/bname) and unlink (parent/bname) racing in
the following order:

* lookup hits storage/posix and finds that dentry exists
* unlink removes the dentry on storage/posix
* unlink reaches protocol/server where the dentry (parent/bname) is
  unlinked from the inode
* lookup reaches protocol/server and creates a dentry (parent/bname)
  on the inode

Now we've a stale dentry (parent/bname) associated with the inode in
itable. This situation is bad for fops like link, create etc which
invoke resolver with type RESOLVE_NOT. These fops fail with EEXIST
even though there is no such dentry on backend fs. This issue can be
solved in two ways:

* Enable "dentry fop serializer" xlator [1].
  # gluster volume set features.sdfs on

* Make sure resolver does a lookup on backend when it finds a dentry
  in itable and validates the state of itable.
   - If a dentry is not found, unlink those stale dentries from itable
     and continue with fop
   - If dentry is found, fail the fop with EEXIST

This patch implements second solution as sdfs is not enabled by
default in brick xlator stack. Once sdfs is enabled by default, this
patch can be reverted.

[1] #397

Change-Id: Ia8bb0cf97f97cb0e72639bce8aadb0f6d3f4a34a
updates: bz#1543279
BUG: 1543279
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
amarts pushed a commit to amarts/glusterfs_fork that referenced this issue Sep 11, 2018
…rialization

Problems addressed by this xlator :

[1]. To prevent race between parallel mkdir,mkdir and lookup etc.

Fops like mkdir/create, lookup, rename, unlink, link that happen on a
particular dentry must be serialized to ensure atomicity.

Another possible case can be a fresh lookup to find existance of a path
whose gfid is not set yet. Further, storage/posix employs a ctime based
heuristic 'is_fresh_file' (interval time is less than 1 second of current
time) to check fresh-ness of file. With serialization of these two fops
(lookup & mkdir), we eliminate the race altogether.

[2]. Staleness of dentries

This causes exponential increase in traversal time for any inode in the
subtree of the directory pointed by stale dentry.

Cause :  Stale dentry is created because of following two operations:

      a. dentry creation due to inode_link, done during operations like
         lookup, mkdir, create, mknod, symlink, create and
      b. dentry unlinking due to various operations like rmdir, rename,
         unlink.

       The reason is __inode_link uses __is_dentry_cyclic, which explores
       all possible path to avoid cyclic link formation during inode
       linkage. __is_dentry_cyclic explores stale-dentry(ies) and its
       all ancestors which is increases traversing time exponentially.

Implementation : To acheive this all fops on dentry must take entry locks
before they proceed, once they have acquired locks, they perform the fop
and then release the lock.

Some documentation from email conversation:
[1] http://www.gluster.org/pipermail/gluster-devel/2015-December/047314.html

[2] http://www.gluster.org/pipermail/gluster-devel/2015-August/046428.html

With this patch, the feature is optional, enable it by running:

 `gluster volume set $volname features.sdfs enable`

Also the feature is tested for a month without issues in the
experiemental branch for all the regression.

Change-Id: I6e80ba3cabfa6facd5dda63bd482b9bf18b6b79b
Fixes: gluster#397
BUG: 1304962
Signed-off-by: Sakshi Bansal <sabansal@redhat.com>
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
amarts added a commit to amarts/glusterfs_fork that referenced this issue Sep 11, 2018
* from the patch which got tested in experimental branch, there
  was a code cleanup involved, which missed setting of a local
  variable, which led to crash immediately after enabling the
  feature.
* added a sanity test case to validate all the fops of sdfs.

Updates: gluster#397

Change-Id: I7e0bebfc195c344620577cb16c1afc5f4e7d2d92
Signed-off-by: Amar Tumballi <amarts@redhat.com>
amarts pushed a commit to amarts/glusterfs_fork that referenced this issue Sep 11, 2018
There have been known races between fops which add a dentry (like
lookup, create, mknod etc) and fops that remove a dentry (like rename,
unlink, rmdir etc) due to which stale dentries are left out in inode
table even though the dentry doesn't exist on backend. For eg.,
consider a lookup (parent/bname) and unlink (parent/bname) racing in
the following order:

* lookup hits storage/posix and finds that dentry exists
* unlink removes the dentry on storage/posix
* unlink reaches protocol/server where the dentry (parent/bname) is
  unlinked from the inode
* lookup reaches protocol/server and creates a dentry (parent/bname)
  on the inode

Now we've a stale dentry (parent/bname) associated with the inode in
itable. This situation is bad for fops like link, create etc which
invoke resolver with type RESOLVE_NOT. These fops fail with EEXIST
even though there is no such dentry on backend fs. This issue can be
solved in two ways:

* Enable "dentry fop serializer" xlator [1].
  # gluster volume set features.sdfs on

* Make sure resolver does a lookup on backend when it finds a dentry
  in itable and validates the state of itable.
   - If a dentry is not found, unlink those stale dentries from itable
     and continue with fop
   - If dentry is found, fail the fop with EEXIST

This patch implements second solution as sdfs is not enabled by
default in brick xlator stack. Once sdfs is enabled by default, this
patch can be reverted.

[1] gluster#397

Change-Id: Ia8bb0cf97f97cb0e72639bce8aadb0f6d3f4a34a
updates: bz#1543279
BUG: 1543279
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
guihecheng pushed a commit to guihecheng/glusterfs that referenced this issue Nov 13, 2019
There have been known races between fops which add a dentry (like
lookup, create, mknod etc) and fops that remove a dentry (like rename,
unlink, rmdir etc) due to which stale dentries are left out in inode
table even though the dentry doesn't exist on backend. For eg.,
consider a lookup (parent/bname) and unlink (parent/bname) racing in
the following order:

* lookup hits storage/posix and finds that dentry exists
* unlink removes the dentry on storage/posix
* unlink reaches protocol/server where the dentry (parent/bname) is
  unlinked from the inode
* lookup reaches protocol/server and creates a dentry (parent/bname)
  on the inode

Now we've a stale dentry (parent/bname) associated with the inode in
itable. This situation is bad for fops like link, create etc which
invoke resolver with type RESOLVE_NOT. These fops fail with EEXIST
even though there is no such dentry on backend fs. This issue can be
solved in two ways:

* Enable "dentry fop serializer" xlator [1].
  # gluster volume set features.sdfs on

* Make sure resolver does a lookup on backend when it finds a dentry
  in itable and validates the state of itable.
   - If a dentry is not found, unlink those stale dentries from itable
     and continue with fop
   - If dentry is found, fail the fop with EEXIST

This patch implements second solution as sdfs is not enabled by
default in brick xlator stack. Once sdfs is enabled by default, this
patch can be reverted.

[1] gluster#397

>Change-Id: Ia8bb0cf97f97cb0e72639bce8aadb0f6d3f4a34a
>updates: bz#1543279
>BUG: 1543279
>Signed-off-by: Raghavendra G <rgowdapp@redhat.com>

upstream patch: https://review.gluster.org/19732
BUG: 1488120
Change-Id: I78df4f6751e5db1fc659ee15d92b76d1486455f0
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: https://code.engineering.redhat.com/gerrit/138152
Tested-by: RHGS Build Bot <nigelb@redhat.com>
Reviewed-by: Nithya Balachandran <nbalacha@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants