Dentry fop serializer xlator on brick stack #397

raghavendrahg · 2018-01-22T08:38:46Z

Problems addressed by this xlator :

To prevent race between parallel operations that create a new inode and a lookup on the dentry pointing to inode while creating or healing gfid handle.
Fops like mkdir/create/symlink/mknod that create a new gfid handle and lookup that heals a gfid handle (by creating a new one if gfid is not present) must be serialized to ensure atomicity while creating or healing gfid handles. Consider a fresh lookup to find existence of a path whose gfid is not set yet (the inode is being created). If we let fresh lookup to assign a gfid, we would end up with a random gfid. However, this is not acceptable when the same inode is created on other bricks by cluster xlators like afr, dht, disperse. Currently storage/posix employs a ctime based heuristic 'is_fresh_file' (interval time is less than 1 second of current time) to check fresh-ness of file. With serialization of these two fops (create class of fops and lookup), we eliminate the race altogether and hence we don't need heuristic based decision making.
Staleness of dentries
This causes exponential increase in inode-linking (with a dentry) time for any inode in the subtree of the directory pointed by stale dentry. The reason is __inode_link uses __is_dentry_cyclic, which explores all possible path to avoid cyclic link formation during inode linkage. __is_dentry_cyclic explores stale-dentry(ies) and its all ancestors which is increases traversing time exponentially.

Cause : Stale dentry is created because of race between two operations:
- dentry creation due to inode_link, done during operations like lookup, mkdir, create, mknod, symlink, create
- dentry unlinking due to various operations like rmdir, rename, unlink.
Non atomicity of operations like rename between modifying path and gfid handle
Note that in storage/posix, following two operations are not atomic during rename (src, dst):
- renaming path src to dst
- associating handle of src - before rename say src-gfid - with dst post rename of path
So, if there are multiple renames on same file, we might end up having a gfid-handle pointing to a non-existent path. This problem can happen only for directories as handles are hardlinks for regular files. However, this problem is currently solved (even without having DFS) as DHT makes sure there are no parallel renames on same directory. When DFS was initially thought of, this problem was very painful one as DHT lacked synchronization for operations like rename on directories. Documenting this just as a historical tidbit, which may not have much relevance in the present.

Implementation : To acheive this all fops on dentry must take entry locks before they proceed, once they have acquired locks, they perform the fop and then release the lock.

With this patch, the feature is optional, enable it by running:

gluster volume set $volname features.sdfs enable

Also the feature is tested for a month without issues in the experiemental branch for all the regression.

Some earlier email conversations on this issue can be found here and here.

The text was updated successfully, but these errors were encountered:

gluster-ant · 2018-01-22T09:09:12Z

A patch https://review.gluster.org/13451 has been posted that references this issue.
Commit message: dentry fop serializer: added new server side xlator for dentry fop serialization

gluster-ant · 2018-01-26T11:38:11Z

A patch https://review.gluster.org/19340 has been posted that references this issue.
Commit message: dentry fop serializer: added new server side xlator for dentry fop serialization

…rialization Problems addressed by this xlator : [1]. To prevent race between parallel mkdir,mkdir and lookup etc. Fops like mkdir/create, lookup, rename, unlink, link that happen on a particular dentry must be serialized to ensure atomicity. Another possible case can be a fresh lookup to find existance of a path whose gfid is not set yet. Further, storage/posix employs a ctime based heuristic 'is_fresh_file' (interval time is less than 1 second of current time) to check fresh-ness of file. With serialization of these two fops (lookup & mkdir), we eliminate the race altogether. [2]. Staleness of dentries This causes exponential increase in traversal time for any inode in the subtree of the directory pointed by stale dentry. Cause : Stale dentry is created because of following two operations: a. dentry creation due to inode_link, done during operations like lookup, mkdir, create, mknod, symlink, create and b. dentry unlinking due to various operations like rmdir, rename, unlink. The reason is __inode_link uses __is_dentry_cyclic, which explores all possible path to avoid cyclic link formation during inode linkage. __is_dentry_cyclic explores stale-dentry(ies) and its all ancestors which is increases traversing time exponentially. Implementation : To acheive this all fops on dentry must take entry locks before they proceed, once they have acquired locks, they perform the fop and then release the lock. Some documentation from email conversation: [1] http://www.gluster.org/pipermail/gluster-devel/2015-December/047314.html [2] http://www.gluster.org/pipermail/gluster-devel/2015-August/046428.html With this patch, the feature is optional, enable it by running: `gluster volume set $volname features.sdfs enable` Also the feature is tested for a month without issues in the experiemental branch for all the regression. Change-Id: I6e80ba3cabfa6facd5dda63bd482b9bf18b6b79b Fixes: #397 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Signed-off-by: Amar Tumballi <amarts@redhat.com> Signed-off-by: Sunny Kumar <sunkumar@redhat.com>

gluster-ant · 2018-02-01T17:51:32Z

A patch https://review.gluster.org/19444 has been posted that references this issue.
Commit message: sdfs: crash fixes

gluster-ant · 2018-02-01T18:21:44Z

A patch https://review.gluster.org/19445 has been posted that references this issue.
Commit message: sdfs: crash fixes

* from the patch which got tested in experimental branch, there was a code cleanup involved, which missed setting of a local variable, which led to crash immediately after enabling the feature. * added a sanity test case to validate all the fops of sdfs. Updates: #397 Change-Id: I7e0bebfc195c344620577cb16c1afc5f4e7d2d92 Signed-off-by: Amar Tumballi <amarts@redhat.com>

* from the patch which got tested in experimental branch, there was a code cleanup involved, which missed setting of a local variable, which led to crash immediately after enabling the feature. * added a sanity test case to validate all the fops of sdfs. Updates: #397 Change-Id: I7e0bebfc195c344620577cb16c1afc5f4e7d2d92 BUG: 1541117 Signed-off-by: Amar Tumballi <amarts@redhat.com>

There have been known races between fops which add a dentry (like lookup, create, mknod etc) and fops that remove a dentry (like rename, unlink, rmdir etc) due to which stale dentries are left out in inode table even though the dentry doesn't exist on backend. For eg., consider a lookup (parent/bname) and unlink (parent/bname) racing in the following order: * lookup hits storage/posix and finds that dentry exists * unlink removes the dentry on storage/posix * unlink reaches protocol/server where the dentry (parent/bname) is unlinked from the inode * lookup reaches protocol/server and creates a dentry (parent/bname) on the inode Now we've a stale dentry (parent/bname) associated with the inode in itable. This situation is bad for fops like link, create etc which invoke resolver with type RESOLVE_NOT. These fops fail with EEXIST even though there is no such dentry on backend fs. This issue can be solved in two ways: * Enable "dentry fop serializer" xlator [1]. # gluster volume set features.sdfs on * Make sure resolver does a lookup on backend when it finds a dentry in itable and validates the state of itable. - If a dentry is not found, unlink those stale dentries from itable and continue with fop - If dentry is found, fail the fop with EEXIST This patch implements second solution as sdfs is not enabled by default in brick xlator stack. Once sdfs is enabled by default, this patch can be reverted. [1] #397 Change-Id: Ia8bb0cf97f97cb0e72639bce8aadb0f6d3f4a34a updates: bz#1543279 BUG: 1543279 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>

…rialization Problems addressed by this xlator : [1]. To prevent race between parallel mkdir,mkdir and lookup etc. Fops like mkdir/create, lookup, rename, unlink, link that happen on a particular dentry must be serialized to ensure atomicity. Another possible case can be a fresh lookup to find existance of a path whose gfid is not set yet. Further, storage/posix employs a ctime based heuristic 'is_fresh_file' (interval time is less than 1 second of current time) to check fresh-ness of file. With serialization of these two fops (lookup & mkdir), we eliminate the race altogether. [2]. Staleness of dentries This causes exponential increase in traversal time for any inode in the subtree of the directory pointed by stale dentry. Cause : Stale dentry is created because of following two operations: a. dentry creation due to inode_link, done during operations like lookup, mkdir, create, mknod, symlink, create and b. dentry unlinking due to various operations like rmdir, rename, unlink. The reason is __inode_link uses __is_dentry_cyclic, which explores all possible path to avoid cyclic link formation during inode linkage. __is_dentry_cyclic explores stale-dentry(ies) and its all ancestors which is increases traversing time exponentially. Implementation : To acheive this all fops on dentry must take entry locks before they proceed, once they have acquired locks, they perform the fop and then release the lock. Some documentation from email conversation: [1] http://www.gluster.org/pipermail/gluster-devel/2015-December/047314.html [2] http://www.gluster.org/pipermail/gluster-devel/2015-August/046428.html With this patch, the feature is optional, enable it by running: `gluster volume set $volname features.sdfs enable` Also the feature is tested for a month without issues in the experiemental branch for all the regression. Change-Id: I6e80ba3cabfa6facd5dda63bd482b9bf18b6b79b Fixes: gluster#397 BUG: 1304962 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Signed-off-by: Amar Tumballi <amarts@redhat.com> Signed-off-by: Sunny Kumar <sunkumar@redhat.com>

* from the patch which got tested in experimental branch, there was a code cleanup involved, which missed setting of a local variable, which led to crash immediately after enabling the feature. * added a sanity test case to validate all the fops of sdfs. Updates: gluster#397 Change-Id: I7e0bebfc195c344620577cb16c1afc5f4e7d2d92 Signed-off-by: Amar Tumballi <amarts@redhat.com>

There have been known races between fops which add a dentry (like lookup, create, mknod etc) and fops that remove a dentry (like rename, unlink, rmdir etc) due to which stale dentries are left out in inode table even though the dentry doesn't exist on backend. For eg., consider a lookup (parent/bname) and unlink (parent/bname) racing in the following order: * lookup hits storage/posix and finds that dentry exists * unlink removes the dentry on storage/posix * unlink reaches protocol/server where the dentry (parent/bname) is unlinked from the inode * lookup reaches protocol/server and creates a dentry (parent/bname) on the inode Now we've a stale dentry (parent/bname) associated with the inode in itable. This situation is bad for fops like link, create etc which invoke resolver with type RESOLVE_NOT. These fops fail with EEXIST even though there is no such dentry on backend fs. This issue can be solved in two ways: * Enable "dentry fop serializer" xlator [1]. # gluster volume set features.sdfs on * Make sure resolver does a lookup on backend when it finds a dentry in itable and validates the state of itable. - If a dentry is not found, unlink those stale dentries from itable and continue with fop - If dentry is found, fail the fop with EEXIST This patch implements second solution as sdfs is not enabled by default in brick xlator stack. Once sdfs is enabled by default, this patch can be reverted. [1] gluster#397 Change-Id: Ia8bb0cf97f97cb0e72639bce8aadb0f6d3f4a34a updates: bz#1543279 BUG: 1543279 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>

There have been known races between fops which add a dentry (like lookup, create, mknod etc) and fops that remove a dentry (like rename, unlink, rmdir etc) due to which stale dentries are left out in inode table even though the dentry doesn't exist on backend. For eg., consider a lookup (parent/bname) and unlink (parent/bname) racing in the following order: * lookup hits storage/posix and finds that dentry exists * unlink removes the dentry on storage/posix * unlink reaches protocol/server where the dentry (parent/bname) is unlinked from the inode * lookup reaches protocol/server and creates a dentry (parent/bname) on the inode Now we've a stale dentry (parent/bname) associated with the inode in itable. This situation is bad for fops like link, create etc which invoke resolver with type RESOLVE_NOT. These fops fail with EEXIST even though there is no such dentry on backend fs. This issue can be solved in two ways: * Enable "dentry fop serializer" xlator [1]. # gluster volume set features.sdfs on * Make sure resolver does a lookup on backend when it finds a dentry in itable and validates the state of itable. - If a dentry is not found, unlink those stale dentries from itable and continue with fop - If dentry is found, fail the fop with EEXIST This patch implements second solution as sdfs is not enabled by default in brick xlator stack. Once sdfs is enabled by default, this patch can be reverted. [1] gluster#397 >Change-Id: Ia8bb0cf97f97cb0e72639bce8aadb0f6d3f4a34a >updates: bz#1543279 >BUG: 1543279 >Signed-off-by: Raghavendra G <rgowdapp@redhat.com> upstream patch: https://review.gluster.org/19732 BUG: 1488120 Change-Id: I78df4f6751e5db1fc659ee15d92b76d1486455f0 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/138152 Tested-by: RHGS Build Bot <nigelb@redhat.com> Reviewed-by: Nithya Balachandran <nbalacha@redhat.com>

raghavendrahg added this to the Release 4.0 (STM) milestone Jan 22, 2018

mscherer closed this as completed in 7886803 Jan 24, 2018

ShyamsundarR added this to Release 4.0 in Releases Jan 25, 2018

ShyamsundarR mentioned this issue Jan 29, 2018

Testing md links ShyamsundarR/glusterfs#8

Open

raghavendrahg mentioned this issue Mar 19, 2018

Turn on Dentry fop serializer by default in brick stack #421

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dentry fop serializer xlator on brick stack #397

Dentry fop serializer xlator on brick stack #397

raghavendrahg commented Jan 22, 2018 •

edited

Loading

gluster-ant commented Jan 22, 2018

gluster-ant commented Jan 26, 2018

gluster-ant commented Feb 1, 2018

gluster-ant commented Feb 1, 2018

Dentry fop serializer xlator on brick stack #397

Dentry fop serializer xlator on brick stack #397

Comments

raghavendrahg commented Jan 22, 2018 • edited Loading

gluster-ant commented Jan 22, 2018

gluster-ant commented Jan 26, 2018

gluster-ant commented Feb 1, 2018

gluster-ant commented Feb 1, 2018

raghavendrahg commented Jan 22, 2018 •

edited

Loading