-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dentry fop serializer xlator on brick stack #397
Comments
A patch https://review.gluster.org/13451 has been posted that references this issue. |
A patch https://review.gluster.org/19340 has been posted that references this issue. |
…rialization Problems addressed by this xlator : [1]. To prevent race between parallel mkdir,mkdir and lookup etc. Fops like mkdir/create, lookup, rename, unlink, link that happen on a particular dentry must be serialized to ensure atomicity. Another possible case can be a fresh lookup to find existance of a path whose gfid is not set yet. Further, storage/posix employs a ctime based heuristic 'is_fresh_file' (interval time is less than 1 second of current time) to check fresh-ness of file. With serialization of these two fops (lookup & mkdir), we eliminate the race altogether. [2]. Staleness of dentries This causes exponential increase in traversal time for any inode in the subtree of the directory pointed by stale dentry. Cause : Stale dentry is created because of following two operations: a. dentry creation due to inode_link, done during operations like lookup, mkdir, create, mknod, symlink, create and b. dentry unlinking due to various operations like rmdir, rename, unlink. The reason is __inode_link uses __is_dentry_cyclic, which explores all possible path to avoid cyclic link formation during inode linkage. __is_dentry_cyclic explores stale-dentry(ies) and its all ancestors which is increases traversing time exponentially. Implementation : To acheive this all fops on dentry must take entry locks before they proceed, once they have acquired locks, they perform the fop and then release the lock. Some documentation from email conversation: [1] http://www.gluster.org/pipermail/gluster-devel/2015-December/047314.html [2] http://www.gluster.org/pipermail/gluster-devel/2015-August/046428.html With this patch, the feature is optional, enable it by running: `gluster volume set $volname features.sdfs enable` Also the feature is tested for a month without issues in the experiemental branch for all the regression. Change-Id: I6e80ba3cabfa6facd5dda63bd482b9bf18b6b79b Fixes: #397 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Signed-off-by: Amar Tumballi <amarts@redhat.com> Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
A patch https://review.gluster.org/19444 has been posted that references this issue. |
A patch https://review.gluster.org/19445 has been posted that references this issue. |
* from the patch which got tested in experimental branch, there was a code cleanup involved, which missed setting of a local variable, which led to crash immediately after enabling the feature. * added a sanity test case to validate all the fops of sdfs. Updates: #397 Change-Id: I7e0bebfc195c344620577cb16c1afc5f4e7d2d92 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* from the patch which got tested in experimental branch, there was a code cleanup involved, which missed setting of a local variable, which led to crash immediately after enabling the feature. * added a sanity test case to validate all the fops of sdfs. Updates: #397 Change-Id: I7e0bebfc195c344620577cb16c1afc5f4e7d2d92 BUG: 1541117 Signed-off-by: Amar Tumballi <amarts@redhat.com>
There have been known races between fops which add a dentry (like lookup, create, mknod etc) and fops that remove a dentry (like rename, unlink, rmdir etc) due to which stale dentries are left out in inode table even though the dentry doesn't exist on backend. For eg., consider a lookup (parent/bname) and unlink (parent/bname) racing in the following order: * lookup hits storage/posix and finds that dentry exists * unlink removes the dentry on storage/posix * unlink reaches protocol/server where the dentry (parent/bname) is unlinked from the inode * lookup reaches protocol/server and creates a dentry (parent/bname) on the inode Now we've a stale dentry (parent/bname) associated with the inode in itable. This situation is bad for fops like link, create etc which invoke resolver with type RESOLVE_NOT. These fops fail with EEXIST even though there is no such dentry on backend fs. This issue can be solved in two ways: * Enable "dentry fop serializer" xlator [1]. # gluster volume set features.sdfs on * Make sure resolver does a lookup on backend when it finds a dentry in itable and validates the state of itable. - If a dentry is not found, unlink those stale dentries from itable and continue with fop - If dentry is found, fail the fop with EEXIST This patch implements second solution as sdfs is not enabled by default in brick xlator stack. Once sdfs is enabled by default, this patch can be reverted. [1] #397 Change-Id: Ia8bb0cf97f97cb0e72639bce8aadb0f6d3f4a34a updates: bz#1543279 BUG: 1543279 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
…rialization Problems addressed by this xlator : [1]. To prevent race between parallel mkdir,mkdir and lookup etc. Fops like mkdir/create, lookup, rename, unlink, link that happen on a particular dentry must be serialized to ensure atomicity. Another possible case can be a fresh lookup to find existance of a path whose gfid is not set yet. Further, storage/posix employs a ctime based heuristic 'is_fresh_file' (interval time is less than 1 second of current time) to check fresh-ness of file. With serialization of these two fops (lookup & mkdir), we eliminate the race altogether. [2]. Staleness of dentries This causes exponential increase in traversal time for any inode in the subtree of the directory pointed by stale dentry. Cause : Stale dentry is created because of following two operations: a. dentry creation due to inode_link, done during operations like lookup, mkdir, create, mknod, symlink, create and b. dentry unlinking due to various operations like rmdir, rename, unlink. The reason is __inode_link uses __is_dentry_cyclic, which explores all possible path to avoid cyclic link formation during inode linkage. __is_dentry_cyclic explores stale-dentry(ies) and its all ancestors which is increases traversing time exponentially. Implementation : To acheive this all fops on dentry must take entry locks before they proceed, once they have acquired locks, they perform the fop and then release the lock. Some documentation from email conversation: [1] http://www.gluster.org/pipermail/gluster-devel/2015-December/047314.html [2] http://www.gluster.org/pipermail/gluster-devel/2015-August/046428.html With this patch, the feature is optional, enable it by running: `gluster volume set $volname features.sdfs enable` Also the feature is tested for a month without issues in the experiemental branch for all the regression. Change-Id: I6e80ba3cabfa6facd5dda63bd482b9bf18b6b79b Fixes: gluster#397 BUG: 1304962 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Signed-off-by: Amar Tumballi <amarts@redhat.com> Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* from the patch which got tested in experimental branch, there was a code cleanup involved, which missed setting of a local variable, which led to crash immediately after enabling the feature. * added a sanity test case to validate all the fops of sdfs. Updates: gluster#397 Change-Id: I7e0bebfc195c344620577cb16c1afc5f4e7d2d92 Signed-off-by: Amar Tumballi <amarts@redhat.com>
There have been known races between fops which add a dentry (like lookup, create, mknod etc) and fops that remove a dentry (like rename, unlink, rmdir etc) due to which stale dentries are left out in inode table even though the dentry doesn't exist on backend. For eg., consider a lookup (parent/bname) and unlink (parent/bname) racing in the following order: * lookup hits storage/posix and finds that dentry exists * unlink removes the dentry on storage/posix * unlink reaches protocol/server where the dentry (parent/bname) is unlinked from the inode * lookup reaches protocol/server and creates a dentry (parent/bname) on the inode Now we've a stale dentry (parent/bname) associated with the inode in itable. This situation is bad for fops like link, create etc which invoke resolver with type RESOLVE_NOT. These fops fail with EEXIST even though there is no such dentry on backend fs. This issue can be solved in two ways: * Enable "dentry fop serializer" xlator [1]. # gluster volume set features.sdfs on * Make sure resolver does a lookup on backend when it finds a dentry in itable and validates the state of itable. - If a dentry is not found, unlink those stale dentries from itable and continue with fop - If dentry is found, fail the fop with EEXIST This patch implements second solution as sdfs is not enabled by default in brick xlator stack. Once sdfs is enabled by default, this patch can be reverted. [1] gluster#397 Change-Id: Ia8bb0cf97f97cb0e72639bce8aadb0f6d3f4a34a updates: bz#1543279 BUG: 1543279 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
There have been known races between fops which add a dentry (like lookup, create, mknod etc) and fops that remove a dentry (like rename, unlink, rmdir etc) due to which stale dentries are left out in inode table even though the dentry doesn't exist on backend. For eg., consider a lookup (parent/bname) and unlink (parent/bname) racing in the following order: * lookup hits storage/posix and finds that dentry exists * unlink removes the dentry on storage/posix * unlink reaches protocol/server where the dentry (parent/bname) is unlinked from the inode * lookup reaches protocol/server and creates a dentry (parent/bname) on the inode Now we've a stale dentry (parent/bname) associated with the inode in itable. This situation is bad for fops like link, create etc which invoke resolver with type RESOLVE_NOT. These fops fail with EEXIST even though there is no such dentry on backend fs. This issue can be solved in two ways: * Enable "dentry fop serializer" xlator [1]. # gluster volume set features.sdfs on * Make sure resolver does a lookup on backend when it finds a dentry in itable and validates the state of itable. - If a dentry is not found, unlink those stale dentries from itable and continue with fop - If dentry is found, fail the fop with EEXIST This patch implements second solution as sdfs is not enabled by default in brick xlator stack. Once sdfs is enabled by default, this patch can be reverted. [1] gluster#397 >Change-Id: Ia8bb0cf97f97cb0e72639bce8aadb0f6d3f4a34a >updates: bz#1543279 >BUG: 1543279 >Signed-off-by: Raghavendra G <rgowdapp@redhat.com> upstream patch: https://review.gluster.org/19732 BUG: 1488120 Change-Id: I78df4f6751e5db1fc659ee15d92b76d1486455f0 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: https://code.engineering.redhat.com/gerrit/138152 Tested-by: RHGS Build Bot <nigelb@redhat.com> Reviewed-by: Nithya Balachandran <nbalacha@redhat.com>
Problems addressed by this xlator :
To prevent race between parallel operations that create a new inode and a lookup on the dentry pointing to inode while creating or healing gfid handle.
Fops like mkdir/create/symlink/mknod that create a new gfid handle and lookup that heals a gfid handle (by creating a new one if gfid is not present) must be serialized to ensure atomicity while creating or healing gfid handles. Consider a fresh lookup to find existence of a path whose gfid is not set yet (the inode is being created). If we let fresh lookup to assign a gfid, we would end up with a random gfid. However, this is not acceptable when the same inode is created on other bricks by cluster xlators like afr, dht, disperse. Currently storage/posix employs a ctime based heuristic 'is_fresh_file' (interval time is less than 1 second of current time) to check fresh-ness of file. With serialization of these two fops (create class of fops and lookup), we eliminate the race altogether and hence we don't need heuristic based decision making.
Staleness of dentries
This causes exponential increase in inode-linking (with a dentry) time for any inode in the subtree of the directory pointed by stale dentry. The reason is __inode_link uses __is_dentry_cyclic, which explores all possible path to avoid cyclic link formation during inode linkage. __is_dentry_cyclic explores stale-dentry(ies) and its all ancestors which is increases traversing time exponentially.
Cause : Stale dentry is created because of race between two operations:
Non atomicity of operations like rename between modifying path and gfid handle
Note that in storage/posix, following two operations are not atomic during rename (src, dst):
So, if there are multiple renames on same file, we might end up having a gfid-handle pointing to a non-existent path. This problem can happen only for directories as handles are hardlinks for regular files. However, this problem is currently solved (even without having DFS) as DHT makes sure there are no parallel renames on same directory. When DFS was initially thought of, this problem was very painful one as DHT lacked synchronization for operations like rename on directories. Documenting this just as a historical tidbit, which may not have much relevance in the present.
Implementation : To acheive this all fops on dentry must take entry locks before they proceed, once they have acquired locks, they perform the fop and then release the lock.
With this patch, the feature is optional, enable it by running:
gluster volume set $volname features.sdfs enable
Also the feature is tested for a month without issues in the experiemental branch for all the regression.
Some earlier email conversations on this issue can be found here and here.
The text was updated successfully, but these errors were encountered: