inode: implement namespace at inode level #1763

amarts · 2020-11-06T11:08:18Z

With this PR, specially on the brick side graph, inode table would be
properly set with 'namespace' inode reference. It is not guaranteed
with fuse/client side graph due to subdir mount.

To get 'namespace' for a corresponding inode, all one needs to do is,
check ns_inode pointer in inode structure.

Currently only special mounts with PID < 0 can set the namespace
attribute, and in lookup, if namespace attribute is present, we set
the variable in inode.

By default, the ns_inode is set to 'root' inode when inode gets created.

Fixes: #1757
Change-Id: I69157e388538ea5d4b4e45d543575a04ee9ef221
Signed-off-by: Amar Tumballi amar@kadalu.io

amarts · 2020-11-08T10:54:31Z

/run regression

amarts · 2020-11-09T05:08:41Z

/run full regression

amarts · 2020-11-09T10:40:24Z

/run regression

amarts · 2020-11-09T18:07:47Z

/run regression

libglusterfs/src/glusterfs/inode.h

xlators/protocol/server/src/server-common.c

amarts · 2020-11-10T17:28:30Z

/run regression

pranithk · 2020-11-16T12:15:38Z

Some doubts:

Who sets namespace xattr on the directory?
Can namespace be nested?

amarts · 2020-11-16T12:56:32Z

Who sets namespace xattr on the directory?

Ideally, the management layer. Depending on the option (if it needs namespace or not). An example here: https://github.com/amarts/glusterfs_fork/blob/simple-quota_v2/extras/hook-scripts/set/post/S61simple-quota.sh

As a check so just any random person doesn't set the flag, planning to check for pid < 0 as a limit.

Can namespace be nested?

For now, the implementation I was thinking says it can be nested. Only thing is, performing a rename / hardlink across namespaces would be problematic, and not ideal.

I guess if everyone agrees, we can just add that check in inode_link(), but I didn't want to bring that check now.

pranithk · 2020-11-16T13:13:05Z

Who sets namespace xattr on the directory?

Ideally, the management layer. Depending on the option (if it needs namespace or not). An example here: https://github.com/amarts/glusterfs_fork/blob/simple-quota_v2/extras/hook-scripts/set/post/S61simple-quota.sh

As a check so just any random person doesn't set the flag, planning to check for pid < 0 as a limit.

Can namespace be nested?

For now, the implementation I was thinking says it can be nested. Only thing is, performing a rename / hardlink across namespaces would be problematic, and not ideal.

I personally understood namespace + quota to be a private share from a big volume. Just like we carve out logical-disks using LVM from a pool of physical disks. Is that not the case?

I guess if everyone agrees, we can just add that check in inode_link(), but I didn't want to bring that check now.

amarts · 2020-11-16T13:18:42Z

I personally understood namespace + quota to be a private share from a big volume. Just like we carve out logical-disks using LVM from a pool of physical disks. Is that not the case?

Yes, it is the case. But there is one difference. GlusterFS never had concept of namespace, so everything below '/' was treated as single tree. With this introduction, because we keep namespace reference in every inode, it becomes like more of xfs project quota like feature, where within a single project any operations are permitted, but across project's no hardlink/renames are allowed. Even the accounting is separate, ie, assume below:

/ and /abcd and /abcd/xyz (treat each of these as projects in xfs). If there is a 100MB file inside of /abcd/xyz it is not counted inside / and /abcd's accounting.

so technically, each of those shares work as a separate filesystem inside, instead of one.

amarts · 2020-11-19T09:57:19Z

/run regression

amarts · 2020-11-20T06:08:05Z

@pranithk now, namespace gets set in client side also!

amarts · 2021-02-23T10:15:45Z

/run regression

amarts · 2021-02-23T17:14:38Z

/run regression

xhernandez

The management in the client side seems a bit weak to me. We may have multiple clients working at the same time, and the namespace can be changed by anyone of them at any time. This change is not consistently identified by the other clients, probably causing inconsistencies in the data related to namespaces.

Another issue that I see is that on graph switch we send lookups in the new graph in a random order (using gfid-based lookups). It seems quite hard to correctly set the namespaces for each inode in this scenario. It seems to me there could be a lot of races with the running requests (even assuming that the namespace is updated recursively, which is not right now).

libglusterfs/src/inode.c

xhernandez · 2021-11-12T11:08:29Z

libglusterfs/src/inode.c

        /* pick the old dentry */
-        dentry = __inode_unlink(inode, srcdir, srcname);
+        if (linked_inode) {


If this is NULL, shouldn't we return an error ?

Been sometime I tested this. But without this fix, there was a crash when a cross namespace rename was attempted. (this may be due to the review comment you gave on line 212. Will retest). I also added this, because there are many cases of return NULL; in __inode_link(), in which case, I felt we shouldn't proceed with __inode_unlink() of the source. (Can this be a different bug)

My comment was not meant to remove this check, but if this check fails, can we continue without returning an error ?

However, after second thought I think it's fine.

Will keep it as is, but add a reference about this comment. as a comment in the code.

libglusterfs/src/inode.c

xlators/mount/fuse/src/fuse-bridge.c

xlators/protocol/server/src/server-rpc-fops_v2.c

xhernandez · 2021-11-12T14:24:06Z

xlators/protocol/server/src/server-rpc-fops_v2.c

+                       "dict set (namespace) failed (path: %s), continuing",
+                       state->loc.path);
+            }
+            if (state->loc.path && (state->loc.path[0] == '<')) {


I think we shouldn't allow lookups based only on gfid. Kernel doesn't do that and it's a basic thing to keep any kind of hierarchical structure consistent. Last time I checked, I think the only place where we really need gfid-based lookups is on a graph switch, but even in this case we could reimplement it in a way that gfid lookups are not really needed.

Anyway this is another discussion and something for another patch.

xhernandez · 2021-11-12T14:24:38Z

xlators/protocol/server/src/server-rpc-fops_v2.c

+                /* This is a lookup on gfid : get full-path */
+                ret = dict_set_int32(xdata, "get-full-path", 1);
+                if (ret) {
+                    gf_msg_debug(


Can we continue in case of failure ?

As I mentioned, its for information right now, and made the log as INFO for now. Working on a recursive lookup for handling gfid based lookups. Should be finished before next major release, so I believe this is good to get in.

xhernandez · 2021-11-12T14:40:55Z

xlators/storage/posix/src/posix-entry-ops.c

+       leaving it as TODO. Good to have logic of resolving GFID only access
+       to a path for many other features too. But initial version can just
+       be knowning that we are hitting the scenario in certain usecases */
+    if ((op_ret == 0) && (dict_get_sizen(xdata, "get-full-path"))) {


Couldn't we reuse one of the already existing implementations ? I've found many ways to get something similar, like:

trusted.glusterfs.pathinfo

trusted.pathinfo

glusterfs.gfid2path

glusterfs.gfidtopath

glusterfs.ancestry.path

Yes. That was the plan. I have implemented this to check how many this requests I receive in posix. In last ~8 months in pushing it out with kadalu releases, I have seen 2 instances of this log, meaning, we need to fix it. But the frequency is not high. We need to take a call about should we allow gfid based lookup at all and then proceed IMO. For this PR, shall I keep it as is, as this is not an implementation which is complete. Or will keep just a TODO for now.

That's fine for now.

amarts · 2021-11-13T06:56:48Z

The management in the client side seems a bit weak to me. We may have multiple clients working at the same time, and the namespace can be changed by anyone of them at any time. This change is not consistently identified by the other clients, probably causing inconsistencies in the data related to namespaces.

I agree with client side issues. I did it as a part of 'completion'. But this feature and need is very much on server side. If you agree, I can get the feature only on the server side, thus avoiding issues on client side. Shall I break this PR into 2 - one which handles it on server side and inode changes, another which takes care of client side initializations (which can be merged later depending on need).

I will address other comments separately. About the gfid based lookup, yes, there is an issue so far. If we actually decide to keep the hierarchy even if a gfid based lookup is done (can achieve it by recursive lookups on the path (get it from gfid2path xattr). Can do that as a separate PR.

xhernandez · 2021-11-23T09:17:07Z

The management in the client side seems a bit weak to me. We may have multiple clients working at the same time, and the namespace can be changed by anyone of them at any time. This change is not consistently identified by the other clients, probably causing inconsistencies in the data related to namespaces.

I agree with client side issues. I did it as a part of 'completion'. But this feature and need is very much on server side. If you agree, I can get the feature only on the server side, thus avoiding issues on client side. Shall I break this PR into 2 - one which handles it on server side and inode changes, another which takes care of client side initializations (which can be merged later depending on need).

As long as we don't create any new feature that depends on client side namespace, it's fine to keep it as it's now. I only wanted to note that we may have several issues with it in the client side. If we are aware of them and agree to fix them before using namespaces in the client side, that's perfectly fine with me.

I will address other comments separately. About the gfid based lookup, yes, there is an issue so far. If we actually decide to keep the hierarchy even if a gfid based lookup is done (can achieve it by recursive lookups on the path (get it from gfid2path xattr). Can do that as a separate PR.

I think it would be great to remove them because they cause a lot of troubles in thing like this. The gfid-lookups in fuse can be easily replaced by regular lookups. However I'm not 100% sure if we need them for anything else. @pranithk are you aware of other uses of gfid-lookups that cannot be replaced by regular lookups ?

Change-Id: I453e15ec6e04fc88386ec6a479f0a1e12ea48d12 Signed-off-by: Amar Tumballi <amar@kadalu.io>

amarts · 2021-11-26T06:09:47Z

/run regression

xhernandez

The patch looks good. However I have two points that I would like to consider:

In the brick side the namespace is only set for lookups and setxattrs. Shouldn't we do the same for create, mkdir, mknod and symlink ?
When sharding is used, all shards may belong to a different namespace than the base shard. Shouldn't we handle this case ?

amarts · 2021-11-26T10:58:50Z

The patch looks good. However I have two points that I would like to consider:

In the brick side the namespace is only set for lookups and setxattrs. Shouldn't we do the same for create, mkdir, mknod and symlink ?

When sharding is used, all shards may belong to a different namespace than the base shard. Shouldn't we handle this case ?

The inode_link() is called during that time, so namespace for those inodes are set based on parent itself (would apply for all shard files too).

xhernandez · 2021-11-26T11:52:23Z

The patch looks good. However I have two points that I would like to consider:

In the brick side the namespace is only set for lookups and setxattrs. Shouldn't we do the same for create, mkdir, mknod and symlink ?

When sharding is used, all shards may belong to a different namespace than the base shard. Shouldn't we handle this case ?

The inode_link() is called during that time, so namespace for those inodes are set based on parent itself (would apply for all shard files too).

You are right for the create/mkdir/mknod/symlink part. I missed it... :-/

However that's not valid for shard files because the parent directory of a shard file is different than the parent directory of the base shard, so these files may belong to different namespaces (basically .shard and all its contents will belong to the root namespace, while the base shard can belong to any other namespace). I think all shards should belong to the same namespace, but I don't see how we can do that without adding "shard intelligence" into the brick side, which I don't like very much either...

Do you have any solution in mind or shard won't be supported with namespaces for the time being ?

amarts · 2021-11-29T06:49:09Z

Do you have any solution in mind or shard won't be supported with namespaces for the time being ?

Here are my thoughts on this:

We will not support 'shard' with namespace (for now, ie, in devel branch)
Work on pending enhancements, namely:

support for complete inode tree (dentry) resolution of gfid on brick process. (server-resolve.c)
support for out of the tree namespaces (ie, any file can have namespace depending on what the 'create' fop requestor asks). - this should resolve shard issue.

Have some suggestions from @xhernandez on how we can go about it, and I also will think more on it and propose the design for next step in an issue and take it forward.

This PR can go in so that many features which needs in-tree namespace features can utilize them immediately.

xhernandez · 2021-11-29T09:05:09Z

@pranithk are you also ok with this PR ? if so, feel free to merge it.

pranithk · 2021-12-05T14:27:07Z

xlators/mount/fuse/src/fuse-bridge.c

@@ -4079,6 +4126,21 @@ fuse_setxattr_resume(fuse_state_t *state)
    state->fd = fd_lookup(state->loc.inode, state->finh->pid);
 #endif /* GF_TEST_FFOP */

+    if (dict_get_sizen(state->xattr, GF_NAMESPACE_KEY)) {


Should this check be in removexattr as well? Shouldn't this check be present in gfapi part too?

Haven't handled 'removexattr()' as removing namespace in a running process is tricky... ie, we have to recursively reset namespace pointer in whole inode tree under that namespace.

my idea was, even if there is gfapi client, the namespace setting always happens through a fuse client (that too a special one, ie, pid < 0). Hence haven't handled setxattr in gfapi.

pranithk · 2021-12-05T14:38:13Z

xlators/protocol/server/src/server-rpc-fops_v2.c

+                   state->loc.path);
+            state->resolve.op_ret = -1;
+            state->resolve.op_errno = ENOMEM;
+            goto err;


xdata will leak here I think, add if (xdata) dict_unref() in err

pranithk · 2021-12-05T14:40:50Z

xlators/protocol/server/src/server-rpc-fops_v2.c

+        dict_get_sizen(state->dict, GF_NAMESPACE_KEY)) {
+        gf_smsg("server", GF_LOG_ERROR, 0, PS_MSG_SETXATTR_INFO, "path=%s",
+                state->loc.path, "key=%s", GF_NAMESPACE_KEY, NULL);
+        ret = -1;


remove 'ret = -1` line, it is confusing, I had to check what SERVER_REQ_SET_ERROR() is doing to ret again.

pranithk · 2021-12-05T14:42:42Z

xlators/protocol/server/src/server-rpc-fops_v2.c

+        ret = -1;
+        SERVER_REQ_SET_ERROR(req, ret);
+        goto out;
+    }


What should happen in removexattr?

as explained in another comment, haven't done removexattr() as it involves doing things beyond just one inode scope.

In that case, until we handle it, do you want to give some errno like ENOTSUP or something?

Done. Please check if thats fine. added a test case for that. Will run the tests anyways.

pranithk · 2021-12-05T14:47:06Z

libglusterfs/src/inode.c

@@ -362,6 +363,7 @@ __inode_ctx_free(inode_t *inode)
 static void
 __inode_destroy(inode_t *inode)


I think we should s/__inode_destroy/inode_destroy/g I thought there was a dead lock.

there is no inode_destroy(). Hence will keep this part to later.

Change-Id: Iba16bf356909b6275af0956949c3d3d302d45081

So that we treat it as an error in removexattr Change-Id: I6579e268124bec01e89b2fe7cb31134c786bf88f Signed-off-by: Amar Tumballi <amar@kadalu.io>

amarts · 2021-12-06T07:43:08Z

/run regression

pranithk · 2021-12-07T04:39:50Z

xlators/protocol/server/src/server-rpc-fops_v2.c

@@ -2740,6 +2769,15 @@ server4_removexattr_resume(call_frame_t *frame, xlator_t *bound_xl)
    if (state->resolve.op_ret != 0)
        goto err;

+    if (dict_get_sizen(state->xdata, GF_NAMESPACE_KEY) ||
+        !strncmp(GF_NAMESPACE_KEY, state->name, sizeof(GF_NAMESPACE_KEY))) {


This should be strcmp. strncmp will only compare the prefix.

pranithk · 2021-12-07T04:39:59Z

xlators/protocol/server/src/server-rpc-fops_v2.c

@@ -2760,6 +2798,15 @@ server4_fremovexattr_resume(call_frame_t *frame, xlator_t *bound_xl)
    if (state->resolve.op_ret != 0)
        goto err;

+    if (dict_get_sizen(state->xdata, GF_NAMESPACE_KEY) ||
+        !strncmp(GF_NAMESPACE_KEY, state->name, sizeof(GF_NAMESPACE_KEY))) {


This should be strcmp. strncmp will only compare the prefix.

Oh sorry you used sizeof, this should be fine.

xhernandez · 2021-12-10T09:19:18Z

/recheck smoke

The brick process is getting crashed due to stack overflow while unref namespace inode, the ns inode was introduced by the patch ((gluster#1763) Solution: __inode_destroy is calling inode_unref that is again calling inode_unref become a recursive call and eventually a brick process is getting crashed. To avoid a crash for namespace inode call only __inode_ref. Fixes: gluster#4295 Change-Id: If5deb06b726a5e7dfedd2784bddcef81e6e5d7d9 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>

The brick process is getting crashed due to stack overflow while unref namespace inode, the ns inode was introduced by the patch ((#1763) Solution: __inode_destroy is calling inode_unref that is again calling inode_unref become a recursive call and eventually a brick process is getting crashed. To avoid a crash for namespace inode call only __inode_ref. Fixes: #4295 Change-Id: If5deb06b726a5e7dfedd2784bddcef81e6e5d7d9 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>

The brick process is getting crashed due to stack overflow while unref namespace inode, the ns inode was introduced by the patch ((gluster#1763) Solution: __inode_destroy is calling inode_unref that is again calling inode_unref become a recursive call and eventually a brick process is getting crashed. To avoid a crash for namespace inode call only __inode_ref. > Fixes: gluster#4295 > Change-Id: If5deb06b726a5e7dfedd2784bddcef81e6e5d7d9 > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > (Cherry picked from commit 80ecbba) > (Reviewed on upstream link gluster#4302) Fixes: gluster#4295 Change-Id: If5deb06b726a5e7dfedd2784bddcef81e6e5d7d9 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>

amarts added FA: Technical Debt CB: libglusterfs labels Nov 6, 2020

amarts requested review from pranithk and xhernandez November 6, 2020 11:08

amarts mentioned this pull request Nov 6, 2020

Simple quota: based on namespace #1750

Merged

mykaul reviewed Nov 9, 2020

View reviewed changes

libglusterfs/src/glusterfs/inode.h Outdated Show resolved Hide resolved

mykaul reviewed Nov 9, 2020

View reviewed changes

xlators/protocol/server/src/server-common.c Outdated Show resolved Hide resolved

amarts force-pushed the i1757_namespace branch from 27a7e0d to 5f3bba1 Compare November 10, 2020 06:54

amarts added this to the Release 9 milestone Nov 14, 2020

amarts force-pushed the i1757_namespace branch 3 times, most recently from 4226c40 to fb5f1b2 Compare November 19, 2020 09:16

amarts requested a review from mohit84 November 23, 2020 03:55

amarts requested a review from csabahenk December 4, 2020 04:46

amarts force-pushed the i1757_namespace branch from fb5f1b2 to 4b95be0 Compare February 22, 2021 18:15

amarts force-pushed the i1757_namespace branch 2 times, most recently from 5d7af58 to c1e5d88 Compare August 16, 2021 10:54

msaju modified the milestones: Release 9, Gluster 11 Nov 9, 2021

xhernandez reviewed Nov 12, 2021

View reviewed changes

address review comments

80ef623

Change-Id: I453e15ec6e04fc88386ec6a479f0a1e12ea48d12 Signed-off-by: Amar Tumballi <amar@kadalu.io>

xhernandez reviewed Nov 26, 2021

View reviewed changes

xhernandez previously approved these changes Nov 29, 2021

View reviewed changes

pranithk requested changes Dec 5, 2021

View reviewed changes

addressed review comments from pranith

e3343bc

Change-Id: Iba16bf356909b6275af0956949c3d3d302d45081

amarts dismissed xhernandez’s stale review via e3343bc December 6, 2021 01:49

check for namespace key in removexattr

366fb10

So that we treat it as an error in removexattr Change-Id: I6579e268124bec01e89b2fe7cb31134c786bf88f Signed-off-by: Amar Tumballi <amar@kadalu.io>

pranithk requested changes Dec 7, 2021

View reviewed changes

pranithk approved these changes Dec 7, 2021

View reviewed changes

xhernandez approved these changes Dec 10, 2021

View reviewed changes

xhernandez merged commit 063720d into gluster:devel Dec 10, 2021

pranithk mentioned this pull request Dec 11, 2021

Alternatives for quotas in RHGS3.5.3 #3030

Closed

Deltik mentioned this pull request Jan 10, 2024

Infinite recursion segmentation fault involving inode_unref() and xlators/features/bit-rot/src/stub/bit-rot-stub.c #4295

Closed

mohit84 mentioned this pull request Jan 30, 2024

core: brick process is getting SIGSEGV during inode_unref #4302

Merged

mohit84 mentioned this pull request Feb 1, 2024

core: brick process is getting SIGSEGV during inode_unref #4303

Open

		@@ -362,6 +363,7 @@ __inode_ctx_free(inode_t *inode)
		static void
		__inode_destroy(inode_t *inode)

inode: implement namespace at inode level #1763

inode: implement namespace at inode level #1763

Conversation

amarts commented Nov 6, 2020 • edited

amarts commented Nov 8, 2020

amarts commented Nov 9, 2020

amarts commented Nov 9, 2020

amarts commented Nov 9, 2020

amarts commented Nov 10, 2020

pranithk commented Nov 16, 2020

amarts commented Nov 16, 2020

pranithk commented Nov 16, 2020

amarts commented Nov 16, 2020

amarts commented Nov 19, 2020

amarts commented Nov 20, 2020

amarts commented Feb 23, 2021

amarts commented Feb 23, 2021

xhernandez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amarts commented Nov 13, 2021

xhernandez commented Nov 23, 2021

amarts commented Nov 26, 2021

xhernandez left a comment

Choose a reason for hiding this comment

amarts commented Nov 26, 2021

xhernandez commented Nov 26, 2021

amarts commented Nov 29, 2021

xhernandez commented Nov 29, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amarts commented Dec 6, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xhernandez commented Dec 10, 2021

amarts commented Nov 6, 2020 •

edited