[storage]: Front-end Attach / Detach Namespace Proto-Bufs #177

mkalderon · 2022-10-31T09:30:26Z

Need to add support for attaching / detaching controllers from a namespace.
We can take this. Just raising as issue in-case someone has prior thoughts on this ?
@glimchb

glimchb · 2022-10-31T11:39:49Z

isn't NVMeNamespaceCreate same as Attach ? we can rename it if that helps...

glimchb · 2022-10-31T11:42:10Z

I opened #178 as well for enable/disable NS...

mkalderon · 2022-10-31T15:16:16Z

Not quite. When a namespace is created it isn't attached to a controller. We're providing api to create subsystem/namespace/controllers etc.. so it's not like backend that you get this information and attachement information as well.
We need a way to attach between namespace and controller. Same namespace can also be attached to several controllers.

Having said that, looking at your additional issue with enable/disable - maybe it's somehow related and provides another way to make this attachment. in enable/disable are you referring to enable a namespace for a specific controller ?

glimchb · 2022-10-31T15:32:31Z

in enable/disable are you referring to enable a namespace for a specific controller ?

yes

benlwalker · 2022-10-31T15:42:01Z

A controller is a "view" or a "session" between a host and a subsystem. A subsystem contains the namespaces. I don't think it makes any sense in the NVMe model to attach a namespace to a controller. Can you outline the use case here? Is this for private namespaces? Or is this some dual port device case where the PCI endpoints see different namespaces?

Also, it does make sense for NVMe over fabrics to set up visibility rules based on the hostnqn via a call we don't have in the API yet. But I don't think that's what this issue is talking about.

glimchb · 2022-10-31T15:46:32Z

if we have 1 subsystem with 1 namespace and 10 controllers each on a separate PF/VF that we want to pass-through to 10 VMs...
how would you describe the operation of exposing/attaching Namespace to Controller (PF/VF) ?

benlwalker · 2022-10-31T16:35:51Z

The best approach is to create 1 subsystem containing 1 namespace for each PF/VF. You do not want the VMs to share state of any kind, and if they share a subsystem there is shared state in there, even if you filter out shared namespaces from view.

mkalderon · 2022-10-31T17:19:51Z

There may be cases where you would want the host "view" to match a backend, with shared namespaces etc...
I think the proto-bufs should provide this functionality, best approach limits the administrator.

benlwalker · 2022-10-31T17:26:00Z

Unfortunately, that's in direct opposition to the defined OPI view of separate "front", "middle", and "back" ends. The front-end is entirely virtualized and emulated, and back/middle end volumes are added as namespaces. The front-end and the host never see anything about back-end subsystems or controllers, and in fact the back-end may not even be NVMe. Things like NVMe admin commands that come into the front-end are not simply passed through to some "middle" or "back" end object because there isn't middle/back-end object that corresponds to a controller/subsystem - only volumes.

In order to think through solutions, I think we'll need a lot more detail on the motivating use case for the host "view" to match some backend. Are you able to provide that?

glimchb · 2022-10-31T17:52:24Z

The best approach is to create 1 subsystem containing 1 namespace for each PF/VF. You do not want the VMs to share state of any kind, and if they share a subsystem there is shared state in there, even if you filter out shared namespaces from view.

I'm not arguing the best approach. I think API should allow and not limit if user wants to share subsystem. Even if we don't see today the use case, it is reasonable to allow this

sburla-marvell · 2022-10-31T20:04:02Z

A subsystem can have namespaces that are not attached to any controller and similarly a subsystem can have a namespace that is attached to multiple controllers. This is all still front end view(nothing do with backend). This means that there should be an ability to attach/detach a namespace to/from a controller. There is also an ability to attach or detach namespaces from nvme controllers using async event at runtime (namespace attribute changed) . This event causes host to reenumerate the namespaces. This is a feature most of our customers use.

benlwalker · 2022-10-31T21:01:01Z

I think there may be some mixing of very different concepts going on here. These API definitions are not doing the equivalent thing of the NVMe Namespace Attach/Detach commands. Those NVMe commands are looking at a list of namespaces already attached to a subsystem, and then adding a reference, via an NSID, to a controller's view. That's not OPI's job - OPI is the thing that put the namespaces into the subsystem to beginwith.

So again, can you clarify what the actual use case is for this feature? An example flow and justification of some kind, even very high level? I'm certainly not advocating that we don't support all sorts of dynamic hot insert/removal of namespaces with all of the appropriate async events, nor am I advocating that we preclude support for any specific NVMe namespace management commands. OPI is definitely going to have to support all of that stuff. I'm really focused just on the need to set up mappings between controllers and namespaces during provisioning, based on the controller identity itself (as opposed to based on a hostnqn), and only in the front-end for NVMe devices. It's a very narrow thing I'm advocating against currently.

The reason I'd like to avoid adding support for this is relatively straightforward:

Anyone sharing subsystems between tenants has to be very careful not to expose shared state. You have to disable NSSR, likely disable namespace management (because the NSIDs are global and you can run into conflicts), and possibly other things. Especially dangerous are new things added to the spec later on.
There's an easy alternative design (make more subsystems) that's safer and better.
We should try to avoid adding APIs just because the spec supports it or we'll end up with 100+ calls and too much complexity. We should add calls whenever there's a real use case for an xPU that someone presents. I don't even propose we be very strict or scrutinize the use cases all that closely - just some reasonable justification is probably a good enough bar for me.

sburla-marvell · 2022-10-31T21:59:35Z

This is not about the optional Namespace management support from the host side. I am not actually we should let the host do namespace management in this case anyway, as we want DPU to be in control. This is about how the DPU provisioning new namespaces to a controller at runtime. Unfortunately, the only way the spec supports this is through the host side (using namespace management commands). But the same mechanism lets DPU side also do this by issuing an async event. The high level use case for this feature is provide additional storage to a tenant without the need to burn another VF/PF resource.

benlwalker · 2022-11-01T20:06:50Z

Can't what you're describing be accomplished by adding a namespace to a subsystem instead of adding it to the controller? I believe it can if the default behavior of a controller is to automatically see all namespaces in the subsystem.

This is really the core of this discussion. What you outlined above can be done with the calls that already exist. What's the new/extra functionality needed that warrants an extra step to attach/detach from specific controllers?

sburla-marvell · 2022-11-01T22:43:04Z

A subsystem can have multiple controllers and multiple namespaces. Not every controller in a subsystem needs to have every namespace in the subsystem active in its CNS:02 list. Even if you added a namespace to a subsystem, the host is not able to see it right away. It would not be able to enumerate the new namespace until it issues the identify (CNS:02) command again (which is generally a reboot/nvme driver reload). The above API lets host see a new namespace that is added to the subsystem to be active for a controller at runtime without having reload the driver.

benlwalker · 2022-11-01T23:00:14Z

A subsystem can have multiple controllers and multiple namespaces. Not every controller in a subsystem needs to have every namespace in the subsystem active in its CNS:02 list.

My line of questioning isn't whether you can or cannot do this. My line of questioning has been focused on why you want to do this. I've provided some reasons above that you probably don't, and instead should have made two subsystems. What's the use case where this is the superior solution to using two subsystems?

Even if you added a namespace to a subsystem, the host is not able to see it right away. It would not be able to enumerate the new namespace until it issues the identify (CNS:02) command again (which is generally a reboot/nvme driver reload). The above API lets host see a new namespace that is added to the subsystem to be active for a controller at runtime without having reload the driver.

This may be the core misconception. I'm advocating that all namespaces in a subsystem (for the front-end) are surfaced through all associated controllers immediately and automatically when they're added or removed from the subsystem. So adding a namespace does in fact generate the asynchronous event notifications, which causes the driver to issue the identify namespace commands and see the new namespace without being reloaded. Is there some problem with this behavior at least being the default?

sburla-marvell · 2022-11-01T23:49:30Z

Ok. In that case

I'm advocating that all namespaces in a subsystem (for the front-end) are surfaced through all associated controllers immediately and automatically when they're added or removed from the subsystem. So adding a namespace does in fact generate the asynchronous event notifications, which causes the driver to issue the identify namespace commands and see the new namespace without being reloaded. Is there some problem with this behavior at least being the default?

Ok. In that case, the only comment I have is that it is more restrictive than what the spec allows for.

glimchb · 2022-11-02T00:19:40Z

I agree we can't / shouldn't implement the entire spec. If there is a real need, we can add new RPCs and new Fields in existing RPCs. But I wouldn't want to go into direction where we just add more things without clear value... it will be hard to support long term.

sburla-marvell · 2022-11-02T02:39:22Z

Also I wonder, how native multipathing would be supported for the front end, if every host controller ends up with its own subsystem as is being advocated here. if the same namespace is exposed to the host (considered the same due to reporting the same unique identifier in "Identify Namespace") through 2 different controllers which are not in the same subsystem, the host nvme driver will end up creating 2 block devices which beats the purpose

benlwalker · 2022-11-02T16:15:10Z

I'm not advocating that every controller has it's own subsystem. I'm advocating that every controller associated with a given subsystem has the same view of the namespaces in that subsystem. This is a very minor narrowing of the flexibility which yields a large reduction in overall complexity. For example, in my proposed system, the common steps to configure a front-end device would be:

Create N back-end/middle-end volumes
Create a subsystem
Add the volumes from 1 to the subsystem in 2.
Create N controllers, one for each PF/VF that will see the subsystem created in step 2. Usually, there is just one controller.

But if we add the ability for each controller to see a different set of namespaces, the steps would become:

Create a back-end/middle-end volume
Create a subsystem
Create N controllers, one for each PF/VF that will see the subsystem created in step 2.
For each controller, add the subset of volumes from 1 that it should see to the controller. This subset can be different for every controller. Allocate the NSIDs such that they don't conflict with any other controller.

If N is anything but 1, is it clear how the second option suddenly becomes much more complex than the first? And I've provided a few reasons that the complexity isn't actually adding any value - there's other ways to accomplish almost every use case that maintain the simpler provisioning flow.

glimchb · 2023-09-12T21:32:07Z

link to DOCA for reference nvme_controller_attach_ns and nvme_controller_detach_ns
https://docs.nvidia.com/networking/display/bluefield3snaplatest/SNAP+RPC+Commands#SNAPRPCCommands-nvme_controller_attach_nsnvme_controller_attach_ns

benlwalker · 2023-09-12T21:42:48Z

Almost a year later, my opinion remains that OPI should not support "private" namespaces. In other words, all controllers in a subsystem shall see the same set of namespaces. If a namespace is added to a subsystem, all controllers in the subsystem are notified. The normal/common case is one controller per subsystem (although we'd still allow multiple controllers for "dual-port" type set ups).

This also means that the normal configuration for these DPU products is to create one subsystem per VF (as opposed to one giant subsystem for the whole device). This has other really important ramifications - it makes it much simpler to live migrate a controller between two subsystems if there is only one controller in it, for example. It also makes it clear what other components in the system are impacted when you request a subsystem reset (just your VF). I really do believe it's the only reasonable design choice to make, so OPI is safe to be opinionated here.

sburla-marvell · 2023-09-13T17:56:08Z

@mkalderon is on leave. As I stated earlier, I am ok with the restriction as long as there is a way for a subsystem to have multiple controllers and multipathing.Marvell SDK also supports the attach/detach semantics.

glimchb added the Storage APIs or code related to storage area label Oct 31, 2022

glimchb assigned mkalderon Nov 4, 2022

This was referenced Nov 10, 2022

missing mrvl_nvm_ctrlr_attach_ns and mrvl_nvm_ctrlr_detach_ns opiproject/opi-marvell-bridge#4

Closed

[storage] Remove NVMe parameters from front-end that don't belong #206

Merged

jainvipin mentioned this issue Nov 16, 2022

[storage]: Front-end Enable / Disable Namespace Proto-Bufs #178

Open

This was referenced Sep 7, 2023

promote APIs from alpha to beta #354

Closed

Requirements for moving APIs from Alpha to Beta #355

Open

glimchb linked a pull request Sep 12, 2023 that will close this issue

New rpc service to attach/detach a controller to a namespace. #237

Open

glimchb assigned benlwalker and unassigned mkalderon Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[storage]: Front-end Attach / Detach Namespace Proto-Bufs #177

[storage]: Front-end Attach / Detach Namespace Proto-Bufs #177

mkalderon commented Oct 31, 2022

glimchb commented Oct 31, 2022

glimchb commented Oct 31, 2022

mkalderon commented Oct 31, 2022

glimchb commented Oct 31, 2022 •

edited

benlwalker commented Oct 31, 2022 •

edited

glimchb commented Oct 31, 2022 •

edited

benlwalker commented Oct 31, 2022

mkalderon commented Oct 31, 2022

benlwalker commented Oct 31, 2022

glimchb commented Oct 31, 2022

sburla-marvell commented Oct 31, 2022

benlwalker commented Oct 31, 2022

sburla-marvell commented Oct 31, 2022

benlwalker commented Nov 1, 2022

sburla-marvell commented Nov 1, 2022

benlwalker commented Nov 1, 2022

sburla-marvell commented Nov 1, 2022

glimchb commented Nov 2, 2022

sburla-marvell commented Nov 2, 2022

benlwalker commented Nov 2, 2022 •

edited

glimchb commented Sep 12, 2023

benlwalker commented Sep 12, 2023

sburla-marvell commented Sep 13, 2023

[storage]: Front-end Attach / Detach Namespace Proto-Bufs #177

[storage]: Front-end Attach / Detach Namespace Proto-Bufs #177

Comments

mkalderon commented Oct 31, 2022

glimchb commented Oct 31, 2022

glimchb commented Oct 31, 2022

mkalderon commented Oct 31, 2022

glimchb commented Oct 31, 2022 • edited

benlwalker commented Oct 31, 2022 • edited

glimchb commented Oct 31, 2022 • edited

benlwalker commented Oct 31, 2022

mkalderon commented Oct 31, 2022

benlwalker commented Oct 31, 2022

glimchb commented Oct 31, 2022

sburla-marvell commented Oct 31, 2022

benlwalker commented Oct 31, 2022

sburla-marvell commented Oct 31, 2022

benlwalker commented Nov 1, 2022

sburla-marvell commented Nov 1, 2022

benlwalker commented Nov 1, 2022

sburla-marvell commented Nov 1, 2022

glimchb commented Nov 2, 2022

sburla-marvell commented Nov 2, 2022

benlwalker commented Nov 2, 2022 • edited

glimchb commented Sep 12, 2023

benlwalker commented Sep 12, 2023

sburla-marvell commented Sep 13, 2023

glimchb commented Oct 31, 2022 •

edited

benlwalker commented Oct 31, 2022 •

edited

glimchb commented Oct 31, 2022 •

edited

benlwalker commented Nov 2, 2022 •

edited