Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing for nvme-subsystem devices #158

Merged
merged 1 commit into from Oct 8, 2020
Merged

Conversation

dannf
Copy link
Contributor

@dannf dannf commented Aug 17, 2020

nvme-subsystem devices have a link that looks like:
../../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1

However, the current code expects an additional /nvme0/ component, i.e.:
../../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0/nvme0n1

Fix this by adding a separate branch for parsing nvme-subsystem devices.

Resolves github issue #157.

Signed-off-by: dann frazier dann.frazier@canonical.com

Copy link

@xnox xnox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please ellaborate which kernels you are checking?

On my system, I do see the extra /nvme0/ dir in both under sys/devices/virual and under /sys/devices/pci*

find /sys/devices/virtual/nvme-subsystem/nvme-subsys0/nvme0/ /sys/devices/pci0000\:00/0000\:00\:1d.0/0000\:04\:00.0/nvme/nvme0/ -name nvme0n1
/sys/devices/virtual/nvme-subsystem/nvme-subsys0/nvme0/nvme0n1
/sys/devices/pci0000:00/0000:00:1d.0/0000:04:00.0/nvme/nvme0/nvme0n1

With v5.4 kernel. Note that i think directories changed slightly after kernel gained nvme multipath support.

Which platform / kernel are you on that doesn't have /nvme0/ subdir ?

(need more information)

@dannf
Copy link
Contributor Author

dannf commented Aug 27, 2020

Thanks for the review @xnox . I'm also on (Ubuntu) 5.4 - the platform is an Nvidia A100. I'm aware of some sysfs differences related to enabling NVMe multipath - for example seeing things like: ./pci0000:d7/0000:d7:00.0/0000:d8:00.0/0000:d9:01.0/0000:db:00.0/nvme/nvme9/nvme8n1 - but I'm not familiar with that impacting behavior as we see here.

fyi, my assumption that the nvme-subsys0/nvme0/nvme0n1 case wasn't actually a thing was based on the existing comments in the modified file:

/*
 * support for NVMe devices
 *
 * /sys/dev/block/$major:$minor looks like:
 * 259:0 -> ../../devices/pci0000:00/0000:00:1d.0/0000:05:00.0/nvme/nvme0/nvme0n1
 * 259:1 -> ../../devices/pci0000:00/0000:00:1d.0/0000:05:00.0/nvme/nvme0/nvme0n1/nvme0n1p1
 * or:
 * 259:0 ->../../devices/virtual/nvme-fabrics/ctl/nvme0/nvme0n1
 * 259:1 ->../../devices/virtual/nvme-fabrics/ctl/nvme0/nvme0n1/nvme0n1p1
 * or:
 * 259:5 -> ../../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1
 * 259:6 -> ../../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1/nvme0n1p1
 *

which suggests that nvme-subsys0/nvme0n1 is a thing and nvme-subsys0/nvme0/nvme0n1 is not. I also see similar paths as you do on my laptop. However, on that system, I see no links under /sys/dev/block/ that point to those nvme-subsys paths. Do you see any on your system? Those are the paths that efivar is parsing here and, on the A100, those point to the nvme-subsys0/nvme0n1 style paths. Of course, I could always add support for both if we find that it is possible to have links under /sys/dev/block/ to nvme-subsys0/nvme0/nvme0n1 paths.

@dannf
Copy link
Contributor Author

dannf commented Sep 3, 2020

Turns out that multipath does have an impact as @xnox suggested. With multipath enabled I have the nvme-subsys links:

$ readlink /sys/dev/block/* | grep nvme
../../devices/pci0000:20/0000:20:03.3/0000:23:00.0/nvme/nvme2/nvme2n1
../../devices/pci0000:20/0000:20:03.2/0000:22:00.0/nvme/nvme1/nvme1n1
../../devices/virtual/nvme-subsystem/nvme-subsys3/nvme3n1
../../devices/virtual/nvme-subsystem/nvme-subsys4/nvme4n1
../../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1
../../devices/pci0000:20/0000:20:03.2/0000:22:00.0/nvme/nvme1/nvme1n1/nvme1n1p1
../../devices/pci0000:20/0000:20:03.2/0000:22:00.0/nvme/nvme1/nvme1n1/nvme1n1p2
../../devices/virtual/nvme-subsystem/nvme-subsys5/nvme5n1

But not when it is disabled:

$ readlink /sys/dev/block/* | grep nvme
../../devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.0/0000:04:14.0/0000:09:00.0/nvme/nvme0/nvme0n1
../../devices/pci0000:20/0000:20:03.2/0000:22:00.0/nvme/nvme1/nvme1n1
../../devices/pci0000:20/0000:20:03.2/0000:22:00.0/nvme/nvme1/nvme1n1/nvme1n1p1
../../devices/pci0000:20/0000:20:03.2/0000:22:00.0/nvme/nvme1/nvme1n1/nvme1n1p2
../../devices/pci0000:b0/0000:b0:01.1/0000:b1:00.0/0000:b2:08.0/0000:be:00.0/0000:bf:04.0/0000:c8:00.0/nvme/nvme5/nvme5n1
../../devices/pci0000:20/0000:20:03.3/0000:23:00.0/nvme/nvme2/nvme2n1
../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/0000:82:00.0/0000:83:00.0/0000:84:14.0/0000:8a:00.0/nvme/nvme4/nvme4n1
../../devices/pci0000:40/0000:40:01.1/0000:41:00.0/0000:42:08.0/0000:50:00.0/0000:51:00.0/0000:52:00.0/nvme/nvme3/nvme3n1

As this patch modifies only nvme-subsys behavior, it should support both cases.

@dannf
Copy link
Contributor Author

dannf commented Sep 8, 2020

Force pushed a fix for a commit message typo

@dannf
Copy link
Contributor Author

dannf commented Sep 21, 2020

I confirmed with one of the NVMe driver maintainers that my assertion in the commit that the extra /nvme0/ component in nvme-subsys devices is not a thing:

On Mon, Sep 21, 2020 at 8:40 AM Christoph Hellwig <hch@lst.de> wrote:
>
> Hi Dann,
>
> On Thu, Sep 17, 2020 at 06:47:15PM -0600, dann frazier wrote:
> > Hi Keith, Jens, Christoph & Sagi,
> >
> > As the nvme maintainers, I wonder if one of you could confirm my
> > assertion in the following PR for libefivar:
> >   https://github.com/rhboot/efivar/pull/158
> >
> > Specifically, that we should never see symlinks under /sys/dev/block
> > that target paths like this - whether or not multipath is enabled -
> > as it currently expects:
> > ../../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0/nvme0n1
> >
> > I've yet to see a situation where the "/nvme0/" component exists and
> > would like to confirm that dropping that is not a regression.
>
> I have no idea where that component would come from.  You'd either see
>
> ../../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0/nvme0n1
>
> with a shared namespace
>
> ../devices/virtual/nvme-fabrics/ctl/nvme0/nvme0n1
>
> with a non-shared fabrics namespace, or
>
> ../devices/pci0000:00/0000:00:03.0/nvme/nvme0/nvme0n1
>
> for a non-shared PCIe namespace

@superm1
Copy link

superm1 commented Sep 21, 2020

@vathpela can you look this over please?

nvme-subsystem devices have a link that looks like:
  ../../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1

However, the current code expects an additional /nvme0/ component, i.e.:
  ../../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0/nvme0n1

Fix this by adding a separate branch for parsing nvme-subsystem devices.

Resolves github issue rhboot#157.

Signed-off-by: dann frazier <dann.frazier@canonical.com>
@dannf
Copy link
Contributor Author

dannf commented Oct 8, 2020

Apparently force pushing dropped the discussion thread from this PR (still in commit a38b00b though). To capture that here - I force pushed to add a debug statement requested by @vathpela . The requested comment change appears to be unnecessary as the paths in the comment look correct - this just fixes the code to match.

@vathpela vathpela merged commit 4e12f99 into rhboot:master Oct 8, 2020
@dannf dannf deleted the gh157 branch October 8, 2020 14:33
@xnox
Copy link

xnox commented Jan 12, 2021

Hi, I am experiencing this issue on CentOS 8, when booted off Ubuntu's v5.8 kernel and trying to use efibootmgr against nvme drive with multipath in the centos 8 chroot which has efivar 37-4.el8.

I'm not sure how to report this properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants