Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve eGPU detection #48

Closed
hertg opened this issue Aug 3, 2020 · 12 comments · Fixed by #49 or #85
Closed

Improve eGPU detection #48

hertg opened this issue Aug 3, 2020 · 12 comments · Fixed by #49 or #85
Assignees
Labels
enhancement Enhance an already existing feature

Comments

@hertg
Copy link
Owner

hertg commented Aug 3, 2020

Currently, the eGPU detection is solely based on the PCI ID, and there are no further checks about the connected device.
This can cause false-positives in the eGPU detection method, if another device (ie. a normal Thunderbolt 3 Docking Station) is connected instead of the eGPU.

I'm using my notebook at home with an eGPU, and at my workplace with a TB3 Docking Station.
When booting up the system at work, it falsely detects it as the eGPU, since its USB Controller gets connected on the same BUS-ID as my eGPU at home. This causes the display-manager to throw an error and requires the following manual steps on another tty on every bootup.

sudo egpu-switcher switch internal
sudo systemctl restart display-manager
@hertg hertg added the enhancement Enhance an already existing feature label Aug 3, 2020
@hertg hertg self-assigned this Aug 3, 2020
toumorokoshi added a commit to toumorokoshi/egpu-switcher that referenced this issue Aug 6, 2020
Fixes hertg#48

Most GPUs register as "VGA compatible controller" in lspci. Adding this
string to the regex search eliminates false positives for other devices
that use the same PCI-E BusID, such as non-VGA devices connected via
non-eGPU docks.
@hertg hertg closed this as completed in #49 Aug 7, 2020
@xabolcs
Copy link
Contributor

xabolcs commented Aug 8, 2020

And how about saving the configured pid and vid, and (optionally) searching for those?

It would improve my user experience, when my eGPU box is connected sometimes directly sometimes though a TB3 dock.

@hertg
Copy link
Owner Author

hertg commented Aug 9, 2020

@xabolcs
What exactly do you mean by "saving the configured pid and vid"?

I've actually thought about the idea to register identifiers of the GPUs, rather than just the PCI-BUSID. This would also create the possibility to use multiple external GPUs with different configurations in the future. But this would probably require to integrate boltctl and therefore creating another dependency.

@xabolcs
Copy link
Contributor

xabolcs commented Aug 9, 2020

I've actually thought about the idea to register identifiers of the GPUs, rather than just the PCI-BUSID.

I meant exactly that! Saving PCI vendor and device codes (see -n or -nn), and using those too for detection.

$ sudo egpu-switcher config

Found 3 possible GPUs...

  1: NVIDIA Corporation GM108M [GeForce MX130] (rev a2) (2:0:0)   # <--- dGPU
  2: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1) (8:0:0)  # <--- eGPU, connected directly
  3: Intel Corporation Device 3e9b (0:2:0)                        # <--- iGPU

Would you like to define a specific INTERNAL GPU? (not recommended) [y/N]: 
Choose your preferred EXTERNAL GPU [1-3]: 

In the above example, the question is about eGPUs, not about their connection! 😉

In my use case egpu-switcher could save that I choose GTX 970 at bus 8:0:0. And it look for GTX970 (more precisely: 10de:13c2) at 8:0:0 at next boot, and if it doesn't finds there, it tries to find GTX970 somewhere else.

@hertg
Copy link
Owner Author

hertg commented Aug 10, 2020

I must have missed the -n option when I scrolled through the lspci manpages yesterday.
That sounds good, i will look into that when I've got the time for the next iteration.

But I might still look into the boltctl integration, not necessarily for the vendor/product ID detection but to take a closer look at the bootup process and solving some timing issues that sometimes still occur on my system.

Just out of curiosity: Did you specifically authorize your eGPU or did you turn off the Thunderbolt Security completely to make egpu-switcher work?

@hertg hertg reopened this Aug 10, 2020
@xabolcs
Copy link
Contributor

xabolcs commented Aug 10, 2020

I must have missed the -n option when I scrolled through the lspci manpages yesterday.

Because I edited my comment. 😅

Just out of curiosity: Did you specifically authorize your eGPU or did you turn off the Thunderbolt Security completely to make egpu-switcher work?

Sure, I authorized it ... but only once: I authorized on Windows 10, and Ubuntu 18.04 worked at first try.
I changed BIOS authorizing settings to "User" or like, and authorized it manually in the OS.

@hertg
Copy link
Owner Author

hertg commented Aug 10, 2020

And you didn't experience any timing issues? Did you enable the Pre-Boot ACL by any chance?
On my machine it sometimes takes quite long time for the eGPU to actually connect. So when the egpu-switcher switch auto runs at bootup, it's not always there.

But that's a different issue I need to look into. Would be great if I could get it to work reliably even on the Secure Thunderbolt Security Level. 😄

@xabolcs
Copy link
Contributor

xabolcs commented Aug 10, 2020

No timing issues here.

I don't know what Pre-Boot ACL means to you, but my TB3 BIOS settings are:

  • Thunderbolt 🙃
  • Enable Thunderbolt Boot Support
  • Enable Thunderbolt (and PCIe behind TBT) Pre-boot modules
  • Security:
    ( ) No Security
    (*) User Authorization
    ( ) Secure Connect
    ( ) Display Port and USB Only
Secure Connect:

The Thunderbolt(TM) adapter port will only allow connection to devices that have been configured with a shared key.

As I don't know what (and how to setup) the shared key is in this TB3 context, I'm using "User Authorization".

@hertg
Copy link
Owner Author

hertg commented Aug 10, 2020

Pre-Boot ACL stands for "Pre-Boot Access Control List", which essentially allows TB devices to connect before the OS is booted (useful if you want to use your docked keyboard when entering the passphrase for your device decryption and such). Therefore the eGPU will be connected much earlier in the boot-process than when Pre-Boot ACL is disabled.
I believe that Pre-Boot ACL is enabled on your system via the Enable Thunderbolt (and PCIe behind TBT) Pre-boot modules setting. If you'd disable that, you probably run into timing issues. What's the manufacturer of your device?

As for the Secure Connect, this is just an additional layer of security that stores a secret key on the TB device, this key will be checked before each connection. This should mitigate the risk that an attacker with physical access to your computer could connect a malicious TB3 device that spoofs the UUID of an already authorized device which is pretty easy to do since the UUID of the device can be read by anyone with physical access to it. If you have a newer version of boltctl, it probably already supports the Secure Connect feature, so the setup process is pretty much the same for authorizing a device with Secure Connect as it is for User AFAIK. But I think not every TB3 device necessarily supports the Secure Connect feature.

But I need to look deeper into that anyway, would be cool if egpu-switcher worked on devices that disable Pre-Boot ACL and set TB Security to Secure, because that's the configuration I'd like to use on my computer. 😛

@xabolcs
Copy link
Contributor

xabolcs commented Aug 10, 2020

I tried "Secure Connect" and it breaks my plug and play dual-boot environment, as the Windows 10 and Ubuntu 18.04 use different keys! 🙃

Rolling back to "User Authorization"

@hertg
Copy link
Owner Author

hertg commented Aug 10, 2020

There shouldn't be much of a difference between a dual-boot computer and using two entirely different computers, right? It should be possible to connect a device via Secure Connect in a dual-boot scenario. Otherwise you could only connect with exactly one computer to the TB3 device, which would probably be mentioned somewhere if that was the case for Secure Connect. 😅

Did you also disable the Pre-Boot ACL? Because setting TB Security to Secure and still having the Pre-Boot ACL enabled doesn't make a lot of sense, as far as I understand. Additionally, this is also what might be causing trouble in the dual-boot scenario, since both OS try to write to the bootACL and get out of sync (?). I believe that a dual-boot with two linux systems would probably work, since boltd keeps a journal of the changes it does to the ACL so it doesn't get out of sync.

You could have a look at what happens if you change to Secure and disable the Pre-Boot ACL. But that will most likely cause the timing issues in egpu-switcher that i mentioned. But might be good to test that out, just to see if that changes anything.
I will certainly test that out on my machine in more detail when I'll take the time to look further into the issue.

@xabolcs
Copy link
Contributor

xabolcs commented Aug 10, 2020

Will test it later.

I don't want to disable Pre-Boot ACL (Enable Thunderbolt (and PCIe behind TBT) Pre-boot modules for me) as I won't be able to select my awesome OSes through rEFInd as I use my DELL laptop closed and docked (display and keyboard through TB3 dock / eGPU box).

I don't know whats happening in Secure Connect mode but authorizing the device on Windows 10 caused authorization error on Ubuntu 18.04 ... and vica versa.

Anyway, as I need pre-boot stuff due to my use case, the Secure Connect mode will not be my preferred policy mode. 😀

@keighrim
Copy link

In my use case, the bus ID of the egpu is different (off by 1) when it's plugged in pre-boot (3a:00.0) from when it's plugged in post-boot (3b:00.0). Probably it's because my notebook automatically disables igpu when egpu is plugged in pre-boot. This causes the xorg.conf.egpu file generated during post-boot egpu connection to fail to load a proper gpu when used with pre-boot connection (and remember igpu is disabled by BIOS when it happens) and eventually gives me no display manager at all. I had to edit is_egpu_connected() function to grep a correct busID and sed that into the xorg.conf.egpu file on the fly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhance an already existing feature
Projects
None yet
3 participants