Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCX PML is pre-empting the usNIC BTL #8489

Closed
jsquyres opened this issue Feb 17, 2021 · 5 comments · Fixed by #8496
Closed

UCX PML is pre-empting the usNIC BTL #8489

jsquyres opened this issue Feb 17, 2021 · 5 comments · Fixed by #8496

Comments

@jsquyres
Copy link
Member

During the weekly Webex today, it became evident that there are differing views on when the UCX PML is to be used by default. NOTE: All of the discussion below is about the default case -- any component / transport can be selected via MCA params / CLI options / etc., of course.

  • My understanding of the UCX PML was that it was only to be used when IB or RoCE devices were discovered on the system.
  • @Akshay-Venkatesh was surprised by this, and thought that the UCX PML should be used whenever it could be used. This includes in TCP and/or shared memory systems.

After looking into this a little bit today, here's what I found:

  • The UCX PML has a priority of 51.
    • The UCX PML will report that it can run if any of its transports can be used.
    • The UCM PML does not exclude any of its transports, so it will report that it can be used even if there are TCP and/or shared memory endpoints available.
  • The CM PML sets a priority based on whether any MTL can run.
    • The OFI MTL, for example, sets a priority of 25.
    • The CM PML will report that it can run if any of its transports can run.
    • The OFI MTL specifically excludes the following libfabric transports: shm,sockets,tcp,udp,rstream
  • The OB1 PML sets a priority of 20.
    • The OB1 PML will always report that it can run, probably on the assumption that everywhere has TCP and/or shared memory.

(all of these priorities are MCA params and can be overridden, of course)

Effectively, it looks like this setup means that the UCX PML is used for all cases, because just about every system has TCP.

For example, on a usNIC-based system, if the UCX PML is available, then by default, the UCX PML is selected (because it sees TCP available), and therefore excludes OB1 and usNIC is not used.

That being said, for reasons I don't quite understand, on an EFA-based system, the UCX PML errors/fails to open the EFA device and therefore excludes itself from selection. CM+OFI MTL then take over and run, as expected.

I do not know what happens on machines with other networks not supported by UCX. However:

  1. This is unacceptable for usNIC.
  2. I am also greatly surprised to discover that UCX has, by default, effectively taken over TCP and shared memory handling in non-IB/non-RoCE networks.

Am I clueless to not have realized that this is happening? ☹️ I'm curious to know if others are aware of this UCX PML behavior.

@rhc54
Copy link
Contributor

rhc54 commented Feb 17, 2021

Say what?? Absolutely not - TCP was supposed to default to the TCP BTL, as it has done for many years. Users would certainly be surprised to find it wasn't.

@Akshay-Venkatesh
Copy link
Contributor

Cc @yosefe

@jsquyres
Copy link
Member Author

FYI @open-mpi/ucx

@rhc54
Copy link
Contributor

rhc54 commented Feb 17, 2021

To be clear, the only time UCX was supposed to be the default is when we are on Mellanox hardware. Otherwise, we were supposed to default to (a) the vendor's BTL/MTL (e.g., usnic) and then (b) to the BTLs. UCX was never supposed to be the default everywhere, just like OFI isn't the default (even though it too supports TCP).

@mwheinz
Copy link

mwheinz commented Feb 24, 2021

I have to admit - I was under the same impression as Ralph. Do we know when the behavior changed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants