Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP BTL does not support Linux virtual interfaces #160

Open
ompiteam opened this issue Oct 1, 2014 · 6 comments
Open

TCP BTL does not support Linux virtual interfaces #160

ompiteam opened this issue Oct 1, 2014 · 6 comments
Assignees
Labels

Comments

@ompiteam
Copy link
Contributor

ompiteam commented Oct 1, 2014

If you create a virtual ethernet device in Linux, the TCP BTL gets confused.

This is because the Linux kernel will use the same kernel index for both interfaces -- the TCP BTL fundamentally assumes that all interfaces will have a unique kernel index (we use that kernel index for indexing and unique identification in modex data). This is clearly a bad assumption.

I chatted with Ralph about this on the phone: we're wondering why the kernel index was used at all. Why not use the OPAL IF index? That ''is'' unique (in a process), and is suitable for both indexing and identification in modex data.

Ralph is going to revamp the OPAL IF interface soon, anyway (e.g., convert it from a list to an array) and will likely be removing all the kernel index stuff. This will force changing the TCP BTL to use the OPAL IF index (instead of the kernel index). This will likely solve the problem.

Once we fix this, perhaps Bart at Atipa can test it for us (he ran into the issue because he has eth0:0 on his cluster head node to talk to the IPMI network. He doesn't usually run MPI jobs on the head node, but he did this once and ran into hangs/badness, and I helped diagnose the issue). :-)

@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Imported from trac issue 3339. Created by jsquyres on 2012-10-02T17:20:21, last modified: 2014-04-18T22:36:27

@ompiteam ompiteam added this to the Open MPI 1.8.4 milestone Oct 1, 2014
@ompiteam ompiteam added the bug label Oct 1, 2014
@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by roye on 2014-04-18 22:36:27:

Hello,
I'm working on a project that used virtual ethernet device on servers. This configuration cause issues with running mpi jobs as mention above.
To workaround the issue, we used tap interfaces ( tunctl ) to handle the virtual IP needed by the server. The mpi jobs works fine with the tap interfaces.
I don't know if there is an impact on performance, but it seems to work fine.

@jsquyres jsquyres modified the milestones: Open MPI 1.9, Open MPI 1.8.4 Dec 3, 2014
@jsquyres
Copy link
Member

jsquyres commented Dec 3, 2014

This is clearly not going to happen in the v1.8 series.

@rhc54 may well be doing this revamp-the-OPAL-if interface work, but only in terms of v1.9 or beyond.

yosefe pushed a commit to yosefe/ompi that referenced this issue Mar 5, 2015
Fix performance regression caused by enabling opal thread support
@jsquyres jsquyres modified the milestones: Open MPI 2.X, Open MPI v2.0.0 Jun 25, 2015
@rhc54 rhc54 closed this as completed Jan 27, 2017
@bwbarrett
Copy link
Member

Since George and I are mucking in this part of the code, I'm going to re-open this issue.

@bwbarrett bwbarrett reopened this Oct 19, 2018
@bwbarrett
Copy link
Member

The reason we use kindex instead of index is that index is a way of enumerating the addresses associated with the interfaces on the host, but kindex is (supposed to be) the way to enumerate actual interfaces. So let's say we had a platform with two devices (A and B) and each have two addresses (A1, A2, B3, B4). That means we'll have 4 unique indexes, but only 2 unique kindexes. There's no advantage to spreading traffic across IPs on the same interface, which is why the BTL looks at kindexes instead of indexes.

That, however, isn't why btl_tcp_if_include and btl_tcp_if_exclude don't work around the case. That's because the TCP BTL currently (and has since this ticket was opened) behaved poorly around both of those options. We use the strings in the if_include and if_exclude to pick a device (kindex) which has an interface that fits in if_include or isn't disabled by if_exclude, but then ignore which address passed that test and pick the first address associated with the kindex. I added a big comment about this as part of #5938 and hope to fix that in the near future.

@bwbarrett
Copy link
Member

The kindex behavior is also a bit wrong on some platforms with regards to IPv6. On Linux, the kindexes are consistent across the IPv4 and IPv6 addresses (that is, we could build the same output as ip addr li). On MacOS, however, the kindexes are not consistent. So if every interface has both an IPv4 and IPv6 address, and Open MPI thinks there are twice as many interfaces as there actually are.

We never actually use the kindex in a call to the kernel. One potential solution to both the MacOS IPv6 problem and the virtual interfaces problem is to use the ifnames to build our own kindexes based on ifname. It would be an O(^2) operation, but hopefully N would be sufficiently small that who cares.

@bwbarrett bwbarrett assigned bwbarrett and unassigned rhc54 Oct 19, 2018
@bwbarrett bwbarrett removed this from the v2.x milestone Oct 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants