Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpirun cannot start in GitHub codespaces due to their host name scheme #9321

Closed
tchayen opened this issue Aug 27, 2021 · 2 comments
Closed

Comments

@tchayen
Copy link

tchayen commented Aug 27, 2021

Background information

What version of Open MPI are you using?

4.0.3

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

apt install libopenmpi-dev openmpi-bin openmpi-common openmpi-doc

Please describe the system on which you are running

Docker container of GitHub codespaces.


Details of the problem

Hi! I am trying to set up openmpi in docker dev container for GitHub codespaces (not sure how much sense it makes but the provided VM reports 4 CPU cores so it should be as viable as on most PCs) and the following error happened:

--------------------------------------------------------------------------
While trying to create a regular expression of the node names
used in this application, the regex parser has detected the
presence of an illegal character in the following node name:

  node:  codespaces_3620e8

Node names must be composed of a combination of ascii letters,
digits, dots, and the hyphen ('-') character. See the following
for an explanation:

  https://en.wikipedia.org/wiki/Hostname

Please correct the error and try again.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
An internal error has occurred in ORTE:

[[59499,0],0] FORCE-TERMINATE AT (null):1 - error base/plm_base_launch_support.c(553)

This is something that should be reported to the developers.
--------------------------------------------------------------------------

I guess it comes from this line and as the error says, it has something to do how GitHub picks hostname. I tried looking for option to override the default name in the config but I didn't find a way. I also tried changing it through hostname and hostnamectl commands but also no luck.

Is it something you are considering is worth investigating? I am happy to provide any more information if needed.

@jjhursey
Copy link
Member

By default in the v4.0.x and v4.1.x series, a manual regular expression of the hostnames is used when communicating the host list to all of the daemons. The regex is useful to compress that list to improve efficiency. However, when it encounters irregular names the default mechanism can be inefficient (and in extreme cases incorrect).

In the v4.0.x and v4.1.x series, we introduced a naive component (PR #6915) that just sends the list without trying to parse them for scenarios where the default regex fails. In the master and v5.x series, we removed the regex framework and replaced it with a zlib compression mechanism to sidestep the whole issue.

In your case, ORTE is encountering a hostname that does not conform to the standard model. Though I would hope that GitHub fixes that generally, you can try to use the naive component to work around the issue in the short term.

Try adding the following MCA parameter to the mpirun command line: -mca regx naive
Or set the following environment variable: OMPI_MCA_regx=naive

@tchayen
Copy link
Author

tchayen commented Aug 27, 2021

Thanks! It helps with the issue.

@tchayen tchayen closed this as completed Aug 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants