-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gethostbyname error on macos 10.15 (github actions vm) #4710
Comments
Could you try the latest version: https://www.mpich.org/downloads/ , either v3.3.2 or v3.4a3? |
Sorry I had a typo in the version number, this is using v3.3.2 which should be the latest stable release. I'm working on testing the pre-release version, but it's not packaged so it's taking longer. |
You could try v3.4a3, which is very close to the edge. |
What's the timeline for the 3.4 release? |
The plan is to release 3.4b in a couple months, then 3.4 another month after. |
running |
The root issue seems that mac would set arbitrary local hostname that can't be resolved by |
We were running inside a github action osx image (for CI) which is a VM I believe although it's not clear how they instantiate the running environment. They do create an arbitrary hostname, and sadly this can also change during the runtime I believe: There are similar reported issues for Azure Pipelines (which I believe is shared backing infrastructure with GitHub Actions): OSX hostname change at runtime for github actions: Although I know this is not a typical deployed usecase for mpi / mpich, being able to run mpi tests using one of the widely used CI environments is very useful so I'm willing to help out anyway possible to make it easier for future users (maybe by adding some notes in the documentation with examples if there is a workaround). |
@jakebolewski I believe this commit now in |
@jakebolewski reminder to confirm if this issue has been resolved, if you have time. Thanks. |
I'm working on building the artifacts to test it now that a new beta release has been tagged. |
This issue is stale. If it is still relevant, please re-open it. |
MPICH 3.4 should have fixed pmodels/mpich#4710
MPICH 3.4 should have fixed pmodels/mpich#4710
We are seeing a github action macos 10.15 vm issue (with mpich 3.3.2) where calling MPI.Init() fails:
Fatal error in MPI_Init: Invalid group, error stack:
MPIR_Init_thread(586)..............:
MPID_Init(224).....................: channel initialization failed
MPIDI_CH3_Init(105)................:
MPID_nem_init(324).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(401):
MPID_nem_tcp_init(373).............: gethostbyname failed, Mac-1594849612293 (errno 0)
(unknown)(): Invalid group
SingleStackUtils: Error During Test at /Users/runner/work/ClimateMachine.jl/ClimateMachine.jl/test/testhelpers.jl:16
Test threw exception
Expression: mpiexec() do cmd
run(
$cmd $oversubscribe -n $ntasks $(Base.julia_cmd()) --startup-file=no --project=$(Base.active_project()) $file
)true
end
failed process: Process(
/Users/runner/.julia/artifacts/848ee2ddce903941ae946cd49f63eac561bd636d/bin/mpiexec -n 1 /Users/runner/hostedtoolcache/julia/1.4.2/x64/bin/julia -Cnative -J/Users/runner/work/ClimateMachine.jl/ClimateMachine.jl/ClimateMachine.so --check-bounds=yes -g1 --startup-file=no --project=/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/jl_pVc3V7/Project.toml /Users/runner/work/ClimateMachine.jl/ClimateMachine.jl/test/Utilities/SingleStackUtils/ssu_tests.jl
, ProcessExited(8)) [8]https://github.com/CliMA/ClimateMachine.jl/runs/875221461#step:8:169
Reading the source code, it seems like defining MPICH_INTERFACE_HOSTNAME as localhost should override gethostbyname behavior, but unfortunately this doesn't seem to fix the issue.
The replacement with getaddrinfo might fix the issue it seems in the upcoming 3.4 release.
#2889
Ref: JuliaParallel/MPI.jl#407
The text was updated successfully, but these errors were encountered: