Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot Build OpenMPI Out of Source Tree with ACFL (armclang) #11737

Closed
antoine-morvan opened this issue Jun 6, 2023 · 9 comments
Closed

Cannot Build OpenMPI Out of Source Tree with ACFL (armclang) #11737

antoine-morvan opened this issue Jun 6, 2023 · 9 comments

Comments

@antoine-morvan
Copy link

antoine-morvan commented Jun 6, 2023

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

Tried with 4.1.4 and 4.1.5

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

build from source (see script below)

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

N/A

Please describe the system on which you are running

  • aarch64-1
    • Operating system/version: Red Hat Enterprise Linux release 8.7 (Ootpa)
    • Computer hardware: Ampere Altra Q8030 (ARM Neoverse N1)
  • aarch64-2
    • Operating system/version: Red Hat Enterprise Linux release 8.3 (Ootpa)
    • Computer hardware: Fujitsu A64FX
  • x86_64
    • Operating system/version: Red Hat Enterprise Linux release 8.6 (Ootpa)
    • Computer hardware: AMD EPYC 7763
  • Network type: unrelated

Details of the problem

Hello,

I am building openmpi on some ARM systems, and spent some time tracking why it did not build with ACFL (Arm Compiler for Linux = armclang). It turns out that out of source tree builds are failing with this particular compiler family.

I tried on both x86_64 and aarch64 systems (see above for processor details), with the following compiler suites:

  • GCC 13.1 (both archs)
  • LLVM 16.0.4 (both archs)
  • ACFL 22.1 & 23.04.1 (aarch64 only)
  • AOCC 4.0 (x86_64 only)

And I tried in and out of source tree build on all these configurations. Only ACFL out of source tree fails. FAQ says it should work without any particular mention about ACFL : https://www.open-mpi.org/faq/?category=building#vpath-parallel-build

The error message looks like this:

make[2]: Entering directory '/home_nfs/bmorvana/software/Linux/Ampere_Q8030_1s_80c_1t_edr_256GB_3200/acfl-22.1/openmpi-4.1.4/build_openmpi/ompi/mpi/fortran/mpiext-use-mpi'
  PPFC     mpi-ext-module.lo
F90-F-0906-Can't find include file /home_nfs/bmorvana/software/Linux/Ampere_Q8030_1s_80c_1t_edr_256GB_3200/acfl-22.1/openmpi-4.1.4/openmpi-4.1.4/ompi/mpiext/pcollreq/mpif-h/mpiext_pcollreq_mpifh.h (mpi-ext-module.F90: 29)
F90/aarch64 Linux FlangArm F90  - 1.5 2017-05-01: compilation aborted
make[2]: *** [Makefile:1802: mpi-ext-module.lo] Error 1
make[2]: Leaving directory '/home_nfs/bmorvana/software/Linux/Ampere_Q8030_1s_80c_1t_edr_256GB_3200/acfl-22.1/openmpi-4.1.4/build_openmpi/ompi/mpi/fortran/mpiext-use-mpi'

Below is a script to reproduce. Please edit the big case 'Load Compiler' with your own compiler settings, and change BUILD_TYPE or OPENMPI_VERSION to test different situations.

#!/usr/bin/env bash
set -e -u -o pipefail
SCRIPT_DIR=$(dirname $(readlink -f $BASH_SOURCE))

########################
### Parameters
########################

# gcc acfl-22.1 acfl-23.04.1
COMPILER=${1:-acfl-22.1}

# IN_SOURCE_TREE OUT_OF_SOURCE_TREE 
BUILD_TYPE=OUT_OF_SOURCE_TREE

# 4.1.4 4.1.5
OPENMPI_VERSION=4.1.4

########################
### Load Compiler
########################

echo " -- Load compiler"
case $COMPILER in
    acfl-22.1)
        echo " -- Loading ARM Compiler 22.1"
        ACFL_ROOT="/home_nfs/bmorvana/software/Linux/aarch64/default/acfl-22.1/prefix"
        module use ${ACFL_ROOT}/modulefiles
        module load acfl/22.1
        
        export CC=armclang
        export CXX=armclang++
        export FC=armflang
        ;;
    acfl-23.04.1)
        echo " -- Loading ARM Compiler 23.04.1"
        ACFL_ROOT="/home_nfs/bmorvana/software/Linux/aarch64/default/acfl-23.04.1/prefix"
        module use ${ACFL_ROOT}/modulefiles
        module load acfl/23.04.1

        export CC=armclang
        export CXX=armclang++
        export FC=armflang
        ;;
    gcc)
        echo " -- Loading GCC 13.1.0"
        GCC_ROOT="/home_nfs/bmorvana/software/Linux/aarch64/default/gcc-13.1.0/prefix"
        export PATH=${GCC_ROOT}/bin:${PATH:-}
        export LD_LIBRARY_PATH=${GCC_ROOT}/lib:${GCC_ROOT}/lib64:${LD_LIBRARY_PATH:-}

        export CC=gcc
        export CXX=g++
        export FC=gfortran
        ;;
    *) echo "Error unsupported compiler '$COMPILER'" && exit 1 ;;
esac

########################
### Prepare
########################

DEST=${SCRIPT_DIR}/${COMPILER/:/-}/openmpi-${OPENMPI_VERSION}/
[ ! -d ${DEST} ] && mkdir -p ${DEST}
DIR=$(readlink -f ${DEST})

PREFIX=${PREFIX:-${DIR}/prefix}

########################
### Fetch
########################

echo " -- Fetch & extract OpenMPI $OPENMPI_VERSION"

OPENMPI_FOLDER=openmpi-${OPENMPI_VERSION}
OPENMPI_ARCHIVE=${OPENMPI_FOLDER}.tar.bz2
OPENMPI_URL=https://download.open-mpi.org/release/open-mpi/v${OPENMPI_VERSION:0:3}/${OPENMPI_ARCHIVE}

if [ ! -d ${DIR}/${OPENMPI_FOLDER} ]; then
    # [ ! -f ${DIR}/${OPENMPI_ARCHIVE} ] && cp ~/Downloads/${OPENMPI_ARCHIVE} ${DIR}/${OPENMPI_ARCHIVE}
    [ ! -f ${DIR}/${OPENMPI_ARCHIVE} ] && wget -c -q -O ${DIR}/${OPENMPI_ARCHIVE} ${OPENMPI_URL}
    (cd ${DIR} && tar xf ${DIR}/${OPENMPI_ARCHIVE})
fi

########################
### Build
########################

case $BUILD_TYPE in
    IN_SOURCE_TREE)
        BUILD_DIR=${DIR}/${OPENMPI_FOLDER}
        ;;
    OUT_OF_SOURCE_TREE)
        BUILD_DIR=${DIR}/build_openmpi
        mkdir -p $BUILD_DIR
        ;;
    *) echo "Error: unsupported build type '$BUILD_TYPE'" && exit 1 ;;
esac

if [ ! -f ${PREFIX}/bin/ompi_info ]; then
    ##
    ## Configure & build
    ##
    [ ! -f ${BUILD_DIR}/Makefile ] && \
        (cd ${BUILD_DIR}/ && \
            ${DIR}/${OPENMPI_FOLDER}/configure \
                --prefix=${PREFIX} \
        )
    (cd ${BUILD_DIR} && make -j $(nproc))
    (cd ${BUILD_DIR} && make -j $(nproc) install)
else
    echo "Skip OpenMPI"
fi

Best.

@jsquyres
Copy link
Member

jsquyres commented Jun 6, 2023

I don't know if we've ever tried building with armclang, particularly for Fortran. So it's possible that something went wrong here.

Can you invoke make V=1 so that we can see the exact Fortran compilation line that was invoked that generated the error? It's possible that a command line flag is missing, or somesuch. It couldn't find mpiext_pcollreq_mpifh.h, which typically means there's either a missing or incorrect CLI option to the Fortran compiler that identifies where header files live.

Can you also send the stdout/stderr from invoking configure, as well as the resulting config.log?

@antoine-morvan
Copy link
Author

Sure, here are the 2 logs (from a build on the Q8030 node with acfl 23.04.1)
01_config.log
02_make.log
Let me know if you need a make log from clean folder.

@jsquyres
Copy link
Member

jsquyres commented Jun 6, 2023

Can you send the contents of the ompi/mpi/fortran/mpiext-use-mpi/mpi-ext-module.F90 file from your build tree?

It should have 2 #include directives in it -- the first one is for mpiext_pcollreq_mpifh.h. It should have a valid path pointing to that file -- either absolute or relative. Does that file exist?

If it does exist, can you artificially make the path to that file shorter (e.g., even if it's through a new sym link that you create)? I wonder if the preprocessor is barfing if the length of that filename is too long...? Seems like a long shot, but I've seen Fortran compilers do weird things with max line and/or filename lengths.

@ggouaillardet
Copy link
Contributor

That looks like a compiler bug (!)

armflang fails to compile a file containing #include "/.../header.h"
when an absolute path is used. But it works just fine with a relative path

in your case, a simple workaround (I did not try, but you get the idea) is to replace

${DIR}/${OPENMPI_FOLDER}/configure with ../${OPENMPI_FOLDER}/configure

An other and uglier workaround would be to pass -I/ to FCFLAGS

I will have a look at LLVM's flang and report this issue to ARM and/or LLVM folks

@antoine-morvan
Copy link
Author

Can you send the contents of the ompi/mpi/fortran/mpiext-use-mpi/mpi-ext-module.F90 file from your build tree?

It should have 2 #include directives in it -- the first one is for mpiext_pcollreq_mpifh.h. It should have a valid path pointing to that file -- either absolute or relative. Does that file exist?

If it does exist, can you artificially make the path to that file shorter (e.g., even if it's through a new sym link that you create)? I wonder if the preprocessor is barfing if the length of that filename is too long...? Seems like a long shot, but I've seen Fortran compilers do weird things with max line and/or filename lengths.

The content of that file is below. The files exist:

image

! -*- fortran -*-
! $HEADER$
!
! *** THIS FILE IS AUTOMATICALLY GENERATED!
! *** Any manual edits will be lost!
!
#include "ompi/mpi/fortran/configure-fortran-output.h"

module mpi_ext
!     Even though this is not a useful parameter (cannot be used as a
!     preprocessor catch) define it to keep the linker from complaining
!     during the build.
      integer OMPI_HAVE_MPI_EXT
      parameter (OMPI_HAVE_MPI_EXT=1)
!
!
!     Enabled Extension: affinity
!     No "use mpi" bindings available
!

!
!     Enabled Extension: cuda
!     No "use mpi" bindings available
!

!
!     Enabled Extension: pcollreq
!
#include "/home_nfs/bmorvana/test/openmpi_acfl_reproduce/acfl-23.04.1/openmpi-4.1.4/openmpi-4.1.4/ompi/mpiext/pcollreq/mpif-h/mpiext_pcollreq_mpifh.h"
#include "/home_nfs/bmorvana/test/openmpi_acfl_reproduce/acfl-23.04.1/openmpi-4.1.4/openmpi-4.1.4/ompi/mpiext/pcollreq/use-mpi/mpiext_pcollreq_usempi.h"

!
end module mpi_ext

@antoine-morvan
Copy link
Author

That looks like a compiler bug (!)

armflang fails to compile a file containing #include "/.../header.h" when an absolute path is used. But it works just fine with a relative path

in your case, a simple workaround (I did not try, but you get the idea) is to replace

${DIR}/${OPENMPI_FOLDER}/configure with ../${OPENMPI_FOLDER}/configure

An other and uglier workaround would be to pass -I/ to FCFLAGS

I will have a look at LLVM's flang and report this issue to ARM and/or LLVM folks

I confirm the workaround using ../${OPENMPI_FOLDER}/configure leads to successful build.

@jsquyres
Copy link
Member

jsquyres commented Jun 7, 2023

That looks like a compiler bug (!)

@ggouaillardet Are you saying you tested yourself, and armflang is failing when you:

#include "/foo/bar.h"

but armflang succeeds with:

#include "../../foo/bar.h"

?

If so, yes, that's sure feels like a compiler/preprocessor bug.

One more thing to test, though, would be to check if the filename length has anything to do with this. E.g.:

! Does this fail?
#include "/some/really/long/path/to/a/really/deep/subdirectory/in/the/wastelands/of/a/gigantic/filesystem/on/your/disk/foo.h"
! Does this succeed?
#include "/foo/bar.h"

@antoine-morvan A simple workaround for you might be to use a relative path to invoke configure instead of an absolute path.

@ggouaillardet
Copy link
Contributor

@jsquyres yes, I did reproduce the problem exactly as you described. Path length is not an issue here.

FWIW, under strace, I can see the builtin (?) pre-processor append /foo/bar.h to the include directories (!)
So unless it is a feature this pre-processor does not support absolute path, I agree this is likely a bug and I will report this to ARM.

@jsquyres
Copy link
Member

jsquyres commented Jun 7, 2023

... I agree this is likely a bug and I will report this to ARM.

Thanks @ggouaillardet.

I'm therefore going to close this issue, since it appears to be an armflang bug.

@jsquyres jsquyres closed this as completed Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants