Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in PAHM SCHISM intergation compiled with GNU on Docker #8

Closed
SorooshMani-NOAA opened this issue Jun 14, 2022 · 13 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@SorooshMani-NOAA
Copy link
Contributor

SorooshMani-NOAA commented Jun 14, 2022

When I try to run SCHISM with parametric wind forcing, I get the following error during initialization:

(gdb) where
#0  0x00007efc640efcb0 in ?? ()
#1  <signal handler called>
#2  __memcmp_avx2_movbe () at ../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:413
#3  0x00007efc66f9231b in _gfortran_compare_string () from /lib/x86_64-linux-gnu/libgfortran.so.5
#4  0x00005617db48e6d1 in sortutils::stringlexcomp (_str2=<optimized out>, _str1=<optimized out>,
    msensitive=.TRUE., str2=..., str1=...)
    at /home/ondemand-user/app/schism/src/Core/PaHM/SortUtils.F90:1460
#5  sortutils::indexxstring (arr1d=<incomplete type>, idx1d=..., status=0, casesens=.TRUE.,
    _arr1d=_arr1d@entry=10) at /home/ondemand-user/app/schism/src/Core/PaHM/SortUtils.F90:522
#6  0x00005617db48395d in parwind::readcsvbesttrackfile ()
    at /home/ondemand-user/app/schism/src/Core/PaHM/parwind.F90:389
#7  0x00005617db3867a4 in schism_init (iorder=0, indir=..., iths=0, ntime=16848,
    _indir=_indir@entry=2) at /home/ondemand-user/app/schism/src/Hydro/schism_init.F90:7096
#8  0x00005617db33ae2b in schism_init0 (ntime=16848, iths=0)
    at /home/ondemand-user/app/schism/src/Driver/schism_driver.F90:145
#9  schism_init0 (ntime=16848, iths=0)
    at /home/ondemand-user/app/schism/src/Driver/schism_driver.F90:138
#10 schism_main () at /home/ondemand-user/app/schism/src/Driver/schism_driver.F90:129
#11 0x00005617db33af04 in schism_driver ()
    at /home/ondemand-user/app/schism/src/Driver/schism_driver.F90:112
#12 0x00005617db33ac5f in main (argc=argc@entry=2, argv=0x7fffa747b08c)
    at /home/ondemand-user/app/schism/src/Driver/schism_driver.F90:77
#13 0x00007efc66a26fd0 in __libc_start_call_main (main=main@entry=0x5617db33ac40 <main>,
    argc=argc@entry=2, argv=argv@entry=0x7fffa7479398) at ../sysdeps/nptl/libc_start_call_main.h:58
#14 0x00007efc66a2707d in __libc_start_main_impl (main=0x5617db33ac40 <main>, argc=2,
    argv=0x7fffa7479398, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
    stack_end=0x7fffa7479388) at ../csu/libc-start.c:409

I run into this issue when compiling SCHISM using GNU. Here is the Dockerfile I use:

FROM ubuntu:impish
 
# Create a non-root user
ARG username=ondemand-user
ARG uid=1000
ARG gid=100
ARG ioprefix=/app/io
ENV USER $username
ENV UID $uid
ENV GID $gid
ENV HOME /home/$USER

# Get necessary packages
RUN apt-get update && apt-get upgrade -y && apt-get install -y \
   git \
   gcc \
   g++ \
   gfortran \
   make \
   cmake \
   openmpi-bin libopenmpi-dev \
   libhdf5-dev \
   libnetcdf-dev libnetcdf-mpi-dev libnetcdff-dev \
   python
 
 # New user
 RUN adduser --disabled-password --gecos "Non-root user" --uid $UID --home $HOME $USER
 
 # Create a project directory inside user home
 ENV PROJECT_DIR $HOME/app
 RUN mkdir -p $PROJECT_DIR
 WORKDIR $PROJECT_DIR
 
ARG SCHISM_COMMIT 73fe54b

RUN git clone https://github.com/schism-dev/schism.git && \
   git -C schism checkout $SCHISM_COMMIT && \
   mkdir -p schism/build && \
   PREV_PWD=$PWD && \
   cd schism/build && \
   cmake ../src/ \
   -DNetCDF_Fortran_LIBRARY=$(nc-config --prefix)/lib/x86_64-linux-gnu/libnetcdff.so \
   -DNetCDF_C_LIBRARY=$(nc-config --prefix)/lib/x86_64-linux-gnu/libnetcdf.so \
   -DNetCDF_INCLUDE_DIR=$(nc-config --prefix)/include \
   -DUSE_PAHM=TRUE \
   -DCMAKE_Fortran_FLAGS_RELEASE="-O2 -ffree-line-length-none -fallow-argument-mismatch" \
   -DCMAKE_BUILD_TYPE="RelWithDebInfo" && \
   make -j8 && \
   mv bin/* -t /usr/bin/ && \
   cd ${PREV_PWD} && \
   rm -rf schism
 
RUN DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get install -y tzdata
RUN DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get install -y expect
RUN DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get install -y vim

# Helper scripts
RUN chown -R $UID:$GID $PROJECT_DIR

# Volume mount points
RUN mkdir -p $ioprefix/output
RUN mkdir -p $ioprefix/input

USER $USER

ENTRYPOINT [ "bash" ]

and I run it using

mpirun -np 8 pschism_PAHM_TVD-VL 4

from inside the container

Note that to start the container and have the SCHISM inputs, you need to set SYS_PTRACE cap and also bind directory of the input:

docker run --rm -it -v/path/to/schism/input:/home/ondemand-user/app/io/input_dir --cap-add SYS_PTRACE --entrypoint bash MY_CONTAINER_ID
@SorooshMani-NOAA SorooshMani-NOAA added the bug Something isn't working label Jun 14, 2022
@SorooshMani-NOAA
Copy link
Contributor Author

@pvelissariou1, when you have the time let's talk about this issue. I'm not sure if it's related to how I setup my dockerfile or it's the compiler version, etc. It used to work when your code was first added to SCHISM. I recently rebuilt the docker image I was using and I keep running into this issue.

@SorooshMani-NOAA
Copy link
Contributor Author

Note that if I try this with intel/2021.3.0 compilers I do not see this issue.

@SorooshMani-NOAA
Copy link
Contributor Author

@josephzhang8 should I create a parallel issue on schism repo for this?

@josephzhang8
Copy link

josephzhang8 commented Jun 14, 2022 via email

@SorooshMani-NOAA
Copy link
Contributor Author

@pvelissariou1 as I mentioned in the email, I was unsuccessful running SCHISM-PAHM with either of the compilers in panvelissariou1/coastalapp:devel. My Dockerfile for compilation is:

FROM panvelissariou1/coastalapp:devel

# Create a project directory inside user home
ENV PROJECT_DIR /app
RUN mkdir -p $PROJECT_DIR
WORKDIR $PROJECT_DIR

SHELL ["/bin/bash", "-l", "-c"]

RUN pyvenv-3.6 /usr/local

RUN git clone https://github.com/schism-dev/schism.git && \
    git -C schism checkout e2b943a && \
    mkdir -p schism/build && \
    module load gnu/9.2.1 netcdf openmpi && \
    export PARMETIS_DIR=$PWD/schism/src/ParMetis-4.0.3/ && \
    PREV_PWD=$PWD && \
    cd ${PARMETIS_DIR} && \
    make && \
    cd ${PREV_PWD} && \
    PREV_PWD=$PWD && \
    cd schism/build && \
    cmake ../src/ \
        -DCMAKE_Fortran_COMPILER=mpifort \
        -DCMAKE_C_COMPILER=mpicc \
        -DNetCDF_Fortran_LIBRARY=$(nc-config --prefix)/lib/libnetcdff.so \
        -DNetCDF_C_LIBRARY=$(nc-config --prefix)/lib/libnetcdf.so \
        -DNetCDF_INCLUDE_DIR=$(nc-config --prefix)/include \
        -DUSE_PAHM=TRUE \
        -DCMAKE_Fortran_FLAGS_RELEASE="-O2 -ffree-line-length-none" && \
    make -j8 && \
    mv bin/* -t /usr/bin/ && \
    cd ${PREV_PWD} && \
    rm -rf schism

I built this and then shelled into the container, loaded the same modules as above and ran mpirun -np 8 --allow-run-as-root pschism_PAHM_TVD-VL 4. Note that to run the container I also use SYS_PTRACE capability and mounted the directory where my SCHISM setup resides. I will share the Docker build log as well (it includes the cmake log)

@SorooshMani-NOAA
Copy link
Contributor Author

The segfault I get is at:

Thread 1 "pschism_PAHM_TV" received signal SIGSEGV, Segmentation fault.
0x00007f8e34e3e68a in __memcmp_avx2_movbe () from /lib64/libc.so.6
(gdb) where
#0  0x00007f8e34e3e68a in __memcmp_avx2_movbe () from /lib64/libc.so.6
#1  0x00007f8e369d5adf in _gfortran_compare_string ()
   from /lib64/libgfortran.so.5
#2  0x0000000000551656 in __sortutils_MOD_stringlexcomp ()
#3  0x00000000005547c3 in __sortutils_MOD_indexxstring ()
#4  0x0000000000549819 in __parwind_MOD_readcsvbesttrackfile ()
#5  0x000000000044ea83 in schism_init_ ()
#6  0x0000000000404df5 in schism_main_ ()
#7  0x0000000000404ee4 in MAIN__ ()
#8  0x0000000000404c3d in main ()
(gdb)

like before. The build log is attached:
schism_pahm_build.log

@pvelissariou1
Copy link
Collaborator

pvelissariou1 commented Jul 12, 2022 via email

@josephzhang8
Copy link

josephzhang8 commented Jul 12, 2022 via email

@pvelissariou1
Copy link
Collaborator

pvelissariou1 commented Jul 12, 2022 via email

@josephzhang8
Copy link

josephzhang8 commented Jul 12, 2022 via email

@platipodium
Copy link

there are a few argument/rank mismatches (see the attached files) that in newer version of GFortran are considered as errors (unless the flag -fallow-argument-mismatch is passed to the compiler)

These cannot be avoided by our codes as the argument mismatch is raised by the MPI_ calls, which take variable type arguments. So unless this is fixed upstream (in MPI code), we have to use the -fallow-argument-mismatch flag in GCC>=10

@pvelissariou1
Copy link
Collaborator

pvelissariou1 commented Jul 14, 2022 via email

@pvelissariou1
Copy link
Collaborator

Issue closed, fixed bugs for SCHISM/PAHM. Compiles fine now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants