Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with Fujitsu clang mode on Fugaku #661

Closed
tj9726 opened this issue Oct 29, 2023 · 5 comments
Closed

Error with Fujitsu clang mode on Fugaku #661

tj9726 opened this issue Oct 29, 2023 · 5 comments
Labels
installation compilation, installation

Comments

@tj9726
Copy link

tj9726 commented Oct 29, 2023

I am having a problem with running Smilei on Fugaku with the Fujitsu compiler clang mode.
I am using the latest version of Smilei (4.8) and the latest version of the Fujitsu compiler (4.10.0).
The results were the same with some of the older versions of Fujitsu compilers.

The compile job script and the environment I use are the following.
Job script:

#!/bin/sh
#PJM -N "compile"
#PJM -L "node=1"
#PJM -L "rscgrp=small"
#PJM -L "elapse=1:00:00"
#PJM -x PJM_LLIO_GFSCACHE=/vol0004

source ~/env/smilei_env

make -j 48 config=no_mpi_tm machine=fugaku_fujitsu_cm

smilei_env:

. /vol0004/apps/oss/spack/share/spack/setup-env.sh

spack load /7q66snj # python
spack load /aqtfct2 # hdf5
spack load /23rofh6 # py-numpy

export SMILEICXX=mpiFCC
export HDF5_ROOT=/vol0004/apps/oss/spack-v0.19/opt/spack/linux-rhel8-a64fx/fj-4.8.1/hdf5-1.12.2-aqtfct2uhggoct2rp6am4iw6adumk5vt/

Smilei compiles, however, a simple test problem with uniform plasma crashes immediately after "Time-Loop started:".
Output files indicate that initialization is complete.
I did not find any informative error messages.

I have tried trad mode with
make -j 4 config=no_mpi_tm machine=fugaku_fujitsu_tm
(make -j 48 fails with Make Error 9)
This compiles and runs without any errors.

I also tried the GCC compiler on Fugaku, which also works (even with -j 48).

Would you help me with the clang mode?

@tj9726 tj9726 added the installation compilation, installation label Oct 29, 2023
@mccoys
Copy link
Contributor

mccoys commented Oct 29, 2023

Hi. You say no informative error message. But something appeared right? Did not say what kind of error?

@tj9726
Copy link
Author

tj9726 commented Oct 29, 2023

Hi. Sorry I missed some of the output files.
The standard output I get is

                    _            _
  ___           _  | |        _  \ \   Version : 4.8-1-g0293ceb2b-master
 / __|  _ __   (_) | |  ___  (_)  | |   
 \__ \ | '  \   _  | | / -_)  _   | |
 |___/ |_|_|_| |_| |_| \___| |_|  | |  
                                 /_/    
 
 

 Reading the simulation parameters
 --------------------------------------------------------------------------------
 HDF5 version 1.12.2
 Python version 3.10.8
	 Parsing pyinit.py
	 Parsing 4.8-1-g0293ceb2b-master
	 Parsing pyprofiles.py
	 Parsing test.py
	 Parsing pycontrol.py
	 Check for function preprocess()
	 python preprocess function does not exist
	 Calling python _smilei_check
	 Calling python _prepare_checkpoint_dir
	 Calling python _keep_python_running() :
�[1;36mCAREFUL: Patches distribution: hilbertian
�[0m
 

 Geometry: 2Dcartesian
 --------------------------------------------------------------------------------
	 Interpolation order : 2
	 Maxwell solver : Yee
	 simulation duration = 100.000000,   total number of iterations = 2000
	 timestep = 0.050000 = 0.707107 x CFL,   time resolution = 20.000000
	 Grid length: 40, 40
	 Cell length: 0.1, 0.1, 0
	 Number of cells: 400, 400
	 Spatial resolution: 10, 10
 

 Electromagnetic boundary conditions
 --------------------------------------------------------------------------------
	 xmin periodic
	 xmax periodic
	 ymin periodic
	 ymax periodic
 

 Vectorization: 
 --------------------------------------------------------------------------------
	 Mode: off
	 Calling python writeInfo
 

 Initializing MPI
 --------------------------------------------------------------------------------
	 applied topology for periodic BCs in x-direction
	 applied topology for periodic BCs in y-direction
	 MPI_THREAD_MULTIPLE not enabled
	 Number of MPI processes: 4
	 Number of threads per MPI process : 12
	 OpenMP task parallelization not activated
 
	 Number of patches: 16 x 16
	 Number of cells in one patch: 25 x 25
	 Dynamic load balancing: never
 

 Initializing the restart environment
 --------------------------------------------------------------------------------
 
 
 

 Initializing species
 --------------------------------------------------------------------------------
	 
	 Creating Species #0: electron
		 > Pusher: boris
		 > Boundary conditions: periodic periodic periodic periodic
		 > Density profile: 2D built-in profile `constant` (value: 1.000000)
	 
	 Creating Species #1: positron
		 > Pusher: boris
		 > Boundary conditions: periodic periodic periodic periodic
		 > Density profile: 2D built-in profile `constant` (value: 1.000000)
 

 Initializing Patches
 --------------------------------------------------------------------------------
	 First patch created
		 Approximately 10% of patches created
		 Approximately 20% of patches created
		 Approximately 30% of patches created
		 Approximately 40% of patches created
		 Approximately 50% of patches created
		 Approximately 60% of patches created
		 Approximately 70% of patches created
		 Approximately 80% of patches created
		 Approximately 90% of patches created
	 All patches created
 

 Creating Diagnostics, antennas, and external fields
 --------------------------------------------------------------------------------
 

 finalize MPI
 --------------------------------------------------------------------------------
	 Done creating diagnostics, antennas, and external fields
 

 Minimum memory consumption (does not include all temporary buffers)
 --------------------------------------------------------------------------------
              Particles: Master 976 MB;   Max 976 MB;   Global 3.81 GB
                 Fields: Master 5 MB;   Max 5 MB;   Global 0.0223 GB
            scalars.txt: Master 0 MB;   Max 0 MB;   Global 0 GB
 

 Initial fields setup
 --------------------------------------------------------------------------------
	 Solving Poisson at time t = 0
 

 Initializing E field through Poisson solver
 --------------------------------------------------------------------------------
	 Poisson solver converged at iteration: 0, relative err is ctrl = 0.000000 x 1e-14
	 Poisson equation solved. Maximum err = 0.000000 at i= -1
 Time in Poisson : 0.009117
	 Applying external fields at time t = 0
	 Applying prescribed fields at time t = 0
	 Applying antennas at time t = 0
 

 Open files & initialize diagnostics
 --------------------------------------------------------------------------------
 

 Running diags at time t = 0
 --------------------------------------------------------------------------------
 

 Species creation summary
 --------------------------------------------------------------------------------
		 Species 0 (electron) created with 40960000 particles
		 Species 1 (positron) created with 40960000 particles
 

 Expected disk usage (approximate)
 --------------------------------------------------------------------------------
	 WARNING: disk usage by non-uniform particles maybe strongly underestimated,
	    especially when particles are created at runtime (ionization, pair generation, etc.)
	 
	 Expected disk usage for diagnostics:
		 File scalars.txt: 8.98 K
	 Total disk usage for diagnostics: 8.98 K
	 
 

 Keeping or closing the python runtime environment
 --------------------------------------------------------------------------------
	 Checking for cleanup() function:
	 python cleanup function does not exist
	 Closing Python
 

 Time-Loop started: number of time-steps n_time = 2000
 --------------------------------------------------------------------------------
�[1;36mCAREFUL: The following `push time` assumes a global number of 48 cores (hyperthreading is unknown)
�[0m
    timestep       sim time   cpu time [s]   (    diff [s] )   push time [ns]

This is why I thought the initialization was complete but crashed immediately after entering the main loop.
The standard outputs are

Stack trace (most recent call last):
#11   Object "smilei", at 0x46d233, in 
#10   Object "/lib64/libc.so.6", at 0x400006844383, in __libc_start_main
#9    Object "smilei", at 0x93805b, in main
#8    Object "/opt/FJSVxtclanga/tcsds-ssl2-latest/lib64/libfjomp.so", at 0x4000041ab073, in __kmpc_fork_call
#7    Object "/opt/FJSVxtclanga/tcsds-ssl2-latest/lib64/libfjomp.so", at 0x4000041b85cb, in __kmp_fork_call
#6    Object "/opt/FJSVxtclanga/tcsds-ssl2-latest/lib64/libfjomp.so", at 0x4000041b7603, in 
#5    Object "/opt/FJSVxtclanga/tcsds-ssl2-latest/lib64/libfjomp.so", at 0x4000042145ff, in __kmp_invoke_microtask
#4    Object "smilei", at 0x93a343, in 
#3    Object "smilei", at 0x7ed4d7, in VectorPatch::dynamics(Params&, SmileiMPI*, SimWindow*, RadiationTables&, MultiphotonBreitWheelerTables&, double, Timers&, int)
#2    Object "smilei", at 0x7ed877, in VectorPatch::dynamicsWithoutTasks(Params&, SmileiMPI*, SimWindow*, RadiationTables&, MultiphotonBreitWheelerTables&, double, Timers&, int)
#1    Object "smilei", at 0x97876b, in Species::dynamics(double, unsigned int, ElectroMagn*, Params&, bool, PartWalls*, Patch*, SmileiMPI*, RadiationTables&, MultiphotonBreitWheelerTables&)
#0    Object "smilei", at 0x91cca4, in PusherBoris::operator()(Particles&, SmileiMPI*, int, int, int, int)
Segmentation fault (Address not mapped to object [(nil)])

for 1 process
and Stack trace (most recent call last): for others (I only saw these one before).
And the system output is [WARN] PLE 0610 plexec The process terminated with the signal.(rank=1)(nid=0x03010004)(sig=11)
I have tried multiple times and the timing of crashes were always the same.
And again, there were no problems with the Fujitsu trad mode and GCC.

@xxirii
Copy link
Contributor

xxirii commented Nov 7, 2023

Hello,

I worked on adapting Smilei on Fugaku few years ago. Unfortunately I can't access the system anymore.

Is compiling in clang mode really important for you since you already have 2 solutions ?

I can suggest to compile with the armclang compiler instead of the Fujitsu compiler that used to give decent performance on A64FX for us. Fugaku compiler was behind but it is possible that they catch up since then. This is a third alternative.

Unfortunately I can't do more especially if you don't have more error output. Perhaps you can ask the system support.

@tj9726
Copy link
Author

tj9726 commented Nov 8, 2023

Hi,

I was checking performance with different compilers for future large-scale simulations.
I read your paper and thought clang mode may perform better than the trad mode.

Did you need anything special to compile when you tested a few years ago?
If you did not, then newer versions of the compiler could be the reason.

I will proceed with what is available (or ask Fujitsu developers).

Thank you for your advice.

@xxirii
Copy link
Contributor

xxirii commented Nov 20, 2023

Sorry for my late reply. Please find below the configuration I have used for my tests :

Spack Env


Example of configuration for ssh:
```bash
Host fugaku
   Hostname login.fugaku.r-ccs.riken.jp
   ForwardX11 yes
   ForwardAgent yes
   User <your ogin>
   ServerAliveInterval 60
   Compression yes

II. Environment

Login nodes are Intel processors. You should use cross compilation or compile on a compute node.

On compute nodes

The Fugaku super-computer relies on Spack for the most advanced libraries and tools.
You first have to source the Spack environment:

. /vol0004/apps/oss/spack/share/spack/setup-env.sh

You can check available libraries by doing:

spack find -xl <lib name>

Check regularly the last library versions because Spack is regularly updated.
For instance:

spack find -xl python
spack find -xl hdf5
. /vol0004/apps/oss/spack/share/spack/setup-env.sh
#Python
spack load /7sz6cn4
#Numpy
spack load /q6rre3p
#HDF5
spack load /l53s4lp

export SMILEICXX=mpiFCCpx
export HDF5_ROOT=/vol0004/apps/oss/spack-v0.16.2/opt/spack/linux-rhel8-a64fx/fj-4.6.1/hdf5-1.10.7-hza6f4rwqjon62z4q7a6vavtrkafvz35/

Use which h5c++ to get the path to the HDF5 library you use.
Save this configuration in a file that you can source in job scripts.

Compilation

Trad mode

The trad mode uses the Fujitsu flags to compile the code.
In this case, we use the machine file fugaku_fujitsu_tm.

#!/bin/sh -x
#PJM -N  "smilei"
#PJM -L  "node=1"                          # Assign node 1 node
#PJM -L  "rscgrp=small"                    # Specify resource group
#PJM -L  "elapse=00:30:00"                 # Elapsed time limit 1 hour
#PJM -x PJM_LLIO_GFSCACHE=/vol0004
#PJM -s

source ~/env/smilei_env

mpiFCC -show

make -j 48 config="verbose" machine="fugaku_fujitsu_tm"

See this page for more information: https://www.fugaku.r-ccs.riken.jp/doc_root/en/user_guides/lang_latest/FujitsuCompiler/C%2B%2B/tradmode.html

Clang mode

In clang mode, the Fujitsu compiler uses the same as the flags Clang compiler.
The flag -Nclang has to be provided.
In this case, we use the machine file fugaku_fujitsu_cm.

#!/bin/sh -x
#PJM -N  "smilei"
#PJM -L  "node=1"                          # Assign node 1 node
#PJM -L  "rscgrp=small"                    # Specify resource group
#PJM -L  "elapse=00:30:00"                 # Elapsed time limit 1 hour
#PJM -x PJM_LLIO_GFSCACHE=/vol0004
#PJM -s

source ~/env/smilei_env

mpiFCC -show

make -j 48 config="verbose" machine="fugaku_fujitsu_cm"

See this page for more information: https://www.fugaku.r-ccs.riken.jp/doc_root/en/user_guides/lang_latest/FujitsuCompiler/C%2B%2B/clangmode.html

IV. Execution

Single node execution

#!/bin/bash
#PJM -L "node=1"                  # 4 nodes
#PJM -L "rscgrp=small"            # Specify resource group
#PJM -L "elapse=10:00"
#PJM --mpi "max-proc-per-node=4"  # Upper limit of number of MPI process created at 1 node
#PJM -x PJM_LLIO_GFSCACHE=/vol0004
#PJM -s

source ~/env/smilei_env

export PLE_MPI_STD_EMPTYFILE=off # Do not create a file if there is no output to stdout/stderr.
export OMP_NUM_THREADS=12
export OMP_SCHEDULE="static"

rm *.out.*
rm *.err.*

cp ~/smilei/develop-mat/smilei .
cp ../template.py input.py

# execute job
mpiexec -n 4 ./smilei input.py               # Execute with maximum number of available process

@mccoys mccoys closed this as completed Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
installation compilation, installation
Projects
None yet
Development

No branches or pull requests

3 participants