New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include PET number in error messages if using ESMF #925
Conversation
Signed-off-by: Lizzie Lundgren <elundgren@seas.harvard.edu>
@jimmielin, this PR prints core number when GEOS-Chem encounters an error, but my implementation is only for MAPL/ESMF. Would it be helpful to make this usable by other MPI models too, e.g. WRF? |
Hi Lizzie, Yes, I can add this functionality for WRF and CESM. Which would be the best way to add some code to a PR I don’t own? Sorry I’m a bit unfamiliar with this situation… Another note is that I vaguely remember that Input_Opt stores the CPU ID in a GC run with MPI. It would be nice to pull the ID from there, as then ESMF/WRF/CESM specific code can be abstracted away from GC_ERROR and put in the coupler which feeds the CPU ID into Input_Opt. Would this be possible? I remember passing in this parameter to GEOS-Chem when coupling to WRF, but it was a long time ago. Thanks! |
Thanks for the extremely fast response! Yes, I thought of passing Input_Opt but wanted to get in a quick fix for GEOS. Passing in Input_Opt is more involved since it will need to be added to the argument list everywhere, hence I just extracted it locally the way it is set for MAPL/ESMF. How are you setting it for WRF? Since GC_ERROR is only called when there is a problem it seems preferable to extract it if it is called rather than add arguments everywhere, but I'm open to a counter-opinion on that. |
Thinking more about this, maybe we could use pure MPI code rather than ESMF/MAPL wrapper code that probably does this under the hood. |
I'm setting this in WRF using WRF's own routines. I think CESM does the same. So it might be useful to find a pure MPI code, then decide on the code path based on the WRF: use module_dm
! WRF DM (MPI) Parallel Information - is master process?
logical :: Am_I_Root
integer :: WRF_DM_MyProc, WRF_DM_NProc, WRF_DM_Comm
if(wrf_dm_on_monitor()) then
Am_I_Root = .true.
else
Am_I_Root = .false.
endif
call wrf_get_nproc(WRF_DM_NProc)
call wrf_get_myproc(WRF_DM_MyProc)
call wrf_get_dm_communicator(WRF_DM_Comm)
! Pass some HPC Information to Input_Opt...
Global_Input_Opt%isMPI = .true.
Global_Input_Opt%amIRoot = Am_I_Root
Global_Input_Opt%thisCPU = WRF_DM_MyProc
Global_Input_Opt%numCPUs = WRF_DM_NProc
Global_Input_Opt%MPIComm = WRF_DM_Comm CESM-GC: use spmd_utils, only : MasterProc, myCPU=>Iam, nCPUs=>npes
Input_Opt%thisCPU = myCPU
Input_Opt%amIRoot = MasterProc |
Thanks for including those. I'm going to make a feature request to expand including PET number in error messages to if using MPI but not ESMF. |
This is a companion PR to go with updates to HEMCO to make error handling more clear and robust when using MPI. However, this update can also be applied without those HEMCO updates. The update for GEOS-Chem is simply to write the CPU number when writing error messages. If all threads print the same traceback via GC_ERROR then the many messages can be more easily parsed by isolating prints for a single CPU. If the error occurs on only one thread, or a subset of threads, this will more easily identify that.