Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coastal_ike_shinnecock_atm2sch2ww3 hangs with GNU compiler #3

Open
uturuncoglu opened this issue Mar 2, 2024 · 9 comments
Open

coastal_ike_shinnecock_atm2sch2ww3 hangs with GNU compiler #3

uturuncoglu opened this issue Mar 2, 2024 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@uturuncoglu
Copy link
Collaborator

The coastal_ike_shinnecock_atm2sch2ww3 configuration hangs with GNU compiler on Hercules. The coastal_ike_shinnecock_atm2sch test case is running without any issue. Example run directory is in /work2/noaa/stmp/tufuk/stmp/tufuk/FV3_RT/rt_2495275/coastal_ike_shinnecock_atm2sch2ww3_intel. I also tried to attach gdb to the hanged processes but that also hangs. It could be a system issue but needs to be investigated further.

@uturuncoglu
Copy link
Collaborator Author

uturuncoglu commented Mar 26, 2024

@josephzhang8 I also debug this one. It is hanging without any issue in error log and also I tried to attach the gdb to the processes but not any clue. I am just seeing following message in the standard out,

67:  B_JGS_BLOCK_GAUSS_SEIDEL is used but the Jacobi solver is not choosen
67:  Please set JGS_USE_JACOBI .eqv. .true.
68:  B_JGS_BLOCK_GAUSS_SEIDEL is used but the Jacobi solver is not choosen
68:  Please set JGS_USE_JACOBI .eqv. .true.

Do you have any idea? Thanks.

@josephzhang8
Copy link

This is message from wave module (WWM). Can u follow the instruction there in wwminput.nml?

@josephzhang8
Copy link

@uturuncoglu: for WWM, it's also best to enable init as zero flag: -finit-local-zero (for gnu)

@uturuncoglu
Copy link
Collaborator Author

@josephzhang8 Okay. Let me look at more carefully. We are using WW3 from UFS Weather Model and its build. So, I don't think there is an issue with the build and flags since it is already used by various applications without any issue. I'll update you if I find something new. Thanks for your help.

@uturuncoglu
Copy link
Collaborator Author

@josephzhang8 Okay. GNU is working fine but I have still issue with GNU+DEBUG combination. It seems that it is stacking outside of the SCHSIM. So, maybe Hercules is not the right platform to test this combination. Or maybe it is just too slow to see the progress. I also attached the gdb to the processes but there is no too much information. Some processes are stuck in the broadcast from ESMF. Anyway, I'll keep this issue open at this point and maybe I could try on another platform that supports GNU like NCAR's Derecho.

#9  0x0000000000abea8f in ESMCI::VMK::broadcast(void*, int, int) ()
#10 0x0000000000a09dde in ESMCI::broadcastInfo(ESMCI::Info*, int, ESMCI::VM const&) ()
#11 0x0000000000af48da in ESMC_InfoBaseSyncDo ()
#12 0x0000000000af60d4 in ESMC_InfoBaseSync ()
#13 0x0000000000821584 in __esmf_infosyncmod_MOD_esmf_infosyncgridcomp ()
#14 0x0000000000423296 in __esmf_attributemod_MOD_esmf_attributeupdategridcomp ()
#15 0x0000000000973748 in __nuopc_driver_MOD_consistentcomponentattributes ()
#16 0x0000000000973c29 in __nuopc_driver_MOD_loopmodelcompsattributeupdate ()
#17 0x0000000000975cd2 in __nuopc_driver_MOD_initializeipdv02p3 ()
#18 0x00000000009aa154 in __nuopc_driver_MOD_initializegeneric ()

@uturuncoglu
Copy link
Collaborator Author

Just current status of the issue: The configuration runs with GNU compiler on Hercules but GNU+DEBUG combinations seems hanging. So, this might be a platform issue. Will try another platform to see if I could reproduce over there or not.

@janahaddad
Copy link
Collaborator

@pvelissariou1 any update on the hotfix testing you did here: schism-dev#31

@uturuncoglu
Copy link
Collaborator Author

@pvelissariou1 @yunfangsun If you don't mind, could you test this configuration (coastal_ike_shinnecock_atm2sch2ww3) on Hera (I have no access to tat machine). It would be nice to run with both Intel and Gnu to see what happens. I tried on Derecho but I think GNU installation has some issue in there and UFS Weather Model just using Hera and Hercules for GNU testing. I'll also try to run on Orion in my side. BTW, you might want to sync input directory from Hercules since it has some changes.

@uturuncoglu
Copy link
Collaborator Author

@janahaddad @pvelissariou1 @yunfangsun rt.sh is not workin on Orion. This is probably introduced due to the OS/system update on Orion. Maybe it is not supported anymore not sure. I opened a ticket in UFS Weather Model side - ufs-community/ufs-weather-model#2365. If we could reproduce these errors, we might close this issue and open again if we have similar issue. My tests on Hercules just runs fine. Maybe Hera testing could give more insight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Backlog
Development

No branches or pull requests

3 participants