-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specified UNIT in FLUSH is not connected with GNU #2
Comments
@hga007 This is just trying to compile RT with GNU compiler. There might be a bug in cmeps cap that we want to fix since it is not working with GNU at this point. I wonder if you test cmeps cap with GNU compiler in your end? Probably, we did not test it before. This is not urgent but we would like to compile the RT with both GNU and Intel without any issue. |
@uturuncoglu: I use my JEDI stack-Spack (ifort) with UFS support on my computer. I don't have one for gfortran. So, all my runs are with ifort. I dislike gfortran very much because It doesn't work well with the TotalView debugger. They build the *.o files without full support, which makes it very difficult for them in the debugger. |
@hga007 Okay that explains why it is working. It would be nice to test with GNU to catch these kind of issues. Anyway, I'll look at it and update you. |
@hga007 I traced this little bit more and put print statements and here is the output,
The code enters |
@hga007 I put couple of print statement to |
@hga007 Okay. I found it. Following statement is changing the value. https://github.com/myroms/roms/blob/b7a47408ba224a4703d9122c33995eee3c5ed06c/ROMS/Utility/inp_par.F#L94 |
@hga007 Maybe we need to disable |
@hga007 BTW, I think that changing stdout in the |
@hga007 Okay. I could go further without
So, probably there is a bug in ROMS side about handling these unit numbers. I checked the unit number 77 and it appears in the code like following,
So, it seems that we have also issue with |
@uturuncoglu: Sorry, this week I am swamped. Andy Moore is visiting to work on very difficult ROMS-JEDI codes, and we have four days to make progress on various issues. The issue here is that we split the standard output for every coupled component in the native coupling scheme. Otherwise, tracking each component's very verbose standard output will take a lot of work. Thus, I open arbitrary Fortran units for the components I control and leave the stdout=6 to the atmosphere component. Is the UFS using those Fortran units? This logic always worked for me. Is it a compiler bug? The compiler may dislike initializing and rewriting variables in the module. Notice that I am not using parameter statements for those units. For years, we have done that with Ifort and gfortran. Since ifort will be deprecated this year, we are moving to ifx and icx. I started coding a ROMS-to-ROMS interpolator using the ESMF/NUOPC interpolation utility with RegridStore. I think I sent you an email about it, but I may have an additional question about the RouteHandle for 3D fields where the interpolation is level-by-level. |
@hga007 No worries. It is not urgent but needs to be resolved since ROMS coupling is not working with GNU compiler. I am not sure about component in UFS side that use those units but I think implementation needs to be prune from it. In ESMF there is a specific call to find the empty unit number. I think rather than using separate files like log.roms and log.coupler etc., the NUOPC cap needs to direct all the log to PET files using ESMF calls. This will eliminate these kind of issues that we could have in the future. As I see from the implementation, the log units are changed outside of the cap. It might be nice to move those to the cap since it is related with the coupling. I think this is not a compiler bug and it seems that a logical issue in the management of this files and also only GNU catches it. I think ROMS mode internally should use stdout (6) for all the log and then cap needs to use ESMF call to write the log to PET files. This will split the log with the model and the coupling interface. Sorry, I missed you mail and forgot to replay. I'll directly reply to your mail regarding to use ESMF for the pre/post processing. |
Here is the update from Hernan:
So, I'll try that branch and let him know about the results. |
@ufuk: I updated to the latest UFS-coastal and I got the following compiling error with gfortran:
If we replace /bin/gfortran with mpif90 in that command, it compiles because the wrapper knows where to find mpi.mod. Is there an error somewhere in the CMake files? |
@hga007 Thanks for testing. Let me checkin my side. I am syncing ufs coastal with ufs weather model at this point. Once that is done. I'll work on this. I'll keep you updated. |
@hga007 BTW, I think you are testing in your local cluster. Right? I am wondering if I could reproduce it on any other supported platform like Orion or Hercules. |
This is bizarre. Why does the compiler that we use affect the ESMF clock configuration? It doesn't make any sense!
I don't have more time to look into this, and I am running out of ideas. |
@hga007 Sorry. I could not get back to this. I am trying to sync ufs-coastal and fix couple of issues at this point. I don't this it is issue related with compiler or ESMF. There might be bug in ROMS side that is appearing under GNU. Once I finalize the work, I'll look at ROMS as a next thing. I could also test this in already supported system, so we could see what happens on Hercules. I also noticed that even if files are fine and checked with NCAR's cprnc tool, the baseline is still failing when it is checked with nc compare tool. So, I also need to look at that one. |
@hga007 I test your branch on MSU's Hercules using GNU compiler. It compiles without any issue in my case. So, the error that you get with mpi use statement could be specific in your environment. I am not sure what is wrong at this point but the run crashes with the following error,
and the last log that I am seeing in out file is I also tried to run with Intel compiler to be sure it is working with Intel. This is also failed with same error. At least, following hash (currently used by ufs-coastal) is fine with Intel.
So, let me know if I need any change to reproduce your error. Maybe I could fix it if I could reproduce the issue. |
@uturuncoglu: The coupling with gfortran compiles and runs with the old (Oct 2023) and new versions of ufs-coastal. My issues were solved by cleaning the YAML parser while compiling aggressively with debugging -g flags. The time clock mismatch was solved by initializing a logical variable that is false in ifort and true in gfortran. Uninitialized logical switches in gfortran have peculiar behavior. Since our Spack-Stack uses openMPI 5, we can't debug gfortran in TotalView. It was so annoying, and I had to put many PRINT statements in place to solve the issues I was having. The old version of the UFS runs successfully to the end with both gfortran and ifort for CDEPS and CMEPS configurations. However, ROMS blows up after 300 timesteps with the new UFS code for all cases with both compilers. Something has changed in the configuration. I am still using the Oct 2023 configuration to run successfully. I have not figured out what changed since I didn't run your regression scripts. If you regenerate those scripts (datm_in, datm.streams, model_configure, and ufs.configure) I can try again with the new UFS code. We need better descriptions of those configuration scripts for some of us who are not interested in running the regression test but are interested in applications outside of NOAA computers. The build_ufs.csh or build_ufs.sh we created is much better and works well for us. I have yet to load all those changes to the github.com/myroms/roms develop branch. They are still in feature/stdinp if you want to try it. Since October 2023, the UFS has undergone many changes, and I need help tracking what is relevant to the ROMS applications. |
@hga007 okay. Thanks. Let me try with the branch again to see what happens. I wonder if I need to put any extra cpp flag in the UFS level cmake build layer? |
@hga007 I am still getting following error with your branch,
Anyway, if you would like to try with newer version of the ufs-coastal, we could meet and discuss about it but at this point GNU is not working for ROMS under ufs-coastal. |
@uturuncoglu, Would you happen to have more information about that error? Maybe you'll need to compile full debugging flags to get more information. I don't get that error at all with gfortran. Yes, we can chat to see if we can figure out how to resolve this problem. |
@hga007 There is no any other error message. Next week Wednesday afternoon around 1PM MT is fine for me. If it is okay, I could setup a call. |
@uturuncoglu: Yes, it works for me. Thank you! |
@hga007 Okay. I have just sent the invitation. |
@hga007 Okay. I am back to this issue. I have just tested ROMS head of develop and it is working without any issue with Intel. I'll tested your branch https://github.com/myroms/roms/tree/feature/stdinp with Intel first to see what happens. If it works, I'll test with GNU. |
@hga007 I am looking differences in this branch (feature/stdinp) and it seems that it has lots of changes that are not related with the log issue. Is it possible to have light weight branch that has only fix for the issue that we are seeing under GNU. BTW, we could isolate the issue if we have under UFS Coastal. Let me know what you think? If you point me the files that you fixed for the issue, I could try to create another branch based on develop and test it in my side. |
@uturuncoglu: The changes are due to GNU to fix the issue with the standard input unit that works with gfortran now. It was more involved than I thought. |
@hga007 It would be nice to split developments to meaningful chunks to find the issue. Maybe the problem that you see is not related with the fix that we need for this issue. So, rather than having a brach that modifies 111 files. Let's do it in small chunks. |
@hga007 Okay. If all those changes are required for GNU fix then let me try one more time under UFS Coastal to see what happens. |
@hga007 I think I don't need to add any specific switch to compile for this fix but let me know if I need. I am building model with |
@hga007 Still same, your fix branch is giving following error and dies,
The last thing that I am seeing is that it is entering |
@hga007 I am compiling with |
@uturuncoglu: I cannot do that now because it will break a lot of stuff. Most of the changes are due to the intrinsic FLUSH routine related to how the standard input is managed in ROMS, which is associated with the fix for gfortran. It cannot be split. Unless we forget about gfortran and force ROMS not to support it, which is radical, we cannot do it for the UFS. This issue is associated with the coupling strategy that does not use the native framework. I convinced myself that the issues I had were due to multiple changes in CMEPS that you cannot revert. It concerns the interpolation next to the land/sea mask. It works for me for the much older UFS version that I saved. I spent days in the debugger and compared the input with the ones you provided, but I had yet to be successful. It is a peculiar problem. I can tell you that my changes work for me, but not for you. There may be something that our computers ignore, but you get issues with your computers. I can try again to see if that is still the case. Like I said, it has been a while since I've tried. |
@hga007 I think I got the trace by runnit with DDT. Here is the one,
It seems that it is failing on |
@hga007 There is a |
@uturuncoglu: Okay, I will look at it in the TotalView debugger. |
Hi Ufuk,
I try in the debugger, to see what is going on. There is something that we can try by commenting that CALL to mp_bcast. Use attached file. Then, we take it from there. Perhaps, this call is not needed. It is intended to report failure to creating the standard output file.
Here we are creating the file log.roms where the standard output is written. It is only be supposed be used only by the Master PET. Please recompile to check if this solves your problem..
Thanks, H
From: Ufuk Turunçoğlu ***@***.***>
Date: Friday, July 5, 2024 at 6:28 PM
To: oceanmodeling/roms ***@***.***>
Cc: Hernan Arango ***@***.***>, Mention ***@***.***>
Subject: Re: [oceanmodeling/roms] Specified UNIT in FLUSH is not connected with GNU (Issue #2)
@hga007<https://github.com/hga007> There is a CALL mp_bcasti (1, 1, io_err) call in that line which I don't know what is doing. At least we have some lead in here.
—
Reply to this email directly, view it on GitHub<#2 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFP6TAHHH6NHQ5LCRWL6MCLZK4MW5AVCNFSM6AAAAABDA6PNAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJRGQ3TCMJQGQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
-- adding last mail from Hernan, I made stdinp_mod.F and stdout_mod.F more robust. I removed the mp_bcast that you were having trouble with. Please give it a try with both ifort and gfortran. I think that it will work for you now. Good luck, H |
@hga007 Good news. Intel is passing with your branch ( |
@hga007 I have just checked and GNU compiler is also fine now under UFS. I also created baseline for GNU on Hercules and it is reproducing itself. So, I think |
@janahaddad @pvelissariou1 @saeed-moghimi-noaa @hga007 I have just sync the ROMS fork under UFS Coastal and pointing the head of develop that has GNU fix. I also added extra ROMS GNU test to |
@hga007 As a part of debugging #1, I decided to run the same configuration in different machine (Hercules). The configuration crash with same way using Intel compiler. Then, I tried to compile the configuration with GNU to see what happens but the code crashed with following error,
I checked it and it seems it is related
stdout
file. It writes thelog.roms
but then fails with this error. I just wonder if you tried to run this configuration in your end with gnu compiler?BTW, my run directory on Hercules: /work2/noaa/stmp/tufuk/stmp/tufuk/FV3_RT/rt_3847567/coastal_irene_atm2roms_gnu
The text was updated successfully, but these errors were encountered: