-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG/ISSUE] GCHP c48 runs on AWS within Docker container die within 1 hour #14
Comments
Same issue when running it natively on the AMI? |
Stop the instance, change its type to |
So I ran again in the container in r5.24xlarge and now I get this error:
... etc...
So it would appear to be an issue internal to MAPL. Or I might have run out of disk space. I requested 500 GB though. Also I had run in the AMI itself earlier at c48 and had similar crashes to the |
That's new message though:
Haven't seen this ever before... |
There are some references to this issue. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=20257 It was a bug in gfortran but was supposed to be fixed in 4.1. But who knows |
This issue seems to have been caused by an out-of-bounds error in the Olson landmap module, as described in geoschem/GCHP#13 (comment) |
Interesting! Why not happening in C24🤔 Can c48 run on AWS now? |
So what appears to be happening is that the Olson landmap is not getting read in properly. This is happening in the code where State_Met%LandTypeFrac is populated from the OLSON Pointers from ExtData. Not sure why this is happening but it may be a MAPL issue. The OLSON data is read in by the custom code in MAPL to read in fraction of grid box (the "F:int" feature). So while you can run on the cloud with the quick fix, I would avoid doing that until we understand the root cause of why the State_Met%LandTypeFrac is all zero. |
I ran a GCHP c48 run AWS cloud using
and it died after an hour.
In runConfig.sh:
The Docker commands were:
Tail end of log file:
I then commented out SpeciesConc_avg from the HISTORY.rc file and re-ran.
Now, the only diagnostic active was SpeciesConc_inst. This also died at 1 hour:
This message:
might be indicative of an out-of-bounds error, perhaps where we deallocate arrays (or fields of State_* objects).
The text was updated successfully, but these errors were encountered: