POD Reduced Basis Surrogate Example Fails in Parallel #27698
Unanswered
rfryeSigma
asked this question in
Q&A Modules: General
Replies: 1 comment
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The POD Reduced Basis Surrogate Example fails when run in parallel.
The example code in
moose/modules/stochastic_tools/examples/surrogates/pod_rb/2d_multireg
has.i
files for a full_order model to run on 1000 samples, a trainer on 100 samples, and a surrogate emulating the full_order model on the same 1000 samples. When properly run the outputs of the full_order and the surrogate models agree to about 5 decimal digits.The example codes cannot be run in place using the code generated by the
Makefile
in the stochastic_tools module because theMaterialReaction
andPODSurrogateTester
codes are not included. They are provided only in thestochastic_tools/test
folder. I created a moose app withstork
, modified itsMakefile
forSTOCHASTIC_TOOLS := yes
and copied the supplementary test code to the app'sinclude
andsrc
folders. With these modifications, I was able to run all of the.i
files with a single processor on an M1 MAC and on an AWS Linux partition.When I try to run the trainer with
mpiexec -n 8
, it runs through the first half of the training and reports errors in the second half, I include the output from the end of the first half through the failure:This error does not show up when
num_rows
in the trainer is less than 15 or for certain other values like 50 and 60. Withnum_rows = 15
, it fails when run in parallel with 3 processors.Beta Was this translation helpful? Give feedback.
All reactions