Make sure no two evaluators evaluate the same field #399

bartgol · 2018-12-07T15:39:51Z

Following trilinos/Trilinos#4005, we should now make sure that each field is evaluated by exactly one evaluator (1). This is currently a problem in at least Aeras and LandIce. It's going to take some work to clean these up, but I do think it's worth it. It was me who opened the issue requesting this feature, since I already spent 4h of debugging (twice!) to find out I had two evaluators evaluating the same field.

In most cases (and I think all the failing tests on the dashboard that fail for this reason), the two evaluators are in fact two copies of the same evaluator. In this case, it would be safe to overlook the repetition, and allow PHX to discard one of the two. However, there are cases (as I said, it happened to me twice), where two different evaluators evaluate the same field. Perhaps you refactored the code, but left a piece of the old one in, and now you have ev1 and ev2 both evaluating the same field. Or even worse: you always had two evaluators evaluating almost the same field: the two fields just differ because of their dimensions, but changing mesh element type (or the vector dimension) will make them clash, and you would not notice it.

In short, I think that, despite being annoying to fix things that basically worked, this is after all a useful improvement to the library. I will take care of fixing all the LandIce tests in the next few days. @ikalash do you think you have time to look at the Aeras ones? If it is the same source of error as in LandIce, there is probably an interpolation evaluator that is registered multiple times.

(1) Note: contributed fields are ok.

jewatkins · 2018-12-07T16:47:43Z

It looks like this was the issue I was having in #392.

ikalash · 2018-12-07T18:11:31Z

Yes I can look at Aeras. Seeing what you did @bartgol for LandIce should be helpful.

A more general comment: I think we need to fix the nightlies before we proceed with the Albany refactor. I hope to make some progress on this next week. I am sorry to keep holding things up with the refactor... but the testing is important and the more code we push w/o fixing lingering issues, the more difficult it will be to fix things later.

bartgol · 2018-12-10T16:15:44Z

I'm working at fixing LandIce_StokesFOBase.hpp, which is where the evaluators that cause problems are registered. I hope to put a mini-PR up in the next few days.

ikalash · 2018-12-10T18:15:22Z

Thanks @bartgol . I am looking at this now. Can you please expand on what you think is the problem when the error messages such as the following, where ev1 = ev2 in your notation:

Error: PHX::DagManager::registerEvaluator() - The field "Flow State:double:<Cell,Node,Dim>(150,9,4)" that is evaluated by the evaluator named "Gather Solution<Residual>" is already evaluated by another registered evaluator named "Gather Solution<Residual>".

or

 Error: PHX::DagManager::registerEvaluator() - The field "Coord Vec:double:<Cell,QuadPoint,Dim>(750,8,3)" that is evaluated by the evaluator named "MapToPhysicalFrame<Residual>" is already evaluated by another registered evaluator named "MapToPhysicalFrame<Residual>".

You are saying we must have duplicate evaluators without realizing it?

ikalash · 2018-12-10T18:22:04Z

This affects also some LCM, ATO and SCOREC tests. Underlying them are issues with mechanics-like problem dependencies.

ikalash · 2018-12-10T18:35:33Z

@bartgol never mind my previous question - I figured it out. The fixes should be straightforward I think for LCM and hopefully Aeras.

3 LCM tests still fail due to DAG problems: 69 - MechanicsWithHydrogenParallel_SERIAL (Failed) 70 - MechanicsWithHydrogenOrthogonal_SERIAL (Failed) 71 - MechanicsWithTemperatureTransientHeat2D (Failed)

This should fix all ShallowWater problems except: Aeras_TC5_noAD_RungeKutta4

Should fix all failing Aeras::XZHydrostatic tests.

bartgol · 2018-12-10T20:07:12Z

If the two evaluators are the same, the issue is more like a warning: in practice, we could ignore it. However, there's a risk that a more real problem could sneak in, in which different evaluators evaluate the same field (which is more likely to be wrong).

I'm taking care of LandIce, so if you take care of Aeras and LCM, we should be good.

Aeras::Hydrostatic tests should be back up now.

All Aeras tests should be running correctly from now on.

Should fix MechanicsWithTemperatureTransientHeat2D problem.

ikalash · 2018-12-11T03:41:08Z

Ok, I have fixed the Aeras tests and most of the LCM tests, but I am having some difficulty with the ATO tests. The following tests fail due to this issue:

ATO:FixedField
ATO:Homogenize_2D
ATOT:Homogenize_2D
ATO:MultiPhys_Homogenize_2D
ATOT:MultiPhys_Homogenize_2D
ATO:2Matl_Homog
ATOT:2Matl_Homog

For the first one, the error is as follows:

 Error: PHX::DagManager::registerEvaluator() - The field "Rho0:double:<Cell,Node>(47,8)" that is evaluated by the evaluator named "Gather Nodal Parameter" is already evaluated by another registered evaluator named "Gather Nodal Parameter".

(http://cdash.sandia.gov/CDash-2-3-0/testDetails.php?test=4138034&build=79111). I thought the issue was multiple instances of

ev = Teuchos::rcp(new PHAL::GatherScalarNodalParameter<EvalT,PHAL::AlbanyTraits>(*p, dl) );

but I'm only finding one such call in ATO_Utils_Def.hpp.

For the rest of them, it is this:

 Error: PHX::DagManager::registerEvaluator() - The field "Stress xx_ave :double:<Dummy>(0)" that is evaluated by the evaluator named "Save Field Stress to Cell State Stress xx_ave Residual" is already evaluated by another registered evaluator named "Save Field Stress to Cell State Stress xx_ave Residual".

(http://cdash.sandia.gov/CDash-2-3-0/testDetails.php?test=4138040&build=79111) . I thought the issue was multiple instances of SaveCellStateField calls in LinearElasticityProblem.hpp, but I'm not seeing the duplication. Also the string in the name seems to be messed up (Stress xx_ave).

@jrobbin , as the owner of ATO, if you'd like to keep ATO in the main branch of Albany, can you please have a look at and try to fix these issues?

ikalash · 2018-12-11T18:54:31Z

@mperego : could you please look at resolving this issue for the following tests, which I believe you were the original author of:

SteadyHeatConstrainedOpt2D_Conductivity_Dist_Param
SteadyHeatConstrainedOpt2D_Conductivity_Dist_ParamT

The problem is that Thermal Conductivity is evaluated by the call:

ev = evalUtils.constructGatherScalarNodalParameter(stateName,fieldName);

as well as

  ev = Teuchos::rcp(new PHAL::ConvertFieldTypePSTtoST<EvalT,PHAL::AlbanyTraits>(*p));

(the second call is supposed to convert the field from ParamScalarT to ScalarT). I thought a simple fix would be to feed ConvertFieldTypePSTtoST and Input and output field name, giving the ScalarT version of "Thermal Conductivity" something else, but there was a similar error that the field was evaluated by multiple evaluators.

ikalash · 2018-12-13T16:44:02Z

@bartgol that's great, thanks Luca!

jewatkins · 2018-12-13T16:45:36Z

@bartgol Any progress on the LandIce tests? I was going to run some GIS cases. I could probably take a look at them but I didn't want to work on something you may already have a fix for.

This should take care of all the ATO tests.

bartgol · 2018-12-13T17:01:04Z

@jewatkins I just fixed the ATO tests, and was about to switch to the LandIce ones. I have a branch for that, but things are a bit more complicated there, due to the recent refactor in the problem classes. I am working at a fix, which I think is a flexible and general solution going forward. But it may take another few days.

In the short term, you can rebuild trilinos setting

-D Phalanx_ALLOW_MULTIPLE_EVALUATORS_FOR_SAME_FIELD:BOOL=OFF

This should not cause your whole trilinos to rebuild. Luckily, you should only get a few phalanx files to rebuild, so it's not a terrible short-term solution.

jewatkins · 2018-12-13T17:03:57Z

@bartgol Okay thanks!

Now problems need to specify some info about the fields for which they need interpolation utility evaluators. The StokesFOBase class then proceeds to automatically construct all these evaluators. This allow to build interpolation utilities only once, needed after recent changes in Phalanx (see issue sandialabs#399(.

ikalash · 2018-12-20T22:22:40Z

Important correction to one of the posts above: to allow for multiple evaluators one actually wants to set

-D Phalanx_ALLOW_MULTIPLE_EVALUATORS_FOR_SAME_FIELD:BOOL=ON

mperego · 2018-12-21T06:45:16Z

Ah, thanks, I overlooked that as well.

bartgol · 2019-01-11T19:28:58Z

This should have been fixed by #412 . Reopen if nighties disagree.

ikalash · 2019-01-12T18:26:32Z

It looks like LandIce tests are failing still with this PR going in on my workstation, e.g.,

http://cdash.sandia.gov/CDash-2-3-0/viewTest.php?onlyfailed&buildid=80266

Also all the CISM-Albany tests are failing as well:

http://cdash.sandia.gov/CDash-2-3-0/viewTest.php?onlyfailed&buildid=80265

It's somewhat odd b/c on other machines things seem to be OK, though there are some spot failures as well, e.g.,

FO_GIS_GisCoupledThickness
FO_GIS_GisCoupledThicknessTpetra
FO_Thermo

in the AlbanyIntel build (http://cdash.sandia.gov/CDash-2-3-0/viewTest.php?onlyfailed&buildid=80276).

bartgol · 2019-01-14T18:08:52Z

It is odd. It may be related to the funky X_DEPENDS_ON_Y options. I did not fully tested them on Friday; I tested them in December, and I naively thought the merge of master into the branch would not change the behavior. Maybe I was wrong. I'm checking right now on my machine.

ikalash · 2019-01-14T18:30:42Z

I don't believe I have that option on yet (what is it again?). I can try to turn it on to see what happens. Also, if you have SRN access, I can give you access to my machine where the nightlies run so you can have a look for yourself at what might be going on.

bartgol · 2019-01-14T18:38:28Z

This build has "-DENABLE_PARAMETERS_DEPEND_ON_SOLUTION:BOOL=ON". I haven't checked the other builds, I thought I'd start with the first one you linked. I don't think it mattered though (on my workstation it passes).

I used gcc 7.2 though. I need to check what happens with 8.2, like in your build. I doubt that really matters, but who knows.

bartgol · 2019-01-14T18:39:47Z

Also, I do have SRN access. If you give me access to the machine, I can have a direct look.

bartgol · 2019-01-14T18:45:45Z

Oh DANG! For some reason on my machine I reconfigured trilinos to allow multiple evaluators to evaluate the same field. That's why everything passed. I'm rebuilding trilinos, and testing again. Sorry about that.

ikalash · 2019-01-14T19:02:23Z

That would explain it. No worries, it happens :). I would think you should be able to reproduce the problem w/ gcc 7 but in case it is needed in the future, I have given you an account on my workstation, camobap.ca.sandia.gov. Please go ahead and try to log on. Your password is your kerberos. The nightlies can be found here: /home/ikalash/nightlyCDash.

bartgol · 2019-01-17T00:49:16Z

Just an update: it's taking me a bit longer than I thought. However, I think some changes I'm making will be necessary when (if) we start putting multiple physics (velocity, temperature, hydrology, thickness,...) together in the same execution. Right now we have some fixes, that allow two specific sets of physics to run together, but things are a bit shaky, and especially when activating the funky X_depends_on_Y options, small changes can introduce quite a big headache.

I am currently trying to make the StokesFOBase class capable to handle coupled physics scenarios, while also being compliant with the recent change in Phalanx.

ikalash · 2019-01-17T16:59:06Z

Thanks for the update. Do you have an ETA for fixing this? I am wondering in particular if I should set -D Phalanx_ALLOW_MULTIPLE_EVALUATORS_FOR_SAME_FIELD:BOOL=ON in my nightlies until this gets fixed so that new failures do not slip through the cracks.

bartgol · 2019-01-17T21:06:47Z

Yeah, you can do that for now. We can set it back to OFF later.

ikalash · 2019-01-18T03:34:58Z

Ok, I set it to "ON" for now.

bartgol self-assigned this Dec 7, 2018

bartgol added build Phalanx Stuff related to Phalanx infrastructure labels Dec 7, 2018

bartgol added this to the Clean build milestone Dec 7, 2018

bartgol changed the title ~~Make sure~~ Make sure no two evaluators evaluate the same field Dec 7, 2018

ikalash self-assigned this Dec 10, 2018

ikalash added a commit that referenced this issue Dec 10, 2018

Aeras: progress towards issue #399.

de2fc36

This should fix all ShallowWater problems except: Aeras_TC5_noAD_RungeKutta4

ikalash added a commit that referenced this issue Dec 10, 2018

Aeras: progress towards issue #399.

14f77f4

Should fix all failing Aeras::XZHydrostatic tests.

ikalash added a commit that referenced this issue Dec 10, 2018

Aeras: progress towards issue #399.

2f648a9

Aeras::Hydrostatic tests should be back up now.

ikalash added a commit that referenced this issue Dec 10, 2018

Aeras: progress towards issue #399.

dbf8b1d

All Aeras tests should be running correctly from now on.

ikalash added a commit that referenced this issue Dec 10, 2018

Aeras: progress towards #399.

c8b820b

ikalash added a commit that referenced this issue Dec 10, 2018

LCM: progress towards #399.

2c1dea9

Should fix MechanicsWithTemperatureTransientHeat2D problem.

ikalash assigned jrobbin Dec 11, 2018

ikalash assigned mperego, ikalash, jrobbin and bartgol and unassigned bartgol, ikalash and jrobbin Dec 11, 2018

bartgol added a commit that referenced this issue Dec 13, 2018

Progress towards fixing #399

adc72de

This should take care of all the ATO tests.

bartgol mentioned this issue Dec 18, 2018

Fix 399 #412

Merged

bartgol closed this as completed Jan 11, 2019

ikalash reopened this Jan 12, 2019

bartgol mentioned this issue Jan 24, 2019

Fix 399 part 2 #424

Merged

bartgol closed this as completed in #424 Jan 28, 2019

ikalash mentioned this issue Apr 29, 2020

Floating point exceptions in Albany LCM material models showing up in Clang build sandialabs/LCM#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure no two evaluators evaluate the same field #399

Make sure no two evaluators evaluate the same field #399

bartgol commented Dec 7, 2018

jewatkins commented Dec 7, 2018

ikalash commented Dec 7, 2018

bartgol commented Dec 10, 2018

ikalash commented Dec 10, 2018

ikalash commented Dec 10, 2018

ikalash commented Dec 10, 2018

bartgol commented Dec 10, 2018

ikalash commented Dec 11, 2018

ikalash commented Dec 11, 2018

ikalash commented Dec 13, 2018

jewatkins commented Dec 13, 2018

bartgol commented Dec 13, 2018

jewatkins commented Dec 13, 2018

ikalash commented Dec 20, 2018

mperego commented Dec 21, 2018 via email •

edited

Loading

bartgol commented Jan 11, 2019 •

edited

Loading

ikalash commented Jan 12, 2019

bartgol commented Jan 14, 2019

ikalash commented Jan 14, 2019

bartgol commented Jan 14, 2019

bartgol commented Jan 14, 2019

bartgol commented Jan 14, 2019

ikalash commented Jan 14, 2019

bartgol commented Jan 17, 2019

ikalash commented Jan 17, 2019

bartgol commented Jan 17, 2019

ikalash commented Jan 18, 2019

Make sure no two evaluators evaluate the same field #399

Make sure no two evaluators evaluate the same field #399

Comments

bartgol commented Dec 7, 2018

jewatkins commented Dec 7, 2018

ikalash commented Dec 7, 2018

bartgol commented Dec 10, 2018

ikalash commented Dec 10, 2018

ikalash commented Dec 10, 2018

ikalash commented Dec 10, 2018

bartgol commented Dec 10, 2018

ikalash commented Dec 11, 2018

ikalash commented Dec 11, 2018

ikalash commented Dec 13, 2018

jewatkins commented Dec 13, 2018

bartgol commented Dec 13, 2018

jewatkins commented Dec 13, 2018

ikalash commented Dec 20, 2018

mperego commented Dec 21, 2018 via email • edited Loading

bartgol commented Jan 11, 2019 • edited Loading

ikalash commented Jan 12, 2019

bartgol commented Jan 14, 2019

ikalash commented Jan 14, 2019

bartgol commented Jan 14, 2019

bartgol commented Jan 14, 2019

bartgol commented Jan 14, 2019

ikalash commented Jan 14, 2019

bartgol commented Jan 17, 2019

ikalash commented Jan 17, 2019

bartgol commented Jan 17, 2019

ikalash commented Jan 18, 2019

mperego commented Dec 21, 2018 via email •

edited

Loading

bartgol commented Jan 11, 2019 •

edited

Loading