-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure no two evaluators evaluate the same field #399
Comments
It looks like this was the issue I was having in #392. |
Yes I can look at Aeras. Seeing what you did @bartgol for LandIce should be helpful. A more general comment: I think we need to fix the nightlies before we proceed with the Albany refactor. I hope to make some progress on this next week. I am sorry to keep holding things up with the refactor... but the testing is important and the more code we push w/o fixing lingering issues, the more difficult it will be to fix things later. |
I'm working at fixing |
Thanks @bartgol . I am looking at this now. Can you please expand on what you think is the problem when the error messages such as the following, where ev1 = ev2 in your notation:
or
You are saying we must have duplicate evaluators without realizing it? |
This affects also some LCM, ATO and SCOREC tests. Underlying them are issues with mechanics-like problem dependencies. |
@bartgol never mind my previous question - I figured it out. The fixes should be straightforward I think for LCM and hopefully Aeras. |
3 LCM tests still fail due to DAG problems: 69 - MechanicsWithHydrogenParallel_SERIAL (Failed) 70 - MechanicsWithHydrogenOrthogonal_SERIAL (Failed) 71 - MechanicsWithTemperatureTransientHeat2D (Failed)
This should fix all ShallowWater problems except: Aeras_TC5_noAD_RungeKutta4
Should fix all failing Aeras::XZHydrostatic tests.
If the two evaluators are the same, the issue is more like a warning: in practice, we could ignore it. However, there's a risk that a more real problem could sneak in, in which different evaluators evaluate the same field (which is more likely to be wrong). I'm taking care of LandIce, so if you take care of Aeras and LCM, we should be good. |
Aeras::Hydrostatic tests should be back up now.
All Aeras tests should be running correctly from now on.
Should fix MechanicsWithTemperatureTransientHeat2D problem.
Ok, I have fixed the Aeras tests and most of the LCM tests, but I am having some difficulty with the ATO tests. The following tests fail due to this issue: ATO:FixedField For the first one, the error is as follows:
(http://cdash.sandia.gov/CDash-2-3-0/testDetails.php?test=4138034&build=79111). I thought the issue was multiple instances of
but I'm only finding one such call in ATO_Utils_Def.hpp. For the rest of them, it is this:
(http://cdash.sandia.gov/CDash-2-3-0/testDetails.php?test=4138040&build=79111) . I thought the issue was multiple instances of SaveCellStateField calls in LinearElasticityProblem.hpp, but I'm not seeing the duplication. Also the string in the name seems to be messed up (Stress xx_ave). @jrobbin , as the owner of ATO, if you'd like to keep ATO in the main branch of Albany, can you please have a look at and try to fix these issues? |
@mperego : could you please look at resolving this issue for the following tests, which I believe you were the original author of: SteadyHeatConstrainedOpt2D_Conductivity_Dist_Param The problem is that Thermal Conductivity is evaluated by the call:
as well as
(the second call is supposed to convert the field from ParamScalarT to ScalarT). I thought a simple fix would be to feed ConvertFieldTypePSTtoST and Input and output field name, giving the ScalarT version of "Thermal Conductivity" something else, but there was a similar error that the field was evaluated by multiple evaluators. |
@bartgol that's great, thanks Luca! |
@bartgol Any progress on the LandIce tests? I was going to run some GIS cases. I could probably take a look at them but I didn't want to work on something you may already have a fix for. |
This should take care of all the ATO tests.
@jewatkins I just fixed the ATO tests, and was about to switch to the LandIce ones. I have a branch for that, but things are a bit more complicated there, due to the recent refactor in the problem classes. I am working at a fix, which I think is a flexible and general solution going forward. But it may take another few days. In the short term, you can rebuild trilinos setting
This should not cause your whole trilinos to rebuild. Luckily, you should only get a few phalanx files to rebuild, so it's not a terrible short-term solution. |
@bartgol Okay thanks! |
Now problems need to specify some info about the fields for which they need interpolation utility evaluators. The StokesFOBase class then proceeds to automatically construct all these evaluators. This allow to build interpolation utilities only once, needed after recent changes in Phalanx (see issue sandialabs#399(.
Now problems need to specify some info about the fields for which they need interpolation utility evaluators. The StokesFOBase class then proceeds to automatically construct all these evaluators. This allow to build interpolation utilities only once, needed after recent changes in Phalanx (see issue sandialabs#399(.
Important correction to one of the posts above: to allow for multiple evaluators one actually wants to set
|
Ah, thanks, I overlooked that as well.
|
This should have been fixed by #412 . Reopen if nighties disagree. |
It looks like LandIce tests are failing still with this PR going in on my workstation, e.g., http://cdash.sandia.gov/CDash-2-3-0/viewTest.php?onlyfailed&buildid=80266 Also all the CISM-Albany tests are failing as well: http://cdash.sandia.gov/CDash-2-3-0/viewTest.php?onlyfailed&buildid=80265 It's somewhat odd b/c on other machines things seem to be OK, though there are some spot failures as well, e.g., FO_GIS_GisCoupledThickness in the AlbanyIntel build (http://cdash.sandia.gov/CDash-2-3-0/viewTest.php?onlyfailed&buildid=80276). |
It is odd. It may be related to the funky X_DEPENDS_ON_Y options. I did not fully tested them on Friday; I tested them in December, and I naively thought the merge of master into the branch would not change the behavior. Maybe I was wrong. I'm checking right now on my machine. |
I don't believe I have that option on yet (what is it again?). I can try to turn it on to see what happens. Also, if you have SRN access, I can give you access to my machine where the nightlies run so you can have a look for yourself at what might be going on. |
This build has I used gcc 7.2 though. I need to check what happens with 8.2, like in your build. I doubt that really matters, but who knows. |
Also, I do have SRN access. If you give me access to the machine, I can have a direct look. |
Oh DANG! For some reason on my machine I reconfigured trilinos to allow multiple evaluators to evaluate the same field. That's why everything passed. I'm rebuilding trilinos, and testing again. Sorry about that. |
That would explain it. No worries, it happens :). I would think you should be able to reproduce the problem w/ gcc 7 but in case it is needed in the future, I have given you an account on my workstation, camobap.ca.sandia.gov. Please go ahead and try to log on. Your password is your kerberos. The nightlies can be found here: |
Just an update: it's taking me a bit longer than I thought. However, I think some changes I'm making will be necessary when (if) we start putting multiple physics (velocity, temperature, hydrology, thickness,...) together in the same execution. Right now we have some fixes, that allow two specific sets of physics to run together, but things are a bit shaky, and especially when activating the funky X_depends_on_Y options, small changes can introduce quite a big headache. I am currently trying to make the StokesFOBase class capable to handle coupled physics scenarios, while also being compliant with the recent change in Phalanx. |
Thanks for the update. Do you have an ETA for fixing this? I am wondering in particular if I should set |
Yeah, you can do that for now. We can set it back to OFF later. |
Ok, I set it to "ON" for now. |
Following trilinos/Trilinos#4005, we should now make sure that each field is evaluated by exactly one evaluator (1). This is currently a problem in at least Aeras and LandIce. It's going to take some work to clean these up, but I do think it's worth it. It was me who opened the issue requesting this feature, since I already spent 4h of debugging (twice!) to find out I had two evaluators evaluating the same field.
In most cases (and I think all the failing tests on the dashboard that fail for this reason), the two evaluators are in fact two copies of the same evaluator. In this case, it would be safe to overlook the repetition, and allow PHX to discard one of the two. However, there are cases (as I said, it happened to me twice), where two different evaluators evaluate the same field. Perhaps you refactored the code, but left a piece of the old one in, and now you have ev1 and ev2 both evaluating the same field. Or even worse: you always had two evaluators evaluating almost the same field: the two fields just differ because of their dimensions, but changing mesh element type (or the vector dimension) will make them clash, and you would not notice it.
In short, I think that, despite being annoying to fix things that basically worked, this is after all a useful improvement to the library. I will take care of fixing all the LandIce tests in the next few days. @ikalash do you think you have time to look at the Aeras ones? If it is the same source of error as in LandIce, there is probably an interpolation evaluator that is registered multiple times.
(1) Note: contributed fields are ok.
The text was updated successfully, but these errors were encountered: