New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: Model/calculate: Piecewise branching performance and extrapolation #431
FIX: Model/calculate: Piecewise branching performance and extrapolation #431
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #431 +/- ##
===========================================
+ Coverage 90.30% 90.48% +0.18%
===========================================
Files 50 50
Lines 7774 7871 +97
===========================================
+ Hits 7020 7122 +102
+ Misses 754 749 -5
... and 3 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
I didn't manage to get to root cause, but I did verify that you can unwrap Piecewise for every energy contribution except the ordering contribution without hitting the Mac-only failure in I have added a test, |
@bocklund ping |
One hesitation I have for this PR is that the mixed behavior between expressions that do extrapolate outside of temperature limits automatically and expressions that don't will make it more difficult to debug cases where users are doing calculations outside of temperature limits. I think this PR is valuable and probably the sensible solution is to extrapolate everywhere. Is it too big of an effort or scope change to do that here? Does it need a separate issue? |
For adjusting extrapolation behavior, there are two approaches:
|
@bocklund This PR now extrapolates all temperature bounds, including multi-branch piecewise expressions. This uses the Model-based extrapolation approach referenced above. It ended up being a smaller delta than I expected, and the performance seems to be fine (though feel free to test on complex databases). The small performance impact makes sense to me in retrospect, as we expect fewer than 100 You can look at the added test to see how you could test the performance while staying on the same branch. In practice, I'm not sure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's here seems reasonable to me. Although I agree that it's technically correct, I haven't seen a use case in the wild that used or relied on temperature limits in a useful way. I'm not too worried about going out of our way to support fully correct non-extrapolation for now (YAGNI).
Complex multi-sublattice phases with lots of parameters and using the magnetic ordering model can challenge pycalphad's Just-In-Time (JIT) compiler, especially for computation of second derivatives. In the worst case, the compiler will hang indefinitely and consume RAM until the process is killed.
The reason for this is the increase in algorithmic complexity that comes from having deep piecewise temperature branching in the
Model
object's representation of the Gibbs energy. However, a common case for the piecewise parameter description is that there is really only one nonzero "branch" for the entire temperature range. While TDB formally wraps every parameter in a piecewise, theModel
object is free to discard trivial branches at build time. That is the approach used in the patch for this PR.This PR includes a test for such a difficult case, where the
Model
object for the corresponding phase has a Gibbs energy Hessian that cannot be built by thedevelop
JIT compiler. The patch is able to reduce the number of Piecewise nodes in the Gibbs energy's abstract syntax tree by more than half. For the sake of efficiency, instead of a full correctness test we only test that the number of nodes is reduced by half.In addition, this PR includes a change to the point sampling algorithm in
calculate
. Currently the sampler (whenfixed_grid
isTrue
) tries to add additional points between all pairwise combinations of endmembers. For certain multi-component, multi-sublattice phases, there can be thousands of endmembers and, thus, millions of endmember pairs. The proposed change detects this case; when there are more than 100,000 points to be added, the algorithm only adds up to the maximum specified by the constant. All endmembers are still added, and this change does not affect the random sampling portion of the algorithm.Finally, in addition to trivial branch elimination, the
Model
class is now updated to extrapolate the lower and upper temperature bounds of all piecewise expressions to negative and positive infinity. This brings pycalphad into line with how TC will extrapolate outside of temperature bounds for parameters. Note that limits specified by theTEMPERATURE-LIMITS
TDB command are still not enforced and it is still possible for users to compute in non-physical regions of a database, but this was always possible; this change will allow for compatibility with a greater number of legacy databases that rely on the extrapolation behavior.For certain pathological cases, this will reduce memory consumption by over 90% and resolve classes of memory errors for users attempting multi-component calculations with complex databases.