Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Binary phase diagram mapping #209

merged 131 commits into from Jul 18, 2019

ENH: Binary phase diagram mapping #209

merged 131 commits into from Jul 18, 2019


Copy link

@bocklund bocklund commented Apr 18, 2019

This is a first look at an improved method of doing phase diagram mapping that I'm opening for some initial feedback.

This improves on the efficiency of the brute force equilibrium calculations that are implemented now for binary phase diagrams.

Main features

  • No regressions. Visually, this should produce exactly the same results as the current brute force method. It's just faster.
  • In theory (not implemented), every calculation temperature can be computed in parallel
  • Some (trivial and optional) connecting of ZPF boundaries from the same two phase region is supported. Includes ability to close miscibility gaps (with a straight line) if the compositions are within a tolerance.
  • Mappings in different regions (with different conditions) can be easily composed through the ZPFBoundarySets object.
  • We can (in principle) test the plotting algorithm now (by comparing ZPFBoundarySets or matplotlib line collections). No tests are implemented here.

Currently, all the code is in the pycalphad.mapping module, however since there are no theoretical regressions, I think we can just drop in replace the mapping calculation and plotting for binplot.
If we decide to move forward, I will reorganize the modules under pycalphad.plot. It just made sense to keep things separate while I was still developing the approach.


There are two main sources for the performance bump compared to the brute force method, where every broadcasted condition is calculated:

  1. We avoid almost all single phase calculations by finding two phase regions using a convex hull
  2. Each two phase region only is calculated one time

Therefore, the most significant speed bumps are going to come from systems that have wide regions of single phase or two phases present. The slowest will have many small two phase regions. Assuming convex hull calculations are much slower than equilibrium calculations, this should never be slower than the brute force method.

~Since equilibrium calculations are looped, there is some small performance bump on the table to run build_callables and pass around the results (Al-Zn goes from 19s to 15s). ~

There are some performance improvements and optimizations in this branch, with help from @richardotis.

  1. EquilibriumResult instead of Datasets are implemented now. There are no slow xarray objects in the core, and EquilibriumResult wraps the Dataset constructor so we can just replace it at the end. A to_xarray=True argument is added to equilibrium in case users want to control whether or not an xarray dataset is created.
  2. Removes chunking code (it was part of the above, it's basically never used since #101.
  3. lower_convex_hull now users EquilibriumResult, removes force_indep_align (@richardotis anything we need to do for this?)
  4. Overhead of calling equilibrium in a tight loop is bypassed by calling starting_point and equilibrium directly, using the grid that was used for the convex hull


All images from the new binplot_map.

Timings do not include the JIT time and are dependent on hardware. Callables are not pre-built and passed for these.


19s for binplot_map
4min 35s (275s) for binplot



1m, 41s for binplot_map


32s for binplot_map


1m, 14s for binplot_map

Further discussion

The main idea of the algorithm is the following:

  1. For each temperature:
  2. Compute a convex hull, set X=0
  3. Try to find a two phase region (miscibility gaps are detected by a site fraction tolerance between composition sets)
    I. If no two phase regions are found; T+=dT (go to 1)
    II. If a two phase region is found: set X=mean(composition sets)
  4. Perform an equilibrium calculation
    I. If one phase is found: X+=dX; go to 3.
    II. If two phases are found: Save the composition sets; X+=dX; go to 4

Two assumptions are made in several places in this approach:

  1. The diagrams are T-X
  2. Binary systems only are assumed

The T-X assumption is fairly core to the code here. If we wanted to do isothermal calculations, e.g. isothermal ternaries, that would require a new mapping calculation function and significant changes to the ZPFBoundarySets object (maybe a new object would be more appropriate?).

We could get rid of the binary assumption and enable plotting of isopleths (including pseudo-binaries!). This would require

  1. Supporting the ability to take in composition conditions as expressions with 1 composition degree of freedom.
  2. The composition used everywhere in this code no longer matches up to a specific component, but more of a "reaction coordinate" of 1D compositions.
  3. Math to project the equilibrium hyperplanes (and compositions) to the reaction coordinate.

Most of the plotting code (ZPFBoundarySets, TwoPhaseRegions) are relatively decoupled from the binary assumption, and it's just a matter of creating the CompSet2D objects in the mapping function to be in terms of the phases and compositions projected into the reaction coordinate.
If we eventually migrate to NP-conditions approach by following the ZPF lines, the code should look very similar and I think the changes would be fairly trivial to make.

@bocklund bocklund added this to the 0.8 milestone Apr 22, 2019
bocklund and others added 28 commits May 28, 2019
…r to resolve a SymEngine compatibility issue
@bocklund bocklund requested a review from richardotis Jun 11, 2019
Copy link

@richardotis richardotis commented Jun 12, 2019

Removing _merge_property_slices from should help test coverage a bit.

Copy link
Collaborator Author

@bocklund bocklund commented Jun 12, 2019

What about just removing dask altogether? I don't think anyone has been able to use it successfully in at least 2 years, since that's when #101 was opened.

Copy link

@richardotis richardotis commented Jun 12, 2019

Okay. If it's a small change it can happen here; otherwise it should be its own PR.

Copy link
Collaborator Author

@bocklund bocklund commented Jun 12, 2019

Dask/distributed/dill/cloudpickle are removed

Copy link

@richardotis richardotis commented Jun 20, 2019

Can we improve test coverage for ZPFBoundarySets at all? It's okay for most of the actual plotting functions to remain uncovered, since we don't need to test that matplotlib works, but there are several important-looking methods in there that could easily regress without tests.

Copy link
Collaborator Author

@bocklund bocklund commented Jul 18, 2019

@richardotis Tests for TwoPhaseRegion and ZPFBoundarySets added and appear to be on their way to passing

@bocklund bocklund merged commit d9d67eb into develop Jul 18, 2019
0 of 4 checks passed
richardotis pushed a commit that referenced this issue Jul 19, 2019
See #209 for a complete discussion of this PR. 

Core changes are as follows:
* Implementation of a mapping algorithm for binary phase diagrams
** Significant performance improvement due to the algorithm with no regressions. Visually, this should produce exactly the same results as the current brute force method. It's just faster.
** ZPFBoundarySets, BinaryCompset, and CompsetPair objects introduced to facilitate storing and transforming the composition sets corresponding to the zero phase boundaries that are calculated here.
* dask is now removed from pycalphad. It was rarely being used by anyone and according to #101, it was not able to be used successfully by default. This closes #101 and updates the parallelism section in the FAQ. Also makes debugging and the code simpler since we don't pass through a confusing dask redirection layer or delayed objects. Drops dask and dill from requirements.
* Creating and accessing xarray datasets has been a performance issue for quite some time. This creates a thin interop layer, `LightDataset`, which creates the NumPy arrays directly and has a similar access structure (without the `sel`, `loc`, and virtually all other xarray methods). pycalphad now uses LightDataset objects internally, which greatly avoids overhead in tight-looped operations. The LightDataset preserves the inputs from construction and can be coerced to xarray Datasets through a method call, which is the default for calculate and equilibrium (users see no external change), but an option exists for those functions to return LightDataset objects for tight loops.
* Removes pickle hack code which is no longer necessary for supported parallelism and Python versions.
* Regex performance optimization for chemical formula parsing (improve Species parsing performance)
* Fix LaTeX printing bug in pycalphad variables objects (@richardotis)
@bocklund bocklund deleted the mapping branch Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

Successfully merging this pull request may close these issues.

None yet

2 participants