New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

equilibrium: Vectorizing the refinement step #37

Closed
richardotis opened this Issue Feb 16, 2016 · 4 comments

Comments

Projects
None yet
1 participant
@richardotis
Collaborator

richardotis commented Feb 16, 2016

We're leaving probably a 10-100x speedup on the table by not vectorizing the refinement (aka "Solve (3/3)") step in the equilibrium calculation. It's not reasonable to expect users to wait an hour or more to perform mapping calculations or compute a phase diagram. The challenge is that the current serial algorithm is much, much easier to reason about and it will take some careful thinking to get all the indexing operations correct.

Here is a raw dump of notes about how it could work.

First I need to initialize the Hessian matrix and the b vector
Let's try some very conservative upper bounds
max_phases = len([c for c in comps if c != 'VA'])
max_num_vars = max_internal_dof*max_phases+max_phases
max_constraints = num_mass_bals + max_sublattices*max_phases
CONSIDER THAT we need to include an extra 'max_phases' dimension because fancy indexing is done in parallel
A phase participating in the same equilibrium multiple times will index the same condition simultaneously -- this will break
The solution is to use the phase_idx as an extra dimension, then sum over that dimension before copying it over to l_hessian
Hessian size will be (*conds, max_phases, max_num_vars+num_constraints, max_num_vars+num_constraints)
b vector size will be (*conds, max_phases, max_num_vars+num_constraints)
Initialize l_multipliers as zeros of shape (*conds, num_constraints)
Set l_multipliers[*conds, :len(max_phases)] = chemical potentials
Initialize Hessian as stacked identity matrix
Initialize b vector as stacked zero vector
Now we need to fill out the Hessian and b at all conditions, for the current iteration
We know the phases participating at all conditions
We know the best guess site fractions for all participating phases at all conditions
For each phase, make a flattened list of (P, T, *sitefracs) to compute -- filter out converged conditions
   Also track the conditions multi-index for each phase computation
   Also track the var_idx we can write sitefrac-related dof to, and also the phase_idx for phasefrac-related dof
   So it should probably be three flattened lists of equal length
   Pretty easy to compute G, G' and G'' for the flattened list -- where do we copy the results to?
   first_var_idx = index of first site fraction variable for a given phase
   last_var_idx = index of last site fraction variable for a given phase
   Hessian matrix (don't forget to clip P and T derivatives):
   Sitefrac/sitefrac derivatives
     Copy phasefrac[cond_multi_index, phase_idx] * G'' to [cond_multi_index, np.index_exp[var_idx:last_var_idx, var_idx:last_var_idx]]
   Phasefrac/sitefrac derivatives
     Copy G' (clipped) to [cond_multi_index, last_var_idx+phase_idx, var_idx:last_var_idx] and [cond_multi_index, var_idx:last_var_idx, last_var_idx+phase_idx]
   Phasefrac/phasefrac derivative
      [cond_multi_index, last_var_idx+phase_idx, last_var_idx+phase_idx] = 0 (need to do this since we initialize as identity matrix)
    Constraint Jacobian:
    Need to track constraint_offset for each flattened condition multi_index (each participating phase increases it)
    Initialize constraint_jac as zeros of shape (*conds, max_phases, max_constraints, max_num_vars)
    Initialize l_constraints as zeros of shape(*conds, max_phases, max_constraints)
    Reset var_idx = first_var_idx
    Site fraction balances
      for idx in range(len(dbf.phases[name].sublattices)): active_in_subl = set(dbf.phases[name].constituents[idx]).intersection(comps)
      constraint_jac[cond_multi_index, phase_idx, constraint_offset, var_idx:last_var_idx+len(active_in_subl)] = 1
      l_constraints[cond_multi_index, phase_idx, constraint_offset] = -(np.sum(site_fracs[..., var_idx:var_idx + len(active_in_subl)], axis=-1) - 1)
      constraint_offset += 1
    Mass balances
      Basically the same as the existing code, with some indexing differences (remember max_phases dimension)
    Copy -constraint_jac.swapaxes(-2,-1).sum(axis=-3) to l_hessian[cond_multi_index, phase_idx, :max_num_vars, max_num_vars:]
    Copy constraint_jac.sum(axis=-3) to l_hessian[cond_multi_index, phase_idx, max_num_vars:, :max_num_vars]
    Gradient term:
    Reset var_idx = first_var_idx
    Initialize as b vector shape
    Copy -phasefrac[cond_multi_index, phase_idx] * G' (clipped) to [cond_multi_index, phase_idx, var_idx:var_idx+phase_dof[name]]
    Copy -G to [cond_multi_index, phase_idx, max_num_vars-max_phases+phase_idx]
    Copy np.einsum('...i, ...i', constraint_jac.swapaxes(-2,-1), l_multipliers[cond_multi_index, None, :]) to [cond_multi_index, phase_idx, :max_num_vars]
    Copy l_constraints to [cond_multi_index, phase_idx, max_num_vars:]

  Compute step = np.linalg.solve(l_hessian.sum(axis=-3), gradient_term.sum(axis=-2))
  step should have the shape (*conds, max_num_vars+num_constraints)
  Now, about the backtracking line search
  This is moderately annoying because we're going to need to compute GM for each condition set for different alpha's
  Initialize alpha as float ones of shape (*conds)

@richardotis richardotis added this to the 0.4 milestone Feb 16, 2016

@richardotis

This comment has been minimized.

Collaborator

richardotis commented Apr 20, 2016

We have pretty good performance with the current sequential solver, with point calculations typically taking less than 5 seconds. The trick is getting more of the steps to be performed sequentially. I'm leaning toward a rewrite of the refinement loop in Cython to make the whole thing multi-threaded.

@richardotis

This comment has been minimized.

Collaborator

richardotis commented May 2, 2016

Another possibility, which would complicate Python 2 compatibility: https://gist.github.com/mangecoeur/9540178

@richardotis

This comment has been minimized.

Collaborator

richardotis commented May 16, 2016

The new ufuncify-based backend is providing very nice solver performance at minimal compilation cost. If we could reliably parallelize the function constructor for each phase, that would be pretty close to ideal. Writing the refinement loop in Cython could provide even better performance, but I need to perform profiling after the compilation step is cleaned up, because it's difficult to get good numbers on the rest of the code when compilation is such a significant bottleneck.

@richardotis

This comment has been minimized.

Collaborator

richardotis commented Aug 3, 2016

With multi-core processing enabled by dask (essentially just leaning on multiprocessing, with better pickling), I think this is essentially done.

@richardotis richardotis closed this Aug 3, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment