# equilibrium: Vectorizing the refinement step#37

Closed
opened this issue Feb 16, 2016 · 4 comments
Closed

# equilibrium: Vectorizing the refinement step #37

opened this issue Feb 16, 2016 · 4 comments
Labels
Milestone

### richardotis commented Feb 16, 2016

 We're leaving probably a 10-100x speedup on the table by not vectorizing the refinement (aka "Solve (3/3)") step in the equilibrium calculation. It's not reasonable to expect users to wait an hour or more to perform mapping calculations or compute a phase diagram. The challenge is that the current serial algorithm is much, much easier to reason about and it will take some careful thinking to get all the indexing operations correct. Here is a raw dump of notes about how it could work. ``````First I need to initialize the Hessian matrix and the b vector Let's try some very conservative upper bounds max_phases = len([c for c in comps if c != 'VA']) max_num_vars = max_internal_dof*max_phases+max_phases max_constraints = num_mass_bals + max_sublattices*max_phases CONSIDER THAT we need to include an extra 'max_phases' dimension because fancy indexing is done in parallel A phase participating in the same equilibrium multiple times will index the same condition simultaneously -- this will break The solution is to use the phase_idx as an extra dimension, then sum over that dimension before copying it over to l_hessian Hessian size will be (*conds, max_phases, max_num_vars+num_constraints, max_num_vars+num_constraints) b vector size will be (*conds, max_phases, max_num_vars+num_constraints) Initialize l_multipliers as zeros of shape (*conds, num_constraints) Set l_multipliers[*conds, :len(max_phases)] = chemical potentials Initialize Hessian as stacked identity matrix Initialize b vector as stacked zero vector Now we need to fill out the Hessian and b at all conditions, for the current iteration We know the phases participating at all conditions We know the best guess site fractions for all participating phases at all conditions For each phase, make a flattened list of (P, T, *sitefracs) to compute -- filter out converged conditions Also track the conditions multi-index for each phase computation Also track the var_idx we can write sitefrac-related dof to, and also the phase_idx for phasefrac-related dof So it should probably be three flattened lists of equal length Pretty easy to compute G, G' and G'' for the flattened list -- where do we copy the results to? first_var_idx = index of first site fraction variable for a given phase last_var_idx = index of last site fraction variable for a given phase Hessian matrix (don't forget to clip P and T derivatives): Sitefrac/sitefrac derivatives Copy phasefrac[cond_multi_index, phase_idx] * G'' to [cond_multi_index, np.index_exp[var_idx:last_var_idx, var_idx:last_var_idx]] Phasefrac/sitefrac derivatives Copy G' (clipped) to [cond_multi_index, last_var_idx+phase_idx, var_idx:last_var_idx] and [cond_multi_index, var_idx:last_var_idx, last_var_idx+phase_idx] Phasefrac/phasefrac derivative [cond_multi_index, last_var_idx+phase_idx, last_var_idx+phase_idx] = 0 (need to do this since we initialize as identity matrix) Constraint Jacobian: Need to track constraint_offset for each flattened condition multi_index (each participating phase increases it) Initialize constraint_jac as zeros of shape (*conds, max_phases, max_constraints, max_num_vars) Initialize l_constraints as zeros of shape(*conds, max_phases, max_constraints) Reset var_idx = first_var_idx Site fraction balances for idx in range(len(dbf.phases[name].sublattices)): active_in_subl = set(dbf.phases[name].constituents[idx]).intersection(comps) constraint_jac[cond_multi_index, phase_idx, constraint_offset, var_idx:last_var_idx+len(active_in_subl)] = 1 l_constraints[cond_multi_index, phase_idx, constraint_offset] = -(np.sum(site_fracs[..., var_idx:var_idx + len(active_in_subl)], axis=-1) - 1) constraint_offset += 1 Mass balances Basically the same as the existing code, with some indexing differences (remember max_phases dimension) Copy -constraint_jac.swapaxes(-2,-1).sum(axis=-3) to l_hessian[cond_multi_index, phase_idx, :max_num_vars, max_num_vars:] Copy constraint_jac.sum(axis=-3) to l_hessian[cond_multi_index, phase_idx, max_num_vars:, :max_num_vars] Gradient term: Reset var_idx = first_var_idx Initialize as b vector shape Copy -phasefrac[cond_multi_index, phase_idx] * G' (clipped) to [cond_multi_index, phase_idx, var_idx:var_idx+phase_dof[name]] Copy -G to [cond_multi_index, phase_idx, max_num_vars-max_phases+phase_idx] Copy np.einsum('...i, ...i', constraint_jac.swapaxes(-2,-1), l_multipliers[cond_multi_index, None, :]) to [cond_multi_index, phase_idx, :max_num_vars] Copy l_constraints to [cond_multi_index, phase_idx, max_num_vars:] Compute step = np.linalg.solve(l_hessian.sum(axis=-3), gradient_term.sum(axis=-2)) step should have the shape (*conds, max_num_vars+num_constraints) Now, about the backtracking line search This is moderately annoying because we're going to need to compute GM for each condition set for different alpha's Initialize alpha as float ones of shape (*conds) `````` The text was updated successfully, but these errors were encountered:
added this to the 0.4 milestone Feb 16, 2016

### richardotis commented Apr 20, 2016

 We have pretty good performance with the current sequential solver, with point calculations typically taking less than 5 seconds. The trick is getting more of the steps to be performed sequentially. I'm leaning toward a rewrite of the refinement loop in Cython to make the whole thing multi-threaded.

### richardotis commented May 2, 2016

 Another possibility, which would complicate Python 2 compatibility: https://gist.github.com/mangecoeur/9540178

### richardotis commented May 16, 2016

 The new ufuncify-based backend is providing very nice solver performance at minimal compilation cost. If we could reliably parallelize the function constructor for each phase, that would be pretty close to ideal. Writing the refinement loop in Cython could provide even better performance, but I need to perform profiling after the compilation step is cleaned up, because it's difficult to get good numbers on the rest of the code when compilation is such a significant bottleneck.

### richardotis commented Aug 3, 2016

 With multi-core processing enabled by dask (essentially just leaning on multiprocessing, with better pickling), I think this is essentially done.