## Extracting and Storing States From Monte Carlo Runs
In this notebook, we'll extract the staes (ASE supercells) stored with our Monte Carlo run (see "job_script.sb" for the example slurm job).

We'll save our states into numpy arrays, which can then be used for initial states for Kinetic Monte Carlo calculations.

Typically, as shown in our example job script, we do multiple Monte Carlo runs (also called Monte Carlo trajectories), each starting from an independent random state. Since the ultimate aim is to train machine learning models, we divide states from these trajectories equally into training and validation states.

In [1]:
import numpy as np
from ase.spacegroup import crystal
from ase.build import make_supercell
import pickle
from onsager import crystal, supercell
from tqdm import tqdm

In [2]:
elems = ["Co", "Ni", "Cr", "Fe", "Mn"]
elemsToIndices = {"Co":0, "Ni":1, "Cr":2, "Fe":3, "Mn":4}
elemsToNum = {}
for elemInd, el in enumerate(elems):
    elemsToNum[el] = elemInd + 1

In [3]:
# load crystal data
with h5py.File("../../CrysDat_FCC/CrystData.h5", "r") as fl:
    lattice = np.array(fl["Lattice_basis_vectors"])
    superlatt = np.array(fl["SuperLatt"])
    SiteIndtoR = np.array(fl["SiteIndToR"])
    RtoSiteInd = np.array(fl["RToSiteInd"])

crys = crystal.Crystal.FCC(a0 = 1.0, chemistry=["A"])
assert np.allclose(lattice, crys.lattice)

superFCC_onsg = supercell.ClusterSupercell(crys, superlatt)

## Now we go ahead and extract the states

In [5]:
def get_states(Temp, lowerTrajId, upperTrajId, startSamp, EndSamp, MC_interval, Nsites, a0=3.59):
    """
    Function to extract ASE supercells, check them and store into numpy arrays.
    :param: Temp - the temperature.
    
    For the following inputs, please also refer to example job script "job_script.sb" as well as the next
    cell in this notebook.
    
    :param: lowerTrajId - the index of the first trajectory to gather states from.
    :param: upperTrajId - the index of the last trajectory to gather states from.
    
    :param: startSamp - the Metropolis step from which states were stored after thermalization
                        (10200 in the example run - since 10000 equilibration steps were done and
                        sample gathered every 200 steps thereafter).
    :param: EndSamp - the last Metropolis step (60000 in the example run).
    :param: MC_interval - the intervals at which states were stored in the Metropolis runs.
                          (200 for the example run).
    :param: Nsites - the number of sites (512 in the example run).
    """
    
    N_traj = upperTrajId - lowerTrajId + 1
    N_samps_per_traj = (EndSamp - startSamp) // MC_interval + 1
    total_states = N_traj * N_samps_per_traj * 5 # 5 for the 5 Jobs 11 to 16
    print("Gathering total {} states from trajectories {} to {} "
          "for {} K.".format(total_states, lowerTrajId, upperTrajId, Temp))
    
    # initialize the state array
    # Remember the supercells have had the (0., 0., 0.) site deleted for the vacancy
    states = np.zeros((total_states, Nsites), dtype=np.int8)
    
    total = 0
    
    counts = np.array([103, 102, 102, 102, 102])
    Job = 1
    for traj in tqdm(range(lowerTrajId, upperTrajId + 1), ncols=65, position=0, leave=True):

        dr = "{0}_{1}/{0}_{1}_{2}/chkpt/".format(Temp,Job,traj)

        for samp in range(startSamp, EndSamp + 1, MC_interval):

            fileName = dr+"supercell_{}.pkl".format(samp)

            with open(fileName, "rb") as fl:
                superFCC = pickle.load(fl)


            # check the supercell composition
            elemCounts = np.zeros(len(elems), dtype=int)
            for at_Ind in range(len(superFCC)):
                elem = superFCC[at_Ind].symbol
                idx = elemsToIndices[elem]
                elemCounts[idx] += 1

            # Check that the atom counts are correct
            assert np.array_equal(elemCounts, count_11)

            # Check that the supercells are always consistent with onsager and store occupancies
            a = superFCC.cell[:]/8
            assert np.allclose(superFCC.cell[:], superFCC_onsg.lattice * a0)
            assert np.allclose(superFCC.cell[:]/8, superFCC_onsg.crys.lattice * a0)
            assert len(superFCC) == Nsites - 1
            occs = np.zeros(len(superFCC), dtype=np.int8)
            for site in range(len(superFCC)):
                assert not np.allclose(superFCC[site].position, 0.)
                Rs  = np.dot(np.linalg.inv(a), superFCC[site].position)
                Rsite = Rs.round(0).astype(int)
                siteInd, _ = superFCC_onsg.index(Rsite, (0, 0))
                assert siteInd == superFCC[site].index + 1, "{} {} {} {}".format(Rs, Rsite, siteInd,
                                                                              superFCC[site].index)
                assert siteInd > 0
                occs[site] = elemsToNum[superFCC[site].symbol]

            states[total, 1:] = occs[:]
            total += 1
                
    print("Gathered total {0} states from trajectories {1} to {2}.".format(total, lowerTrajId, upperTrajId))
    
    return states

In [None]:
T = 1073
# Gather states from trajectories 1 to 8 for the training set
# We have simulated 16 trajectories. The first 8 will be training states for our machine learning
# later on, and the last 8 will be for validation.
statesTrain = get_states(T, 1, 8, 10200, 60000, 200, SiteIndtoR.shape[0])
statesVal = get_states(T, 9, 16, 10200, 60000, 200, SiteIndtoR.shape[0])

statesAll = np.zeros((statesTrain.shape[0] + statesVal.shape[0], statesTrain.shape[1]))

np.save("statesAll_1073.npy".format(T), statesAll) # statesAll is going to go to be used for
                                                   # Kinetic Monte Carlo Simulations.