# HTSOHM workflow 0.9.1

This notebook takes a user through the process of generating a seed population of hypothetical porous materials and successively mutating libraries of said materials until the method as converged--that is the structure-property space has been sufficiently uniformly sampled.

In [1]:
HTSOHM_dir = '~/HTSOHM-dev' #specifies HTSOHM directory
import sys
sys.path.insert(0, './bin') #adds HTSOHM modules to Python

number_of_atom_types = 4
number_of_materials = 50

bins = 5

## Generate seed population 

Now that the number of atom-types and population size have been specified, the seed can be generated.

In [None]:
from generate import *

generate(number_of_materials,
         number_of_atom_types)

## Screen seed

Now that the seed population has been generated, each material can then be screened using RASPA. Because the number of materials is typically large, the work is distributed across a computing cluster. For each material methane loading (see `${HTSOHM}/bin/ch4simNOHVF` for RASPA input file), void fraction (see `${HTSOHM}/bin/HVsim`), and surface area (see `${HTSOHM}/bin/SAsim`) is calculated. Raw data will be saved in `${HTSOHM}/data/`.

<b>DEV NOTE: ln 17, screen.sh<br>
`export HTSOHM_DIR=${HOME}/HTSOHM-dev`<br>
`export LIB_DIR=${HOME}/HTSOHM-dev/$LIBRARY_DIR`'<br>

ALSO MATS PER CORE REDUCED FROM 100 TO 20

In [None]:
%run ./bin/screen.ipy

screen(HTSOHM_dir, 'gen0', 0, number_of_materials)

## Collect output

Data for the entire library is now divided into different directories, one for each node used to carry out the calculations. All of this data is then collected into three files, one for each of the properties calculated: `ch4_abs_cc_cc.txt`, `HVdata2col.txt`, and `SAdata_m2_cc.txt`. These data files can be find found in `${HTSOHM}/data/` and the directory containing all materials from that library.

<b>DEV NOTE: rmdata is commented out...<br>
`find_missing` only checks names, not values

In [None]:
%run ./bin/cat_data.ipy
from find_missing import *

prep_gen0(HTSOHM_dir, number_of_materials)
find_missing('gen0')

## Bin library, select parents

Now that structure-properties have been calculated for the seed population, the library can be binned--sorted in three dimensions: methane loading, surface area, and void fraction--so that those materials with the rarest combinations of structure properties can be selected to <i>parent</i> new materials.

In [None]:
from binning import *

p_dir = 'gen0'
#bins = 5
n_child = number_of_materials

bin_count_gen0, bin_IDs_gen0 = bin3d('gen0', bins)

p_list_gen0 = pick_parents('gen0',
                           bin_count_gen0,
                           bin_IDs_gen0,
                           n_child)

## <i>Dummy</i> test

Because properties are calculated using MC methods, it is possible that a partiular data point (combination of properties) seems to be unique, but is in fact a statistical anomaly. To deal with this, each parent is re-screened 5 times so that the results can be compared to the original calculation.

In [None]:
%run ./bin/dummy_screen.ipy

dummy_screen(HTSOHM_dir, 'gen0')

<b>DEV NOTE: add find missing step???

In [None]:
%run ./bin/dummy_test.ipy

dummy_test(HTSOHM_dir, 'gen0')

# First generation 

## Mutate `gen0`, create `gen1`

In [None]:
mutation_strength = 0.2

from mutate import *

firstS('gen0', mutation_strength)
mutate('gen0', number_of_atom_types, 'gen1')

## Screen `gen1`

In [None]:
%run ./bin/screen.ipy

screen(HTSOHM_dir,
       'gen1',
       1 * number_of_materials,
       number_of_materials)

## Collect `gen1` output

In [None]:
%run ./bin/cat_data.ipy
from find_missing import *

prep4mut(HTSOHM_dir,
         'gen1',
         1 * number_of_materials,
         number_of_materials,
         'gen0',
         'tgen1')

find_missing('tgen1')

## Bin `tgen1`, select parents 

In [2]:
from binning import *

p_dir = 'tgen1'
n_child = number_of_materials

bin_count, bin_IDs = bin3d(p_dir, bins)

p_list_tgen1 = pick_parents(p_dir,
                           bin_count,
                           bin_IDs,
                           n_child)

## `tgen1` dummy test

In [3]:
%run ./bin/dummy_screen.ipy

dummy_screen(HTSOHM_dir, 'tgen1')

/ihome/cwilmer/ark111/HTSOHM-dev/bin
Screening dummies...
See terminal for status.

Jobs submitted!
/ihome/cwilmer/ark111/HTSOHM-dev


In [None]:
%run ./bin/dummy_test.ipy

dummy_test(HTSOHM_dir, 'tgen1')

# Second generation

## Calculate mutation strength(s)

In [48]:
import numpy as np
import os
from binning import *

p_dir = 'tgen1'
gp_dir = 'gen' + str(int(p_dir[-1]) - 1)
#bins <---input variable

#def calc_S(p_dir):

s_0 = np.genfromtxt('gen0/s_list.txt', usecols=0, dtype=float)
s0 = s_0[0]

p_list = np.genfromtxt(p_dir + '/p_list.txt', usecols=0, dtype=str)
p_As = np.genfromtxt(p_dir + '/p_list.txt', usecols=1,
                       dtype=str)
p_Bs = np.genfromtxt(p_dir + '/p_list.txt', usecols=2,
                       dtype=str)
p_Cs = np.genfromtxt(p_dir + '/p_list.txt', usecols=3,
                       dtype=str)

p_bins = []
for i in range(len(p_As)):
    pos = [int(p_As[i][1:-1]),
           int(p_Bs[i][:-1]),
           int(p_Cs[i][:-1])]
    p_bins.append(pos)

gp_list = np.genfromtxt(gp_dir + '/p_list.txt', usecols=0, dtype=str)
gp_As = np.genfromtxt(gp_dir + '/p_list.txt', usecols=1,
                       dtype=str)
gp_Bs = np.genfromtxt(gp_dir + '/p_list.txt', usecols=2,
                       dtype=str)
gp_Cs = np.genfromtxt(gp_dir + '/p_list.txt', usecols=3,
                       dtype=str)

gp_bins = []
for i in range(len(gp_As)):
    pos = [int(gp_As[i][1:-1]),
           int(gp_Bs[i][:-1]),
           int(gp_Cs[i][:-1])]
    gp_bins.append(pos)

gp_s = np.genfromtxt(gp_dir + '/s_list.txt')

s_file = open(os.path.abspath(p_dir) + '/s_list.txt', 'w')

bin_list = []
for i in p_bins:
    if i in gp_bins:
        if i not in bin_list:
            bin_list.append(i)
            
skip = len(gp_dir) + 1
names = [x[0][skip:] for x in os.walk(gp_dir)][1:]
n_gp = len(names)

p_children = []

for i in range(len(gp_list)):
    if gp_bins[i] in bin_list:
        p_children.append('MAT-' + str(i))

#p_ID_array = bin3d('tgen1,' bins)




In [49]:
print(p_children)

['MAT-2', 'MAT-4', 'MAT-5', 'MAT-6', 'MAT-7', 'MAT-8', 'MAT-9', 'MAT-12', 'MAT-15', 'MAT-16', 'MAT-17', 'MAT-19', 'MAT-20', 'MAT-24', 'MAT-27', 'MAT-28', 'MAT-33', 'MAT-34', 'MAT-36', 'MAT-37', 'MAT-38', 'MAT-39', 'MAT-40', 'MAT-41', 'MAT-42', 'MAT-43', 'MAT-45', 'MAT-46', 'MAT-48']


## Mutate `tgen1`, create `gen2`

In [None]:
mutation_strength = 0.2

from mutate import *

firstS('gen0', mutation_strength)
mutate('gen0', number_of_atom_types, 'gen1')