# Generate Relaxed Conformers
The previous notebook generated conformers without reducing their energy afterwards. Here, we both generate conformers and relax them. This notebook starts from the `SchNetPack` db file produced in the last step to avoid having to regenerate coordinates.

In [12]:
from jcesr_ml.coordinates import generate_conformers
from jcesr_ml.schnetpack import make_schnetpack_data
from jcesr_ml.benchmark import load_benchmark_data
from multiprocessing import Pool, TimeoutError
from tqdm import tqdm_notebook as tqdm
from ase.io.xyz import write_xyz
from io import StringIO
import pickle as pkl

## Get Training Data
We are going to modify the dataframe so that I can re-use code from the previous notebook. We are going to get both the gold standard training data (with properties in DataFrame format) and the `SchNetPack` db version of the dataset (with the generated atomic coordiantes)

In [2]:
train_data, _ = load_benchmark_data()

Load in the SchNetPack db

In [3]:
with open('train_dataset.pkl', 'rb') as fp:
    old_db = pkl.load(fp)

Get the list of properties to store

In [4]:
properties = old_db.required_properties

## Get Relaxed Coordinates
Get the generated coordinates in the current training set (`old_db`) and store them in the DataFrame

In [5]:
xyz_gen = []
for i in tqdm(range(len(old_db))):
    fp = StringIO()
    write_xyz(fp, old_db.get_atoms(i))
    xyz_gen.append(fp.getvalue())




Quick check that structures are in the same order by verifying the atom counts are the same

In [6]:
for x, y in zip(xyz_gen, train_data['xyz']):
    assert x.split("\n")[0] == y.split("\n")[0]

Store the structures in the DataFrame

In [7]:
train_data['xyz_gen'] = xyz_gen

## Generate Relaxed Conformers
Same procedure as the last notebook (GA), but relax the coordinates with a forcefield after generation

In [8]:
timeout = 60  # Long timeout to make sure everything finishes

In [9]:
%%time
with Pool() as p:
    futures = [p.apply_async(generate_conformers, (x,), {'relax': True}) for x in train_data['xyz_gen']]
    
    # Wait for all to finish
    conformers = []
    failures = 0
    for x, y in tqdm(list(zip(futures, train_data['xyz_gen']))):
        try:
            res = x.get(timeout)
        except TimeoutError:
            failures += 1
            res = [y]
        conformers.append(res)
    print('Number of failures:', failures)


Number of failures: 2
CPU times: user 1min 1s, sys: 34.7 s, total: 1min 35s
Wall time: 6h 6min 52s


Add the results to a DataFrame

In [10]:
train_data['conformers'] = conformers

## Make an ASE Database
Make another ASE database, using the relaxed conformers

In [13]:
db = make_schnetpack_data(train_data, 'train_dataset_relaxed_confs.db', properties=properties,
                          xyz_col='xyz_gen', conformers='conformers')
with open('train_dataset_ralaxed_confs.pkl', 'wb') as fp:
    pkl.dump(db, fp)