Optimization, Metadynamics, and Ab Initio Molecular Dynamics (AIMD) calculations were performed on a number of
molecules from the Chemspider Dataset. In order to use this data, it was converted into the MSet Format.

In [1]:
import scripts.make_dataset as MD

The output files from the calculation can be directly input into the MSet converter script. If the optimization
failed to converged, or the calculation failed in some other way, a NoneType object is returned.

In [2]:
failed_opt_filename = "/mnt/sdb1/adriscoll/chemspider_data/outputs/expanded_opt/1/112682.out"
opt_filename = "/mnt/sdb1/adriscoll/chemspider_data/outputs/expanded_opt/8/827.out"

The script prints the file it is reading in, and whether or not it was able to build an MSet.

In [3]:
no_mset = MD.read_opt_data(failed_opt_filename)
opt_mset = MD.read_opt_data(opt_filename)

Reading  /mnt/sdb1/adriscoll/chemspider_data/outputs/expanded_opt/1/112682.out
Hit EOF on  /mnt/sdb1/adriscoll/chemspider_data/outputs/expanded_opt/1/112682.out
No MSet built for  /mnt/sdb1/adriscoll/chemspider_data/outputs/expanded_opt/1/112682.out
Reading  /mnt/sdb1/adriscoll/chemspider_data/outputs/expanded_opt/8/827.out


As shown below, an NoneType object is returned for a failed calculation, while a MoleculeSet object is returned for
a successful calculation.

In [4]:
print(no_mset)
print(opt_mset)

None
<tensorchem.dataset.molecule.MoleculeSet object at 0x7f1c8eaa3bb0>


In order to build the MSet, the calculated values are found in the script and input into the build_new_geom function.
This function requires atomic numbers, coordinates, energies, forces, dipoles, quadrupoles, charges, and the method
and basis on which the calculation was performed.
In order to demonstrate how this function works, I will rebuild the geometry of the MSet which I just loaded.

In order to obtain these different values, it will be necessary to iterate through all of the current geometries in
the Molecule. I will demonstrate by using the first geometry only.

In [5]:
geom = opt_mset.geometries[0]

It is easy to get the atomic numbers and coordinates, as those values have been defined as properties.

In [6]:
print(geom.at_nums)
print(geom.coords)

(6, 8, 6, 8, 6, 6, 6, 6, 6, 6, 17, 8, 17, 8, 1, 1, 1, 1, 1, 1, 1, 1)
((-2.8374752876, 2.0505241302, -0.8318563326), (-1.4551501021, 1.7115817562, -0.7035574341), (-1.2297722286, 0.4532286916, -0.2557724741), (-2.086139958, -0.3690486078, 0.034292272), (0.2312808329, 0.1814652696, -0.1749558753), (1.1865502757, 1.1961939529, -0.3283593834), (2.5485460941, 0.8956955547, -0.239945276), (2.9657118214, -0.415350595, 9.11032e-05), (2.0120200346, -1.4252548913, 0.1540165558), (0.6506380728, -1.1332727372, 0.0703617122), (2.5080032526, -3.0522101084, 0.4450849904), (4.3045936207, -0.6991476528, 0.0824421485), (3.685567935, 2.1799571079, -0.4330112439), (-2.3870299233, -2.4440483151, 1.7697932568), (-2.9012899521, 3.0738057051, -1.2121267375), (-3.3324978919, 2.0102778871, 0.1432542713), (-3.3310047128, 1.3838129152, -1.5458967085), (0.8808350806, 2.2238165155, -0.5119442284), (-0.0799073494, -1.9307489294, 0.1962563472), (4.3992877628, -1.6532307729, 0.2632733391), (-2.3741607877, -1.696123950

The energy, dipole, and quadrupole values are stored in the labels dictionary of the geometry. In order to obtain those values, I will have to call the key under which they are stored. I can also access the method and basis in this
way.

In [7]:
print(geom.labels)

{'potential': [(-1530.9113020, wB97X-D, 6-311g**)], 'dipole': [([-1.9385000, 2.0072000, -1.6489000], wB97X-D, 6-311g**)], 'quadrupole': [([-86.0920000, -13.5810000, -93.1845000, 3.8711000, 1.9911000, -98.3867000], wB97X-D, 6-311g**)]}


Getting charges and forces is more difficult. They are stored in the atom class of the geometry, under the labels
property. In order to obtain those values, I will have to call the key under which they are stored. The method and
basis are also stored here.

In [8]:
print(geom.atoms[0].labels)

{'charge': [(-0.0000, mulliken, wB97X-D, 6-311g**)], 'forces': [([-0.0128345, 0.0111656, -0.0029599], wB97X-D, 6-311g**)]}


Below I access all of the values which are necessary to build a geometry and store them as variables.

In [9]:
atomic_nums = [geom.at_nums for geom in opt_mset.geometries]
coords = [geom.coords for geom in opt_mset.geometries]
energy = [geom.labels['potential'][0].export_json()[-1] for geom in opt_mset.geometries]
forces = [[atom.labels['forces'][0].export_json()[-1] for atom in geom.atoms] for geom in opt_mset.geometries]
dipole = [geom.labels['dipole'][0].export_json()[-1] for geom in opt_mset.geometries]
quadrupole = [geom.labels['quadrupole'][0].export_json()[-1] for geom in opt_mset.geometries]
charges = [[atom.labels['charge'][0].export_json()[-1] for atom in geom.atoms] for geom in opt_mset.geometries]
method = opt_mset.geometries[0].labels['potential'][0].export_json()[0].split('.')[1]
basis = opt_mset.geometries[0].labels['potential'][0].export_json()[0].split('.')[2]

The geometry is build and printed below. When the values necessary to build a geometry are accessed from the calculation files, this function makes it very easy to turn these molecules into MSets!

In [10]:
opt_mset_geom = MD.build_new_geom(atomic_nums, coords, energy, forces, dipole, quadrupole, charges, method, basis)
print(opt_mset_geom)

22

C   -2.843818    2.240668   -0.744255
O   -1.483822    1.810053   -0.654941
C   -1.285000    0.596463   -0.141562
O   -2.194845   -0.111959    0.224881
C    0.153497    0.234354   -0.083623
C    1.136167    1.109810   -0.544769
C    2.467673    0.752318   -0.483691
C    2.859908   -0.486704    0.037371
C    1.850312   -1.344331    0.491532
C    0.514308   -1.004941    0.440324
Cl    2.336269   -2.896019    1.136618
O    4.157302   -0.795009    0.077287
Cl    3.690859    1.841809   -1.062634
O   -2.101156   -2.708443    1.438117
H   -2.805403    3.233859   -1.184616
H   -3.298390    2.275553    0.246683
H   -3.418158    1.563110   -1.377451
H    0.858538    2.071665   -0.952023
H   -0.235850   -1.698378    0.809717
H    4.262544   -1.668779    0.469623
H   -2.329818   -1.813748    1.156470
H   -2.269359   -3.241815    0.660736
