# Optimizing molecular structure


The initial structure can be provided manually, or as constructed using SMILES strings (see [previous](wf_build) section). Here we will do the latter to illustrate a full structure optimization workflow, with methanol as the example.

To minimize computational cost, we perform the final quantum chemical structure optimization at a minimal level of theory (HF with the STO-3G basis set). For practical calculations, DFT or MP2 as well as a larger basis set should be used.

In [1]:
import py3Dmol as p3d
from rdkit import Chem
from rdkit.Chem import AllChem
import veloxchem as vlx
import py3Dmol as p3d
import numpy as np



## Initial structure

We obtain the initial (UFF optimized) structure from RDkit, using the SMILES string (`CO`).

In [2]:
def smilestoxyz(smiles, opt = True, return_noH = False):
    mol_bare = Chem.MolFromSmiles(smiles)
    mol_full = Chem.AddHs(mol_bare)
    AllChem.EmbedMolecule(mol_full)
    if opt: AllChem.UFFOptimizeMolecule(mol_full)
    if return_noH:
        return Chem.MolToXYZBlock(mol_full), Chem.RemoveHs(mol_full)
    return Chem.MolToXYZBlock(mol_full)

methanol_uff = smilestoxyz('CO')

## xTB optimization

Next, set up the xTB driver and perform the structure optimization.

In [3]:
methanol = vlx.Molecule.from_xyz_string(methanol_uff)
xtb_drv = vlx.XtbDriver()
method_settings = {'xtb':'gfn2'}
xtb_drv.set_method(method_settings['xtb'].lower())
xtb_grad_drv = vlx.XtbGradientDriver(xtb_drv)
xtb_opt_drv = vlx.OptimizationDriver(xtb_grad_drv)
xtb_opt = xtb_opt_drv.compute(methanol)

                                                                                                                          
                                                Optimization Driver Setup                                                 
                                                                                                                          
                                     Coordinate System       :    TRIC                                                    
                                     Constraints             :    No                                                      
                                     Max. Number of Steps    :    300                                                     
                                     Transition State        :    No                                                      
                                     Hessian                 :    never                                                   
                

* Info * Computing energy and gradient...                                                                                 

          ...................................................
          :                      SETUP                      :
          :.................................................:
          :  # basis functions                  12          :
          :  # atomic orbitals                  12          :
          :  # shells                            8          :
          :  # electrons                        14          :
          :  max. iterations                   280          :
          :  Hamiltonian                  GFN2-xTB          :
          :  restarted?                       true          :
          :  GBSA solvation                  false          :
          :  PC potential                    false          :
          :  electronic temp.          300.0000000     K    :
          :  accuracy                    1.0000000          :
        

* Info * Computing energy and gradient...                                                                                 

          ...................................................
          :                      SETUP                      :
          :.................................................:
          :  # basis functions                  12          :
          :  # atomic orbitals                  12          :
          :  # shells                            8          :
          :  # electrons                        14          :
          :  max. iterations                   280          :
          :  Hamiltonian                  GFN2-xTB          :
          :  restarted?                       true          :
          :  GBSA solvation                  false          :
          :  PC potential                    false          :
          :  electronic temp.          300.0000000     K    :
          :  accuracy                    1.0000000          :
        

                                     *** Time spent in Optimization Driver: 0.45 sec                                      
                                                                                                                          


Convert the results to xyz-coordinates.

In [4]:
def toxyz(molecule):
    from veloxchem.veloxchemlib import ChemicalElement
    from veloxchem.veloxchemlib import bohr_in_angstroms

    elem_ids = molecule.elem_ids_to_numpy()
    xs = molecule.x_to_numpy() * bohr_in_angstroms()
    ys = molecule.y_to_numpy() * bohr_in_angstroms()
    zs = molecule.z_to_numpy() * bohr_in_angstroms()

    xyz = "" 
    xyz += "%d\n\n" % molecule.number_of_atoms()
    for elem_id, x, y, z in zip(elem_ids, xs, ys, zs):
        elem = ChemicalElement()
        elem.set_atom_type(elem_id)
        xyz += "%6s %22.12f %22.12f %22.12f\n" % (elem.get_name(),x,y,z)
    return xyz

methanol_xtb = toxyz(xtb_opt)

## HF optimization

Set up the SCF and optimization drivers and performing the final optimization, using the xTB results as the initial structure.

```python
molecule = vlx.Molecule.from_xyz_string(methanol_xtb)
basis = vlx.MolecularBasis.read(molecule, 'STO-3G')
scf_drv = vlx.ScfRestrictedDriver()
scf_results = scf_drv.compute(molecule, basis)
grad_drv = vlx.ScfGradientDriver(scf_drv)
opt_drv = vlx.OptimizationDriver(grad_drv)
opt_molecule = opt_drv.compute(molecule, basis)
```

```python
methanol_hf = toxyz(opt_molecule)
```

In [5]:
methanol_hf = '6\n\n     C        -0.355338295061         0.015832253296         0.022623869829\n     O         0.957130783496         0.544496756944        -0.204029530782\n     H        -0.479565645795        -0.368664416163         1.040410925550\n     H        -0.607609510888        -0.787365989746        -0.677656188432\n     H        -1.069087493005         0.828785331979        -0.122302379295\n     H         1.554469486523        -0.233087881063        -0.059049636847\n'

## Comparison

Visualize the results using py3Dmol:

In [6]:
viewer = p3d.view(viewergrid=(1, 3), width=500, height=200, linked=False)
viewer.addModel(methanol_uff, 'xyz', viewer=(0, 0))
viewer.addModel(methanol_xtb, 'xyz', viewer=(0, 1))
viewer.addModel(methanol_hf, 'xyz', viewer=(0, 2))
viewer.setViewStyle({"style": "outline", "color": "black", "width": 0.1})
viewer.setStyle({"stick": {}})
viewer.show()

These structures are all very similar, featuring, *e.g*, the staggered H-O-C-H dihedral. To better see any differences in the structures, the distance matrices can be calculated  using functionalities from RDfit:

In [7]:
print('From UFF:')
m_uff = AllChem.MolFromXYZBlock(methanol_uff)
dm_uff = AllChem.Get3DDistanceMatrix(m_uff)
print(np.around(dm_uff,3))
print()
print('From xTB:')
m_xtb = AllChem.MolFromXYZBlock(methanol_xtb)
dm_xtb = AllChem.Get3DDistanceMatrix(m_xtb)
print(np.around(dm_xtb,3))
print()
print('From HF:')
m_hf = AllChem.MolFromXYZBlock(methanol_hf)
dm_hf = AllChem.Get3DDistanceMatrix(m_hf)
print(np.around(dm_hf,3))

From UFF:
[[0.    1.398 1.112 1.112 1.109 1.925]
 [1.398 0.    2.065 2.065 2.05  0.992]
 [1.112 2.065 0.    1.818 1.806 2.291]
 [1.112 2.065 1.818 0.    1.806 2.291]
 [1.109 2.05  1.806 1.806 0.    2.855]
 [1.925 0.992 2.291 2.291 2.855 0.   ]]

From xTB:
[[0.    1.406 1.096 1.096 1.088 1.946]
 [1.406 0.    2.078 2.078 2.008 0.963]
 [1.096 2.078 0.    1.783 1.777 2.343]
 [1.096 2.078 1.783 0.    1.777 2.343]
 [1.088 2.008 1.777 1.777 0.    2.817]
 [1.946 0.963 2.343 2.343 2.817 0.   ]]

From HF:
[[0.    1.433 1.095 1.095 1.091 1.928]
 [1.433 0.    2.109 2.109 2.048 0.991]
 [1.095 2.109 0.    1.773 1.77  2.316]
 [1.095 2.109 1.773 0.    1.77  2.316]
 [1.091 2.048 1.77  1.77  0.    2.831]
 [1.928 0.991 2.316 2.316 2.831 0.   ]]


This gives the the distances between all pairs of atoms, as expressed in Å.

As long as the atom order is consistent (which it here is), we can print the differences in distance matrices, or maximal absolute difference:

In [8]:
print('Difference between HF and xTB:')
print(np.around(dm_hf - dm_xtb,3))
print()
print('Maximal absolute difference between HF and xTB:')
print(f'{np.max(np.abs(dm_hf - dm_xtb)):.4f} Å')

Difference between HF and xTB:
[[ 0.     0.027 -0.    -0.     0.003 -0.019]
 [ 0.027  0.     0.031  0.031  0.04   0.028]
 [-0.     0.031  0.    -0.01  -0.007 -0.027]
 [-0.     0.031 -0.01   0.    -0.007 -0.027]
 [ 0.003  0.04  -0.007 -0.007  0.     0.014]
 [-0.019  0.028 -0.027 -0.027  0.014  0.   ]]

Maximal absolute difference between HF and xTB:
0.0395 Å


The largest difference is thus seen to be 0.04 Å, between atoms 2 and 5. These atoms are the oxygen and one hydrogen from CH$_4$, which has a total distance of 2.05 Å. As any pair-wise differences will propagate along the molecule, it may be more relevant to look at *relative* differences:

In [9]:
# replace the diagonal elements (zeros) with a large number to avoid nan
dm_tmp = dm_hf + 1e10*np.diag(np.ones(len(dm_hf)))

print('Relative difference between HF and xTB:')
print(np.around((dm_hf - dm_xtb)/dm_tmp,3))
print()
print('Maximal relative difference between HF and xTB:')
print(f'{np.max(np.abs((dm_hf - dm_xtb)/dm_tmp)):.4f}')

Relative difference between HF and xTB:
[[ 0.     0.019 -0.    -0.     0.003 -0.01 ]
 [ 0.019  0.     0.015  0.015  0.019  0.029]
 [-0.     0.015  0.    -0.006 -0.004 -0.012]
 [-0.     0.015 -0.006  0.    -0.004 -0.012]
 [ 0.003  0.019 -0.004 -0.004  0.     0.005]
 [-0.01   0.029 -0.012 -0.012  0.005  0.   ]]

Maximal relative difference between HF and xTB:
0.0286


Now we see that the largest *relative* difference in atom-pair distance is between atoms 2 and 6, which correspond to the oxygen and the hydrogen directly connected to the oxygen. With this, we have thus found the largest relative deviation in a bond length, which is in most cases more important than in distances between atoms at different sites.