# 2 - Specifiying New/More Complex Ligands 

So far (from 1-Introduction/Overview.ipynb) we know how to specify basic inputs and understand some of the basic outputs from Architector.

What about new or unknown systems inlcuding more complex ligands?

We have some tools to address some of these challenges manually along with some SMILES tools!

In this tutorial we will learn:

**(A)** How to manually identify coordination sites of new ligands for generation in Architector.

**(B)** How to automatically and manually identify ligand types (geometries!).

**(C)** How to use internal commands to simplify inputs for more complex coordination environments!

## For (A), From here, we need a challenge. Let's try a [La(Terpyradine)<sub>3</sub>]<sup>3+</sup> complex.}

But what is the SMILES for Terpyradine (Terpy, for short), and how is it coordinated to a metal center?

Tracking down the SMILES can be done on [Wikipedia: here](https://en.wikipedia.org/wiki/Terpyridine). Giving: "c1ccnc(c1)c2cccc(n2)c3ccccn3"

However, what are the coordinating atoms?

Here, we turn to useful routines included in Architector:

In [1]:
import architector
from architector import (build_complex, # Build routine
                         view_structures, # Visualization
                         smiles2Atoms) # Smiles utility to ASE atoms

We will also initialize the metal and ligand smiles for La/Terpy:

In [2]:
terpy_smiles = 'c1ccnc(c1)c2cccc(n2)c3ccccn3'
metal = 'La'

Next, the smiles2Atoms utility converts our terpy smiles to [ASE atoms](https://wiki.fysik.dtu.dk/ase/ase/atoms.html) for visualization purposes.

In [3]:
terpy_atoms = smiles2Atoms(terpy_smiles)

### Next, we visualize with labelled indices for identification of ligand-metal coordinating atoms (CAs)

We already know the view_structures commond, but there are a couple additional parameters that can be useful for this:

**(i)** The labelinds=True option adds overlays with the exact indices of the atoms as used by Architector

**(ii)** The size of the visualization can be shifted using w (width) and h (height) commands (default is 200x200)

With these two additions we can visualize the ligand structure for identification of CAs:

In [4]:
view_structures(terpy_atoms,labelinds=True,w=500,h=500) 

### Visually, we can identify that the CAs will be the nitrogen atoms (Blue atoms) at indices 3,11, and 17.

We can now save these indices for building the complexes!

In [5]:
terpy_coordList = [3,11,17]

## Now, for (B), Identifying ligand types we have 2 different methods:

**(i)*** Automatically 

**(ii)** Manually

For **(i)**, all we need to do is input ligand dictionaries without a specified ligType! So we funcationally already have enough information to generate the [La(Terpyradine)<sub>3</sub>]<sup>3+</sup> complex!

In [6]:
terpy_ligand_dict = {'smiles':terpy_smiles,
                    'coordList':terpy_coordList}

And the full input dictionary (including 3 terpy ligands!):

In [7]:
inputDict = {'core':{'metal':metal,'coreCN':9},
            'ligands':[terpy_ligand_dict]*3,
            'parameters':{'assemble_method':'GFN-FF', # Switch to GFN-FF for faster assembly, 
                          'n_conformers':2, # Test 2 different conformers
                          'return_only_1':True # Return just one
                          # but still using GFN2-xTB for the final relaxation. Will have more printout.
                         }}
inputDict # Print out full input Dictionary

{'core': {'metal': 'La', 'coreCN': 9},
 'ligands': [{'smiles': 'c1ccnc(c1)c2cccc(n2)c3ccccn3',
   'coordList': [3, 11, 17]},
  {'smiles': 'c1ccnc(c1)c2cccc(n2)c3ccccn3', 'coordList': [3, 11, 17]},
  {'smiles': 'c1ccnc(c1)c2cccc(n2)c3ccccn3', 'coordList': [3, 11, 17]}],
 'parameters': {'assemble_method': 'GFN-FF',
  'n_conformers': 2,
  'return_only_1': True}}

Looks good! Now we build the complex using Architector - Note that this might take a couple of minutes:

In [8]:
out = build_complex(inputDict) # Might take a couple minutes

ligType not specified for c1ccnc(c1)c2cccc(n2)c3ccccn3 - testing ligand placement to determine ligType!
Assigning lig c1ccnc(c1)c2cccc(n2)c3ccccn3 to ligType tri_mer!

          CN  :   150.00000
          rep :   500.00000
          disp:  2500.00000
          HB1 :   250.00000
          HB2 :   450.00000

          Pauling EN used:
          Z : 1  EN :  2.20
          Z : 6  EN :  2.55
          Z : 7  EN :  3.04
          Z :57  EN :  1.10
          electric field strengths (au): 0.000

           ------------------------------------------------- 
          |           Force Field Initialization            |
           ------------------------------------------------- 

          distances ...
          ----------------------------------------
          generating topology and atomic info file ...
          pair mat ...
          computing topology distances matrix with Floyd-Warshall algo ...
          making topology EEQ charges ...
          #fragments for EEQ constrain: 1
     

          pair mat ...
          computing topology distances matrix with Floyd-Warshall algo ...
          making topology EEQ charges ...
          #fragments for EEQ constrain: 1
          rings ...
          # BATM   964
          # H in HB   22
          doing iterative Hueckel for 2 subsystem(s) ...

  atom   neighbors  erfCN metchar sp-hybrid imet pi  qest     coordinates
    1  La      6    4.40   0.02         0    2    0   0.430    0.000000    0.000000    0.000000
    2  C       3    2.82   0.00         2    0    1   0.003   -3.371034    5.828949    7.329424
    3  C       3    2.80   0.00         2    0    1   0.007   -4.364569    3.414277    7.145246
    4  C       3    2.95   0.00         2    0    1   0.023   -3.484941    1.885367    5.228205
    5  N       3    2.79   0.00         2    0    1  -0.130   -1.721624    2.613544    3.537055
    6  C       3    3.07   0.00         2    0    1   0.053   -0.703862    5.003327    3.678234
    7  C       3    2.81   0.00         2 

          pair mat ...
          computing topology distances matrix with Floyd-Warshall algo ...
          making topology EEQ charges ...
          #fragments for EEQ constrain: 1
          rings ...
          # BATM   1842
          # H in HB   33
          doing iterative Hueckel for 3 subsystem(s) ...

  atom   neighbors  erfCN metchar sp-hybrid imet pi  qest     coordinates
    1  La      9    4.41   0.00         0    2    0   0.441    0.000000    0.000000    0.000000
    2  C       3    2.82   0.00         2    0    1  -0.007   -3.371034    5.828949    7.329424
    3  C       3    2.80   0.00         2    0    1  -0.001   -4.364569    3.414277    7.145246
    4  C       3    2.95   0.00         2    0    1   0.018   -3.484941    1.885367    5.228205
    5  N       3    2.79   0.00         2    0    1  -0.150   -1.721624    2.613544    3.537055
    6  C       3    3.06   0.00         2    0    1   0.047   -0.703862    5.003327    3.678234
    7  C       3    2.81   0.00         2


          CN  :   150.00000
          rep :   500.00000
          disp:  2500.00000
          HB1 :   250.00000
          HB2 :   450.00000

          Pauling EN used:
          Z : 1  EN :  2.20
          Z : 6  EN :  2.55
          Z : 7  EN :  3.04
          Z :57  EN :  1.10
          electric field strengths (au): 0.000

           ------------------------------------------------- 
          |           Force Field Initialization            |
           ------------------------------------------------- 

          distances ...
          ----------------------------------------
          generating topology and atomic info file ...
          pair mat ...
          computing topology distances matrix with Floyd-Warshall algo ...
          making topology EEQ charges ...
          #fragments for EEQ constrain: 1
          ----------------------------------------
          generating topology and atomic info file ...
          pair mat ...
          computing topology distances matr

                 Step     Time          Energy         fmax
*Force-consistent energies used in optimization.
LBFGSLineSearch:    0 11:40:08    -3794.330140*       1.5397
LBFGSLineSearch:    1 11:40:08    -3794.699693*       1.2723
LBFGSLineSearch:    2 11:40:08    -3795.122227*       1.6351
LBFGSLineSearch:    3 11:40:09    -3795.461648*       0.7608
LBFGSLineSearch:    4 11:40:09    -3795.589232*       0.5554
LBFGSLineSearch:    5 11:40:09    -3795.747708*       0.4832
LBFGSLineSearch:    6 11:40:09    -3795.901508*       0.5108
LBFGSLineSearch:    7 11:40:09    -3796.052275*       0.5745
LBFGSLineSearch:    8 11:40:10    -3796.185933*       0.4913
LBFGSLineSearch:    9 11:40:10    -3796.293014*       0.4360
LBFGSLineSearch:   10 11:40:10    -3796.387877*       0.5030
LBFGSLineSearch:   11 11:40:10    -3796.520067*       0.6068
LBFGSLineSearch:   12 11:40:11    -3796.738611*       0.9924
LBFGSLineSearch:   13 11:40:11    -3796.842997*       0.4462
LBFGSLineSearch:   14 11:40:11    -37

And we can again visualize the structures:

In [9]:
view_structures(out)

### Should look great!

However, this took a bit of time.

What was the ligand type assigned automatically? It is in the output text of the build_complex cell - and it should be "tri_mer". This is short for [tridentate meridial](https://www.coursehero.com/study-guides/introchem/isomers-in-coordination-compounds/), which we likley could have identified manually!

To do this **(ii)** manually, we have a tool in the documentation for visualizing all ligand types that we are replicating here for tridentates:

In [10]:
import pandas as pd # Pandas is used to read in the reference data
import numpy as np # Numpy is used for selecting from the database
import architector # Architector is used for importing the filepath to the reference data

In [11]:
# Pull out the datapath for the ligand reference structures:
ref_data_path = '/'.join(architector.__file__.split('/')[0:-1]) + '/data/angle_stats_datasource.csv'
ref_data_path

'/Users/mgt16/software/architector/architector/data/angle_stats_datasource.csv'

For the utility we need a defined denticity - since we have a ligand with 3 CAs - it is tridentate!

In [12]:
denticity = 3 

### Now, we can read in and visualize the data

In [13]:
# Read in reference data for examples.
ligdf = pd.read_csv(ref_data_path)
# Show the reference data!
print('Showing examples of each ligand label!')
print('Note that "m" indicates the metal in each - some will not show if M-L bonds are longer than cutoff radii.')
print('####################################################################################')
ligtypes = ligdf.geotype_label.value_counts().index.values
cns = [ligdf[ligdf.geotype_label == x].cn.values[0] for x in ligtypes]
order = np.argsort(cns)
for i in order:
    if cns[i] == denticity: # Only Pick out Tri Dentates
        print("Ligand label - 'ligType':", "'" + ligtypes[i] + "'")
        print('Ligand denticity: ', int(cns[i]))
        # Sample 4 structures matching these labels:
        tdf = ligdf[ligdf.geotype_label == ligtypes[i]].sample(4,random_state=42) 
        # Visualize the structures:
        view_structures(tdf.xyz_structure,labels=['m']*4)
        print('####################################################################################')

Showing examples of each ligand label!
Note that "m" indicates the metal in each - some will not show if M-L bonds are longer than cutoff radii.
####################################################################################
Ligand label - 'ligType': 'tri_fac'
Ligand denticity:  3


####################################################################################
Ligand label - 'ligType': 'tri_mer_bent'
Ligand denticity:  3


####################################################################################
Ligand label - 'ligType': 'tri_mer'
Ligand denticity:  3


####################################################################################


## Here, we can manually see that "tri_mer" or "tri_mer_bent" are possible labels for terpy!

Now we can add this information to the terpy ligands dictionary manually to accelerate generation:

In [14]:
import copy

terpy_lig_dict_copy = copy.deepcopy(terpy_ligand_dict) # Copy terpy ligand dict

terpy_lig_dict_copy['ligType'] = 'tri_mer' # Add ligType manually!

And copy the inputDict to update with manual label:

In [15]:
new_inputDict = copy.deepcopy(inputDict) # Copy inputDict

new_inputDict['ligands'] = [terpy_lig_dict_copy]*3 # Update ligands field with new terpy_dict

Finally rebuild the complex. Note that this will still likely be a bit slow - lanthanides tend to take longer with XTB.

In [16]:
newout = build_complex(new_inputDict) # Still might take a couple minutes


          CN  :   150.00000
          rep :   500.00000
          disp:  2500.00000
          HB1 :   250.00000
          HB2 :   450.00000

          Pauling EN used:
          Z : 1  EN :  2.20
          Z : 6  EN :  2.55
          Z : 7  EN :  3.04
          Z :57  EN :  1.10
          electric field strengths (au): 0.000

           ------------------------------------------------- 
          |           Force Field Initialization            |
           ------------------------------------------------- 

          distances ...
          ----------------------------------------
          generating topology and atomic info file ...
          pair mat ...
          computing topology distances matrix with Floyd-Warshall algo ...
          making topology EEQ charges ...
          #fragments for EEQ constrain: 1
          ----------------------------------------
          generating topology and atomic info file ...
          pair mat ...
          computing topology distances matr


          CN  :   150.00000
          rep :   500.00000
          disp:  2500.00000
          HB1 :   250.00000
          HB2 :   450.00000

          Pauling EN used:
          Z : 1  EN :  2.20
          Z : 6  EN :  2.55
          Z : 7  EN :  3.04
          Z :57  EN :  1.10
          electric field strengths (au): 0.000

           ------------------------------------------------- 
          |           Force Field Initialization            |
           ------------------------------------------------- 

          distances ...
          ----------------------------------------
          generating topology and atomic info file ...
          pair mat ...
          computing topology distances matrix with Floyd-Warshall algo ...
          making topology EEQ charges ...
          #fragments for EEQ constrain: 1
          ----------------------------------------
          generating topology and atomic info file ...
          pair mat ...
          computing topology distances matr


          CN  :   150.00000
          rep :   500.00000
          disp:  2500.00000
          HB1 :   250.00000
          HB2 :   450.00000

          Pauling EN used:
          Z : 1  EN :  2.20
          Z : 6  EN :  2.55
          Z : 7  EN :  3.04
          Z :57  EN :  1.10
          electric field strengths (au): 0.000

           ------------------------------------------------- 
          |           Force Field Initialization            |
           ------------------------------------------------- 

          distances ...
          ----------------------------------------
          generating topology and atomic info file ...
          pair mat ...
          computing topology distances matrix with Floyd-Warshall algo ...
          making topology EEQ charges ...
          #fragments for EEQ constrain: 1
          ----------------------------------------
          generating topology and atomic info file ...
          pair mat ...
          computing topology distances matr


          CN  :   150.00000
          rep :   500.00000
          disp:  2500.00000
          HB1 :   250.00000
          HB2 :   450.00000

          Pauling EN used:
          Z : 1  EN :  2.20
          Z : 6  EN :  2.55
          Z : 7  EN :  3.04
          Z :57  EN :  1.10
          electric field strengths (au): 0.000

           ------------------------------------------------- 
          |           Force Field Initialization            |
           ------------------------------------------------- 

          distances ...
          ----------------------------------------
          generating topology and atomic info file ...
          pair mat ...
          computing topology distances matrix with Floyd-Warshall algo ...
          making topology EEQ charges ...
          #fragments for EEQ constrain: 1
          ----------------------------------------
          generating topology and atomic info file ...
          pair mat ...
          computing topology distances matr

                 Step     Time          Energy         fmax
*Force-consistent energies used in optimization.
LBFGSLineSearch:    0 11:41:03    -3794.330140*       1.5397
LBFGSLineSearch:    1 11:41:03    -3794.699693*       1.2723
LBFGSLineSearch:    2 11:41:04    -3795.122227*       1.6351
LBFGSLineSearch:    3 11:41:04    -3795.461648*       0.7608
LBFGSLineSearch:    4 11:41:04    -3795.589232*       0.5554
LBFGSLineSearch:    5 11:41:05    -3795.747708*       0.4832
LBFGSLineSearch:    6 11:41:05    -3795.901508*       0.5108
LBFGSLineSearch:    7 11:41:05    -3796.052275*       0.5745
LBFGSLineSearch:    8 11:41:05    -3796.185933*       0.4913
LBFGSLineSearch:    9 11:41:05    -3796.293014*       0.4360
LBFGSLineSearch:   10 11:41:05    -3796.387877*       0.5030
LBFGSLineSearch:   11 11:41:06    -3796.520067*       0.6068
LBFGSLineSearch:   12 11:41:06    -3796.738611*       0.9924
LBFGSLineSearch:   13 11:41:06    -3796.842997*       0.4462
LBFGSLineSearch:   14 11:41:07    -37

Visualization should reveal the same (or near-identical) output structure:

In [17]:
view_structures(newout)

## For (C), we can reduce the necessity of manually specifying that 3 terpy ligands are filling the coordination environment

This is done with a simple parameter addition:

In [18]:
new_inputDict # print the dictionary for reference

{'core': {'metal': 'La', 'coreCN': 9, 'smiles': '[La]'},
 'ligands': [{'smiles': 'c1ccnc(c1)c2cccc(n2)c3ccccn3',
   'coordList': [3, 11, 17],
   'ligType': 'tri_mer'},
  {'smiles': 'c1ccnc(c1)c2cccc(n2)c3ccccn3',
   'coordList': [3, 11, 17],
   'ligType': 'tri_mer'},
  {'smiles': 'c1ccnc(c1)c2cccc(n2)c3ccccn3',
   'coordList': [3, 11, 17],
   'ligType': 'tri_mer'}],
 'parameters': {'assemble_method': 'GFN-FF',
  'n_conformers': 2,
  'return_only_1': True,
  'is_actinide': False,
  'original_metal': 'La'}}

Updating both the ligands definition to be only a single copy of the terpy_lig_dict_copy, and adding the parameter 'fill_ligand' to indicate that the ligand which should fill the coordination sphere should be the first ligand (index 0) or terpy!

In [19]:
new_inputDict['ligands'] = [terpy_lig_dict_copy]
new_inputDict['parameters']['fill_ligand'] = 0

We can also request the complexes to not be relaxed to save additional time with the parameter 'relax' set to False. This will result in slightly less accurate geometries, so be a bit more careful here:

In [20]:
new_inputDict['parameters']['relax'] = False
new_inputDict

{'core': {'metal': 'La', 'coreCN': 9, 'smiles': '[La]'},
 'ligands': [{'smiles': 'c1ccnc(c1)c2cccc(n2)c3ccccn3',
   'coordList': [3, 11, 17],
   'ligType': 'tri_mer'}],
 'parameters': {'assemble_method': 'GFN-FF',
  'n_conformers': 2,
  'return_only_1': True,
  'is_actinide': False,
  'original_metal': 'La',
  'fill_ligand': 0,
  'relax': False}}

Looks good, and definitely more simple that the initial version of the inputDict that we created! Now onto building (again)!

In [21]:
newout1 = build_complex(new_inputDict) # Still might take a couple minutes


          CN  :   150.00000
          rep :   500.00000
          disp:  2500.00000
          HB1 :   250.00000
          HB2 :   450.00000

          Pauling EN used:
          Z : 1  EN :  2.20
          Z : 6  EN :  2.55
          Z : 7  EN :  3.04
          Z :57  EN :  1.10
          electric field strengths (au): 0.000

           ------------------------------------------------- 
          |           Force Field Initialization            |
           ------------------------------------------------- 

          distances ...
          ----------------------------------------
          generating topology and atomic info file ...
          pair mat ...
          computing topology distances matrix with Floyd-Warshall algo ...
          making topology EEQ charges ...
          #fragments for EEQ constrain: 1
          ----------------------------------------
          generating topology and atomic info file ...
          pair mat ...
          computing topology distances matr


  atom   neighbors  erfCN metchar sp-hybrid imet pi  qest     coordinates
    1  La      6    4.40   0.02         0    2    0   0.430   -0.000000    0.000000    0.000000
    2  C       3    2.82   0.00         2    0    1   0.003   -3.371034    5.828949    7.329424
    3  C       3    2.80   0.00         2    0    1   0.007   -4.364569    3.414277    7.145246
    4  C       3    2.95   0.00         2    0    1   0.023   -3.484941    1.885367    5.228205
    5  N       3    2.79   0.00         2    0    1  -0.130   -1.721624    2.613544    3.537055
    6  C       3    3.06   0.00         2    0    1   0.053   -0.703862    5.003327    3.678234
    7  C       3    2.81   0.00         2    0    1   0.001   -1.540722    6.626304    5.597595
    8  C       3    3.06   0.00         2    0    1   0.053    1.292372    5.826842    1.763210
    9  C       3    2.81   0.00         2    0    1   0.002    2.412179    8.207841    1.777594
   10  C       3    2.81   0.00         2    0    1   0.001  


          CN  :   150.00000
          rep :   500.00000
          disp:  2500.00000
          HB1 :   250.00000
          HB2 :   450.00000

          Pauling EN used:
          Z : 1  EN :  2.20
          Z : 6  EN :  2.55
          Z : 7  EN :  3.04
          Z :57  EN :  1.10
          electric field strengths (au): 0.000

           ------------------------------------------------- 
          |           Force Field Initialization            |
           ------------------------------------------------- 

          distances ...
          ----------------------------------------
          generating topology and atomic info file ...
          pair mat ...
          computing topology distances matrix with Floyd-Warshall algo ...
          making topology EEQ charges ...
          #fragments for EEQ constrain: 1
          ----------------------------------------
          generating topology and atomic info file ...
          pair mat ...
          computing topology distances matr

          rings ...
          # BATM   1842
          # H in HB   33
          doing iterative Hueckel for 3 subsystem(s) ...

  atom   neighbors  erfCN metchar sp-hybrid imet pi  qest     coordinates
    1  La      9    4.41   0.00         0    2    0   0.441   -0.000000    0.000000    0.000000
    2  C       3    2.82   0.00         2    0    1  -0.007   -3.371034    5.828949    7.329424
    3  C       3    2.80   0.00         2    0    1  -0.001   -4.364569    3.414277    7.145246
    4  C       3    2.95   0.00         2    0    1   0.018   -3.484941    1.885367    5.228205
    5  N       3    2.79   0.00         2    0    1  -0.150   -1.721624    2.613544    3.537055
    6  C       3    3.06   0.00         2    0    1   0.047   -0.703862    5.003327    3.678234
    7  C       3    2.81   0.00         2    0    1  -0.004   -1.540722    6.626304    5.597595
    8  C       3    3.06   0.00         2    0    1   0.048    1.292372    5.826842    1.763210
    9  C       3    2.81   0.00

In [22]:
view_structures(newout1)

# Conclusions!

In this tutorial we learned:

**(A)** How to manually identify coordination sites of new ligands for generation in Architector.

**(B)** How to automatically and manually identify ligand types (geometries!).

**(C)** How to use internal commands to simplify inputs for more complex coordination environments!