# 2 - Specifiying New/More Complex Ligands 

So far (from 1-Introduction/Overview.ipynb) we know how to specify basic inputs and understand some of the basic outputs from Architector.

What about new or unknown systems inlcuding more complex ligands?

We have some tools to address some of these challenges manually along with some SMILES tools!

In this tutorial we will learn:

**(A)** How to manually identify coordination sites of new ligands for generation in Architector.

**(B)** How to automatically and manually identify ligand types (geometries!).

**(C)** How to use internal commands to simplify inputs for more complex coordination environments!

## For (A), From here, we need a challenge. Let's try a [La(Terpyiadine)<sub>3</sub>]<sup>3+</sup> complex.}

But what is the SMILES for Terpyridine (Terpy, for short), and how is it coordinated to a metal center?

Tracking down the SMILES can be done on [Wikipedia: here](https://en.wikipedia.org/wiki/Terpyridine). Giving: "c1ccnc(c1)c2cccc(n2)c3ccccn3"

However, what are the coordinating atoms?

Here, we turn to useful routines included in Architector:

In [None]:
import architector
from architector import (build_complex, # Build routine
                         view_structures, # Visualization
                         smiles2Atoms) # Smiles utility to ASE atoms

We will also initialize the metal and ligand smiles for La/Terpy:

In [None]:
terpy_smiles = 'c1ccnc(c1)c2cccc(n2)c3ccccn3'
metal = 'La'

Next, the smiles2Atoms utility converts our terpy smiles to [ASE atoms](https://wiki.fysik.dtu.dk/ase/ase/atoms.html) for visualization purposes.

In [None]:
terpy_atoms = smiles2Atoms(terpy_smiles)

### Next, we visualize with labelled indices for identification of ligand-metal coordinating atoms (CAs)

We already know the view_structures command, but there are a couple additional parameters that can be useful for this:

**(i)** The labelinds=True option adds overlays with the exact indices of the atoms as used by Architector

**(ii)** The size of the visualization can be shifted using w (width) and h (height) commands (default is 200x200)

With these two additions we can visualize the ligand structure for identification of CAs:

In [None]:
view_structures(terpy_atoms,labelinds=True,w=500,h=500) 

### Visually, we can identify that the CAs will be the nitrogen atoms (Blue atoms) at indices 3,11, and 17.

We can now save these indices for building the complexes!

In [None]:
terpy_coordList = [3,11,17]

## Now, for (B), Identifying ligand types we have 2 different methods:

**(i)*** Automatically 

**(ii)** Manually

For **(i)**, all we need to do is input ligand dictionaries without a specified ligType! So we funcationally already have enough information to generate the [La(Terpyridine)<sub>3</sub>]<sup>3+</sup> complex!

In [None]:
terpy_ligand_dict = {'smiles':terpy_smiles,
                    'coordList':terpy_coordList}

And the full input dictionary (including 3 terpy ligands!):

In [None]:
inputDict = {'core':{'metal':metal,'coreCN':9},
            'ligands':[terpy_ligand_dict]*3,
            'parameters':{'assemble_method':'GFN-FF', # Switch to GFN-FF for faster assembly, 
                          'n_conformers':2, # Test 2 different conformers
                          'return_only_1':True # Return just one
                          # but still using GFN2-xTB for the final relaxation. Will have more printout.
                         }}
inputDict # Print out full input Dictionary

Looks good! Now we build the complex using Architector - Note that this might take a couple of minutes:

In [None]:
out = build_complex(inputDict) # Might take a couple minutes

And we can again visualize the structures:

In [None]:
view_structures(out)

### Should look great!

However, this took a bit of time.

What was the ligand type assigned automatically? It is in the output text of the build_complex cell - and it should be "tri_mer". This is short for [tridentate meridial](https://www.coursehero.com/study-guides/introchem/isomers-in-coordination-compounds/), which we likley could have identified manually!

To do this **(ii)** manually, we have a tool in the documentation for visualizing all ligand types that we are replicating here for tridentates:

In [None]:
import pandas as pd # Pandas is used to read in the reference data
import numpy as np # Numpy is used for selecting from the database
import architector # Architector is used for importing the filepath to the reference data

In [None]:
# Pull out the datapath for the ligand reference structures:
ref_data_path = '/'.join(architector.__file__.split('/')[0:-1]) + '/data/angle_stats_datasource.csv'
ref_data_path

For the utility we need a defined denticity - since we have a ligand with 3 CAs - it is tridentate!

In [None]:
denticity = 4

### Now, we can read in and visualize the data

In [None]:
# Read in reference data for examples.
ligdf = pd.read_csv(ref_data_path)
# Show the reference data!
print('Showing examples of each ligand label!')
print('Note that "m" indicates the metal in each - some will not show if M-L bonds are longer than cutoff radii.')
print('####################################################################################')
ligtypes = ligdf.geotype_label.value_counts().index.values
cns = [ligdf[ligdf.geotype_label == x].cn.values[0] for x in ligtypes]
order = np.argsort(cns)
for i in order:
    if cns[i] == denticity: # Only Pick out Tri Dentates
        print("Ligand label - 'ligType':", "'" + ligtypes[i] + "'")
        print('Ligand denticity: ', int(cns[i]))
        # Sample 4 structures matching these labels:
        tdf = ligdf[ligdf.geotype_label == ligtypes[i]].sample(4,random_state=42) 
        # Visualize the structures:
        view_structures(tdf.xyz_structure,labels=['m']*4)
        print('####################################################################################')

## Here, we can manually see that "tri_mer" or "tri_mer_bent" are possible labels for terpy!

Now we can add this information to the terpy ligands dictionary manually to accelerate generation:

In [None]:
import copy

terpy_lig_dict_copy = copy.deepcopy(terpy_ligand_dict) # Copy terpy ligand dict

terpy_lig_dict_copy['ligType'] = 'tri_mer' # Add ligType manually!

And copy the inputDict to update with manual label:

In [None]:
new_inputDict = copy.deepcopy(inputDict) # Copy inputDict

new_inputDict['ligands'] = [terpy_lig_dict_copy]*3 # Update ligands field with new terpy_dict

Finally rebuild the complex. Note that this will still likely be a bit slow - lanthanides tend to take longer with XTB.

In [None]:
newout = build_complex(new_inputDict) # Still might take a couple minutes

Visualization should reveal the same (or near-identical) output structure:

In [None]:
view_structures(newout)

## For (C), we can reduce the necessity of manually specifying that 3 terpy ligands are filling the coordination environment

This is done with a simple parameter addition:

In [None]:
new_inputDict # print the dictionary for reference

Updating both the ligands definition to be only a single copy of the terpy_lig_dict_copy, and adding the parameter 'fill_ligand' to indicate that the ligand which should fill the coordination sphere should be the first ligand (index 0) or terpy!

In [None]:
new_inputDict['ligands'] = [terpy_lig_dict_copy]
new_inputDict['parameters']['fill_ligand'] = 0

We can also request the complexes to not be relaxed to save additional time with the parameter 'relax' set to False. This will result in slightly less accurate geometries, so be a bit more careful here:

In [None]:
new_inputDict['parameters']['relax'] = False
new_inputDict

Looks good, and definitely more simple that the initial version of the inputDict that we created! Now onto building (again)!

In [None]:
newout1 = build_complex(new_inputDict) # Still might take a couple minutes

In [None]:
view_structures(newout1)

# Conclusions!

In this tutorial we learned:

**(A)** How to manually identify coordination sites of new ligands for generation in Architector.

**(B)** How to automatically and manually identify ligand types (geometries!).

**(C)** How to use internal commands to simplify inputs for more complex coordination environments!