# Welcome to **metal_complex**

- ## **Introduction**

The study of complexes is an important facet of chemistry. Indeed, chemical complexes have many fields of application such as catalysis, biorganic chemistry and organometallic chemistry. Complexes are usually formed from a central metal bound to several ligands. Although it may seem simple, complexes are sometimes difficult to guide. Indeed, ligands can be significant molecules, which makes the visualization of the complex and the study of its properties difficult. A principal property in the study of complexes is the oxidation state of central metal. Indeed, the oxidation state of the metal can influence the 3D structure but also the properties of the complex (the oxidation state of a metal can for example determine whether it has a high or low spin and thus what will be the possible UV-visible transitions of the complex).

This report details the development and execution of a program whose purpose is to simplify the study of complexes by allowing the 3D visualization of the complexes, obtaining the oxidation state of the central metal and calculating the molar mass of the complex. 


- ## **Objectives**

The aims of this project was to develop a programm that can :
1. Design interactive 3D structure of a complexe from smiles inputs 
2. Determine the oxydation state of the central metal atom of the complex
3. Determine the molecular weight of sophisticated complexes

- ## **Tools and data used**

First of all, jupyter lab was the environement used in order to do the notebooks and write the code.

The following packages were used and are needed for the good execution of the code :

- Pandas : a software library in order to shape and manipulate the data. 
- rdkit:  a software library for molecular modeling, analysis and design.
- py3Dmol : a widget to embed interactive 3D molecula viewer in a notebook.
- tkinter : a module for creating and managing graphical user interfaces.
- RegEx (also named re): a package that implements regular expression search.

The following data were used: 

- ligands_misc_info.csv : the data was used to find the bonding atom of each ligand and to finc the corresponding ligand number of each ligand. The data can be found at the following URL : [ligands_misc_info.csv](https://raw.githubusercontent.com/hkneiding/tmQMg-L/main/ligands_misc_info.csv)
- ligands_fingerprints.csv : the data was used to find the charge of each ligand. The data canbe found at the following URL : [ligands_fingerprints.csv](https://raw.githubusercontent.com/hkneiding/tmQMg-L/main/ligands_fingerprints.csv)
- oxydation_states_métaux.csv: the data was used in order to check if the oxydation state of the metal is possible or not. The data can be found at the following URL : [oxydation states métaux](https://github.com/sermetsim/metal_complex/blob/main/data/oxydation%20states%20m%C3%A9taux.csv) 

- ## **Creation Process**

### 1. Modelise 3D metal complex

The first and most difficult step was to model the complex in 3D. The problem encountered was that the 3Dmol Python library could not draw the given complex correctly as it was too complex and large. To resolve this, a function determining the coordinates of each atom of each ligand was written. However, a problem occurred when trying to connect these atoms, as the program would sometimes connect the wrong atoms together or not respect the octet rule (e.g., a hydrogen atom with two bonds). 

Therefore, another solution was found. Instead of connecting them directly in 3D, the ligands and the metal were grouped in the same 2D plane. With the help of a database, the coordinates of the linkage atoms of the ligands could be determined and bound to the metal. Once the final complex was formed in 2D, it could easily be transformed into 3D using RDKit functions.

### 2. Calculate the molecular weight and oxidation state

The second step involved calculating the molecular weight and oxidation state. The molecular weight was calculated using the *Chem.MolFromSmiles()* function, which computes the molecular weight of a given SMILES string. The molecular weights of the individual SMILES were then summed together to obtain the total molecular weight.

For the oxidation state, the total charge of the ligands was determined with the help of a database containing the charges of various ligands. This information allowed the determination of the compound's oxidation state. However, a problem arose in a few cases where the metal preferred one oxidation state over another, even though it resulted in a charged complex instead of a neutral one (e.g., PtCl4, which prefers to be negatively charged even though Pt can have an oxidation state of IV). To address this issue, the total charge of the complex needs to be known and is requested from the user.

### 3. Interface 

A Tkinter interface was created to allow the user to input their desired ligands and metal (as SMILES), as well as the global charge of the final complex. It also allows the user to choose what information should be returned (the 3D complex, the molecular weight, the oxidation state, or all of them). The results of the calculations are displayed within the interface; however, the 3D visualization is provided within the Jupyter Notebook. This is because it is impossible to create an interactive image in the Tkinter interface.


- ## **What can achieve this python package?**

### Initialisation

First of all, the functions must be import.

In [1]:
import sys
import os
notebook_path = os.getcwd()
src_path = os.path.abspath(os.path.join(notebook_path, "../src/metal_complex"))
sys.path.insert(0, src_path)
from metal_functions import *

### <u>Modelise 3D metal complex</u>
This package has few useful functions as a chemist. First of all, let's talk about the 3D vizualisation of a metal complex as said in the package name. The next code show a six coordination number complex: $[Fe(en)_{3}]^{+3}$

In [5]:
list_of_ligand_SMILES = ['N([H])([H])C([H])([H])C([H])([H])N([H])[H]','N([H])([H])C([H])([H])C([H])([H])N([H])[H]','N([H])([H])C([H])([H])C([H])([H])N([H])[H]']
metal_SMILES = '[Fe]'
metal_complex(list_of_ligand_SMILES, metal_SMILES)

#### Errors
As you can see, the molecule is interactive but some atoms are not recognize by rdkit (`error UFFTYPER`) but it does not impact the viewer. However, this function does not work with all ligands: firstly, it can return a non possible molecule with bad geometry and secondly, the results may changed if the code is use two times:

In [8]:
list_of_ligand_SMILES = ['[Cl]','[Cl]','[Cl]','[Cl]','[Cl]','[Cl]']
metal_SMILES = '[Ni]'
metal_complex(list_of_ligand_SMILES, metal_SMILES)

In this example, the molecule is not octahedric and the bonds between Ni and the Cls are of different length.  
Nevertheless it is not the only problem: in the used data, <u>[ligands_misc_info.csv](https://raw.githubusercontent.com/hkneiding/tmQMg-L/main/ligands_misc_info.csv)</u>, some ligands does not have filled data. For example, the octane does not appear in the data, thus it returns an error because some functions cannot work without correct idx_list.

In [9]:
list_of_ligand_SMILES = ['[C]([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])[H]']
metal_complex(list_of_ligand_SMILES, metal_SMILES)

#### Test functions

Another issue with in this functions is related to the test part. For example, with the function smiles_to_ligand, a list of SMILES is transformed into a list of mol object but for the same SMILES, the returned mol object is not the same:

In [9]:
mol1 = Chem.MolFromSmiles('C')
mol2 = Chem.MolFromSmiles('C')
print(mol1, mol2)
if mol1 != mol2:
    print('Not the same mol object')

As a consequence, the following functions (smiles_to_ligand, create_molecule_in_3D, metal_complex and show_complex) cannot be tested because it is impossible to predict the exact output mol object name.

### <u>Give the oxydation state</u>

This package can calculate the oxydation state of the metal center of the complexes. The next code calculate it for $[Fe(en)_{3}]^{+3}$ :  

In [9]:
ligands = ['N([H])([H])C([H])([H])C([H])([H])N([H])[H]','N([H])([H])C([H])([H])C([H])([H])N([H])[H]','N([H])([H])C([H])([H])C([H])([H])N([H])[H]']
metal_SMILES = '[Fe]'
charge = 3
print(metal_oxydation_state(charge, total_charge_of_the_ligands(smile_to_number(ligands)), metal_SMILES))

This program can also tell if the oxydation state of the metal is impossible. the next code should calculate the oxydation state of the iron in the following complex: $[Fe(en)_{3}]^{+9}$, but this complex does not exist as the oxydation state of the Iron is impossible. The porgram should tell it : 

In [13]:
ligands = ['N([H])([H])C([H])([H])C([H])([H])N([H])[H]','N([H])([H])C([H])([H])C([H])([H])N([H])[H]','N([H])([H])C([H])([H])C([H])([H])N([H])[H]']
metal_SMILES = '[Fe]'
charge = 9
print(metal_oxydation_state(charge, total_charge_of_the_ligands(smile_to_number(ligands)), metal_SMILES))

#### Errors
Nevertheless, some ligands are not in the database so there can be some errors as one can se here with octane : 

In [11]:
ligands = ['[C]([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])[H]', '[C]([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])[H]', '[C]([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])[H]', '[C]([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])[H]', '[C]([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])[H]', '[C]([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])C([H])([H])[H]']
metal_SMILES = '[Fe]'
charge = 0
print(metal_oxydation_state(charge, total_charge_of_the_ligands(smile_to_number(ligands)), metal_SMILES))

Here the program replies "Sorry your ligand is invalid" while the smiles are correct. The program fails to distinguish between correct smiles whose ligand is not in the database and smiles of a ligand that just does not exist. 

### <u>Give the moleculare mass</u>

A simple function add to the package determine the mass of the 3D complex by giving the SMILES of all the ligands and the one of the metal.  
The functions is round at 3 decimals and show the molecular mass even if the molecule does not appear correctly with the <u>metal_complex</u> function.

In [24]:
list_of_ligand_SMILES = ['[Cl]','[Cl]','[Cl]','[Cl]','[Cl]','[Cl]']
metal_SMILES = '[Ni]'
print(calculate_MO(list_of_ligand_SMILES, metal_SMILES))

list_of_ligand_SMILES = ['c1ccccc1','c1ccccc1','c1ccccc1','c1ccccc1']
metal_SMILES = '[Pt]'
print(calculate_MO(list_of_ligand_SMILES, metal_SMILES))

- ## **Tkinter interface**

A Tkinter interface has been created (file: <u>[src/metal_complex/interface_project.py](https://github.com/sermetsim/metal_complex/blob/main/src/metal_complex/interface_project.py)</u>). When the script is run an interface appears with the following elements:
- 6 ligand entries
- 1 metal SMILES entry
- 1 total complex charge entry
- 1 'Submit' button
- 3 checkboxes  
With these checkboxes, the oxidation state and the molecular mass of the metal can be shown directly on the interface by selecting the corresponding checkboxes. However, the 3D metal complex cannot be shown directly on the interface because it is an interactive 3D object.

Here is the code to run the interface:

In [5]:
%run ../src/metal_complex/interface_project.py

- ## **Faced challenges**

### <u>Create a 3D metal complex</u>

At the beginning, the aim was to convert a metal complex SMILES directly to a mol object, nevertheless, these types of molecules does not always have SMILES.  
The next step was to recover the XYZ file on the <u>[tmqmg-l](https://github.com/hkneiding/tmQMg-L)</u> dataset and create bond for close atoms. But another probelm raised up: some hydrogen atoms was enough closed to the metal to be bond at it, consequently there were a lot of valence electron problems.  
Finally, the best approach was to use the Chem.CombineMol(mol1,mol2) function of RdKit to combine the ligand mol objects to the metal atoms throught the MolBlock file and create 3D coordinate with AllChem.EmbedMolecule(mol).

By using this method, the only requirements was to know the atoms that will bond with the metal. On the previous dataset, this information was also given but the index of the bonding atoms also counts the H-atoms that do not appear explicitly in the MolBlock. Therefore, the function simplify_idx was used to obtain the index of the atoms without counting the H-atoms.

Of course, a lot of minor issues raised up during all the coding part like understand how to make a manual bond between two atoms, how to extract the valued information of the given dataset.

### <u>Find the oxydation state</u>

The difficulty for this part was to find a usable database to have the possible oxidation states for metals. A github repository on the subject was found but the data was to hard to manipulate so a .csv file was created to meet the expectations. This file is the file oxidation_states_métaux.csv

- ## **Conclusion**

This Python package simplifies the study of chemical complexes by providing tools for 3D visualization, molecular weight calculation, and oxidation state determination. The Tkinter interface allows users to input ligands and metals (as SMILES) and choose the desired output, enhancing usability.

Despite challenges in modeling complex structures, the approach of transforming 2D planes into 3D models using RDKit functions proved effective. Limitations include the constraints of the database and the inability to create interactive 3D images directly in Tkinter, which are instead rendered in a Jupyter Notebook.

Overall, the project successfully aids in visualizing and analyzing chemical complexes. Future improvements could enhance model accuracy and expand the ligand database to reduce errors. This tool is valuable for chemists in both research and education.

- ## **Further developpement**

The future developments of this program would be to determine the number of valence electrons of the central metal in order to be able to predict which reactions are possible with the complex. By concentrating on the field of organometallic chemistry, it would be possible to determine which of the following reactions is possible or most likely : 

- Ligand association
- Ligand dissociation
- Oxydative addition
- Reductive elimination
- Migratory insertion
- β-H elimination

This would be useful in the study of certain reaction and catalytic properties of complexes and metals.  
Another possible developpement would be to show the 3D metal complex directly on the Tkinter interface or, if it is not possible with Tkinter, on another interface.