# 1. Introduction:

The aim of this project was the visualisation and analysis of organic molecules. In this context, the relative reactivity of molecules and their respective active sites seemed an interesting feature to analyze. Electrophilicity and nucleophilicity are fundamental concepts in organic chemistry that describe the reactivity of molecules. Electrophilicity refers to the ability of a molecule or ion to accept an electron pair, making it an electron-loving species (electrophile). Electrophiles typically have a positive charge, partial positive charge, or an electron-deficient atom, making them attracted to electron-rich regions. On the other hand, nucleophilicity describes the ability of a molecule or ion to donate an electron pair, making it an electron-rich species (nucleophile). Nucleophiles are usually negatively charged or have lone pairs of electrons, such as anions, amines, and alcohols. The interaction between nucleophiles and electrophiles drives many chemical reactions, particularly in organic synthesis, where nucleophiles attack electrophiles to form new bonds. 

# 2. Project Functionality, Results and Limitations:

## Ranking

### Functionality and Results

The first feature is the ranking of a list of smiles. This is achieved by transforming the smiles into an xyz file format. The latter is read by rdkit wich generates 3D coordinates in an XYZ file. The global descriptors (nucleophilicity or electrophilicity)  are extractd for the given molecules and given descriptor (E or N). The molecules are then ranked in order of decreasing reactivity. The calculations are performed multiple times to average the result. The global descriptors generated form morfeus for electrophilicity and nucleophilicity are calculated respectively as is follows:

<div style="text-align: center;">
    <img src="https://raw.githubusercontent.com/fracaludo/RankChem/main/notebooks/images/glob_desc.png" alt="Global Descriptors" width ="200">
</div>

Where *IP* is the ionization potential and *EA* the electron affinity.

Parameters:
- Smiles input
- E or N


Output:
- Ranked smiles
- Corresponding global descriptor value

In [51]:
import RankChem

smiles_list = ["CC=O", "CC(=O)C", "O=COC", "CN(C)C(C)=O"]

# Choose the descriptor type: 'N' for nucleophilicity, 'E' for electrophilicity
descriptor_type = 'E'  # or 'E'

# Define the number of iterations for averaging the descriptor values
iterations = 10

# Calculate the descriptors for the list of SMILES strings
descriptors = RC.calculate_descriptor(smiles_list, descriptor_type, iterations)

# Rank the molecules based on their descriptor values
ranked_descriptors = RC.rank_descriptors(descriptors)

# Print the ranked list of molecules
print("Ranked list of molecules based on their descriptors:")
for smiles, descriptor in ranked_descriptors:
    print(f"{smiles}: {descriptor}")


ModuleNotFoundError: No module named 'RankChem'

### Limitations

The Ranking feature is often wrong. Indeed, the ranking yields different results when processing the same SMILES input multiple times. Several factors may contribute to this unreliability. The main hypothesis is that the global descriptor calculations generated by morfeus or XTB-python are wrong, which is however very unlikely as they are widely used and should be reliable. Additionally, coordinate problems arise due to RDKit using different bases for each molecule, making direct comparisons challenging and potentially inaccurate. Moreover, as all molecules have different conformers, varying conformers may lead to different descriptor values and hence, inconsistent rankings.


Due to time constraints, it was decided to accept the code as it is, even though it may not always return the correct ranking.

## Highlighting

### Functionality and Results
The second added feature is the highlighting of nucleophile and electrophile sites on a molecule. Provided with the SMILES of a molecule, the morfeus-xtb package reads the xyz file and calculates fukui indexes for every atom in the molecule storing them in a dictionnary. According to the documentation provided by Morfeus, the Fukui coefficients are determined through finite differences approach using the atomic charges from xtb. More information regarding said calculations is provided on the Morfeus Background for XTB electronic parameters (https://github.com/digital-chemistry-laboratory/morfeus.git). This being said, the results of the Fukui coefficients varied every time the code was being ran. It was therefore chosen to iterate multiple times over the same xyz file and generate multiple Fukui dictionnaries for the same molecule and finally average the values. The maximum of the average values was chosen as the most electrophilic or nucleophilic. Its index is taken and highlited on the 3D representation of the molecule. In spite of the many iterations the atom with the highest nucleophilic fukui coefficient, often appeared to be the wrong one. For instance acetaldehyde was chosen as electrophile, the provided result XXXX.
Parameters:
XXX
Output:
- 3D interactive visulaization with highlighted site
- Fukui average values for all atoms
- Highest average Fukui value

In [None]:
from ....

### Limitations

Again, the highlighting feature does not correctly highlight the right sites. When asking for the N site, it usually gives back the correct N site (although not always). However, when asking for the E site, it nearly always returns the N site highlighted, which is bizarre. When controlling the fukui values for all the atoms, the atom with the highest fukui E value is often as well the atom with the highest fukui N value. The indexing should therefore be correct and one can assume that XTB and morfeus are unreliable.
Another source of error could be the generated coordinates. Indeed, Rdkit generates different coordinates every time one runs the code. As the xyz files are generated from the coordinates, and the xtb calculations from the xyz files, the results always varied. Therefore, multiple codes were tested where N different conformers were generated. Then, the coordinates, xyz files and xtb calculations for all atoms in all N conformers were calculated. An average Fukui for both E and N was calculated for each atom of the molecule. This should have given consistent E and N values. However, this method also did not give reliable and stable results.
The most probable explanation for the highlighting errors is that something is not working correctly in the Morfeus or XTB packages. This, however, means that the issue with our code is out of our hands. One could contact the package authors for further information about this.
Again, due to time constraints, it was decided to accept the code as it is, despite it not always correctly highlighting the intended atom.


# Challenges

While working on this project we faced many problems.

Firstly, installing xtb-python was more challenging than it should have been. Indeed, we discovered after many installation tries with the assistants that xtb-python is only available for Apple and Linux. However, we both have Windows. Hence, Linux had to be installed on both our computers, which was not a very straightforward process. Then understanding how Linux worked and trying to link it to our windows files was also not that easy. (Ludovica's computer literally crashed due to the size of Linux.)

Then, the coding started. The randomness of the XTB results ...

Finally the correct highlighting of the E and N sites ...

Bibliography:

Nápoles-Duarte JM, Biswas A, Parker MI, Palomares-Baez JP, Chávez-Rojo MA and Rodríguez-Valdez LM (2022) Stmol: A component for building interactive molecular visualizations within streamlit web-applications. Front. Mol. Biosci. 9:990846. doi: 10.3389/fmolb.2022.990846



Due to time constraints, it was decided to accept the code as it is, even though it may not always correctly highlight the intended atom.

The highlighting error could be due to an error in the atom indexing. This is however unlikely as we tested this hypothesis multiple times and it never led to a better result. Another source of error could be the generated coordinates. Indeed, Rdkit generates different coordinates every time one runs the code. As the xyz files are generated from the coordinates, and the xtb calculations from the xyz files, the results always varied. Therefore, multiple codes were tested where N conformers were generated, then the coordinates, xyz files and xtb calculations for all atoms in all N conformers were calculated. An average Fukui for both E and N was calculated for each atom of the molecule. This should therefore give consitent E and N values. However, this method also did not give reliable and stable results.


The last possible explanation for the ranking and highlighting errors is that something is not working correctly in the Morfeus or XTB packages. This, however, means that the issue with our code is out of our hands. One could contact the package authors for further information about this.


An interface was then created where the user is asked to input a smiles of a molecule and then choose if he/she wanted the nucleophile or electrophile site highlighted. The number of iterations can be chosen and the visualization style chosen. The interface then returns the displayed 3D molecule with the previously chosen site highlighted (E or N). The sidebar of the interface also contains links to our repository and useful documentation.