# Molecular Descriptors and Fingerprints in RDKit

---

## Table of contents

- [What are Molecular Descriptors?](#What-are-Molecular-Descriptors?)
- [Calculating Molecular Descriptors](#Calculating-Molecular-Descriptors)
- [Mordered Descriptors](#Mordered-Descriptors)


## What are Molecular Descriptors?

---
Molecular descriptors are numerical represenations of chemical structures and properties. RDKit provides a wide range of descriptors that can be used for various applications, such as QSAR modeling (a compuational approach that analyses, simulates or pedicts the toxicity of a chemical on the basis of its chemical structure), virtual screening, and property prediction.

## Calculating Molecular Descriptors
---
You can calculate molecular descriptors using the `Descriptors` module in RDKit. Here's an example:

In [1]:
from rdkit import Chem
from rdkit.Chem import Descriptors


# Create a molecule from SMILES
mol = Chem.MolFromSmiles("CN1C=NC2=C1C(=O)N(C(=O)N2C)C")

# Calculated descriptors
molecular_weight = Descriptors.MolWt(mol)
heavy_atoms = Descriptors.HeavyAtomMolWt(mol)
hydrogen_acceptors = Descriptors.NumHAcceptors(mol)
hydrogen_donors = Descriptors.NumHDonors(mol)

print(f"Molecular weight: {molecular_weight:.2f}")
print(f"Heavy Atom count: {heavy_atoms}")
print(f"Hydrogen Acceptors: {hydrogen_acceptors}")
print(f"Hydrogdn Donors: {hydrogen_donors}")

Molecular weight: 194.19
Heavy Atom count: 184.11399999999998
Hydrogen Acceptors: 6
Hydrogdn Donors: 0


RDKit provides many other descriptors, including:

* `Descriptors.NumRotatableBonds(mol)`: Number of rotatable bonds
* `Descriptors.HeavyAtomCount(mol)`: Number of heavy atoms
* `Descriptors.MolLogP(mol)`: LogP value
* `Descriptors.TPSA(mol)`: Topological polar surface area

**LogP**, also known as the partition coefficient, is a crucial parameter in chemistry, particulary in drug discovery and development. It represents the logarithm of the partition coefficient of a solute between octanol and water at near infinite dilution. LogP is a measure of the preference of a compound to dissolve in either water or an organic solvent, such as octanol, when uncharged.

You can further explore the complete list of 217 available descriptors form [Greg Landrum's](https://greglandrum.github.io/rdkit-blog/posts/2022-12-23-descriptor-tutorial.html) blog post. 

Let's list first ten of them.

In [2]:
from rdkit.Chem import Descriptors
[print(descriptor) for descriptor in Descriptors._descList[:10]]
print(f"Total number of Descriptors: {len(Descriptors._descList)}")

('MaxAbsEStateIndex', <function MaxAbsEStateIndex at 0x10ba16140>)
('MaxEStateIndex', <function MaxEStateIndex at 0x10ba15fe0>)
('MinAbsEStateIndex', <function MinAbsEStateIndex at 0x10ba161f0>)
('MinEStateIndex', <function MinEStateIndex at 0x10ba16090>)
('qed', <function qed at 0x10ba654e0>)
('SPS', <function SPS at 0x10ba65a60>)
('MolWt', <function <lambda> at 0x10ba65850>)
('HeavyAtomMolWt', <function HeavyAtomMolWt at 0x10ba65900>)
('ExactMolWt', <function <lambda> at 0x10ba661f0>)
('NumValenceElectrons', <function NumValenceElectrons at 0x10ba662a0>)
Total number of Descriptors: 217


## Mordered Descriptors
---

[Mordred](https://github.com/mordred-descriptor/mordred?tab=readme-ov-file) is a comprehensive, open-source chemical descriptor calculation tool designed for use in cheminformatics, drug discovery, and material science. It can compute a vast array of descriptors based on molecular sturcture, ranging from simple aromatic to more complex 3D geometry calculations. 

## Overview
Chemical descriptors are numerical values that represent chemical information encoded within a molecular structure. Thesd descriptrs play a crucial rol in the quantitative analysis of chemical compounds, enabling researchesr to apply statistical and machine learning models to predict physicochemical properties, biological activity, and more. 

## Types of Descriptors

Mordred calculated over a thousand descriptors, which can be broadly categorized into several types:

* **Constitutional Descriptors**: Basic counts of atoms, bonds, molecular size, and other simple charactereistic of the molecule's constitution.
* **Topological Descriptors**: Derived from the molecule's graph representation, including connectivity, branching, and molecular shape indices.
* **Geometrical Descriptors**: Related to the 3D sturcture of the molecule, including distances, angles, and molecular volume.
* **Electronic Descriptors**: Reflect the electronic distribution and potential energy surface characteristics, often used in quantum chemical calculations.
* **Hydrophobicity and Hydrophilicity Descriptors**: Capture the molecule's partitioning behavior between hydrophobi and hydrophilic environments, which is critical in drug design for understanding solubility and permeability.



## Installation and Usage

Mordred is easy to install using pip:

In [3]:
!pip install mordred



In [4]:
!pip update numpy

ERROR: unknown command "update"


To use Mordred, you need a molecule object from RDKit. Here's a simple example:

In [8]:
from mordred import Calculator, descriptors
from rdkit import Chem

# Create a molecule object
mol = Chem.MolFromSmiles("CN1C=NC2=C1C(=O)N(C(=O)N2C)C")

# Crete a calculator with all descriptors
calc = Calculator(descriptors, ignore_3D=True)

# Calculate descriptor for the molecule
result = calc(mol)

# Convert the result to a dictionary
descriptor_dict = result.asdict()
descriptor_dict

{'ABC': <mordred.error.Error at 0x107c1c880>,
 'ABCGG': <mordred.error.Error at 0x10d591a80>,
 'nAcid': 0,
 'nBase': 0,
 'SpAbs_A': 17.66822897660583,
 'SpMax_A': 2.504976898746481,
 'SpDiam_A': 4.932792612901732,
 'SpAD_A': 17.668228976605825,
 'SpMAD_A': 1.2620163554718447,
 'LogEE_A': 3.586589811148481,
 'VE1_A': 3.456084231045242,
 'VE2_A': 0.24686315936037445,
 'VE3_A': 1.5766084596749441,
 'VR1_A': 57.28665037374297,
 'VR2_A': 4.091903598124498,
 'VR3_A': 4.384539855427741,
 'nAromAtom': 9,
 'nAromBond': 10,
 'nAtom': 24,
 'nHeavyAtom': 14,
 'nSpiro': 0,
 'nBridgehead': 0,
 'nHetero': 6,
 'nH': 10,
 'nB': 0,
 'nC': 8,
 'nN': 4,
 'nO': 2,
 'nS': 0,
 'nP': 0,
 'nF': 0,
 'nCl': 0,
 'nBr': 0,
 'nI': 0,
 'nX': 0,
 'ATS0dv': 248.0,
 'ATS1dv': 245.0,
 'ATS2dv': 384.0,
 'ATS3dv': 353.0,
 'ATS4dv': 278.0,
 'ATS5dv': 68.0,
 'ATS6dv': 6.0,
 'ATS7dv': 0.0,
 'ATS8dv': 0.0,
 'ATS0d': 86.0,
 'ATS1d': 105.0,
 'ATS2d': 172.0,
 'ATS3d': 167.0,
 'ATS4d': 146.0,
 'ATS5d': 92.0,
 'ATS6d': 51.0,
 'ATS

### 