## Workflow plan

 - Create a list of SMARTS strings for the bond types reported in reference 10.1021/ed085p532 (Tables 2).
 - Develop an RDKit-based query tool that checks whether the current SDF file contains the molecules with specified bond type.
 - If a given bond type is found in the molecule, the constitutive correction for that species will be included.
 - Constitutive corrections are added along atomic Pascal constants (procedure for unknown molecule).

### Materials:
 - [Substructure Filtering in RDKit ](https://www.youtube.com/watch?v=Z1PrErlmTGI)
 - [SMARTS notation](https://en.wikipedia.org/wiki/SMILES_arbitrary_target_specification)

## Compounds loader

In [1]:
from pathlib import Path
from typing import Any

import pytest
from rdkit import Chem
from rdkit.Chem import Mol, MolToSmarts, RemoveHs

from src import DIAMAG_COMPOUND_CONSTITUTIVE_CORR_SUBDIR
from src.constants.common_molecules import COMMON_MOLECULES
from src.core.compound import MBCompound
from src.core.molecule import MBMolecule
from src.constants.bonds import DIAMAG_RELEVANT_BONDS
from src.loader import SDFLoader


### Constitutive corrections for relevent bond types

In [2]:
compound: MBCompound = SDFLoader.Load(
    "1,2,3-triazine.sdf", subdir=DIAMAG_COMPOUND_CONSTITUTIVE_CORR_SUBDIR
)

mol: MBMolecule = compound.GetMols(to_rdkit=False)[0]
print(f'SDF loaded first molecule SMARTS: {mol.smarts}')

match = False
for idx, diamag_relevant_bond in enumerate(DIAMAG_RELEVANT_BONDS):
    if mol.HasSubstructMatch(smarts=diamag_relevant_bond.SMARTS):
        match = True
        print(f'{idx}: Match: "{diamag_relevant_bond.formula}": {diamag_relevant_bond}')
    
if not match:
    print("No match found.")

SDF loaded first molecule SMARTS: [#7]1:[#7]:[#6]:[#6]:[#6]:[#7]:1
61: Match: "triazine": DiamagRelevantBond(sdf_file='1,2,3-triazine.sdf', constitutive_corr=-1.4, formula='triazine', SMARTS='[$(n1nnccc1),$(n1nccnc1),$(n1cncnc1),$([nH+]1nnccc1),$(n1[nH+]nccc1),$(n1n[nH+]ccc1),$([nH+]1nccnc1),$(n1[nH+]ccnc1),$(n1ncc[nH+]c1),$([nH+]1cncnc1),$(n1c[nH+]cnc1),$(n1cnc[nH+]c1)]', description="Assumes the same constant for three triazine isomers and thier monoprotonated states. This must be noted in Software's MANUAL.")
