## Workflow plan

 - Create a list of SMILES strings for the molecules reported in reference 10.1021/ed085p532 (Tables 3â€“5).
 - Develop an RDKit-based query tool that checks whether the current SDF file contains the molecules from the SMILES list.
 - If a given molecule is found in the SDF file, the diamagnetic contribution for that species will be included.

### Materials:
 - [Crack the Code: Mastering SMILES Notation](https://www.youtube.com/watch?v=QRLaIARxP30)
 - [OpenSMILES community - Bible of the SMILES notation](http://opensmiles.org/)
 - [Structure-to-SMILES conventer](https://www.rcsb.org/chemical-sketch)
 - [PubChem - SMILES library](https://pubchem.ncbi.nlm.nih.gov/)

## Compounds loader

In [14]:
from pathlib import Path

from rdkit import Chem

from src import SDF_TEST_DIR
from src.loader import SDFLoader

sdf_files = [p.name for p in SDF_TEST_DIR.glob("*.sdf")]

compounds = [SDFLoader.Load(filename, "tests") for filename in sdf_files]

canonical_smiles = []
for compound in compounds:
    group = []
    for mol in compound.GetMols(to_rdkit=False):
        group.append(mol.ToSmiles())
    print(
        f"len: {len(set(group))}, file: {mol.source_file}, set: {set(group)}, group: {group}"
    )

# print(canonical_smiles)

len: 1, file: acac-.sdf, set: {'CC(=O)C=C(C)[O-]'}, group: ['CC(=O)C=C(C)[O-]', 'CC(=O)C=C(C)[O-]', 'CC(=O)C=C(C)[O-]', 'CC(=O)C=C(C)[O-]']
len: 1, file: AcO-.sdf, set: {'CC(=O)[O-]'}, group: ['CC(=O)[O-]', 'CC(=O)[O-]', 'CC(=O)[O-]']
len: 1, file: AsO43-.sdf, set: {'O=[As]([O-])([O-])[O-]'}, group: ['O=[As]([O-])([O-])[O-]', 'O=[As]([O-])([O-])[O-]', 'O=[As]([O-])([O-])[O-]', 'O=[As]([O-])([O-])[O-]', 'O=[As]([O-])([O-])[O-]']
len: 1, file: bipy.sdf, set: {'c1ccc(-c2ccccn2)nc1'}, group: ['c1ccc(-c2ccccn2)nc1', 'c1ccc(-c2ccccn2)nc1', 'c1ccc(-c2ccccn2)nc1', 'c1ccc(-c2ccccn2)nc1', 'c1ccc(-c2ccccn2)nc1']
len: 1, file: BrO3-.sdf, set: {'[O-][Br+2]([O-])[O-]'}, group: ['[O-][Br+2]([O-])[O-]', '[O-][Br+2]([O-])[O-]', '[O-][Br+2]([O-])[O-]', '[O-][Br+2]([O-])[O-]']
len: 1, file: C2O42-.sdf, set: {'O=C([O-])C(=O)[O-]'}, group: ['O=C([O-])C(=O)[O-]', 'O=C([O-])C(=O)[O-]', 'O=C([O-])C(=O)[O-]', 'O=C([O-])C(=O)[O-]']
len: 1, file: C5H5-.sdf, set: {'c1cc[cH-]c1'}, group: ['c1cc[cH-]c1', 'c1cc[cH-]