In [3]:
from pyCheckmol import CheckMol

- This function generates the bitvector of functional groups (FG) that can be used in QSPR-QSAR modeling.
- Each position means the presence or absence of a certain FG. The positions of this vector follows the ordering of checkmol table, i.e, the first position is the presence/absence of cation, in the same way the last position (204) is the alpha-hydroxyacid.


In [4]:
smi = 'CC1(C(N2C(S1)C(C2=O)NC(=O)C(C3=CC=C(C=C3)O)N)C(=O)O)C'
cm = CheckMol()
cm.functionalGroupASbitvector(smi)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
       1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 1., 0., 1., 1., 1.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0.,
       0.])

- The `cm.functionalGroupSmiles` function returns the functional groups information organized in a pandas.Dataframe, dictionary or just the list with the functional groups code by seting the parameters `justFGcode` and `returnDataframe`.

In [5]:
smi = 'CC1(C(N2C(S1)C(C2=O)NC(=O)C(C3=CC=C(C=C3)O)N)C(=O)O)C'
cm = CheckMol()
res = cm.functionalGroupSmiles(smiles=smi, isString=True, generate3D=False, justFGcode=False, returnDataframe=True,deleteTMP=False)
res

Unnamed: 0,Functional Group,Frequency,Atom Position,Functional Group Number,Code
0,thiohemiaminal,1,5,22,C2NSHC10
1,hydroxy compound,1,20,27,O1H00000
2,phenol,1,20,34,O1H1A000
3,amine,1,21,47,N1C00000
4,prim. amine,1,21,48,N1C10000
5,prim. aliphat. amine,1,21,49,N1C1C000
6,carboxylic acid deriv.,3,81122,75,C3O20000
7,carboxylic acid,1,22,76,C3O2H000
8,carboxylic acid amide,2,811,80,C3ONC000
9,carboxylic acid sec. amide,1,11,82,C3ONC200


- The full output about the molecule and functional groups provided by checkmol can be viewed using the following command

In [6]:
print(cm.information_)

Molecule name: 
atoms: 25  bonds: 27  rings: 4
   1 C  C3     0.0000   -2.4049    0.0000  (1 heavy-atom neighbors, Hexp: 0 Htot: 3)
   2 C  C3     0.5000   -1.5388    0.0000  (4 heavy-atom neighbors, Hexp: 0 Htot: 0)
   3 C  C3     1.3090   -0.9511    0.0000  (3 heavy-atom neighbors, Hexp: 0 Htot: 1)
   4 N  NAM    1.0000    0.0000    0.0000  (3 heavy-atom neighbors, Hexp: 0 Htot: 0)
   5 C  C3    -0.0000   -0.0000    0.0000  (3 heavy-atom neighbors, Hexp: 0 Htot: 1)
   6 S  S3    -0.3090   -0.9511    0.0000  (2 heavy-atom neighbors, Hexp: 0 Htot: 0)
   7 C  C3    -0.0000    1.0000    0.0000  (3 heavy-atom neighbors, Hexp: 0 Htot: 1)
   8 C  C2     1.0000    1.0000    0.0000  (3 heavy-atom neighbors, Hexp: 0 Htot: 0)
   9 O  O2     1.7071    1.7071    0.0000  (1 heavy-atom neighbors, Hexp: 0 Htot: 0)
  10 N  NAM   -0.7071    1.7071    0.0000  (2 heavy-atom neighbors, Hexp: 0 Htot: 1)
  11 C  C2    -1.6730    1.4483    0.0000  (3 heavy-atom neighbors, Hexp: 0 Htot: 0)
  12 O  O2    -2.3