The descriptors were introduced in the paper "A widely applicable set of descriptors" published by Paul Labute in the Journal of Molecular Graphics and Modelling back in 2000. [Here's a link](https://dx.doi.org/10.1016/S1093-3263(00)00068-1) Random aside: I'm a bit surprised to see that JMGM still exists... it used to be on my standard reading list back in the day, but I haven't thought about it in years. :-)

I won't get deeply into the motivation and derivation, read the paper for that, but Paul wanted to come up with a set of descriptors which were generally useful for QSAR studies. He published a three sets of related descriptors: `SlogP_VSAX`, `SMR_VSAX`, and `PEOE_VSAX` which are all based on the same idea: you calculate the contribution of each atom in the molecule to a molecular property (either LogP, MR, or the partial charge) along with the contribution of each atom to an approximate molecular surface area measure (this is the VSA part), assign the atoms to bins based on the property contributions, and then sum up the VSA contributions for each atom in a bin.

Sounds complicated, but it isn't. Here's a simple example, the molecule methylamine `CN`

| atom | logp contribution | VSA contribution |
|---|---------|-------|
| C | -0.2035 | 7.048 |
| N | -1.019  | 5.734 |


The boundaries of the LogP bins for the `SlogP_VSA` descriptor that the RDKit uses are:
```
[-0.4, -0.2, 0, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5, 0.6]
```
So the N would add a contribution of 5.734 to bin 1 and the C would add a contribution of 7.048 to bin 2 (for this descriptor the bins are labelled from 1).

In [34]:
from rdkit import Chem
from rdkit.Chem import rdMolDescriptors
from rdkit.Chem import Descriptors
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw
from rdkit.Chem import Crippen
import rdkit
print(rdkit.__version__)

2022.09.4


In [74]:
rdMolDescriptors.SlogP_VSA_(Chem.MolFromSmiles('CN'))

[5.733667477162185,
 7.04767198267719,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0]

In [70]:
rdMolDescriptors._CalcCrippenContribs(Chem.MolFromSmiles('CN'))

[(-0.2035, 2.753), (-1.019, 2.262)]

In [72]:
list(rdMolDescriptors._CalcLabuteASAContribs(Chem.MolFromSmiles('CN'))[0])

[7.04767198267719, 5.733667477162185]

In [5]:
print(Descriptors.SMR_VSA3.__doc__)

MOE MR VSA Descriptor 3 ( 1.82 <= x <  2.24)


In [13]:
# from: https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/Descriptors/Crippen.cpp#L194
rdkit_data="C1	[CH4]	0.1441	2.503	\n"\
"C1	[CH3]C	0.1441	2.503	\n"\
"C1	[CH2](C)C	0.1441	2.503	\n"\
"C2	[CH](C)(C)C	0	2.433	\n"\
"C2	[C](C)(C)(C)C	0	2.433	\n"\
"C3	[CH3][N,O,P,S,F,Cl,Br,I]	-0.2035	2.753	\n"\
"C3	[CH2X4]([N,O,P,S,F,Cl,Br,I])[A;!#1]	-0.2035	2.753	\n"\
"C4	[CH1X4]([N,O,P,S,F,Cl,Br,I])([A;!#1])[A;!#1]	-0.2051	2.731	\n"\
"C4	[CH0X4]([N,O,P,S,F,Cl,Br,I])([A;!#1])([A;!#1])[A;!#1]	-0.2051	"\
"2.731	\n"\
"C5	[C]=[!C;A;!#1]	-0.2783	5.007	\n"\
"C6	[CH2]=C	0.1551	3.513	\n"\
"C6	[CH1](=C)[A;!#1]	0.1551	3.513	\n"\
"C6	[CH0](=C)([A;!#1])[A;!#1]	0.1551	3.513	\n"\
"C6	[C](=C)=C	0.1551	3.513	\n"\
"C7	[CX2]#[A;!#1]	0.0017	3.888	\n"\
"C8	[CH3]c	0.08452	2.464	\n"\
"C9	[CH3]a	-0.1444	2.412	\n"\
"C10	[CH2X4]a	-0.0516	2.488	\n"\
"C11	[CHX4]a	0.1193	2.582	\n"\
"C12	[CH0X4]a	-0.0967	2.576	\n"\
"C13	[cH0]-[A;!C;!N;!O;!S;!F;!Cl;!Br;!I;!#1]	-0.5443	4.041	\n"\
"C14	[c][#9]	0	3.257	\n"\
"C15	[c][#17]	0.245	3.564	\n"\
"C16	[c][#35]	0.198	3.18	\n"\
"C17	[c][#53]	0	3.104	\n"\
"C18	[cH]	0.1581	3.35	\n"\
"C19	[c](:a)(:a):a	0.2955	4.346	\n"\
"C20	[c](:a)(:a)-a	0.2713	3.904	\n"\
"C21	[c](:a)(:a)-C	0.136	3.509	\n"\
"C22	[c](:a)(:a)-N	0.4619	4.067	\n"\
"C23	[c](:a)(:a)-O	0.5437	3.853	\n"\
"C24	[c](:a)(:a)-S	0.1893	2.673	\n"\
"C25	[c](:a)(:a)=[C,N,O]	-0.8186	3.135	\n"\
"C26	[C](=C)(a)[A;!#1]	0.264	4.305	\n"\
"C26	[C](=C)(c)a	0.264	4.305	\n"\
"C26	[CH1](=C)a	0.264	4.305	\n"\
"C26	[C]=c	0.264	4.305	\n"\
"C27	[CX4][A;!C;!N;!O;!P;!S;!F;!Cl;!Br;!I;!#1]	0.2148	"\
"2.693	"\
"\n"\
"CS	[#6]	0.08129	3.243	\n"\
"H1	[#1][#6,#1]	0.123	1.057	\n"\
"H2	[#1]O[CX4,c]	-0.2677	1.395	\n"\
"H2	[#1]O[!#6;!#7;!#8;!#16]	-0.2677	1.395	\n"\
"H2	[#1][!#6;!#7;!#8]	-0.2677	1.395	\n"\
"H3	[#1][#7]	0.2142	0.9627	\n"\
"H3	[#1]O[#7]	0.2142	0.9627	\n"\
"H4	[#1]OC=[#6,#7,O,S]	0.298	1.805	\n"\
"H4	[#1]O[O,S]	0.298	1.805	\n"\
"HS	[#1]	0.1125	1.112	\n"\
"N1	[NH2+0][A;!#1]	-1.019	2.262	\n"\
"N2	[NH+0]([A;!#1])[A;!#1]	-0.7096	2.173	\n"\
"N3	[NH2+0]a	-1.027	2.827	\n"\
"N4	[NH1+0]([!#1;A,a])a	-0.5188	3	\n"\
"N5	[NH+0]=[!#1;A,a]	0.08387	1.757	\n"\
"N6	[N+0](=[!#1;A,a])[!#1;A,a]	0.1836	2.428	\n"\
"N7	[N+0]([A;!#1])([A;!#1])[A;!#1]	-0.3187	1.839	\n"\
"N8	[N+0](a)([!#1;A,a])[A;!#1]	-0.4458	2.819	\n"\
"N8	[N+0](a)(a)a	-0.4458	2.819	\n"\
"N9	[N+0]#[A;!#1]	0.01508	1.725	\n"\
"N10	[NH3,NH2,NH;+,+2,+3]	-1.95		\n"\
"N11	[n+0]	-0.3239	2.202	\n"\
"N12	[n;+,+2,+3]	-1.119		\n"\
"N13	[NH0;+,+2,+3]([A;!#1])([A;!#1])([A;!#1])[A;!#1]	-0.3396	"\
"0.2604	"\
"\n"\
"N13	[NH0;+,+2,+3](=[A;!#1])([A;!#1])[!#1;A,a]	-0.3396	"\
"0.2604	"\
"\n"\
"N13	[NH0;+,+2,+3](=[#6])=[#7]	-0.3396	0.2604	\n"\
"N14	[N;+,+2,+3]#[A;!#1]	0.2887	3.359	\n"\
"N14	[N;-,-2,-3]	0.2887	3.359	\n"\
"N14	[N;+,+2,+3](=[N;-,-2,-3])=N	0.2887	3.359	\n"\
"NS	[#7]	-0.4806	2.134	\n"\
"O1	[o]	0.1552	1.08	\n"\
"O2	[OH,OH2]	-0.2893	0.8238	\n"\
"O3	[O]([A;!#1])[A;!#1]	-0.0684	1.085	\n"\
"O4	[O](a)[!#1;A,a]	-0.4195	1.182	\n"\
"O5	[O]=[#7,#8]	0.0335	3.367	\n"\
"O5	[OX1;-,-2,-3][#7]	0.0335	3.367	\n"\
"O6	[OX1;-,-2,-2][#16]	-0.3339	0.7774	\n"\
"O6	[O;-0]=[#16;-0]	-0.3339	0.7774	\n"\
"O12	[O-]C(=O)	-1.326		\"order flip here "\
"intentional\"\n"\
"O7	[OX1;-,-2,-3][!#1;!N;!S]	-1.189	0	\n"\
"O8	[O]=c	0.1788	3.135	\n"\
"O9	[O]=[CH]C	-0.1526	0	\n"\
"O9	[O]=C(C)([A;!#1])	-0.1526	0	\n"\
"O9	[O]=[CH][N,O]	-0.1526	0	\n"\
"O9	[O]=[CH2]	-0.1526	0	\n"\
"O9	[O]=[CX2]=O	-0.1526	0	\n"\
"O10	[O]=[CH]c	0.1129	0.2215	\n"\
"O10	[O]=C([C,c])[a;!#1]	0.1129	0.2215	\n"\
"O10	[O]=C(c)[A;!#1]	0.1129	0.2215	\n"\
"O11	[O]=C([!#1;!#6])[!#1;!#6]	0.4833	0.389	\n"\
"OS	[#8]	-0.1188	0.6865	\n"\
"F	[#9-0]	0.4202	1.108	\n"\
"Cl	[#17-0]	0.6895	5.853	\n"\
"Br	[#35-0]	0.8456	8.927	\n"\
"I	[#53-0]	0.8857	14.02	\n"\
"Hal	[#9,#17,#35,#53;-]	-2.996		\n"\
"Hal	[#53;+,+2,+3]	-2.996		\n"\
"Hal	[+;#3,#11,#19,#37,#55]	-2.996		\"Footnote h indicates "\
"these should be here?\"\n"\
"P	[#15]	0.8612	6.92	\n"\
"S2	[S;-,-2,-3,-4,+1,+2,+3,+5,+6]	-0.0024	7.365	\"Order flip "\
"here is intentional\"\n"\
"S2	[S-0]=[N,O,P,S]	-0.0024	7.365	\"Expanded definition of "\
"(pseudo-)ionic S\"\n"\
"S1	[S;A]	0.6482	7.591	\"Order flip here is intentional\"\n"\
"S3	[s;a]	0.6237	6.691	\n"\
"Me1	[#3,#11,#19,#37,#55]	-0.3808	5.754	\n"\
"Me1	[#4,#12,#20,#38,#56]	-0.3808	5.754	\n"\
"Me1	[#5,#13,#31,#49,#81]	-0.3808	5.754	\n"\
"Me1	[#14,#32,#50,#82]	-0.3808	5.754	\n"\
"Me1	[#33,#51,#83]	-0.3808	5.754	\n"\
"Me1	[#34,#52,#84]	-0.3808	5.754	\n"\
"Me2	[#21,#22,#23,#24,#25,#26,#27,#28,#29,#30]	-0.0025	"\
"	"\
"\n"\
"Me2	[#39,#40,#41,#42,#43,#44,#45,#46,#47,#48]	-0.0025	"\
"	"\
"\n"\
"Me2	[#72,#73,#74,#75,#76,#77,#78,#79,#80]	-0.0025		"\

In [22]:
from collections import namedtuple

CrippenTuple = namedtuple('CrippenTuple',('name','smarts','logp_contrib','mr_contrib','note'))
lines = [x.split('\t') for x in rdkit_data.split('\n')]

crippenData = []
for i,entry in enumerate(lines):
    entry[2] = float(entry[2])
    if entry[3]:
        entry[3] = float(entry[3])
    else:
        entry[3] = None
    crippenData.append(CrippenTuple(*entry))
print(crippenData[:3])

[CrippenTuple(name='C1', smarts='[CH4]', logp_contrib=0.1441, mr_contrib=2.503, note=''), CrippenTuple(name='C1', smarts='[CH3]C', logp_contrib=0.1441, mr_contrib=2.503, note=''), CrippenTuple(name='C1', smarts='[CH2](C)C', logp_contrib=0.1441, mr_contrib=2.503, note='')]


In [62]:
import re
def find_contribs_for_bin(lower,upper,crippenData=crippenData,which='mr_contrib'):
    res = []
    for tpl in crippenData:
        v = getattr(tpl,which)
        if v is not None and v>=lower and v<=upper:
            res.append(tpl)
    return res
def find_tuples_for_atom(symbol,crippenData=crippenData):
    res = []
    anum = Chem.GetPeriodicTable().GetAtomicNumber(symbol)
    for tpl in crippenData:
        if tpl.name.startswith(symbol) or re.match(f'\[[^\]]*#{anum}[^0-9]',tpl.smarts):
            res.append(tpl)
    return res

In [29]:
find_contribs_for_bin(1.82,2.24)

[CrippenTuple(name='N2', smarts='[NH+0]([A;!#1])[A;!#1]', logp_contrib=-0.7096, mr_contrib=2.173, note=''),
 CrippenTuple(name='N7', smarts='[N+0]([A;!#1])([A;!#1])[A;!#1]', logp_contrib=-0.3187, mr_contrib=1.839, note=''),
 CrippenTuple(name='N11', smarts='[n+0]', logp_contrib=-0.3239, mr_contrib=2.202, note=''),
 CrippenTuple(name='NS', smarts='[#7]', logp_contrib=-0.4806, mr_contrib=2.134, note='')]

So it's an N, but it's not all possible Ns:

In [66]:
find_tuples_for_atom('N')

[CrippenTuple(name='N1', smarts='[NH2+0][A;!#1]', logp_contrib=-1.019, mr_contrib=2.262, note=''),
 CrippenTuple(name='N2', smarts='[NH+0]([A;!#1])[A;!#1]', logp_contrib=-0.7096, mr_contrib=2.173, note=''),
 CrippenTuple(name='N3', smarts='[NH2+0]a', logp_contrib=-1.027, mr_contrib=2.827, note=''),
 CrippenTuple(name='N4', smarts='[NH1+0]([!#1;A,a])a', logp_contrib=-0.5188, mr_contrib=3.0, note=''),
 CrippenTuple(name='N5', smarts='[NH+0]=[!#1;A,a]', logp_contrib=0.08387, mr_contrib=1.757, note=''),
 CrippenTuple(name='N6', smarts='[N+0](=[!#1;A,a])[!#1;A,a]', logp_contrib=0.1836, mr_contrib=2.428, note=''),
 CrippenTuple(name='N7', smarts='[N+0]([A;!#1])([A;!#1])[A;!#1]', logp_contrib=-0.3187, mr_contrib=1.839, note=''),
 CrippenTuple(name='N8', smarts='[N+0](a)([!#1;A,a])[A;!#1]', logp_contrib=-0.4458, mr_contrib=2.819, note=''),
 CrippenTuple(name='N8', smarts='[N+0](a)(a)a', logp_contrib=-0.4458, mr_contrib=2.819, note=''),
 CrippenTuple(name='N9', smarts='[N+0]#[A;!#1]', logp_cont

In [37]:
m = Chem.MolFromSmiles('CNC')
rdMolDescriptors._CalcCrippenContribs(m)

[(-0.2035, 2.753), (-0.7096, 2.173), (-0.2035, 2.753)]

In [41]:
list(rdMolDescriptors._CalcLabuteASAContribs(m)[0])

[7.04767198267719, 5.316788604006331, 7.04767198267719]

In [42]:
Descriptors.SMR_VSA3(m)

5.316788604006331

In [43]:
pyridine = Chem.MolFromSmiles('n1ccccc1')
Descriptors.SMR_VSA3(pyridine)

4.9839785209472085

In [44]:
list(rdMolDescriptors._CalcLabuteASAContribs(pyridine)[0])

[4.9839785209472085,
 6.196843571613076,
 6.06636706846161,
 6.06636706846161,
 6.06636706846161,
 6.196843571613076]

In [46]:
list(rdMolDescriptors._CalcLabuteASAContribs(Chem.MolFromSmiles('N(C)(C)C'))[0])

[4.899909730850478, 7.04767198267719, 7.04767198267719, 7.04767198267719]

In [None]:
list(rdMolDescriptors._CalcLabuteASAContribs(Chem.MolFromSmiles('N(C)(C)C'))[0])