In [1]:
import sys
sys.path.append('..')
import chemex as cx
import chemex.web
from itertools import islice
import pandas as pd
from pandas import DataFrame

# Experimental and calculated properties
Many properties displayed on ChemSpider pages aren't accessible through the ChemSpider web API. (As far as I can tell!)

`cx.web.cs_scrape_properties(csid, [props])` retrieves the listed properties from a given ChemSpider page in the main "Properties" tab, plus the contents of the "EPI Suite" tab (see below). If you know exactly which properties you want, you the optional second argument can be a list of those properties. See `cx.web.cs_default_props` for an example.

The function returns an instance of [OrderedMultiDict](https://boltons.readthedocs.org/en/latest/dictutils.html), which preserves the order in which values were added and keeps multiple values for the same key. These objects can be a little more annoying to handle with pandas than ordinary Python `dict`, because *all* values are lists of values (even single values).

In [2]:
csid1 = 4471
data1 = cx.web.cs_scrape_properties(csid1)

What properties do we have and how many of them?...

In [3]:
df1 = DataFrame([data1], index=['Values'], columns=data1.keys()).transpose()
df1['Number of values'] = df1['Values'].apply(len)
df1

Unnamed: 0,Values,Number of values
CSID,[4471],1
Experimental Melting Point,"[63 °C TCI H0266, 62-65 °C Alfa Aesar, 64 °C J...",6
Experimental Boiling Point,[150 deg C / 5 mm (346.3046 °C / 760 mmHg)\r\n...,2
Experimental LogP,"[3.641 Vitas-M STK057962, 2.9758 Synthon-Lab ...",2
Experimental Flash Point,"[216 °C Alfa Aesar, 216 °C Alfa Aesar, 100 °C ...",4
Experimental Gravity,[1.3 g/mL Alfa Aesar A17662],1
Predicted Melting Point,"[63 °C TCI, 63 °C TCI H0266]",2
Safety,"[26-37 Alfa Aesar A17662, 36/37/38 Alfa Aesar ...",12
Compound Source,[synthetic Microsource \r\n [01500...,1
Bio Activity,[Oxybenzone(Eusolex 4360; Escalol 567) is an o...,1


Properties can and do have multiple values.

In [4]:
data1.getlist('Experimental Melting Point')

['63 °C TCI H0266',
 '62-65 °C Alfa Aesar',
 '64 °C Jean-Claude Bradley Open Melting Point Dataset 2707',
 '65.5 °C Jean-Claude Bradley Open Melting Point Dataset 21678',
 '63 °C Biosynth Q-200287',
 '62-65 °C Alfa Aesar A17662']

Because the function looks for properties based on the structure of the HTML page (rather than a pre-defined set of keywords), it can return different numbers of properties/values for different chemicals.

In [5]:
data2 = cx.web.cs_scrape_properties(5889)
print('{0}: {1} properties\n{2}: {3} properties'.format(data1.get('CSID'), len(data1.keys()),
                                                        data2.get('CSID'), len(data2.keys())))

4471: 35 properties
5889: 49 properties


## Combining multiple results

In [6]:
datalist1 = [data1, data2]
index = [d.get('CSID') for d in datalist1]
DataFrame(datalist1, index=index)

Unnamed: 0,ACD/#Freely Rotating Bonds,ACD/#H bond acceptors,ACD/#H bond donors,ACD/#Rule of 5 Violations,ACD/BCF (pH 5.5),ACD/BCF (pH 7.4),ACD/Boiling Point,ACD/Density,ACD/Enthalpy of Vaporization,ACD/Flash Point,...,Predicted Melting Point,Retention Index (Kovats),Retention Index (Lee),Retention Index (Linear),Retention Index (Normal Alkane),Safety,Stability,Symptoms,Target Organs,Toxicity
4471,[3],[3],[1],[0],[519.93],[309.75],[370.3±27.0 °C at 760 mmHg],[1.2±0.1 g/cm3],[64.1±3.0 kJ/mol],[140.5±17.2 °C],...,"[63 °C TCI, 63 °C TCI H0266]",[2012 (estimated with error: 89) NIST Spectra ...,,[1938 (Program type: Ramp; Column cl... (show ...,,"[26-37 Alfa Aesar A17662, 36/37/38 Alfa Aesar ...",,,,
5889,[0],[1],[2],[0],[4.06],[4.58],[184.4±0.0 °C at 760 mmHg],[1.0±0.1 g/cm3],[42.4±0.0 kJ/mol],[70.0±0.0 °C],...,,[992 (estimated with error: 83) NIST Spectra m...,[154.3 (Program type: Ramp; Column cl... (show...,[939.2 (Program type: Ramp; Column cl... (show...,[947 (Program type: Isothermal; Col... (show m...,[23/24/25-40-41-43-48/23/24/25-68-50 Alfa Aesa...,"[Stable. Incompatible with oxidizing agents, b...","[Headache, lassitude (weakness, exhaustion), d...","[Blood, cardiovascular system, eyes, liver, ki...","[ORL-RAT LD50 250 mg kg-1 , ORL-MUS LD50 464..."


## Getting information for multiple chemicals at once using a generator
With the generator `cx.web.cs_properties_gen` you can also specify a list of properties of interest just as above. If you don't, it will return everything it retrieves from the page.

In [7]:
csid_list = [4471, 5889, 8677, 20939]
multi_data = cx.web.cs_properties_gen(csid_list, cx.web.cs_default_props)
datalist2 = list(islice(multi_data, None))

In [8]:
DataFrame(datalist2, index=csid_list)

Unnamed: 0,ACD/BCF (pH 7.4),ACD/Boiling Point,ACD/Flash Point,ACD/KOC (pH 7.4),ACD/LogP,ACD/Vapour Pressure,CSID,EPI Suite,Experimental Boiling Point,Experimental LogP,Experimental Melting Point,Experimental Solubility,Experimental Vapor Pressure
4471,[309.75],[370.3±27.0 °C at 760 mmHg],[140.5±17.2 °C],[1818.30],[3.64],[0.0±0.9 mmHg at 25°C],[4471],[Predicted data is generated using the US Envi...,[150 deg C / 5 mm (346.3046 °C / 760 mmHg)\r\n...,"[3.641 Vitas-M STK057962, 2.9758 Synthon-Lab ...","[63 °C TCI H0266, 62-65 °C Alfa Aesar, 64 °C J...",,
5889,[4.58],[184.4±0.0 °C at 760 mmHg],[70.0±0.0 °C],[103.36],[0.94],[0.7±0.3 mmHg at 25°C],[5889],[Predicted data is generated using the US Envi...,"[183-184 °C Alfa Aesar, 363 F (183.8889 °C)\r\...",[0.9 Egon Willighagen http://dx.doi.org/10.102...,"[-6 °C Alfa Aesar, -6 °C Oxford University Che...","[4% NIOSH BW6650000, Soluble in water Alfa Aes...",[0.6 mmHg NIOSH BW6650000]
8677,[],[],[],[],[],[],[8677],[None],,,"[206 °C Alfa Aesar, 204-207 °C Oxford Universi...",,
20939,[1306.66],[175.4±20.0 °C at 760 mmHg],[42.8±0.0 °C],[5917.62],[4.45],[1.5±0.2 mmHg at 25°C],[20939],[Predicted data is generated using the US Envi...,"[170-180 °C Alfa Aesar, 175-177 °C Food and Ag...",,"[-40 °C LKT Labs \r\n [L3250], -74...",[Insoluble in water. LKT Labs \r\n ...,


# EPI Suite results
[EPI Suite](http://www2.epa.gov/tsca-screening-tools/epi-suitetm-estimation-program-interface) is a software package for estimating environmental fate and properties of chemicals (it also looks up experimentally measured properties from a database). It only runs on Windows, but ChemSpider conveniently stores EPI Suite results for many chemicals. They aren't exposed through the web API, but they appear in a tab on the compound page as a text blob. 

If you use `cx.web.cs_scrape_properties` and the information is available, you can get this EPI Suite blob as the property `'EPI Suite'`.

In [9]:
d = cx.web.cs_scrape_properties(592, props=['EPI Suite'])
for i in d.get('EPI Suite').split('\n'):
    print(i)

Predicted data is generated using the US Environmental Protection Agency’s EPISuite

                        
 Log Octanol-Water Partition Coef (SRC):
    Log Kow (KOWWIN v1.67 estimate) =  -0.65
    Log Kow (Exper. database match) =  -0.72
       Exper. Ref:  Hansch,C et al. (1995)
    Log Kow (Exper. database match) =  -0.72
       Exper. Ref:  Hansch,C et al. (1995)

 Boiling Pt, Melting Pt, Vapor Pressure Estimations (MPBPWIN v1.42):
    Boiling Pt (deg C):  204.20  (Adapted Stein & Brown method)
    Melting Pt (deg C):  22.66  (Mean or Weighted MP)
    VP(mm Hg,25 deg C):  0.0286  (Modified Grain method)
    MP  (exp database):  52.8 deg C
    BP  (exp database):  122 @ 14.5 mm Hg deg C
    VP  (exp database):  8.14E-02 mm Hg at 25 deg C
    Subcooled liquid VP: 0.153 mm Hg (25 deg C, exp database VP )

 Water Solubility Estimate from Log Kow (WSKOW v1.41):
    Water Solubility at 25 deg C (mg/L):  1e+006
       log Kow used: -0.72 (expkow database)
       no-melting pt equation 

The function `cx.web.epi_suite_values` will extract a few of the specific values as an `OrderedMultiDict`. (A rough attempt at text processing, help would be appreciated.)

In [10]:
epi_omd = cx.web.epi_suite_values(d['EPI Suite'])
DataFrame([epi_omd], index=['EPI Suite estimate'], columns=epi_omd.keys()).transpose()

Unnamed: 0,EPI Suite estimate
Log Kow (KOWWIN v1.67 estimate),[-0.65]
Log Kow (Exper. database match),[-0.72]
Henrys LC [VP/WSol estimate using EPI values],[3.390E-009 atm-m3/mole]
Log Koa (KOAWIN v1.10 estimate),[4.615]
Log Koa (experimental database),[None]
Ready Biodegradability Prediction,[YES]
Log BCF from regression-based method,[0.500 (BCF = 3.162)]
Level III Fugacity Model,[\n Mass Amount Half-Life Emis...
