In [1]:
import skchem
import pandas as pd
pd.options.display.max_rows = 10

# Pipelining

`scikit-chem` expands on the scikit-learn `Pipeline` object to support filtering.  It is initialized using a list of Transformer objects.

In [10]:
pipeline = skchem.pipeline.Pipeline([
        skchem.standardizers.ChemAxonStandardizer(keep_failed=True),
        skchem.forcefields.UFF(),
        skchem.filters.OrganicFilter(),
        skchem.descriptors.MorganFeaturizer()])

The pipeline will apply each in turn to objects, using the the highest priority function that each object implements, according to the order `transform_filter` > `filter` > `transform`.

For example, our pipeline can transform sodium acetate all the way to fingerprints:

In [11]:
mol = skchem.Mol.from_smiles('CC(=O)[O-].[Na+]')

In [4]:
pipeline.transform_filter(mol)

morgan_fp_idx
0       0
1       0
2       0
3       0
4       0
       ..
2043    0
2044    0
2045    0
2046    0
2047    0
Name: MorganFeaturizer, dtype: uint8

It also works on collections of molecules:

In [8]:
mols = skchem.read_smiles('https://archive.org/download/scikit-chem_example_files/example.smi', name_column=1).squeeze(); mols

1
ethane                          <Mol: CC>
propane                        <Mol: CCC>
benzene                   <Mol: c1ccccc1>
sodium acetate    <Mol: CC(=O)[O-].[Na+]>
serine                <Mol: NC(CO)C(=O)O>
Name: structure, dtype: object

In [9]:
pipeline.transform_filter(mols)

ChemAxonStandardizer: 100% (5 of 5) |##########################################| Elapsed Time: 0:00:02 Time: 0:00:02
UFF: 100% (5 of 5) |###########################################################| Elapsed Time: 0:00:00 Time: 0:00:00
OrganicFilter: 100% (5 of 5) |#################################################| Elapsed Time: 0:00:00 Time: 0:00:00
MorganFeaturizer: 100% (5 of 5) |##############################################| Elapsed Time: 0:00:00 Time: 0:00:00


morgan_fp_idx,0,1,2,3,4,5,6,7,8,9,...,2038,2039,2040,2041,2042,2043,2044,2045,2046,2047
1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ethane,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
propane,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
benzene,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
sodium acetate,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
serine,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
