# Chemprop

This tutorial contains message passing neural networks for molecular property prediction as described in the paper "Analyzing Learned Molecular Representations for Property Prediction" (https://bit.ly/2RZn4nd) and as used in the paper "A Deep Learning Approach to Antibiotic Discovery" (https://bit.ly/3weQFYM).

Credit: https://github.com/chemprop/chemprop

Google Colab: https://colab.research.google.com/drive/1oxgEZZEI75pCOSNPEuNiJ019YDakLD7F?usp=sharing

In [None]:
# Installing from source
!git clone https://github.com/chemprop/chemprop.git
%cd chemprop
!pip install -e .

In [None]:
# Install RDKit
!pip install rdkit-pypi==2021.3.1.5

In [None]:
# Extract data from Github repo
!tar -zxvf data.tar.gz

In [9]:
# To load a trained model and make predictions, run
!chemprop_train --data_path data/tox21.csv --dataset_type classification --save_dir tox21_checkpoints

Command line
python /usr/local/bin/chemprop_train --data_path data/tox21.csv --dataset_type classification --save_dir tox21_checkpoints
Args
{'activation': 'ReLU',
 'aggregation': 'mean',
 'aggregation_norm': 100,
 'atom_descriptor_scaling': True,
 'atom_descriptors': None,
 'atom_descriptors_path': None,
 'atom_descriptors_size': 0,
 'atom_features_size': 0,
 'atom_messages': False,
 'batch_size': 50,
 'bias': False,
 'bond_feature_scaling': True,
 'bond_features_path': None,
 'bond_features_size': 0,
 'cache_cutoff': 10000,
 'checkpoint_dir': None,
 'checkpoint_path': None,
 'checkpoint_paths': None,
 'class_balance': False,
 'config_path': None,
 'crossval_index_dir': None,
 'crossval_index_file': None,
 'crossval_index_sets': None,
 'cuda': True,
 'data_path': 'data/tox21.csv',
 'dataset_type': 'classification',
 'depth': 3,
 'device': device(type='cuda'),
 'dropout': 0.0,
 'empty_cache': False,
 'ensemble_size': 1,
 'epochs': 30,
 'explicit_h': False,
 'extra_metrics': [],
 'featu

In [10]:
# To make predictions, run
!chemprop_predict --test_path data/tox21.csv --checkpoint_dir tox21_checkpoints --preds_path tox21_preds.csv

Loading training args
Loading data
0it [00:00, ?it/s]7831it [00:00, 152485.10it/s]
  0% 0/7831 [00:00<?, ?it/s]100% 7831/7831 [00:00<00:00, 322553.22it/s]
Validating SMILES
Test size = 7,831
  cpuset_checked))
Predicting with an ensemble of 1 models
  0% 0/1 [00:00<?, ?it/s]Loading pretrained parameter "encoder.encoder.0.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.0.W_i.weight".
Loading pretrained parameter "encoder.encoder.0.W_h.weight".
Loading pretrained parameter "encoder.encoder.0.W_o.weight".
Loading pretrained parameter "encoder.encoder.0.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Moving model to cuda

  0% 0/157 [00:00<?, ?it/s][A
  1% 1/157 [00:00<01:04,  2.40it/s][A
  3% 5/157 [00:00<00:12, 11.71it/s][A
  6% 9/157 [00:00<00:10, 14.07it/s][A
 10% 16/157 [00:00<00:05, 25.72it/s][A
 13% 20/157 [00:01<00:05

In [11]:
# To load a trained model and encode the fingerprint latent representation of molecules, run
!chemprop_fingerprint --test_path data/tox21.csv --checkpoint_dir tox21_checkpoints --preds_path tox21_fingerprint.csv

Loading training args
Loading data
7831it [00:00, 149780.86it/s]
100% 7831/7831 [00:00<00:00, 311149.79it/s]
Validating SMILES
Test size = 7,831
  cpuset_checked))
Encoding smiles into a fingerprint vector from a single model
Loading pretrained parameter "encoder.encoder.0.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.0.W_i.weight".
Loading pretrained parameter "encoder.encoder.0.W_h.weight".
Loading pretrained parameter "encoder.encoder.0.W_o.weight".
Loading pretrained parameter "encoder.encoder.0.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Moving model to cuda
Saving predictions to tox21_fingerprint.csv
Elapsed time = 0:00:13


In [12]:
# Given a trained model, you can interpret the model prediction using the following command:
!chemprop_interpret --data_path data/tox21.csv --checkpoint_dir tox21_checkpoints/fold_0/ --property_id 1

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
['CNC(=O)[C@H](Cc1c[nH]c2ccccc12)NC(=O)[C@@H](CC(=O)NO)CC(C)C'],0.033,,
['C[Se]CC[C@H](N)C(=O)O'],0.020,,
['O=C(O[C@@H]1C[C@@H]2C[C@@H]3C[C@H](C1)N2CC3=O)c1c[nH]c2ccccc12'],0.036,,
['CCCCCCCCOc1ccccc1C(=O)Nc1ccc(C(=O)OCC[N+](C)(CC)CC)cc1'],0.040,,
['O[C@@H]1[C@H](O)CN2CCC[C@@H](O)[C@H]12'],0.069,,
['CCCC(=O)O[C@]1(C(=O)CCl)[C@@H](C)C[C@H]2[C@@H]3CCC4=CC(=O)C=C[C@]4(C)[C@@]3(F)C(=O)C[C@@]21C'],0.968,C[C@]12C=CC(=O)C=C1CC[CH2:1][C:1]2,0.856
['C=CC[C@@H]1C=C(C)C[C@H](C)C[C@H](OC)[C@H]2O[C@@](O)(C(=O)C(=O)N3CCCC[C@H]3C(=O)O[C@H](/C(C)=C/[C@@H]3CC[C@@H](O)[C@H](OC)C3)[C@H](C)[C@@H](O)CC1=O)[C@H](C)C[C@@H]2OC'],0.065,,
['Nc1ncnc2c1ncn2[C@@H]1O[C@H](COP(=O)(O)O)[C@@H](O)[C@H]1O'],0.020,,
['CCOC(=O)N(C)C(=O)CSP(=S)(OCC)OCC'],0.003,,
['CC/C=C\\CC/C=C/CO'],0.004,,
['C=C(C)CCC[C@H](C)CCO'],0.011,,
['CCCCCCCCC1CCC(=O)O1'],0.022,,
['C=CC(C)(CCC=C(C)C)OC(=O)c1ccccc1'],0.007,,
['CC1CCc2nccnc21'],0.004,,
['CCCCCCCCCCCCCCC1CO1'],0.015,,
[