## Deep Inverse Reinforcement Learning for Structural Evolution of Small Molecules

ABSTRACT: The size and quality of chemical libraries to the drug discovery pipeline are crucial for developing
new drugs or repurposing existing drugs. Existing techniques such as combinatorial organic synthesis and High-Throughput Screening usually make the process extraordinarily tough and complicated
since the search space of synthetically feasible drugs is exorbitantly huge. While reinforcement
learning has been mostly exploited in the literature for generating novel compounds, the requirement of designing a reward function that succinctly represents the learning objective could prove
daunting in certain complex domains. Generative Adversarial Network-based methods also mostly
discard the discriminator after training and could be hard to train. In this study, we propose a framework for training a compound generator and learning a transferable reward function based on the
entropy maximization inverse reinforcement learning paradigm. We show from our experiments
that the inverse reinforcement learning route offers a rational alternative for generating chemical
compounds in domains where reward function engineering may be less appealing or impossible
while data exhibiting the desired objective is readily available.

Link to paper: https://arxiv.org/pdf/2008.11804v2.pdf

Credit: https://github.com/bbrighttaer/irelease

In [1]:
# Clone the repository and cd into directory
!git clone https://github.com/bbrighttaer/irelease.git
%cd irelease

Cloning into 'irelease'...
remote: Enumerating objects: 2268, done.[K
remote: Counting objects: 100% (209/209), done.[K
remote: Compressing objects: 100% (165/165), done.[K
remote: Total 2268 (delta 131), reused 118 (delta 44), pack-reused 2059[K
Receiving objects: 100% (2268/2268), 120.11 MiB | 30.61 MiB/s, done.
Resolving deltas: 100% (1683/1683), done.
Checking out files: 100% (277/277), done.
/content/irelease


In [None]:
# Install dependencies / requirements
!pip install gym==0.15.6 rdkit-pypi==2021.3.1.5 ptan==0.6 xgboost==0.90

# Install soek module
!git clone https://github.com/bbrighttaer/soek.git
%cd irelease/soek
!python setup.py install
%cd irelease/proj

### Pretraining
The Stack-RNN model used in our work could be pretrained with the following command:

In [None]:
!cp -a irelease/irelease /content/irelease/proj/
%cd irelease/proj
!python pretrain_rnn.py --data ../data/chembl.smi

### Evaluation Functions
#### DRD2 Activity

The evaluation function for the DRD2 experiment is an RNN classifier trained with the BCE loss function. The following is the command to train the model using 5-fold cross validation:

In [None]:
!python expert_rnn_bin.py --data_file ../data/drd2_bin_balanced.csv --cv

After training, the evaluation can be done using:

In [None]:
! python expert_rnn_bin.py --data_file ../data/drd2_bin_balanced.csv --cv --eval --eval_model_dir ./model_dir/expert_rnn_bin/

The value of the `--eval_model_dir` flag is a directory which contains the 5 models saved from the CV training stage.

#### LogP
The evaluation function for the LogP optimization experiment is an RNN model trained using the MSE loss function. The following command invokes training:

In [None]:
!python expert_rnn_reg.py --data_file ../data/logP_labels.csv --cv

After training, the evaluation can be done using:

In [None]:
!python expert_rnn_reg.py --data_file ../data/logP_labels.csv --cv --eval --eval_model_dir ./model_dir/expert_rnn_reg/

#### JAK2
We trained XGBoost models for the JAK2 maximization experiment. The same XGBoost models were used for the JAK2 minimization experiment, as mentioned in the paper.

The following invokes the training process:

In [None]:
!python expert_xgb_reg.py --data_file ../data/jak2_data.csv --cv

And evaluation could be done using:

In [None]:
!python expert_xgb_reg.py --data_file ../data/jak2_data.csv --cv --eval --eval_model_dir ./model_dir/expert_xgb_reg/

### Training
The following files are used for PPO training for both DIRL and IRL:

- DRD2 Activity: `ppo_rl_drd2.py`
- LogP Optimization: `ppo_rl_logp.py`
- JAK2 Maximization: `ppo_rl_jak2_minmax.py`
- JAK2 Minimization: `ppo_rl_jak2_min.py`

For DRL training, the following files are used:
 
- DRD2 Activity: `reinforce_rl_drd2.py`
- LogP Optimization: `reinforce_rl_logp.py`
- JAK2 Maximization: `reinforce_rl_jak2_minmax.py`
- JAK2 Minimization: `ppo_rl_jak2_min.py`

These files mostly share command line flags for training. For instance, to train
a generator with the DRD2 demonstrations (DIRL) the following command could be used:

In [None]:
!python ppo_rl_drd2.py  --exp_name drd2 --demo ../data/drd2_active_filtered.smi --unbiased ../data/unbiased_smiles.smi --prior_data ../data/chembl.smi --pretrained_model irelease_prior.mod

For DRL just add the flag `--use_true_reward`

In [None]:
!python ppo_rl_drd2.py  --exp_name drd2 --demo ../data/drd2_active_filtered.smi --unbiased ../data/unbiased_smiles.smi --prior_data ../data/chembl.smi --pretrained_model irelease_prior.mod --use_true_reward

### Compound Sampling
Assuming the training phase produces the model `biased_generator.mod`, compound
samples, in the form of SMILES, could be generated using:

In [None]:
!python pretrain_rnn.py --data ../data/chembl.smi --eval --eval_model_name biased_generator.mod --num_smiles 1000

The `--num_smiles` flag controls the number of SMILES (valid and invalid) that would be sampled from the
generator.

After the generation, a JSON file is produced which contains valid and invalid
SMILES. In our experiments, we process this `.json` file using 
[smiles_worker.py](https://github.com/bbrighttaer/irelease/blob/master/proj/smiles_worker.py) to save the valid SMILES into a CSV file. 

A sample file JSON file produced after SMILES generation is 
[here](https://github.com/bbrighttaer/irelease/blob/master/proj/analysis/DRD2_activity_smiles_biased_ppo_grl_eval.json).
The corresponding processed CSV file containing the valid SMILES and 
the evaluation function's 
predictions is also [here](https://github.com/bbrighttaer/irelease/blob/master/proj/analysis/DRD2_activity_smiles_biased_ppo_grl_eval.csv)