## Use GNN to predict ADMET Property

Demonstration on Submitting to TDC ADMET Caco2_Wang Benchmark

### Step 1: Load the benchmark dataset.

In [1]:
from tdc import BenchmarkGroup
group = BenchmarkGroup(name = 'ADMET_Group', path = 'data/')
benchmark = group.get('Caco2_Wang')

train_val, test = benchmark['train_val'], benchmark['test']

Found local copy...


### Step 2: Train Your Models With Five Runs

We use [DeepPurpose](https://github.com/kexinhuang12345/DeepPurpose), a sklearn-style deep learning for drug discovery library as an example.

In [3]:
from DeepPurpose import CompoundPred as models
from DeepPurpose.utils import data_process, generate_config

drug_encoding = 'MPNN'
prediction_runs = []

for seed in [1, 2, 3, 4, 5]:
    ### Generate Different Train, Valid Splits Given Seed ###
    train, valid = group.get_train_valid_split(benchmark = name, split_type = 'default', seed = seed)
    
    ### Train the Model on Train, Valid Set ###
    train = data_process(X_drug = train.Drug.values, y = train.Y.values, drug_encoding = drug_encoding, split_method='no_split')
    val = data_process(X_drug = valid.Drug.values, y = valid.Y.values, drug_encoding = drug_encoding, split_method='no_split')
    test = data_process(X_drug = benchmark['test'].Drug.values, y = benchmark['test'].Y.values, drug_encoding = drug_encoding, split_method='no_split')

    config = generate_config(drug_encoding = drug_encoding, train_epoch = 10, LR = 0.001, batch_size = 128)
    model = models.model_initialize(**config)
    model.train(train, val, test, verbose = False)
    
    ### Generate Predictions on the Test Set ###
    y_pred = model.predict(test)
    prediction_runs.append({benchmark['name']: y_pred})

generating training, validation splits...
100%|██████████| 728/728 [00:00<00:00, 1355.81it/s]


Drug Property Prediction Mode...
in total: 637 drugs
encoding drug...
unique drugs: 634
do not do train/test split on the data for already splitted data
Drug Property Prediction Mode...
in total: 91 drugs
encoding drug...
unique drugs: 91
do not do train/test split on the data for already splitted data
Drug Property Prediction Mode...
in total: 182 drugs
encoding drug...
unique drugs: 181
do not do train/test split on the data for already splitted data
predicting...


generating training, validation splits...
100%|██████████| 728/728 [00:00<00:00, 1300.04it/s]


Drug Property Prediction Mode...
in total: 637 drugs
encoding drug...
unique drugs: 635
do not do train/test split on the data for already splitted data
Drug Property Prediction Mode...
in total: 91 drugs
encoding drug...
unique drugs: 90
do not do train/test split on the data for already splitted data
Drug Property Prediction Mode...
in total: 182 drugs
encoding drug...
unique drugs: 181
do not do train/test split on the data for already splitted data
predicting...


generating training, validation splits...
100%|██████████| 728/728 [00:00<00:00, 1245.62it/s]


Drug Property Prediction Mode...
in total: 637 drugs
encoding drug...
unique drugs: 634
do not do train/test split on the data for already splitted data
Drug Property Prediction Mode...
in total: 91 drugs
encoding drug...
unique drugs: 91
do not do train/test split on the data for already splitted data
Drug Property Prediction Mode...
in total: 182 drugs
encoding drug...
unique drugs: 181
do not do train/test split on the data for already splitted data
predicting...


generating training, validation splits...
100%|██████████| 728/728 [00:00<00:00, 1257.85it/s]


Drug Property Prediction Mode...
in total: 637 drugs
encoding drug...
unique drugs: 634
do not do train/test split on the data for already splitted data
Drug Property Prediction Mode...
in total: 91 drugs
encoding drug...
unique drugs: 91
do not do train/test split on the data for already splitted data
Drug Property Prediction Mode...
in total: 182 drugs
encoding drug...
unique drugs: 181
do not do train/test split on the data for already splitted data
predicting...


generating training, validation splits...
100%|██████████| 728/728 [00:00<00:00, 1274.03it/s]


Drug Property Prediction Mode...
in total: 637 drugs
encoding drug...
unique drugs: 635
do not do train/test split on the data for already splitted data
Drug Property Prediction Mode...
in total: 91 drugs
encoding drug...
unique drugs: 90
do not do train/test split on the data for already splitted data
Drug Property Prediction Mode...
in total: 182 drugs
encoding drug...
unique drugs: 181
do not do train/test split on the data for already splitted data
predicting...


## Step 3: Evaluate the testing set prediction with pre-specified TDC evaluator

The mean and standard deviation of the model performances are generated.

In [4]:
group.evaluate_many(prediction_runs)

{'caco2_wang': [0.64, 0.028]}

## Step 4: Copy the above results and submit to [THIS FORM](https://forms.gle/HYupGaV7WDuutbr9A).

## That's it! Your results will be reflected on the [leaderboard website](https://tdcommons.ai/benchmark/admet_group/01caco2/) soon!