# Ligand ADMET and Potency (Property Prediction)

The [ADMET](https://polarishub.io/competitions/asap-discovery/antiviral-admet-2025) and [Potency](https://polarishub.io/competitions/asap-discovery/antiviral-potency-2025) Challenge of the [ASAP Discovery competition](https://polarishub.io/blog/antiviral-competition) take the shape of a property prediction task. Given the SMILES (or, to be more precise, the CXSMILES) of a molecule, you are asked to predict the numerical properties of said molecule. This is a relatively straight-forward application of ML and this notebook will quickly get you up and running!

To begin with, choose one of the two challenges! The code will look the same for both. 

In [1]:
CHALLENGE = "antiviral-potency-2025" # "antiviral-admet-2025"  # or: "antiviral-potency-2025"

## Load the competition

Let's first load the competition from Polaris.

Make sure you are logged in! If not, simply run `polaris login` and follow the instructions. 

In [2]:
import polaris as po


competition = po.load_competition(f"asap-discovery/{CHALLENGE}")

  from .autonotebook import tqdm as notebook_tqdm


As suggested in the logs, we'll cache the dataset. Note that this is not strictly necessary, but it does speed up later steps.

In [3]:
competition.cache()

'/Users/joshuarose/Library/Caches/polaris/datasets/1d9f43c7-7449-48ec-bfd7-50c7ec5ce863'

Let's get the train and test set and take a look at the data structure.

In [4]:
train, test = competition.get_train_test_split()

In [10]:
print(type(train))
train[0]

<class 'polaris.dataset._subset.Subset'>


('COC[C@]1(C)C(=O)N(C2=CN=CC3=CC=CC=C23)C(=O)N1C |&1:3|',
 {'pIC50 (SARS-CoV-2 Mpro)': nan, 'pIC50 (MERS-CoV Mpro)': 4.19})

In [11]:
test[0]

'C=CC(=O)NC1=CC=CC(N(CC2=CC=CC(Cl)=C2)C(=O)CC2=CN=CC3=CC=CC=C23)=C1'

## Build a model
Next, we'll train a simple baseline model using scikit-learn. 

You'll notice that the challenge has multiple targets.

In [16]:
train.target_cols

['pIC50 (MERS-CoV Mpro)', 'pIC50 (SARS-CoV-2 Mpro)']

An interesting idea would be to build a multi-task model to leverage shared information across tasks.

For the sake of simplicity, however, we'll simply build a model per target here. 

In [12]:
train.X
train.y

{'pIC50 (SARS-CoV-2 Mpro)': array([ nan, 5.29,  nan, ...,  nan, 5.06,  nan]),
 'pIC50 (MERS-CoV Mpro)': array([4.19, 4.92, 4.73, ..., 4.22, 4.4 , 4.22])}

In [16]:
import datamol as dm
import numpy as np
import tensorflow as tf
#from tensorflow.keras.models import Sequential
#from tensorflow.keras.layers import Dense

from sklearn.ensemble import GradientBoostingRegressor

# Prepare the input data. We'll use Datamol to compute the ECFP fingerprints for both the train and test columns.
X_train = np.array([dm.to_fp(dm.to_mol(smi)) for smi in train.X])
X_test = np.array([dm.to_fp(dm.to_mol(smi)) for smi in test.X])

y_pred = {}

# For each of the targets...
for tgt in competition.target_cols:
    #print(tgt)
    # We get the training targets
    # Note that we need to mask out NaNs since the multi-task matrix is sparse.
    y_true = train.y[tgt]
    mask = ~np.isnan(y_true)

    # We'll train a simple baseline model
    model = GradientBoostingRegressor()
    model.fit(X_train[mask], y_true[mask])

    # And then use that to predict the targets for the test set
    y_pred[tgt] = model.predict(X_test)

## Submit your predictions
Submitting your predictions to the competition is simple.

In [11]:
'''competition.submit_predictions(
    predictions=y_pred,
    prediction_name="my-first-predictions",
    prediction_owner="cwognum",
    report_url="https://www.example.com", 
    # The below metadata is optional, but recommended.
    github_url="https://github.com/polaris-hub/polaris",
    description="Just testing the Polaris API here!",
    tags=["tutorial"],
    user_attributes={"Framework": "Scikit-learn", "Method": "Gradient Boosting"}
)'''

✅ SUCCESS: [1mYour competition predictions have been successfully uploaded to the Hub for evaluation.[0m
 


  self._color = self._set_color(value) if value else value


For the ASAP competition, we will only evaluate your latest submission. 

The results will only be disclosed after the competition ends.

The End.