Model interpretability roadmap #35

miquelduranfrigola · 2024-02-22T12:41:08Z

Adding an interpretability module to ZairaChem

Background

This project is related to @HellenNamulinda's MSc thesis at Makerere University. The thesis is co-supervised by Dr. Joyce Nakatumba-Nabende. At the moment, ZairaChem does not have any explainable AI (XAI) capabilities. The goal of this project is to develop an automated tool for model interpretability that can be incorporated into ZairaChem. While there are many approaches for chemistry, here we will focus on the following:

A limited set of molecular descriptors: We need to focus on descriptors that a medicinal chemist would understand. Therefore, we will focus on a set of commonly used descriptors. This post can offer some guidance.
XGBoost/CatBoost regression: These are tree-based methods that work well in many scenarios, especially for regression. Automatic hyperparameter tuning can be achieved with Optuna.
Shapley analysis: Shapley values work naturally well with tree-based methods. We will focus on this approach to interpretability.
We will validate the tool in the context of a dataset donated by Medicines for Malaria Venture (MMV) to Ersilia.

Objectives

To develop a standalone Python tool for chemistry specifically oriented to Shapley value analysis of molecular descriptors. The tool is called xai4chem.
To apply the tool to the MMV dataset.
To incorporate the tool into ZairaChem.

Steps

FAQ

Where do we create issues?

Most issues related to this work should be created in the xai4chem repository. When we reach a point of integration to ZairaChem, we can create issue there correspondingly.

Is there a more comprehensive description of the project available?

Yes. This is part of @HellenNamulinda 's MSc project and she is writing a thesis accordingly. A project proposal document is already available.

HellenNamulinda · 2024-03-19T10:03:37Z

Progress Updates

Created a python package structure for xai4chem, Use a Python package boilerplate xai4chem#1
Created a class to calculate molecular descriptors, Calculate molecular descriptors using datamol xai4chem#5
For the start, we are using the default 22 features of datamol descriptors.
Others will be experimented and tracked in the issue; Create a class to calculate molecular descriptors using fit and transform logic xai4chem#3
Implemented a regressor class with the option of XGBoost or CatBoost. This is for training, evaluating, explaining the model.
Hyperparameter optimization is done using optuna during training.
To evaluate the model, the evaluate method is called and metrics such as Mean Squared Error, Root Mean Squared Error, Mean Absolute Error, R-squared Score, and Explained Variance Score are saved in a json file.
To interprete the model, the explain method method is used. Shapely values are calculated and currently three plots are saved; a summary_plot( barplot showing average of shapely values), a beeswarm plot showing shap values of all the inputs, and a waterfall plot as summary for one input(This shows the shapley values for each feature).
We will explore mapping shapley values unto the molecular graphs, and this will be tracked in the isuue; Applying Shapley Value Analysis: Visualization Plots and Number of Features to Consider xai4chem#7

Next

Outline steps to run the tool as python package

miquelduranfrigola · 2024-03-20T18:35:04Z

Thanks @HellenNamulinda - this is useful.

We did not discuss much about optuna. Does it work as expected for you? Do you get better results than using XGBoost with default parameters?

HellenNamulinda · 2024-03-22T23:52:35Z

Hello @miquelduranfrigola,
I apologize for the delay in providing this update.

I've compared XGBoost's performance with default parameters against those optimized by Optuna. Surprisingly, the default parameters seem to yield better results, with an R2 of 0.50 compared to around 0.3 achieved with Optuna. I will add these findings in the notebook that I will uipload later today.

It's worth noting that the parameters optimized by Optuna can vary in each study, introducing some uncertainty in the results. I may need to adjust the search space to align more closely with the default parameters.

Additionally, I've observed that training a CatBoost model is consistently slower, which prolongs the optimization process with Optuna.

However, I believe Optuna is still valuable. It's essential to carefully define the search parameters to achieve optimal results.

miquelduranfrigola · 2024-03-23T05:44:18Z

Thanks @HellenNamulinda , this is useful. I agree we need to use optuna. We'll have to play a bit with the search space, then, and perhaps increase the number of iterations.

HellenNamulinda · 2024-03-25T10:08:32Z

Progress Updates

The steps for using xai4chem as a python package are outlined in the README.md, Update ReadME.md with installation and usage instructions, and code improvements xai4chem#8
After adjusting the search space, the performance improved to R2 > 0.50. This can be seen in the notebook. Finding better parameters requires time, and hence increasing the number of trials(n_trials) can help to get better params.

Next
From the meeting,

Rename the train method to fit. So that we have fit, predict, evaluate and explain.
It would be better to reduce the args for the regressor methods. Forexample, the output folder should be defined when instantiating the regressor class.
Explore Zero Shot AutoML for mining good hyperparameters.
Include feature selection. This can start by increasing the number of features(datamol descriptors), Create a class to calculate molecular descriptors using fit and transform logic xai4chem#3 (comment), and then 2D mordred descriptors.

miquelduranfrigola · 2024-03-25T14:44:45Z

Thanks @HellenNamulinda , all next steps sound good to me.

For the zero-shot, feel free to explore this: https://microsoft.github.io/FLAML/docs/Use-Cases/Zero-Shot-AutoML/
Good to know that, with more time, you could get to a competitive solution with optuna.
For feature selection, feel free to explore this: https://github.com/AutoViML/featurewiz
2D mordred is less prioritary if a wider datamol descriptor is available.

HellenNamulinda · 2024-04-17T06:40:40Z

From the meetings,

We will continue testing with the 3 types of descriptors that were added. That's; datamol, RDKit and Mordred.. Fingerprints will be explored later.
While featurewiz was added for feature selection, the final number of features can't be determined by the user. So, the logic will be to specify the k-features. If featurewiz is applied and feafures selected are less than the desired number of features, (say 100), we don't use featurewiz. If the number of features returned by featurewiz are morethan k(100), we then select the top k.

@miquelduranfrigola,
From the experiments using the MMV dataset. with feature selection, mordred gives an r2_score of 0.39 and mae of 11.73, compared to rdkit r2_score of 0.35 and mae of 12.24.
Also in the TDC benchmarks, the performance of the pipeline is promising. With the performance on some datasets placing in top 3 in the TDC leaderboards.
More information in the sldes.

We agreed to experiment and compare performance, rdkit descriptors without feature selection(using all the descriptors).

Note: All the experiments were done done using catboost with default parameters.

Trying zero-shot, XGBoost's performance on the pf_3d7_ic50 data improved from r2_score: 0.64 to r2_score: 0.71.
Am going to finalize testing this and we use zero-shot in place for optuna.

HellenNamulinda · 2024-05-14T06:33:40Z

From last week,
We agreed to use morgan fingerprints. And this was implemented(ersilia-os/xai4chem@d2ff580).

Also, use Zero-shot AutoML(ersilia-os/xai4chem@f5d5ad8). but FLAML zero-shot only supports XGBoost and not Catboost.

HellenNamulinda · 2024-05-28T06:45:04Z

To be able to interprete other trained models besides the regression models developed using xai4chem, it was best to have the explain_model as a separate module(independent of the regressor).

With the explain_module, interpretability plots can be generated even for trained classification models.

HellenNamulinda · 2024-06-04T08:13:47Z

Hello @miquelduranfrigola,
From the meeting, we have reviewed everything that has been implemented in xai4chem.

In our pipeline, we choose features to be any of the three descriptors(Small(datamol), Mid-size (RDKit), and Large (Mordred)) or the count-based morgan fingerprints.
Also, Feature selection automatically selects the relevant k features during training, if the value of k is given.

For interpratability, we are currently saving three interpretability plots; barplot, beeswarm plot and a waterfall plot for the first data sample(this can be generated for other samples).

All the other usuage details are documented in the README.

Some pending concerns
Save test results as csv file(not joblib)
Save shapley values as csv file(not only interpretability plots)
Add interpretation onto chemical structures for morgan fingerprints?

Benchmark
With xai4chem pipeline on PPBR_AZ dataset(Plasma Protein Binding Rate(%)): Using RDKit descriptors, the MAE of 7.618 and r2=0.3401 places 2nd in the leaderboard.

The MMV Data:
We started with a small set(LDH assay: 4816 samples). The performance is ranging between 0.36 and 0.40(r2_score) and 12.10 and 11.53 (MEA).

And this brings us to combining descriptors and fingerprints?
What would you advise on Feature Maps as input features -2D(descriptors and fingerprints). I haven't yet implemented it, but it is something I have started looking at next.

miquelduranfrigola · 2024-06-04T15:39:29Z

Thanks @HellenNamulinda — very informative.

Let's first close the pending concerns and then we will look into blending or not descriptors and fingerprints.

HellenNamulinda · 2024-06-18T07:29:34Z

@miquelduranfrigola,
For the test results, we are saving a csv file containing the smiles strings and the model output values.
Where as for the interpretability results csv file, we are saving the descriptors/fingerprints and the shapely values.

This week, I'm working on mapping interpretation(shapely values) unto chemical structures for fingerprint features.

HellenNamulinda · 2024-07-25T06:20:26Z

Hello @miquelduranfrigola,

Over the past weeks we have been modelling the MMV data(both the small set(LDH assay), and large set(Luminescence assay), first as a regression problem and then as a classification problem.

Regression(see slides)

Small Set

Our initial results showed that Morgan fingerprints with reduced number of features(100) did not significantly compromise the model's performance with a better R2 score and mean absolute error (MAE). This indicated that a smaller, more interpretable set of features could be used without losing much predictive power.

Large set

Despite using the same descriptors and feature selections, the R2 scores were generally low, indicating that our models could not adequately explain the variability in the data.

From this, we had to try classification instead of regression, and compare classification performance metrics.

Classification(slides with performance)

We experimented with a cutoff of 30%(and 40%). In both, the dataset was extremely imbalanced.
For the small set, the dataset had a class ratio of 1:10.
For the large set, we performed random undersampling to handle class imbalance, maintaining ratios of 1:5 and 1:10.

The default prediction threshold (0.5) generally provided high precision but low recall, indicating that while the models were good at identifying active compounds, they missed many true positives.
Calibrating the prediction threshold using the Youden Index and maximum FPR(5%(0.05) and 10%(0.1)), significantly improved recall and F1 scores.

Regarding adapting regression to zairachem, you had mentioned that the second option might be used if the classification performance stands out.

miquelduranfrigola · 2024-07-25T09:38:12Z

Hi @HellenNamulinda this is a great summary.

Before including it to ZairaChem, let's focus on packaging xai4chem nicely, including updating the README file if necessary.
I think it is great that both regression and classification are implemented. While regression might not have worked in the chosen example, it is clear that regression will be useful in many settings, including surrogate modeling in ZairaChem.

To me, the most important step now is to get a nice report at the end of the run. The more I think about it, the more I realize that we may want to run interpretability from multiple scopes, for example, physicochemistry, fingerprints, etc, in independent runs. Can you please list here which are the descriptors that are fully implemented already? Thanks!

HellenNamulinda · 2024-07-30T03:30:05Z

Hello @miquelduranfrigola,
Running from the different scopes works. Unlike the fingerprints that we are mapping back into molecules, the pysiocochemical properties(as features are all catered for).

As for fingerprints, 2 options are currently supported(rdkit or morgan).

As before, xai4chem has 3 descriptor types, that's datamol(small), rdkit(medium size) and Mordred(large size). Also, 2 fingerprints types are used (morgan and rdkit). There is a slight difference in how these two fingerprint features are mapped back to molecules, and it has already been implemented.

As for the nice report, would a canvas be a good way? but there are several interpretability plots generated.

bar plot and beeswarm plot
scatter plots for 5 top features
waterfall plots for 5 samples
top bits are drawn for 5 samples (if fingerprints were used)
shap_highlights for each of 5 samples (if fingerprints were used)

miquelduranfrigola · 2024-07-30T06:05:19Z

Hi @HellenNamulinda this sounds good to me. Thanks for the update.
I think a canvas would be a good way for reporting. Your suggested plots look good. I would add, at least:

AUROC or Correlation plot for classification and regression, respectively.

miquelduranfrigola assigned HellenNamulinda Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model interpretability roadmap #35

Model interpretability roadmap #35

miquelduranfrigola commented Feb 22, 2024 •

edited

Loading

HellenNamulinda commented Mar 19, 2024 •

edited

Loading

miquelduranfrigola commented Mar 20, 2024

HellenNamulinda commented Mar 22, 2024 •

edited

Loading

miquelduranfrigola commented Mar 23, 2024

HellenNamulinda commented Mar 25, 2024

miquelduranfrigola commented Mar 25, 2024

HellenNamulinda commented Apr 17, 2024

HellenNamulinda commented May 14, 2024

HellenNamulinda commented May 28, 2024

HellenNamulinda commented Jun 4, 2024

miquelduranfrigola commented Jun 4, 2024

HellenNamulinda commented Jun 18, 2024 •

edited

Loading

HellenNamulinda commented Jul 25, 2024

miquelduranfrigola commented Jul 25, 2024

HellenNamulinda commented Jul 30, 2024

miquelduranfrigola commented Jul 30, 2024

Model interpretability roadmap #35

Model interpretability roadmap #35

Comments

miquelduranfrigola commented Feb 22, 2024 • edited Loading

Adding an interpretability module to ZairaChem

Background

Objectives

Steps

FAQ

Where do we create issues?

Is there a more comprehensive description of the project available?

HellenNamulinda commented Mar 19, 2024 • edited Loading

miquelduranfrigola commented Mar 20, 2024

HellenNamulinda commented Mar 22, 2024 • edited Loading

miquelduranfrigola commented Mar 23, 2024

HellenNamulinda commented Mar 25, 2024

miquelduranfrigola commented Mar 25, 2024

HellenNamulinda commented Apr 17, 2024

HellenNamulinda commented May 14, 2024

HellenNamulinda commented May 28, 2024

HellenNamulinda commented Jun 4, 2024

miquelduranfrigola commented Jun 4, 2024

HellenNamulinda commented Jun 18, 2024 • edited Loading

HellenNamulinda commented Jul 25, 2024

miquelduranfrigola commented Jul 25, 2024

HellenNamulinda commented Jul 30, 2024

miquelduranfrigola commented Jul 30, 2024

miquelduranfrigola commented Feb 22, 2024 •

edited

Loading

HellenNamulinda commented Mar 19, 2024 •

edited

Loading

HellenNamulinda commented Mar 22, 2024 •

edited

Loading

HellenNamulinda commented Jun 18, 2024 •

edited

Loading