Skip to content
/ tox Public

Machine learning pipeline for predicting molecular toxicity.

License

Notifications You must be signed in to change notification settings

nierja/tox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chemical Toxicity Prediction

Python TensorFlow Keras Keras Tuner License

Overview

This project explores the application of traditional machine learning and deep learning for predicting molecular toxicity. It tackles the challenge of toxicity prediction by generating ~20 different molecular representations and compares their performance on a large variety of models. This code was used for my Master Thesis Quantitative structure-activity relationship and machine learning.

Features

  • Utilizes TensorFlow and Keras for building and training deep neural networks.
  • Implements hyperparameter tuning using the HyperBand algorithm
  • Supports cross-validation, ensemble modeling, various dimensionality reduction techniques, and evaluation of key metrics.
  • Conducts extensive cleaning and preprocessing of chemical data
  • Offers a wide variety of molecular representations and models

Table of Contents

Installation

  1. Clone the repository, install all dependencies into a virtual environment, and activate it:

    git clone https://github.com/nierja/tox.git
    cd tox
    ./lib/initialize_venv.sh
    source ./lib/TOX_GPU_VENV/bin/activate
    export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python

Usage

  1. Generate desired descriptors for the training, validation, and test datasets by running:

    python3 src/descriptor_generation/generate.py

    For generating all descriptors across all targets, uncomment their names in generate.py.

  2. Tune the hyperparameters and train the machine learning and deep learning models:

    python3 src/DL/Tox21_tuner.py
    python3 src/ML/ML.py
  3. You can set model parameters as CLI parameters:

    python3 src/DL/Tox21_tuner.py --target=NR-AR --NN_type=DNN --n_layers=4 --fp=maccs

Results

This repository contains code for building a pipeline for toxicity prediction. Various 2D and 3D fingerprints are generated using RDKit and Mordred, and their suitability as molecular representations can be compared on a large variety of machine-learning and deep-learning models.

Both our deep learning and traditional machine learning models are employed on Tox21 and Ames Mutagenicity datasets. Their performance is evaluated against recently published models for toxicity prediction using the AUC-ROC metric and, regarding certain toxicity targets, shows improvement over these models. For further information. please see Master Thesis Quantitative structure-activity relationship and machine learning.

About

Machine learning pipeline for predicting molecular toxicity.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published