Skip to content

openkinome/kinoml

Repository files navigation

KinoML

License: MIT CI DOCS codecov

GitHub closed pr GitHub open pr GitHub closed issues GitHub open issues

KinoML is a modular and extensible framework for machine learning (ML) in small molecule drug discovery with a special focus on kinases. It enables users to easily:

  1. Access and download data: from online data sources, such as ChEMBL or PubChem as well as from their own files, with a focus on data availability and inmutability.
  2. Featurize data: so that it is ML readeable. KinoML offers a wide variety of featurization schemes, from ligand-only to ligand:kinase complexes.
  3. Run structure-based experiments: using KinoML's implemented models, with a special focus on reproducibility.

The purpose of KinoML is to help users conduct ML kinase experiments, from data collection to model evaluation. Tutorials on how to use KinoML as well as working examples showcasing how to use KinoML to perform experiments end-to-end can be found here. Note that despite KinoML's focus being on kinases, it can be applied to any protein system. For more detailed instructions, please refer to the Documentation.

A KinoML workflow to achieve points 1, 2 and 3 is illustrated in the following image:

KinoML object model
Fig. 1: KinoML workflow overview. Colors represent objects of the same class.

Notice

Please be aware that this code is work in progress and is not guaranteed to provide the expected results. The API can change at any time without warning.

Installation

KinoML and its dependencies can be installed via conda/mamba.

git clone https://github.com/openkinome/kinoml.git  # clone the repo
cd kinoml  # change directory to local copy of repo
mamba env create -n kinoml -f devtools/conda-envs/test_env.yaml
conda activate kinoml  
python -m pip install git+https://github.com/openkinome/kinoml.git 

Usage

The tutorials folder is divided into two parts:

  1. Getting started: the notebooks in this folder aim to give the user an understanding of how to use KinoML to: (1) access and download data, (2) featurize data, and (3) run a (simple) ML model on the featurized data obtained with KinoML to predict ligand binding affinity. Additionally, this folder contains notebooks that explain the KinoML object model and how to access the different objects, as well as notebooks showcasing all the different featurizers implemented within KinoML and how to use each of them.

  2. Experiments: this folder contains four individual structure-based experiments to predict ligand binding affinity. All experiments use KinoML to obtain the data, featurize it and train and evaluate a ML model implemented within thekinoml.ml class. The purpose of these experiments is to display usage examples of KinoML to conduct end-to-end structure-based kinases experiments.

⚠️ You will need a valid OpenEye License for the structural featurizers of the tutorials to work. For the Schrodinger featurizers tutorial you will also need a Schrodinger License!

For users interested in more KinoML usage examples, they can checkout other repositories under the initative OpenKinome. Particularly, other two repositories that may be of interest are:

  • kinodata: repository with ready-to-use kinase-focused datasets from ChEMBL, as well as tutorials explaining how to process kinase data for ML applications.
  • experiments-binding-affinity: more advanced and reproducible ML experiments using KinoML.

Copyright (c) 2019, OpenKinome

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.1.