Prediction of small molecules as potent inhibitors of various cancer targets using Artificial Intelligence (AI)
It is a framework centered around Python, specializing in machine learning and deep learning for the field of drug discovery. Equipped with an array of features, it streamlines various drug discovery and chemoinformatics challenges. Its foundation incorporates Scikit-learn, Numpy, Pandas, Matplotlib, Seaborn, PaDEL etc. facilitating the creation of personalized machine learning and deep learning models or the utilization of existing ones. Furthermore, it leverages the RDKit framework to calculate ADMET properties.
- Requirements
- Installation
- Getting Started
- Data Collection from ChEMBL
- Load dataset from csv
- Load dataset from sdf
- Exploratory data analysis (EDA)
- Compound Featurization
- Feature Selection
- Unsupervised Exploration
- Data Split
- Build, train and evaluate a model
- Hyperparameter tuning
- Feature Importance (Shap Values)
- Unbalanced Datasets
- Pipelines
- Pipeline Optimization
- About Us
- Citing CancerAI
- Related Publications
- License
To run this project, one need to have the following Python libraries installed:
scikit-learn numpy pandas matplotlib seaborn PaDELpy rdkit Please use the requirements.txt file to upload all the necessary libraries.
Instructions for installation...
Instructions for installation using pip...
Instructions for manual installation...
Introduction or overview of getting started section...
The [ChEMBL Database] (https://www.ebi.ac.uk/chembl/) compiles curated bioactivity information on over 2 million compounds sourced from in excess of 88,000 documents and 1.6 million assays. This comprehensive dataset encompasses details across 15,000 targets, 2000 cells, and 45,000 indications. The data provided is current as of January 02, 2024, under ChEMBL version 33. We need to install "chembl_webresource_client" library before we can download the data from ChEMBL33 database.
Instructions or code for loading dataset from CSV...
Instructions or code for loading dataset from SDF...
... (Continue with the rest of the sections)