MMSyn is a multimodal deep learning framework for prediction of synergistic drug combinations by integrating multimodal data.
cell lines
- ONeil_31_gsva_1329_dim.csv - pathway scores of cell lines
- gene expressin data is downloaded from CCLE (the Cancer Cell Line Encyclopedia)
- DNA copy number data is downloaded from CCLE (the Cancer Cell Line Encyclopedia)
drug
- 38_drug_smiles.csv - SMILES (Simplified molecular input line entry system) of drugs
dataset
- drug_pair_cell_line_triple.csv - the effects of 538 pairwise drug combinations in 30 cell lines
The code has been tested running under Python 3.7. The required package are as follows:
- pytorch == 2.0.0+cu118
- numpy == 1.26.0
- sklearn == 1.0.2
- networkx == 2.8.4
- pandas == 1.2.4
- rdkit == 2023.3.1
- torch_geometric == 2.3.0
- cell_autoencoder.py: learn low_dimensional representations from high-dimensional cell line features
- dataset.py: the dataset objects generated by PyG
- metric.py: evaluation metric functions
- model.py: details of MMSyn model
- preprocess.py: load data and convert to pytorch format
- pubchemfp.py: generate drug pubchem fingerprints
- simles2graph.py: convert SMILES sequence to graph
- train.py: train the model and make predictions
- trainer.py: training and evaluation functions
- Install dependencies, including torch2.0, torch_geometric, sklearn, rdkit, and networkx.
- Run cell_autoencoder.py to reduce the dimensionality of the DNA copynumberdata and gene expression data.
- Run preprocess.py to convert label data and feature data into pytorch format.
- Run train.py for training and prediction.