diff --git a/tutorials/multiomics-cancer-classification/extend-reading/extension-tasks.md b/tutorials/multiomics-cancer-classification/extend-reading/extension-tasks.md index e062875..50b38e9 100644 --- a/tutorials/multiomics-cancer-classification/extend-reading/extension-tasks.md +++ b/tutorials/multiomics-cancer-classification/extend-reading/extension-tasks.md @@ -3,7 +3,14 @@ Users can try to set the `cfg.DATASET.NUM_MODALITIES=1` to try only using single mRNA experission for prediction and compare its results with the ones using all three modalities. ## Task 2 - Try another dataset +To demonstrate the generalizability of the model, we also provide a dataset for Alzheimer's Disease, [**ROSMAP**](https://www.synapse.org/Synapse:syn3219045) [1,2] +Besides, we also provide a configuration file for **ROSMAP**, named [`configs/ROSMAP.yaml`](https://github.com/pykale/embc-mmai25/blob/main/tutorials/multiomics-cancer-classification/configs/ROSMAP.yaml). + To try ROSMAP dataset, replace `"experiments/BRCA.yaml"` with `"experiments/ROSMAP.yaml"` in the following line under Configuration section and run the pipeline again. ```python cfg.merge_from_file("experiments/BRCA.yaml") ``` +## Reference +[1] Bennett, D. A., Buchman, A. S., Boyle, P. A., Barnes, L. L., Wilson, R. S., & Schneider, J. A. (2018). Religious orders study and rush memory and aging project. Journal of Alzheimerβs disease, 64(s1), S161-S189. + +[2] De Jager, P.L.; Ma, Y.; McCabe, C.; Xu, J.; Vardarajan, B.N.; Felsky, D.; Klein, H.U.; White, C.C.; Peters, M.A.; Lodgson, B.; et al. (2018). A multi-omic atlas of the human frontal cortex for aging and Alzheimerβs disease research. Scientific Data 5, 1-13 diff --git a/tutorials/multiomics-cancer-classification/images/mogonet-pykale-api.png b/tutorials/multiomics-cancer-classification/images/mogonet-pykale-api.png index 7e2e257..411bfed 100644 Binary files a/tutorials/multiomics-cancer-classification/images/mogonet-pykale-api.png and b/tutorials/multiomics-cancer-classification/images/mogonet-pykale-api.png differ diff --git a/tutorials/multiomics-cancer-classification/tutorial-cancer.ipynb b/tutorials/multiomics-cancer-classification/tutorial-cancer.ipynb index a695829..bf16e38 100644 --- a/tutorials/multiomics-cancer-classification/tutorial-cancer.ipynb +++ b/tutorials/multiomics-cancer-classification/tutorial-cancer.ipynb @@ -17,13 +17,13 @@ "source": [ "In this tutorial, we will use a [**M**ulti-**O**mics **G**raph c**O**nvolutional **NET**works (MOGONET) by **Wang et al. (Nature Communication, 2021)**](https://www.nature.com/articles/s41467-021-23774-w) [1] pipeline implemented in `PyKale` [2] to integrate **patient multiomics data** for **cancer classification**.\n", "\n", - "We will work with multiomics data from two datasets: [**BRCA** of TCGA](https://www.cancerimagingarchive.net/collection/tcga-brca/) [3] and [**ROSMAP**](https://www.synapse.org/Synapse:syn3219045) [4,5]. The BRCA dataset has five subtypes, while the ROSMAP dataset has only two. Three omics modalities will be used: mRNA expression, DNA methylation, and miRNA expression.\n", + "We will work with multiomics data from [**BRCA** of TCGA](https://www.cancerimagingarchive.net/collection/tcga-brca/) [3], which has five subtypes as the labels of classification. Three omics modalities will be used: mRNA expression, DNA methylation, and miRNA expression.\n", "\n", "The multimodal approach used in this tutorial involves **late fusion**, where a cross-omics tensor is constructed for the prediction probability fusion across three omics modalities.\n", "\n", "The main tasks of this tutorial are:\n", "\n", - "- Load BRCA or ROSMAP dataset.\n", + "- Load BRCA dataset.\n", "- Define a MOGONET model.\n", "- Train and evaluate the MOGONET model on the multiomics data.\n", "- Obtain the feature importance and visualize the interpretation of the model." @@ -46,17 +46,7 @@ "execution_count": null, "id": "551867b5", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "fatal: destination path 'mmai-tutorials' already exists and is not an empty directory.\n", - "/content/mmai-tutorials/tutorials/multiomics-cancer-classification\n", - "Changed working directory to: /content/mmai-tutorials/tutorials/multiomics-cancer-classification\n" - ] - } - ], + "outputs": [], "source": [ "import os\n", "\n", @@ -97,18 +87,7 @@ "execution_count": null, "id": "6050d5b4", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n", - " Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n", - " Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n", - "pykale, gdown, nilearn, and yacs installed successfully β \n" - ] - } - ], + "outputs": [], "source": [ "%pip install --quiet \\\n", " \"pykale[example]@git+https://github.com/pykale/pykale@main\" \\\n", @@ -165,14 +144,6 @@ "cfg.merge_from_file(\"configs/BRCA.yaml\")" ] }, - { - "cell_type": "markdown", - "id": "71add965", - "metadata": {}, - "source": [ - "Besides, we also provide a configuration file for another dataset **ROSMAP**, named [`configs/ROSMAP.yaml`](https://github.com/pykale/embc-mmai25/blob/main/tutorials/multiomics-cancer-classification/configs/ROSMAP.yaml). Users can try with this dataset later." - ] - }, { "cell_type": "markdown", "id": "66a1eb4b", @@ -215,35 +186,7 @@ "execution_count": null, "id": "f85914b1", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "DATASET:\n", - " NAME: TCGA_BRCA\n", - " NUM_CLASSES: 5\n", - " NUM_MODALITIES: 3\n", - " RANDOM_SPLIT: False\n", - " ROOT: dataset/\n", - " URL: https://github.com/pykale/data/raw/main/multiomics/TCGA_BRCA.zip\n", - "MODEL:\n", - " EDGE_PER_NODE: 10\n", - " EQUAL_WEIGHT: False\n", - " GCN_DROPOUT_RATE: 0.5\n", - " GCN_HIDDEN_DIM: [400, 400, 200]\n", - " GCN_LR: 0.0005\n", - " GCN_LR_PRETRAIN: 0.001\n", - " VCDN_LR: 0.001\n", - "OUTPUT:\n", - " OUT_DIR: ./outputs\n", - "SOLVER:\n", - " MAX_EPOCHS: 500\n", - " MAX_EPOCHS_PRETRAIN: 100\n", - " SEED: 2023\n" - ] - } - ], + "outputs": [], "source": [ "print(cfg)" ] @@ -255,7 +198,7 @@ "source": [ "## Step 1: Data Loading and Preparation\n", "\n", - "We use two multiomics benchmarks in this tutorial, BRCA and ROSMAP, which have been provided by the authors of MOGONET paper in [their repository](https://github.com/txWang/MOGONET).\n", + "We use the multiomics benchmark **BRCA** in this tutorial, which have been provided by the authors of MOGONET paper in [their repository](https://github.com/txWang/MOGONET).\n", "\n", "If users are interested in more details regarding **data organization, downloading, loading, and pre-processing**, please refer to the [Data page](https://pykale.github.io/mmai-tutorials/tutorials/multiomics-cancer-classification/extend-reading/data.html) of the tutorial." ] @@ -350,27 +293,7 @@ "execution_count": null, "id": "676ebd93", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Dataset info:\n", - " number of modalities: 3\n", - " number of classes: 5\n", - "\n", - " modality | total samples | num train | num test | num features\n", - " -----------------------------------------------------------------\n", - " 1 | 875 | 612 | 263 | 1000 \n", - " 2 | 875 | 612 | 263 | 1000 \n", - " 3 | 875 | 612 | 263 | 503 \n", - " -----------------------------------------------------------------\n", - "\n", - "\n" - ] - } - ], + "outputs": [], "source": [ "print(multiomics_data)" ] @@ -418,48 +341,7 @@ "execution_count": null, "id": "da221bd6", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "Model info:\n", - " Unimodal encoder:\n", - " (1) MogonetGCN(\n", - " (conv1): MogonetGCNConv(1000, 400)\n", - " (conv2): MogonetGCNConv(400, 400)\n", - " (conv3): MogonetGCNConv(400, 200)\n", - ") (2) MogonetGCN(\n", - " (conv1): MogonetGCNConv(1000, 400)\n", - " (conv2): MogonetGCNConv(400, 400)\n", - " (conv3): MogonetGCNConv(400, 200)\n", - ") (3) MogonetGCN(\n", - " (conv1): MogonetGCNConv(503, 400)\n", - " (conv2): MogonetGCNConv(400, 400)\n", - " (conv3): MogonetGCNConv(400, 200)\n", - ")\n", - "\n", - " Unimodal decoder:\n", - " (1) LinearClassifier(\n", - " (fc): Linear(in_features=200, out_features=5, bias=True)\n", - ") (2) LinearClassifier(\n", - " (fc): Linear(in_features=200, out_features=5, bias=True)\n", - ") (3) LinearClassifier(\n", - " (fc): Linear(in_features=200, out_features=5, bias=True)\n", - ")\n", - "\n", - " Multimodal decoder:\n", - " VCDN(\n", - " (model): Sequential(\n", - " (0): Linear(in_features=125, out_features=125, bias=True)\n", - " (1): LeakyReLU(negative_slope=0.25)\n", - " (2): Linear(in_features=125, out_features=5, bias=True)\n", - " )\n", - ")\n" - ] - } - ], + "outputs": [], "source": [ "print(mogonet_model)" ] @@ -489,18 +371,7 @@ "execution_count": null, "id": "7383c5c1", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "INFO:pytorch_lightning.utilities.rank_zero:π‘ Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.\n", - "INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True\n", - "INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores\n", - "INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs\n" - ] - } - ], + "outputs": [], "source": [ "import pytorch_lightning as pl\n", "\n", @@ -530,36 +401,7 @@ "execution_count": null, "id": "2b42b719", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "69bed99b2d194cdd9c32a523820d8579", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "Training: | | 0/? [00:00, ?it/s]" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=100` reached.\n" - ] - } - ], + "outputs": [], "source": [ "trainer_pretrain.fit(network)" ] @@ -580,18 +422,7 @@ "execution_count": null, "id": "e94b710d", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "INFO:pytorch_lightning.utilities.rank_zero:π‘ Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.\n", - "INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True\n", - "INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores\n", - "INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs\n" - ] - } - ], + "outputs": [], "source": [ "network = mogonet_model.get_model(pretrain=False)\n", "trainer = pl.Trainer(\n", @@ -620,36 +451,7 @@ "execution_count": null, "id": "b3e66c8f", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "305496c13a134426ab2261926a17c518", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "Training: | | 0/? [00:00, ?it/s]" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=500` reached.\n" - ] - } - ], + "outputs": [], "source": [ "trainer.fit(network)" ] @@ -668,66 +470,7 @@ "execution_count": null, "id": "019e2e7b", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "9ebbf6fd784944a39d374b60e4c53024", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "Testing: | | 0/? [00:00, ?it/s]" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
βββββββββββββββββββββββββββββ³ββββββββββββββββββββββββββββ\n", - "β Test metric β DataLoader 0 β\n", - "β‘ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©\n", - "β Accuracy β 0.8019999861717224 β\n", - "β F1 macro β 0.6880000233650208 β\n", - "β F1 weighted β 0.7699999809265137 β\n", - "βββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββ\n", - "\n" - ], - "text/plain": [ - "βββββββββββββββββββββββββββββ³ββββββββββββββββββββββββββββ\n", - "β\u001b[1m \u001b[0m\u001b[1m Test metric \u001b[0m\u001b[1m \u001b[0mβ\u001b[1m \u001b[0m\u001b[1m DataLoader 0 \u001b[0m\u001b[1m \u001b[0mβ\n", - "β‘ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©\n", - "β\u001b[36m \u001b[0m\u001b[36m Accuracy \u001b[0m\u001b[36m \u001b[0mβ\u001b[35m \u001b[0m\u001b[35m 0.8019999861717224 \u001b[0m\u001b[35m \u001b[0mβ\n", - "β\u001b[36m \u001b[0m\u001b[36m F1 macro \u001b[0m\u001b[36m \u001b[0mβ\u001b[35m \u001b[0m\u001b[35m 0.6880000233650208 \u001b[0m\u001b[35m \u001b[0mβ\n", - "β\u001b[36m \u001b[0m\u001b[36m F1 weighted \u001b[0m\u001b[36m \u001b[0mβ\u001b[35m \u001b[0m\u001b[35m 0.7699999809265137 \u001b[0m\u001b[35m \u001b[0mβ\n", - "βββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββ\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "[{'Accuracy': 0.8019999861717224,\n", - " 'F1 weighted': 0.7699999809265137,\n", - " 'F1 macro': 0.6880000233650208}]" - ] - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "trainer.test(network)" ] @@ -750,18 +493,7 @@ "execution_count": null, "id": "f061dd93", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "INFO:pytorch_lightning.utilities.rank_zero:π‘ Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.\n", - "INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True\n", - "INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores\n", - "INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs\n" - ] - } - ], + "outputs": [], "source": [ "from kale.interpret.model_weights import select_top_features_by_masking\n", "import pytorch_lightning as pl\n", @@ -817,13 +549,7 @@ "execution_count": null, "id": "2dd9e5e3", "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [] - } - ], + "outputs": [], "source": [ "f1_key = \"F1\" if multiomics_data.num_classes == 2 else \"F1 macro\"\n", "df_featimp_top = select_top_features_by_masking(\n", @@ -849,45 +575,7 @@ "execution_count": null, "id": "c984bdb1", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Rank\tFeature name \tOmics\tImportance\n", - " 1\tMSLN|10232 \t 0\t21.0000\n", - " 2\thsa-mir-9-2 \t 2\t17.6050\n", - " 3\thsa-mir-9-1 \t 2\t16.0960\n", - " 4\thsa-mir-203 \t 2\t15.0900\n", - " 5\tABCC11|85320 \t 0\t13.0000\n", - " 6\tTMEM207 \t 1\t13.0000\n", - " 7\tHOXD11 \t 1\t13.0000\n", - " 8\tKRTAP3-1 \t 1\t13.0000\n", - " 9\tOR1J4 \t 1\t13.0000\n", - " 10\tGPR37L1 \t 1\t13.0000\n", - " 11\thsa-mir-2115 \t 2\t11.5690\n", - " 12\thsa-mir-187 \t 2\t11.5690\n", - " 13\thsa-let-7a-3 \t 2\t9.5570\n", - " 14\thsa-let-7f-2 \t 2\t9.0540\n", - " 15\thsa-mir-205 \t 2\t8.5510\n", - " 16\thsa-mir-551b \t 2\t8.5510\n", - " 17\tANKRD45|339416 \t 0\t8.0000\n", - " 18\tNOTCH1|4851 \t 0\t8.0000\n", - " 19\tMDGA2|161357 \t 0\t8.0000\n", - " 20\tARHGEF4|50649 \t 0\t8.0000\n", - " 21\tCRHR1|1394 \t 0\t8.0000\n", - " 22\tCXCL3|2921 \t 0\t8.0000\n", - " 23\tCSDA|8531 \t 0\t8.0000\n", - " 24\tPI3|5266 \t 0\t8.0000\n", - " 25\tSLC43A3|29015 \t 0\t8.0000\n", - " 26\tTRIML2|205860 \t 0\t8.0000\n", - " 27\tRDH10|157506 \t 0\t8.0000\n", - " 28\tIFFO2|126917 \t 0\t8.0000\n", - " 29\tISL2|64843 \t 0\t8.0000\n", - " 30\tFGFBP1|9982 \t 0\t8.0000\n" - ] - } - ], + "outputs": [], "source": [ "print(\"{:>4}\\t{:<20}\\t{:>5}\\t{}\".format(\"Rank\", \"Feature name\", \"Omics\", \"Importance\"))\n", "for rank, row in enumerate(df_featimp_top.itertuples(index=False), 1):\n", @@ -908,10 +596,7 @@ "[3] Lingle, W., Erickson, B. J., Zuley, M. L., Jarosz, R., Bonaccio, E., Filippini, J., Net, J. M., Levi, L., Morris, E. A., Figler, G. G., Elnajjar, P., Kirk, S., Lee, Y., Giger, M., & Gruszauskas, N. (2016). The Cancer Genome Atlas Breast Invasive Carcinoma Collection (TCGA-BRCA) (Version 3) [Data set]. The Cancer Imaging Archive.\n", "\n", "\n", - "\n", - "[4] Bennett, D. A., Buchman, A. S., Boyle, P. A., Barnes, L. L., Wilson, R. S., & Schneider, J. A. (2018). Religious orders study and rush memory and aging project. Journal of Alzheimerβs disease, 64(s1), S161-S189.\n", - "\n", - "[5] De Jager, P.L.; Ma, Y.; McCabe, C.; Xu, J.; Vardarajan, B.N.; Felsky, D.; Klein, H.U.; White, C.C.; Peters, M.A.; Lodgson, B.; et al. (2018). A multi-omic atlas of the human frontal cortex for aging and Alzheimerβs disease research. Scientific Data 5, 1-13" + "\n" ] } ], diff --git a/workshop/intro.md b/workshop/intro.md index 61f5d82..e4fc057 100644 --- a/workshop/intro.md +++ b/workshop/intro.md @@ -21,25 +21,25 @@ This workshop covers the tutorials for **four biomedical applications** using th **Tutorial Topics:** 1. **Brain Disorder Diagnosis** - - **Dataset**: ABIDE (Autism Brain Imaging Data Exchange) + - **Dataset**: [ABIDE (Autism Brain Imaging Data Exchange)](https://fcon_1000.projects.nitrc.org/indi/abide/) - **Modalities**: Neuroimaging (fMRI) and phenotypic features (e.g., age, gender, IQ) - **Task**: Use neuroimaging and phenotypic data for autism classification - **Multimodal approach**: Regularization - using phenotypic features to regularize feature embedding for reducing the phenotypic effect (e.g. site effect) in neuroimaging data to improve cross-site classification performance. 2. **Cardiovascular Disease Assessment** - - **Dataset**: MIMIC Chest X-rays and ECG signals + - **Dataset**: MIMIC [Chest X-rays](https://physionet.org/content/mimic-cxr/2.1.0/) and [ECG signals](https://physionet.org/content/mimic-iv-ecg/1.0/) - **Modalities**: Chest X-ray images and ECG signals - **Task**: Integrate imaging and physiological signals for classifying health and cardiothoracic abnormalities - **Multimodal approach**: Hybrid fusion - combining CXR and ECG at feature and decision level for improved classification. 3. **Cancer Classification** - - **Dataset**: TCGA (The Cancer Genome Atlas) + - **Dataset**: [TCGA (The Cancer Genome Atlas)](https://www.cancerimagingarchive.net/collection/tcga-brca/) - **Modalities**: DNA methylation, mRNA expression, and miRNA expression. - **Task**: Combine genomics and transcriptomics data for cancer classification - **Multimodal approach**: Late fusion - cross-omics tensor for probability fusion. 4. **DrugβTarget Interaction Prediction** - - **Dataset**: BindingDB and BioSNAP + - **Dataset**: [BindingDB](https://www.bindingdb.org/rwd/bind/index.jsp) and [BioSNAP](https://snap.stanford.edu/biodata/) - **Modalities**: Protein structures (3D) and molecular graphs (SMILES) - **Task**: Predict molecular interactions from structural and textual features - **Multimodal approach**: Interaction - bilinear interaction between protein and molecular embedding.