<a href="https://colab.research.google.com/github/maragraziani/interpretAI_DigiPath/blob/main/data_setup.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# This notebook helps you to set up the data for the experiments.
### You are encouraged to do this **before the workshop**, 

An easy way to get everything set up is connecting the Colab Notebook to your GDrive to store the data. 
This notebook will drive you step by step through the passages. 

You can start by running the code in the box below. A link will be generated asking the permission to connect to your GDrive. You should log in with your gmail account, accept the request and copy paste the generated code inside the texbox.

If you do not want to connect the Colab to your GDrive, remember to upload manually the data folder in the github (path/) to the runtime filesystem.

The cells in this notebook will download the files directly in your drive folder. 

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
import os
if os.path.exists('/content/drive/MyDrive/CNNinterpret')==False:
    os.mkdir('/content/drive/MyDrive/CNNinterpret')

This demo uses the Camelyon dataset [1]. 

To facilitate research on these challenging data, a subset of images with fixed splits called PatchCamelyon (PCam) was released for benchmarking by Bastiaan S, Veeling and Jasper Linmans, Jim Winkens, Taco Cohen and Max Welling [2].
**You can add the PCam data to your GDrive account by selecting the option "create a shortcut to my Drive" at this link:** https://drive.google.com/drive/folders/1gHou49cA1s5vua2V5L98Lt8TiWA3FrKB?usp=sharing
Otherwise, you can download the data by following this notebook. You are strongly encouraged to perform the download before the workshop, as it may take several minutes.

NOTE: you will need to run some of the download cells below even if you already downloaded the data beforehand. These will be highlighted in the code. 

[1] B. S. Veeling, J. Linmans, J. Winkens, T. Cohen, M. Welling. "Rotation Equivariant CNNs for Digital Pathology". arXiv:1806.03962

[2] Ehteshami Bejnordi et al. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer. JAMA: The Journal of the American Medical Association, 318(22), 2199–2210. doi:jama.2017.14585

In [4]:
!wget https://zenodo.org/record/4669282/files/camelyonpatch_level_2_split_valid_y.h5.gz
!mv camelyonpatch_level_2_split_valid_y.h5.gz /content/drive/MyDrive/CNNinterpret/
!gunzip /content/drive/MyDrive/CNNinterpret/camelyonpatch_level_2_split_valid_y.h5.gz

--2021-04-07 12:38:22--  https://zenodo.org/record/4669282/files/camelyonpatch_level_2_split_valid_y.h5.gz
Resolving zenodo.org (zenodo.org)... 137.138.76.77
Connecting to zenodo.org (zenodo.org)|137.138.76.77|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3038 (3.0K) [application/octet-stream]
Saving to: ‘camelyonpatch_level_2_split_valid_y.h5.gz’


2021-04-07 12:38:23 (413 MB/s) - ‘camelyonpatch_level_2_split_valid_y.h5.gz’ saved [3038/3038]



In [6]:
!wget https://zenodo.org/record/2546921/files/camelyonpatch_level_2_split_valid_x.h5.gz
!mv camelyonpatch_level_2_split_valid_x.h5.gz /content/drive/MyDrive/CNNinterpret/
!gunzip /content/drive/MyDrive/CNNinterpret/camelyonpatch_level_2_split_valid_x.h5.gz

--2021-04-07 12:44:06--  https://zenodo.org/record/2546921/files/camelyonpatch_level_2_split_valid_x.h5.gz
Resolving zenodo.org (zenodo.org)... 137.138.76.77
Connecting to zenodo.org (zenodo.org)|137.138.76.77|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 805965320 (769M) [application/octet-stream]
Saving to: ‘camelyonpatch_level_2_split_valid_x.h5.gz’


2021-04-07 12:46:30 (5.43 MB/s) - ‘camelyonpatch_level_2_split_valid_x.h5.gz’ saved [805965320/805965320]



In [7]:
!wget https://zenodo.org/record/4669282/files/elephant.jpg
!mv elephant.jpg /content/drive/MyDrive/CNNinterpret/
!wget https://zenodo.org/record/4669282/files/hummingbird.jpg
!mv hummingbird.jpg /content/drive/MyDrive/CNNinterpret/
!wget https://zenodo.org/record/4669282/files/normalizers.py
!mv normalizers.py /content/drive/MyDrive/CNNinterpret/
!wget https://zenodo.org/record/4669282/files/normalizing_patch.npy
!mv normalizing_patch.npy /content/drive/MyDrive/CNNinterpret/
!wget https://zenodo.org/record/4669282/files/weights.h5
!mv weights.h5 /content/drive/MyDrive/CNNinterpret/

--2021-04-07 12:51:47--  https://zenodo.org/record/4669282/files/elephant.jpg
Resolving zenodo.org (zenodo.org)... 137.138.76.77
Connecting to zenodo.org (zenodo.org)|137.138.76.77|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1167136 (1.1M) [image/jpeg]
Saving to: ‘elephant.jpg’


2021-04-07 12:51:49 (1.09 MB/s) - ‘elephant.jpg’ saved [1167136/1167136]

--2021-04-07 12:51:49--  https://zenodo.org/record/4669282/files/hummingbird.jpg
Resolving zenodo.org (zenodo.org)... 137.138.76.77
Connecting to zenodo.org (zenodo.org)|137.138.76.77|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21401 (21K) [image/jpeg]
Saving to: ‘hummingbird.jpg’


2021-04-07 12:51:51 (117 KB/s) - ‘hummingbird.jpg’ saved [21401/21401]

--2021-04-07 12:51:51--  https://zenodo.org/record/4669282/files/normalizers.py
Resolving zenodo.org (zenodo.org)... 137.138.76.77
Connecting to zenodo.org (zenodo.org)|137.138.76.77|:443... connected.
HTTP request sent, awaiting

The files should now be all set for the experiments. Do not forget to link again your drive folder to the Colab notebook to run the experiments (you will find this already in the code where needed)
