[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://colab.research.google.com/github/monte-flora/explain_tutorial/blob/main/src/tutorial_notebooks/Notebook00_Download_Data_and_Models.ipynb)

### 1. Download the Data 

The three datasets and pre-fit machine learning models are provided at the following [link](https://doi.org/10.5281/zenodo.8136709). The following code downloads the datasets and models and unzips them. Unfortunately, the Zenodo download can take up to 10 minutes. If you are using Google Colab, then the following code will upload the data to your local Google drive. 

### 2. Unzip the data 

This step will unzip the datasets and models, which will create directory named "datasets" and "models" that  contain CSV-format datasets and the corresponding models, respectively. This will unzip the data in the same location as where the data was downloaded. We recommend transfering the data and models to another location. You'll then use those data paths for the other notebooks to load the data and models. 


In [1]:
%pip install zenodo_get PyDrive httplib2==0.15.0

In [2]:
def using_colab():
    try:
        import google.colab
        return True
    except ImportError:
        return False
import sys, os
from glob import glob 
    
if using_colab():
    # When using Google Colab, need to clone the explain_tutorial repo
    # Otherwise, the code assumes you are running these notebooks
    # in their original directory structure. 
    !git clone https://github.com/monte-flora/explain_tutorial
    sys.path.append('explain_tutorial')   
else:
    from os.path import dirname
    path = dirname(dirname(os.getcwd()))
    sys.path.append(path)

# src is the source directory of the explain_tutorial repo. 
from src.io.io import fetch_zenodo
from src.io.colab_io import GoogleDriveIO

fetch_zenodo()

# When using Google Colab, we will want to store the downloaded zenodo data 
# for future use! 
if using_colab():
    uploader= GoogleDriveIO()
    
    dataset_names = ['lightning', 'road_surface', 'severe_wind']
    dataset_paths = [f'/content/datasets/{name}_dataset.csv' for name in dataset_names]

    for path, name in zip(dataset_paths, dataset_names):
        print(f'Uploading {name, path} dataset to Google Drive...')
        uploader.upload(path, title=f'{name} dataset')

    model_paths = glob('/content/models/*')
    for path in model_paths:
        if 'Isotonic' not in path:
            if 'JTTI' in path:
                name = 'road_surface'
            elif 'NN' in path:
                name = 'lightning'
            else:
                name = 'severe_wind'
            print(f'Uploading {name, path} model to Google Drive...')
            uploader.upload(path, title=f'{name} model')