# TORNET: Loading and Saving Data

This notebook loads TORNET data from Zenodo, converts the data to a PyTorch/TORNET dataloader class, then saves the data to a pickle file for later re-use. The purpose of this notebook is to download/generate the data one time. Any later notebooks can then access the data by unpickling the .pt files in PyTorch.

The following datasets are available. Each contains a zip file with the folders train and test. The 2013 zip file also contains catalog.csv, which is required for initial data loading.

- Tornet 2013 (3 GB) and catalog: https://zenodo.org/doi/10.5281/zenodo.10558658

- Tornet 2014 (15 GB): https://zenodo.org/doi/10.5281/zenodo.10558838

- Tornet 2015 (17 GB): https://zenodo.org/doi/10.5281/zenodo.10558853

- Tornet 2016 (16 GB): https://zenodo.org/doi/10.5281/zenodo.10565458

- Tornet 2017 (15 GB): https://zenodo.org/doi/10.5281/zenodo.10565489

- Tornet 2018 (12 GB): https://zenodo.org/doi/10.5281/zenodo.10565514

- Tornet 2019 (18 GB): https://zenodo.org/doi/10.5281/zenodo.10565535

- Tornet 2020 (17 GB): https://zenodo.org/doi/10.5281/zenodo.10565581

- Tornet 2021 (18 GB): https://zenodo.org/doi/10.5281/zenodo.10565670

- Tornet 2022 (19 GB): https://zenodo.org/doi/10.5281/zenodo.10565691

In [3]:
import subprocess
import os

def fetch_data(years, catalog = True, unzip = True):
    available_years = ["2013", "2014"," 2015", "2016"," 2017", "2018", "2019"," 2020", "2021", "2022"]
    dois = ["10.5281/zenodo.10558658", "10.5281/zenodo.10558838","10.5281/zenodo.10558853", "10.5281/zenodo.10565458", "10.5281/zenodo.10565489","10.5281/zenodo.10565514", "10.5281/zenodo.10565535","10.5281/zenodo.10565581", "10.5281/zenodo.10565670","10.5281/zenodo.10565691"]


    selected_dois = [str(y) for y in years if str(y) in available_years]

    for d in selected_dois:
        if catalog and "2013" not in selected_dois:
            subprocess.run(["zenodo_get", dois[0]])

        idx = available_years.index(d)
        subprocess.run(["zenodo_get", dois[idx]])

    print("Selected dois received: ", selected_dois)

    if unzip:
        for fo in os.listdir():
            if str(fo).endswith(".gz"):
                print("Unzipping: ", fo)
                command = "tar -xvf " + str(fo)
                subprocess.run(command)

        print("Data unzipped: ", selected_dois)

        

In [None]:
fetch_data([2013])