<a href="https://colab.research.google.com/github/nick11roberts/AutoML-Decathlon-hackathon/blob/main/automl_decathlon_starter_kit_hackathon.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AutoML Decathlon Hackathon Notebook

This notebook creates a self-contained environment for you to get a better understanding of how to work with the data, how your methods should be formatted, and how the competition pipeline interfaces with each.

There are two main steps of the competition pipeline:


1.   Ingestion - the datasets are loaded and the model is created, trained, and used to generate predictions
2.   Scoring - the generated predictions and true target outputs are used to calculate the final scores of model on each task. The scores are based on varying loss functions depending on the task, but each is defined such that a *lower* score indicates better performace.

7 out of the 10 competition tasks are present in the setup for this notebook, so that .

As you look through this notebook, keep in mind that what you mainly need to implement is the `Model` class in `model.py`. We recommend that you first look through each element of the `sample_code_submission` directory and especially the code example `model.py`. There are additional examples in the `simple_baseline_models` directory.

A short description of the elements of `sample_code_submission` is below. Further details will be explained as you step through this notebook.


1.   `metadata`: not relevant in this notebook environment. Required for official competition submissions through CodaLab; do not remove or edit 
2.   `tasks_to_run.yaml`: specifies a subset of the tasks to run the method on. If not included, will attempt to run on all 10 tasks.
3.   `model.py`: where you will implement your method. It contains the `Model` class which is used in the pipeline, and has 3 mandatory functions: `__init__`, `train`, and `test`.
4.    You may also include any other necessary files for your method in the directory along with the 3 elements above. 




# (1) Setup

---

Run these cells to set up the code environment and download the data.
Do not change or remove any of the existing commands in this section.

In [None]:
# Getting starter kit code
!git clone -b hackathon https://github.com/cxxz/automl_decathlon_starter_kit.git

Cloning into 'automl_decathlon_starter_kit'...
remote: Enumerating objects: 298, done.[K
remote: Counting objects: 100% (58/58), done.[K
remote: Compressing objects: 100% (56/56), done.[K
remote: Total 298 (delta 28), reused 1 (delta 0), pack-reused 240[K
Receiving objects: 100% (298/298), 196.13 KiB | 17.83 MiB/s, done.
Resolving deltas: 100% (159/159), done.


In [None]:
# Getting datasets, creates a dev_public directory in the required format
%cd automl_decathlon_starter_kit/
!wget https://storage.googleapis.com/decathlon_test/dev_public.zip
!unzip dev_public.zip
!rm dev_public.zip

/content/automl_decathlon_starter_kit
--2022-09-30 16:53:06--  https://storage.googleapis.com/decathlon_test/dev_public.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.200.128, 74.125.68.128, 74.125.24.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.200.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1152556537 (1.1G) [application/x-zip-compressed]
Saving to: ‘dev_public.zip’


2022-09-30 16:53:28 (50.3 MB/s) - ‘dev_public.zip’ saved [1152556537/1152556537]

Archive:  dev_public.zip
   creating: dev_public/md/
   creating: dev_public/md/cosmic/
  inflating: dev_public/md/cosmic/test_metadata.json  
  inflating: dev_public/md/cosmic/train_metadata.json  
   creating: dev_public/md/crypto/
  inflating: dev_public/md/crypto/test_metadata.json  
  inflating: dev_public/md/crypto/train_metadata.json  
   creating: dev_public/md/deepsea/
  inflating: dev_public/md/deepsea/test_metadata.json  
  inflating: dev

In [None]:
# Installing dependency
!pip install xgboost==1.6.1
# Feel free to add any others that you may want to use


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting xgboost==1.6.1
  Downloading xgboost-1.6.1-py3-none-manylinux2014_x86_64.whl (192.9 MB)
[K     |████████████████████████████████| 192.9 MB 71 kB/s 
Installing collected packages: xgboost
  Attempting uninstall: xgboost
    Found existing installation: xgboost 0.90
    Uninstalling xgboost-0.90:
      Successfully uninstalled xgboost-0.90
Successfully installed xgboost-1.6.1


# (2) Pipeline Toy Examples

---

This section will walk you through the competition pipeline with descriptions and small segmented examples, only operating on a single task.



Defining some variables and copying chosen model directory for testing purposes.

In [None]:
from os.path import join

%load_ext autoreload
%autoreload 2

'''
Choose 1 dataset to run toy examples on. 
Feel free to change this to any one of the included tasks.
'''
dataset = 'nottingham'

# copy simple model
baseline_dir = 'simple_baseline_models/'
test_dir = 'test_model'
inges_dir = 'ingestion/'
score_dir = 'scoring/'  

from sys import path
path.append(test_dir); path.append(inges_dir); path.append(score_dir); path.append(baseline_dir);

'''
Choose 1 baseline model to run toy examples with.
Feel free to change this to any one of the included baselines, or you can test your own
'''
model_simple = join(baseline_dir, 'decathlon_xgb', '.') # choose one simple baseline model; change this if needed

!mkdir -p $test_dir
!cp -r $model_simple $test_dir # copy the model directory

## (2.1) Ingestion; Dataset Loading

---

The first step of the ingestion process is to load the data. A custom `DecathlonDataset` class, which extends the pytorch `Dataset`, has been implemented within the pipeline. Each instance also holds metadata information, such as the input shape, output shape, number of samples, and task type (single-label classification, multi-label classification, regression). This metadata is passed to the `model`.

This is already implemented in the ingestion code, so you do not need to change or add anything.


In [None]:
from dev_datasets import DecathlonDataset, extract_metadata

train_dataset = DecathlonDataset(dataset, './dev_public', 'train')
test_dataset = DecathlonDataset(dataset, './dev_public', 'test')

md_train = extract_metadata(train_dataset)
md_test = extract_metadata(test_dataset)
print ("Dataset path: ", md_train.get_dataset_name())
print ("Input shape: ",  md_train.get_tensor_shape())
print ("Output shape:", md_train.get_output_shape())
print ("Dataset size: ",  md_train.size())

Dataset path:  nottingham
Input shape:  (1792, 88, 1, 1)
Output shape: (88,)
Dataset size:  693


A function similar to the one below is already implemented within `Model` class for you to use for training and testing. 

In [None]:
from torch.utils.data import DataLoader

'''
A similar function is implemented within the Model class.
'''
def get_dataloader(dataset, batch_size, split):
    """Get the PyTorch dataloader.
    Args:
        dataset:
        batch_size : batch_size for training set

    Return:
        dataloader: PyTorch Dataloader
    """
    if split == "train":
        dataloader = DataLoader(
            dataset,
            dataset.required_batch_size or batch_size,
            shuffle=True,
            drop_last=False,
            collate_fn=dataset.collate_fn,
        )
    elif split == "test":
        dataloader = DataLoader(
            dataset,
            dataset.required_batch_size or batch_size,
            shuffle=False,
            collate_fn=dataset.collate_fn,
        )
    return dataloader


batch_size = 1
train_loader = get_dataloader(train_dataset, batch_size, 'train')
test_loader = get_dataloader(test_dataset, batch_size, 'test')

In [None]:
# Printing output size
labels = []
for x, y in test_loader:
    if len(labels) < 10:
        print(x.shape, y.shape)
    label = y.tolist()
    labels += label

torch.Size([1, 1792, 88, 1, 1]) torch.Size([1, 88])
torch.Size([1, 1792, 88, 1, 1]) torch.Size([1, 88])
torch.Size([1, 1792, 88, 1, 1]) torch.Size([1, 88])
torch.Size([1, 1792, 88, 1, 1]) torch.Size([1, 88])
torch.Size([1, 1792, 88, 1, 1]) torch.Size([1, 88])
torch.Size([1, 1792, 88, 1, 1]) torch.Size([1, 88])
torch.Size([1, 1792, 88, 1, 1]) torch.Size([1, 88])
torch.Size([1, 1792, 88, 1, 1]) torch.Size([1, 88])
torch.Size([1, 1792, 88, 1, 1]) torch.Size([1, 88])
torch.Size([1, 1792, 88, 1, 1]) torch.Size([1, 88])


## (2.2) Ingestion; Creating and Training the Model

---

Within the ingestion process, the metadata of the dataset is used to initialize the `Model` instance. Your implementation of the `Model` class determines how the metadata of the task will affect aspects of your method, such as architecture, size, etc.

In [None]:
%load_ext autoreload
%autoreload 2

# set time budget and instantiate the model
from model import Model
M = Model(md_train) # pass the metadata of the dataset

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Device Found =  cuda 
Moving Model and Data into the device...


INPUT SHAPE =  (88, 1792, 1, 1)


Train the model for a certain time budget.

The `train` function of the `Model` class takes a `DecathlonDataset` instance for the training data and a `remaining_time_budget` as arguments. Additionally, there may be an optional validation dataset and corresponding metadata passed to `train`, in the case where the task has a special, pre-made validation split. In most cases, however, the validation data is not pre-prepared and you should create your own train/validation splits for model selection within the function.

In the cell below, the time budget is purely illustrative. 
It is passed to the model's mandatory `train` function, where the logic on how it should affect the training process must be implemented, otherwise your method may be at risk of timing out in the true ingestion procedure.

This logic is not implemented in the baselines and the following cell has no such time-out function.

Again, a reminder that this is already implemented in the ingestion so you do not need to change anything in the starter kit code.

In [None]:
time_budget = 200
M.train(train_dataset, val_dataset=None, val_metadata=None, remaining_time_budget=time_budget)

(693, 157696) (693, 88)
[0]	validation_0-logloss:0.13629
[1]	validation_0-logloss:0.05228
[2]	validation_0-logloss:0.02623
[3]	validation_0-logloss:0.01772
[4]	validation_0-logloss:0.01516
[5]	validation_0-logloss:0.01415
[6]	validation_0-logloss:0.01418
[7]	validation_0-logloss:0.01448
[8]	validation_0-logloss:0.01476
[9]	validation_0-logloss:0.01497
[10]	validation_0-logloss:0.01517
2022-09-30 16:57:02,352 INFO model.py: 135.02 sec used for xgboost. Total time used for training: 135.02 sec. 


## (2.) Testing and Scoring the Model

Testing is still a part of the ingestion process. The model's mandatory`test` method is called to generate and save the predictions on the test dataset.

The `time_budget` is the time remaining after `train` and passed to the model's mandatory `test` function, but is not utilized for the baselines or cell examples. During a local test or actual submission, it is important to note that the time required for testing is part of the overall time budget, so your method should leave enough time for generating predictions.

In [None]:
# get prediction by calling test method
prediction = M.test(test_dataset, remaining_time_budget=time_budget)
print(prediction.shape)
print(prediction[0])

2022-09-30 16:57:09,920 INFO model.py: Begin testing...
2022-09-30 16:57:10,297 INFO model.py: [+] Successfully made one prediction. 0.38 sec used. Total time used for testing: 0.38 sec. 
(174, 88)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In the scoring process (which follows the entire ingestion process), the saved predictions and true outputs are read, then used to calculate the score per task. The tasks' score types differ - for example, one may be based on AUROC while another is based on negative log likelihood loss - but they are all defined such that **a lower score indicates better performance**. 

In [None]:
'''
Quick test of get_solution from score.py
''' 

from score import get_solution

solution = get_solution("dev_public", dataset)
print(solution.shape)
print(solution)

2022-09-30 16:57:18,719 INFO score.py: solution shape=(174, 88)
(174, 88)
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]


In [None]:
from score import decathlon_scorer

score = decathlon_scorer(solution, prediction, dataset)
print ("Score: ", score)

Score:  0.26123303174972534


Once your method has a score on all 10 tasks, these task scores are compared with other submissions and baselines to determine the final AUP metric. We will not calculate any AUPs in this notebook since it is by definition a relative metric that depends on other submissions, but if you are interested to know more you can visit the CodaLab page where details are provided.

# (3) Pipeline Local Test

The command below simulates a run of the selected model through the ingestion and scoring processes. Use this to check your own method's results, and make sure the arguments are properly specified.

Datasets can be specified in `tasks_to_run.yaml` in the model directory. 

For this local test, it is recommended you always include this file; otherwise, it will attempt to run all 10 tasks which will clutter the output. We recommend you start with one or two of the smallest tasks and work you way up to the subset provided for the hackathon.

The printed output will contain both the ingestion logs and scoring logs for use in debugging. The `model.py` file you implement should contain a logger which you can use to output desired information to the ingestion log. 

In this case the `time_budget` (in seconds) argument matters, so make sure you set it to a reasonable value for your experiments, otherwise the ingestion will think you have timed out and give you bad scores for the tasks.

In [None]:
!python run_local_test.py --code_dir=./test_model --dataset_dir=./dev_public --time_budget=2000

2022-09-30 16:57:41 INFO run_local_test.py: ##################################################
2022-09-30 16:57:41 INFO run_local_test.py: Begin running local test using
2022-09-30 16:57:41 INFO run_local_test.py: code_dir = test_model
2022-09-30 16:57:41 INFO run_local_test.py: dataset_dir = dev_public
2022-09-30 16:57:41 INFO run_local_test.py: ##################################################
2022-09-30 16:57:43,775 INFO ingestion.py: Found user-specified task list: navierstokes spherical ninapro deepsea nottingham crypto ember
2022-09-30 16:57:43,775 INFO ingestion.py: Starting ingestion for navierstokes
2022-09-30 16:57:43,776 INFO ingestion.py: Starting ingestion for navierstokes, this has a time constraint of 2000.0 s.
2022-09-30 16:57:43,776 INFO ingestion.py: ************************************************
2022-09-30 16:57:43,776 INFO ingestion.py: ******** Processing dataset Navierstokes ********
2022-09-30 16:57:43,776 INFO ingestion.py: ***********************************