<img align="left" height="100" width="300" src="./src/readme/logo_white.svg">

# How to generate predictions with Modulos AutoML

This jupyter notebook describes how to generate predictions with the Modulos AutoML solution. It introduces the online and the batch clients. 

**⚠️ Please make sure to install the necessary requirements first. For more information please have look at the README/Installation section.**

# Online Client

In [1]:
import online_client as oc

The online client allows you to perform predictions on single sample data. The online client can also be used for multiple samples or a stream of data by doing sample wise predictions. Instead of providing the data via a .tar file, you will now have to provide the data in the form of a python dictionary.

**Recommendation:** We recommend using the online client when you have a stream of input data.

### Example data point (replace with your data)

This is how the input dictionary should look like for one example data point. You can replace this dictionary with your own data.

In [2]:
sample_dict = {
    'gi': 1.2780000000000022,
    'gk': 4.847000000000001,
    'gr': 0.7530000000000001,
    'gw1': 6.8450000000000015,
    'gw2': 7.9849999999999985,
    'gz': 1.7840000000000025,
    'ij': 1.6529999999999987,
    'ik': 3.5689999999999986,
    'iw1': 5.5669999999999975,
    'iw2': 6.706999999999997,
    'iz': 0.5060000000000002,
    'jw1': 3.914,
    'jw2': 5.053999999999999,
    'kw1': 1.998,
    'kw2': 3.137999999999998,
    'ri': 0.5250000000000021,
    'rw1': 6.0920000000000005,
    'rw2': 7.231999999999997,
    'rz': 1.0310000000000024,
    'sample_ids_generated': '210',
    'ug': 1.4840000000000049,
    'ui': 2.7620000000000084,
    'uj': 4.415000000000006,
    'uk': 6.331000000000008,
    'ur': 2.237000000000005,
    'uw1': 8.329000000000006,
    'uw2': 9.469000000000005,
    'uz': 3.268000000000008,
    'w1w2': 1.1399999999999988,
    'zj': 1.1469999999999985,
    'zk': 3.062999999999999,
    'zw1': 5.060999999999998,
    'zw2': 6.200999999999998
}

### Run online prediction script

The online client takes a single sample python dictionary as input and outputs predictions in the form of a dictionary. It does so by performing the following steps:
* Creating a temporary directory to save and store intermediate calculations.
* Saving the input python dictionary as HDF5 version of the dataset (this step will likely be removed in a future version).
* Running the feature extractor on the data.
* Running the model to get predictions.
* Transforming the predictions into a dictionary.

In [3]:
oc.main(sample_dict=sample_dict, tmp_dir=oc.TMP_DIR)

# Batch Client

In [4]:
import batch_client as bc 
import os
import shutil

The batch client allows you to get predictions for a large number of samples without changing the interface of how you provide the data to the solution. This means, you package your data the same way (in terms of structure) as you did for the data upload. Then, you are ready to go.

**Recommendation:** We recommend using the batch client if you would like to quickly generate predictions for a full dataset or if you would like to ensure that the solution works correctly. 

### Set path variables

* **path_to_tar:** Path to the tar file. The tar file has to contain the same data and has to be packed in the same way as the dataset that was uploaded to the platform. The example dataset provided here was split of the validation set of the original, uploaded dataset. Please replace it with your own dataset to generate new predictions.
* **path_to_tmp:** Path to temporary data folder.

In [5]:
path_to_tar = "src/sample_dataset/generated_dataset.tar"
path_to_tmp = os.path.join(bc.DEFAULT_OUTPUT_DIR,"tmp_data_dir")

### Run the batch client 

Remove temporary files from previous run:

In [6]:
if os.path.exists(path_to_tmp):
    shutil.rmtree(path_to_tmp)

The batch client takes a `.tar` file as input and outputs predictions in same format as training labels have been. It does so by performing the following steps:
* Creating a temporary directory to save and store intermediate calculations.
* Converting the `.tar` data set into an internal format (HDF5 file). Saved in `path_to_hdf5_data`
* Running the feature extractor on the data.
* Running the model to get predictions.
* Saving the predictions into the same format as training labels have been when training on the platform.

In [7]:
bc.main(dataset_path=path_to_tar, output_dir_user="", verbose=True,
        keep_tmp=True)

### Look at the predictions

In [8]:
from IPython.display import HTML
from modulos_utils.solution_utils import jupyter_utils as ju

In [9]:
displayer = ju.JupyterDisplayer.construct(base_dir=bc.FILE_DIR)
HTML(displayer.show())

### Clean Up

Delete the entire output_batch_client folder including all predictions.

In [10]:
# import shutil
# shutil.rmtree(bc.DEFAULT_OUTPUT_DIR)

© Modulos AG 2019-2022. All rights reserved.