#### Tracking your datasets for the data science lab (examples)


#### Initializing your tracker (PyPads)
First you have to install pypads-padre which has pypads as a dependency

    pip install pypads-padre

In [1]:
from pypads.app.base import PyPads
from dotenv import load_dotenv
tracker = PyPads(uri="http://mlflow.padre-lab.eu", autostart=False)

In [None]:
- configure the env vars
- load the datasets and test if it works
- do random tests on the datasets with splitting

In [1]:
#### 3D MNIST example
tracker.start_track(experiment_name= "3D MNIST")
import h5py
import numpy as np
path = "data/3d-mnist/full_dataset_vectors.h5"

@tracker.decorators.dataset(name="3DMNIST", target_col=[-1])
def load_3d_mnist(path):
    """
    The aim of this dataset is to provide a simple way to get started with 3D computer vision problems such as 3D shape recognition.

    Accurate 3D point clouds can (easily and cheaply) be adquired nowdays from different sources:

     - RGB-D devices: Google Tango, Microsoft Kinect, etc.
     - Lidar.
     - 3D reconstruction from multiple images.

    However there is a lack of large 3D datasets (you can find a good one here based on triangular meshes); it's especially hard to find datasets based on point clouds (wich is the raw output from every 3D sensing device).

    This dataset contains 3D point clouds generated from the original images of the MNIST dataset to bring a familiar introduction to 3D to people used to work with 2D datasets (images).
    
    The full dataset is splitted into arrays:

    X_train (10000, 4096)
    y_train (10000)
    X_test(2000, 4096)
    y_test (2000)

    """
    with h5py.File(path, "r") as hf:    
        X_train, y_train = hf["X_train"][:], hf["y_train"][:]
        X_test, y_test = hf["X_test"][:], hf["y_test"][:]
        train_data = np.concatenate([X_train,y_train.reshape(len(y_train),1)], axis=1)
        test_data = np.concatenate([X_test,y_test.reshape(len(y_test),1)], axis=1)
        data = np.concatenate([train_data, test_data] , axis=0)
    return data

data = load_3d_mnist(path)

(array([[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]]),
 array([5, 5, 0, ..., 1, 2, 2]),
 array([[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]]),
 array([7, 7, 2, ..., 8, 9, 9]))