# Validation example

First we import the data, model and validation modules.

The `laborecommender.data` module contains the toolset for data preparation, the `laborecommender.model` module contains the toolset for the modelling itself and the `laborecommender.validation` module contains the toolset for validation, namely scoring functions, cross validation and grid search.

In [1]:
import sklearn.model_selection
import laborecommender.data
import laborecommender.model
import laborecommender.validation

We prepare the dataset using the sample data extracted from MIMIC-III. The dataset must be a list of tuples (or lists), where each nested tuple is a bag of laboratory tests requested at the same time. Then a train-test splitting is performed.

In [2]:
bags = laborecommender.data.get_bags_from_mimic()
train_bags, test_bags = sklearn.model_selection.train_test_split(bags,random_state=11)
test_bags_x, test_bags_y = laborecommender.data.make_supervised_dataset(test_bags)

For hyperparameter optimization we make a grid search over a parameter grid.

In [3]:
param_grid = {
    "metric":["jaccard","minkowski"],
    "k":[15,20,40,50,80,100]}
grid_search_results = laborecommender.validation.grid_search(
    param_grid,
    laborecommender.model.LaboRecommender(),
    train_bags,
    laborecommender.validation.mean_average_precision
)

These are the best hyperparameters and the best result extracted from the grid search.

In [4]:
grid_search_results["best_params"]

{'k': 20, 'metric': 'jaccard'}

In [5]:
grid_search_results["best_mean_result"]

0.941950738599854

We train a model using the best hyperparameters and measure the performance over the test subset.

In [6]:
lr = laborecommender.model.LaboRecommender(**grid_search_results["best_params"])
lr.fit(train_bags)
predicted = lr.predict(test_bags_x)
laborecommender.validation.mean_average_precision(test_bags_y,predicted)

0.9315320181309321