# Cleaning Up ML Model Directories

As you explore several experimental campaigns, you will end up with several trained models. If you are working locally, you might start to run out of memory. You can use the `model_utilities` to delete all models from the hard drive except the best ones in terms of training, testing, and validation performance.

In [1]:
# Prototype
import sys
sys.path.append("..")

In [10]:
import pandas as pd

import nucml.model.utilities as model_utils

## Reading in ML Results

For the utilities to work, the DataFrame must contain the following features: 

- train_mae
- test_mae
- val_mae
- model_path
- scaler_path

If even one of these is missing, the scripts will not work.

In [5]:
knn_results = pd.read_csv("1_KNN/knn_results_B0.csv")

In [7]:
knn_results.head()

Unnamed: 0,id,distance_metric,mt_strategy,normalizer,train_mae,train_mse,train_evs,train_mae_m,train_r2,val_mae,...,val_r2,test_mae,test_mse,test_evs,test_mae_m,test_r2,model_path,training_time,scaler_path,run_name
0,13,euclidean,one_hot,standard,0.02588,0.010936,0.985814,0.0,0.985814,0.118292,...,0.915686,0.118816,0.063698,0.914577,0.042778,0.914573,E:\ML_Models_EXFOR\KNN_B0\k13_distance_euclide...,3671.971,E:\ML_Models_EXFOR\KNN_B0\k13_distance_euclide...,k13_distance_euclidean_standard_one_hot_B0_v1
1,9,euclidean,one_hot,standard,0.025931,0.010999,0.985734,0.0,0.985734,0.118136,...,0.917756,0.118899,0.064408,0.913698,0.041733,0.913698,E:\ML_Models_EXFOR\KNN_B0\k9_distance_euclidea...,4992.854609,E:\ML_Models_EXFOR\KNN_B0\k9_distance_euclidea...,k9_distance_euclidean_standard_one_hot_B0_v1
2,1,euclidean,one_hot,standard,0.02906,0.021444,0.972555,0.0,0.972555,0.140604,...,0.885059,0.140549,0.091576,0.882881,0.047418,0.88288,E:\ML_Models_EXFOR\KNN_B0\k1_distance_euclidea...,6258.326797,E:\ML_Models_EXFOR\KNN_B0\k1_distance_euclidea...,k1_distance_euclidean_standard_one_hot_B0_v1
3,5,euclidean,one_hot,standard,0.026062,0.011286,0.985346,0.0,0.985346,0.119661,...,0.9119,0.119958,0.067081,0.911042,0.041299,0.91104,E:\ML_Models_EXFOR\KNN_B0\k5_distance_euclidea...,6381.527638,E:\ML_Models_EXFOR\KNN_B0\k5_distance_euclidea...,k5_distance_euclidean_standard_one_hot_B0_v1
4,17,euclidean,one_hot,standard,0.025844,0.010874,0.985899,0.0,0.985899,0.119776,...,0.912522,0.11974,0.064122,0.913774,0.043261,0.913773,E:\ML_Models_EXFOR\KNN_B0\k17_distance_euclide...,6478.077649,E:\ML_Models_EXFOR\KNN_B0\k17_distance_euclide...,k17_distance_euclidean_standard_one_hot_B0_v1


## Filtering the Best Models

The `get_best_models_df()` utility filters out the best models in terms of performance on the train, validation, and testing set. 

In [11]:
best_models = model_utils.get_best_models_df(knn_results)

In [12]:
best_models

Unnamed: 0,id,distance_metric,mt_strategy,normalizer,train_mae,train_mse,train_evs,train_mae_m,train_r2,val_mae,...,test_mae,test_mse,test_evs,test_mae_m,test_r2,model_path,training_time,scaler_path,run_name,tag
75,20,euclidean,one_hot,quantilenormal,0.025789,0.010846,0.985945,0.0,0.985945,0.12077,...,0.120936,0.064027,0.913956,0.04392,0.913956,E:\ML_Models_EXFOR\KNN_B0\k20_distance_euclide...,4424.581484,E:\ML_Models_EXFOR\KNN_B0\k20_distance_euclide...,k20_distance_euclidean_quantilenormal_one_hot_...,Train
78,20,manhattan,one_hot,quantilenormal,0.025789,0.010846,0.985945,0.0,0.985945,0.119448,...,0.119632,0.062456,0.91616,0.043589,0.91616,E:\ML_Models_EXFOR\KNN_B0\k20_distance_manhatt...,3465.984211,E:\ML_Models_EXFOR\KNN_B0\k20_distance_manhatt...,k20_distance_manhattan_quantilenormal_one_hot_...,Train
248,13,manhattan,one_hot,minmax,0.02585,0.010894,0.98587,0.0,0.98587,0.115603,...,0.115871,0.059485,0.921356,0.041835,0.921356,E:\ML_Models_EXFOR\KNN_B0\k13_distance_manhatt...,49135.074445,E:\ML_Models_EXFOR\KNN_B0\k13_distance_manhatt...,k13_distance_manhattan_minmax_one_hot_B0_v2,Val
276,8,manhattan,one_hot,minmax,0.025926,0.011033,0.985687,0.0,0.985687,0.116317,...,0.115661,0.058934,0.921851,0.04082,0.921849,E:\ML_Models_EXFOR\KNN_B0\k8_distance_manhatta...,45342.842937,E:\ML_Models_EXFOR\KNN_B0\k8_distance_manhatta...,k8_distance_manhattan_minmax_one_hot_B0_v2,Test


## Deleting Unnecessary Models

The filtered models will be safe from deletion. Any other model in the model results will be deleted.

**NOTE: This action is unreversible.**

In [17]:
print("The following function will delete {} models/scaler pairs".format(len(knn_results) - len(best_models)))

The following function will delete 276 models/scaler pairs


In [12]:
# model_utils.cleanup_model_dir(knn_results, "E:/ML_Models_EXFOR/DT_B0/")