# Descriptions to the CMag_Calibration_ML Folder


## Code

The directory contains Python and Matlab code for the project. Some Python libraries used in the project are listed below:

In [7]:
import sys
import pandas
import numpy as np
import keras
import tensorflow as tf
import sklearn

print("Python version:", sys.version)
print("pandas version:", pandas.__version__)
print("numpy version:", np.__version__)
print("keras version:", keras.__version__)
print("tensorflow version:", tf.__version__)
print("sklearn version:", sklearn.__version__)


Python version: 3.6.7 |Anaconda, Inc.| (default, Oct 23 2018, 14:01:38) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
pandas version: 0.23.4
numpy version: 1.15.4
keras version: 2.2.4
tensorflow version: 1.12.0
sklearn version: 0.20.1


### baseline_model 
It contains a notebook to generate a training dataset of sensor grid data to retrain the baseline model, and Matlab files for running the baseline model.

- magnetic_model: Matlab folders of baseline model 

    - CalibratedSystem_CardioMag_orginal_files
    
    Original Matlab files provided by the MSRL, ETH.
    
    - CalibrateSystem_CardioMag_retrained_w_SensorGrid
    
    Matlab file `recalibrated_X_test.m` is used to make predictions of the testing dataset `X_test.mat`, and save predictions as `recalibrated_y_pred.mat`, using the recalibrated linear baseline model.
    
    Matlab file `metrolab.m` is used to make predictions of the metrolab dataset `metrolab_unified_units.csv`, and save predictions as `metrolab_y_pred.mat`, using the recalibrated linear baseline model.
    
- `Generating_training_set_for_baseline_linear_model.ipynb`: Python notebook to generate a training dataset from `train.npy` for the baseline model by selecting samples with current vectors in range of [-5, 5] A.


### functions (Python)
- `file_handler.py`: contains methods to load raw sensor grid data (numpy file) and metrolab data (csv file). Used in `01_Data_Visulisation.ipynb`, `02_Dataset_Preparation.ipynb`.
    
- `functions_plot.py`: contains a method to plot 3D coordinates of data. Used in `01_Data_Visulisation.ipynb`, `02_Dataset_Preparation.ipynb`.
    
- `functions.py`: contains methods to
    1. load training and testing data
    2. perform feature scaling based on training dataset
    3. save object
    4. load object
        
Used in `03_Train_RandomForest_GridSearch.py`, `04_Train_ANN_keras.ipynb`, `05_PerformanceEvaluation_General.ipynb`, `06_PerformanceEvaluation_SystemSpecific.ipynb`, `07_Error_Location_Distribution.ipynb`, `08_Small_Training_Set.ipynb`.


### Metrolab_Testing
- `01_metrolab.ipynb`: preprocess metrolab data. Including to remove samples with nan magnetic field measurements (`metrolab_remove_nan.csv`); convert all measurements in metrolab dataset to SI units (`metrolab_unified_units.csv`).
- `02_Testing_w_Metrolab.ipynb`: test models trained with sensor grid dataset (baseline, RF and ANN) with metrolab dataset, and compare results using R2 and RMSE metrics.


### 01_Data_Visualization.ipynb
visualize sensor grid dataset data, perform statistical analysis of the dataset and feature correlation heatmap.

### 02_Dataset_Preparation.ipynb
split training and testing data with a 9:1 ratio, and check the location and current distribution in two datasets. All data in the machine learning models were saved with SI units, ie. locations are in metres, currents are in amps and field are in Tesla.

### 03x_Train_Random_Forest.ipynb
Trains a RF regression model. We also look at the effect of the training set size and some hyperparameters of the RF on prediction performance

### 03_Train_RandomForest_GridSearch.py
to train a RF regression model, and use grid search to select hyperparameter. Model is saved as `Models/RF/GridSearch_RFmulti.pkl` and prediction outputs of testing dataset are saved as `Models/RF/GridSearch_RF_predictions.npy`. 

### 04_Train_ANN_keras.ipynb
to train an ANN model, and structure is defined in "build model structure" cell. Outputs are saved in the `Models/ANN` folder. To use the trained model for prediction, please see cells below 'Testing'.

### 05_Saturated_MPEM.ipynb
This uses the saturation corrected MPEM  and output the predictions on the test set. Requires the MPEM package to be installed.

### 06_CNN.ipynb
Gets the predictions from deep-fluids. The regular (CNN) and divergence-free (CNN-DF) networks are tested. The script outputs the predictions in a format that can be used in the performance evaluation and also makes plots specific to these methods.

### 07_PerformanceEvaluation_General.ipynb
Compare the linear baseline model, RF and ANN model on R2 and RMSE metrics.

### 08_PerformanceEvaluation_SystemSpecific.ipynb
To plot R2 and RMSE metrics stratified on currents.

### 09_Error_Location_Distribution.ipynb
To plot prediction error spatial distribution of linear baseline model, RF and ANN models. 

### 10_Small_Training_Set.ipynb
To plot testing results of RF and ANN models when trained with small training datasets. 


## Data

`cmag_data` raw data of sensor grid and metrolab dataset.

`train.npy` and `test.npy` are training and testing data of the sensor grid dataset with a ratio of 9:1 produced using `Code/02_Dataset_Preparation.ipynb` notebook.

## Figures
Results of figures produced by `Code/06_PerformanceEvaluation_System Specific.ipynb`, `Code/07_Error_Location_Distribution.ipynb` and `Code/08_Small_Training_Set.ipynb`.

## Models

- ANN: includes trained ANN model `model.hdf5`, predictions made on the sensor grid testing dataset `predictions_ANN.npy`, training log of the model `trainHistoryDict.pickle` and a training csv file `training.csv`.

- RF: Predictions made on the sensor grid testing dataset `GridSearch_RF_predictions.npy`. We do not save the RF model since it takes several GB of space.

- S-MPEM: The linear MPEM model is included as a baseline. The saturation parameters are all hardcoded in 05_Saturated_MPEM.ipynb

- Small_training_set: includes models (ANN) and predictions on the original testing dataset of the small training dataset experiment. 
