**MSc Computational Physics AUTh**<br>
**Academic Year: 2024-2025**<br>
**Master's Thesis**<br>

**Thesis Title:**<br>  
# **"Reconstruction of the EoSs of Exotic Stars using ML and ANNs regression models"**

**Implemented by: Ioannis Stergakis**<br>
**AEM: 4439**<br>

**Jupyter Notebook: JN4a**<br>
**Name: "train_test_dtree_regress.ipynb"**<br>

**Description:**<br> 
**Training and testing the `DecisionTreeRegressor` algorithm:**<br>
**1. Performing grid search to determine the best hyperparameters**<br>
**2. Performing cross validation to optimize the model for future foreign data**<br>
**3. Assessing the accuracy of the best model using different scorers and metrics**


**Abbrevations:**<br>
**1. NS -> Neutron Star**<br>
**2. QS -> Quark Star**<br>
**3. ML -> Machine Learning**

In [1]:
# Importing useful modules
import joblib
from data_analysis_ES_ML import *

In [None]:
# Defining the grid of hyperparameters values for the 'DecisionTree' regressor
dtree_grid = {
    'max_depth': [None, 5, 10, 20],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 5],
    'max_features': [None, 'sqrt', 'log2'],
    'criterion': ['squared_error', 'friedman_mse']
}

# 1. Neutron Stars

## **1.1 Using 8 M-R points**

### A. Predicting Energy on center $E_c$ Values

#### ->Using rowwise-shuffled data (linear)

In [4]:
# Showing the datasets
# regression_ML(filename="linNS_reg_data_pp8mr8s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).show_datasets()

In [5]:
# Building a regression model
regression_ML("linNS_reg_data_pp8mr8s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).train_test("dtree",5,dtree_grid,"msle",cores_par=18,filesave="linNS_dtree_grid_enrg_16X_rwsh")

TRAINING AND ASSESSING A MACHINE LEARNING REGRESSION MODEL


>Preliminaries
>> DATA INFO AND SCALING:
-------------------------------------------------------------------------------------------------------------------
Y (response) data type: "enrg"
Number of Y columns:  12
X (explanatory) data type: "Mass" and "Radius"
Number of X columns:  16
The scaling of the X (explanatory) data has been completed
-------------------------------------------------------------------------------------------------------------------
>> CROSS-VALIDATION SETTINGS:
-------------------------------------------------------------------------------------------------------------------
The KFold cross-validator has been initialized with 5 n_splits
The cross-validation scorer has been initialized with the "Mean_Squared_Log_Error" as metric
-------------------------------------------------------------------------------------------------------------------
>> ESTIMATOR INFO:
------------------------------------------

{'max_depth': [None, 5, 10, 20],
 'min_samples_split': [2, 5, 10],
 'min_samples_leaf': [1, 2, 5],
 'max_features': [None, 'sqrt', 'log2'],
 'criterion': ['squared_error', 'friedman_mse']}

-------------------------------------------------------------------------------------------------------------------
>> FITTING PROCEDURE OVERVIEW:
-------------------------------------------------------------------------------------------------------------------
The grid search has been initialized


Ongoing fitting process...
Fitting 5 folds for each of 216 candidates, totalling 1080 fits


The fitting process has been completed
Elapsed fitting time: 0.0'18.76"
Available CPU cores: 18
-------------------------------------------------------------------------------------------------------------------
>> RESULTS:
-------------------------------------------------------------------------------------------------------------------
Best model:   DecisionTreeRegressor(criterion='friedman_mse', max_depth=20,
                      min_samples_leaf=5)
Best parameters:   {'criterion': 'friedman_mse', 'max_depth': 20, 'max_features': None, 'min_samples_leaf': 5, 'min_samples_split': 2}
Best cross-validation score (msle):   0.013328019432616847



>Overfitting metrics (using the train dataset as test dataset)
>> PREDICTIONS AND REAL VALUES:
-------------------------------------------------------------------------------------------------------------------
Predictions of "enrg"


array([[ 256.85589842,  350.1849122 ,  446.29446662, ..., 1328.83392699,
        1452.21209613, 1572.51598943],
       [ 275.28291961,  377.76502824,  485.01755628, ..., 1359.69844189,
        1470.65019716, 1578.99532156],
       [ 274.54533321,  357.73467588,  443.72133807, ..., 1374.282678  ,
        1494.6445647 , 1610.92359564],
       ...,
       [ 309.90083648,  418.2458448 ,  566.24805208, ..., 1601.66962714,
        1733.22672701, 1860.75137517],
       [ 264.72569444,  355.95066401,  489.40730082, ..., 1571.64656026,
        1686.14416692, 1795.66940371],
       [ 277.88777678,  394.51480727,  493.96609271, ..., 1653.02467615,
        1822.135442  , 1985.84617967]])

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(25),E_c(50),E_c(75),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800)
0,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
1,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
2,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
3,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
4,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
...,...,...,...,...,...,...,...,...,...,...,...,...
24195,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665
24196,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665
24197,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665
24198,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.01140553 0.00898726 0.00741785 0.00607863 0.00549459 0.00307573
 0.00240078 0.0024799  0.00325652 0.00438866 0.00564157 0.00689119]
Uniform average
0.005626515886352498
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[ 1354.51164995  2143.37531418  2701.24121236  2986.30437472
  3202.19394158  2857.20704431  2932.40772425  3995.39806356
  6997.03519908 12237.22436678 19887.77293603 30060.04958984]
Uniform average
7612.893451386112



>Prediction metrics (using the actual test

array([[ 227.94816475,  304.08112846,  399.9912266 , ..., 1032.08131278,
        1132.08131278, 1232.08131278],
       [ 213.98199117,  277.98557267,  351.60769939, ..., 1025.71113017,
        1125.71113017, 1225.71113017],
       [ 221.82847236,  298.75727594,  396.611166  , ..., 1102.07706813,
        1202.09704212, 1302.09704212],
       ...,
       [ 270.63313713,  390.29686685,  485.20852959, ..., 1154.56127279,
        1254.56127279, 1354.56127279],
       [ 271.24162377,  416.73799916,  517.65693776, ..., 1309.43925457,
        1412.05362053, 1512.24306333],
       [ 300.28470119,  392.14470967,  509.49941399, ..., 1394.93183774,
        1509.17711705, 1621.65390449]])

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(25),E_c(50),E_c(75),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800)
24200,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24201,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24202,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24203,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24204,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
...,...,...,...,...,...,...,...,...,...,...,...,...
30295,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602
30296,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602
30297,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602
30298,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.01522516 0.01319349 0.01144128 0.01024228 0.01054358 0.00811935
 0.00892608 0.01060817 0.01316264 0.01657261 0.02011225 0.02351084]
Uniform average
0.01347147770625754
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[ 2053.44821839  3225.72783051  3877.05486335  4416.6675227
  5442.68920798  6602.030442    9770.69041845 15436.20923503
 24307.49517682 37357.40563162 54789.86169045 76841.76880676]
Uniform average
20343.420753671653



>Saving the grid search info:
The grid sear

## **1.2 Using 16 M-R points**

### A. Predicting Energy on center $E_c$ Values

#### ->Using rowwise-shuffled data (linear)

In [7]:
# Showing the datasets
# regression_ML(filename="linNS_reg_data_pp8mr16s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).show_datasets()

In [8]:
# Building a regression model
regression_ML("linNS_reg_data_pp8mr16s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).train_test("dtree",5,dtree_grid,"msle",cores_par=18,filesave="linNS_dtree_grid_enrg_32X_rwsh")

TRAINING AND ASSESSING A MACHINE LEARNING REGRESSION MODEL


>Preliminaries
>> DATA INFO AND SCALING:
-------------------------------------------------------------------------------------------------------------------
Y (response) data type: "enrg"
Number of Y columns:  12
X (explanatory) data type: "Mass" and "Radius"
Number of X columns:  32
The scaling of the X (explanatory) data has been completed
-------------------------------------------------------------------------------------------------------------------
>> CROSS-VALIDATION SETTINGS:
-------------------------------------------------------------------------------------------------------------------
The KFold cross-validator has been initialized with 5 n_splits
The cross-validation scorer has been initialized with the "Mean_Squared_Log_Error" as metric
-------------------------------------------------------------------------------------------------------------------
>> ESTIMATOR INFO:
------------------------------------------

{'max_depth': [None, 5, 10, 20],
 'min_samples_split': [2, 5, 10],
 'min_samples_leaf': [1, 2, 5],
 'max_features': [None, 'sqrt', 'log2'],
 'criterion': ['squared_error', 'friedman_mse']}

-------------------------------------------------------------------------------------------------------------------
>> FITTING PROCEDURE OVERVIEW:
-------------------------------------------------------------------------------------------------------------------
The grid search has been initialized


Ongoing fitting process...
Fitting 5 folds for each of 216 candidates, totalling 1080 fits


The fitting process has been completed
Elapsed fitting time: 0.0'29.41"
Available CPU cores: 18
-------------------------------------------------------------------------------------------------------------------
>> RESULTS:
-------------------------------------------------------------------------------------------------------------------
Best model:   DecisionTreeRegressor(criterion='friedman_mse', max_depth=20,
                      min_samples_leaf=5)
Best parameters:   {'criterion': 'friedman_mse', 'max_depth': 20, 'max_features': None, 'min_samples_leaf': 5, 'min_samples_split': 2}
Best cross-validation score (msle):   0.015870389934795164



>Overfitting metrics (using the train dataset as test dataset)
>> PREDICTIONS AND REAL VALUES:
-------------------------------------------------------------------------------------------------------------------
Predictions of "enrg"


array([[ 289.52908574,  389.05831958,  496.80613436, ..., 1394.84255082,
        1509.33949902, 1621.73993702],
       [ 281.07275592,  368.55903178,  453.32154937, ..., 1416.03703728,
        1560.50187999, 1702.85569646],
       [ 254.69369779,  350.66560857,  448.30178428, ..., 1264.35846644,
        1374.87493081, 1484.09049109],
       ...,
       [ 254.9849055 ,  363.96227701,  471.71410595, ..., 1590.51049682,
        1708.55472707, 1820.39059147],
       [ 296.06700862,  442.02157192,  567.36966412, ..., 1389.52689739,
        1492.91258313, 1594.68882405],
       [ 283.68666549,  380.29123997,  512.77891299, ..., 1605.5174975 ,
        1739.29672809, 1867.65202148]])

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(25),E_c(50),E_c(75),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800)
0,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
1,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
2,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
3,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
4,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
...,...,...,...,...,...,...,...,...,...,...,...,...
24195,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665
24196,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665
24197,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665
24198,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.01145026 0.00922916 0.00845227 0.00738294 0.00689557 0.00388165
 0.00294657 0.00286719 0.00362442 0.00481777 0.00618541 0.00757686]
Uniform average
0.006275838447599478
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[ 1360.83479177  2164.34674061  2970.18808072  3473.24056652
  3847.75921664  3496.45184391  3569.49935389  4672.20894757
  7967.09380941 13810.47954285 22429.65134229 33959.40189303]
Uniform average
8643.429677434273



>Prediction metrics (using the actual test

array([[ 220.25917612,  286.45718721,  355.97473522, ..., 1004.95936737,
        1104.95936737, 1204.95936737],
       [ 212.02037087,  275.33819313,  350.24300069, ..., 1020.36976935,
        1120.36976935, 1220.36976935],
       [ 226.73252311,  296.22822176,  367.77286272, ..., 1048.42682018,
        1148.42682018, 1248.42682018],
       ...,
       [ 276.30989389,  396.24552259,  484.87073784, ..., 1473.014241  ,
        1667.32663249, 1865.27967629],
       [ 257.03606073,  361.51288913,  464.99486326, ..., 1284.36447797,
        1387.03581612, 1487.42498981],
       [ 267.17549875,  375.84207917,  480.3524385 , ..., 1280.69926195,
        1390.6865225 , 1499.46668344]])

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(25),E_c(50),E_c(75),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800)
24200,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24201,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24202,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24203,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24204,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
...,...,...,...,...,...,...,...,...,...,...,...,...
30295,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602
30296,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602
30297,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602
30298,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.01432591 0.01353561 0.01306437 0.01258837 0.01356375 0.01189072
 0.01329085 0.0152526  0.0179151  0.02130668 0.02473227 0.02796591]
Uniform average
0.016619346005581303
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[ 1911.99470953  3288.39384712  4255.25671125  5120.94345815
  6625.23960468  8921.30968017 13538.0018896  20818.26339421
 31314.22306353 45921.48830358 64769.8523293  88064.20953556]
Uniform average
24545.764710557763



>Saving the grid search info:
The grid se

# 2. Quark Stars

## **2.1 Using 8 M-R points**

### A. Predicting Energy on center $E_c$ Values

#### ->Using rowwise-shuffled data

In [5]:
# Showing the datasets
# regression_ML(filename="QS_reg_data_pp8mr8s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).show_datasets()

In [13]:
# Building a regression model
regression_ML("QS_reg_data_pp8mr8s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).train_test("dtree",5,dtree_grid,"msle",cores_par=18,filesave="QS_dtree_grid_enrg_16X_rwsh")

TRAINING AND ASSESSING A MACHINE LEARNING REGRESSION MODEL


>Preliminaries
>> DATA INFO AND SCALING:
-------------------------------------------------------------------------------------------------------------------
Y (response) data type: "enrg"
Number of Y columns:  12
X (explanatory) data type: "Mass" and "Radius"
Number of X columns:  16
The scaling of the X (explanatory) data has been completed
-------------------------------------------------------------------------------------------------------------------
>> CROSS-VALIDATION SETTINGS:
-------------------------------------------------------------------------------------------------------------------
The KFold cross-validator has been initialized with 5 n_splits
The cross-validation scorer has been initialized with the "Mean_Squared_Log_Error" as metric
-------------------------------------------------------------------------------------------------------------------
>> ESTIMATOR INFO:
------------------------------------------

{'max_depth': [None, 5, 10, 20],
 'min_samples_split': [2, 5, 10],
 'min_samples_leaf': [1, 2, 5],
 'max_features': [None, 'sqrt', 'log2'],
 'criterion': ['squared_error', 'friedman_mse']}

-------------------------------------------------------------------------------------------------------------------
>> FITTING PROCEDURE OVERVIEW:
-------------------------------------------------------------------------------------------------------------------
The grid search has been initialized


Ongoing fitting process...
Fitting 5 folds for each of 216 candidates, totalling 1080 fits


The fitting process has been completed
Elapsed fitting time: 1.0'10.91"
Available CPU cores: 18
-------------------------------------------------------------------------------------------------------------------
>> RESULTS:
-------------------------------------------------------------------------------------------------------------------
Best model:   DecisionTreeRegressor(criterion='friedman_mse', max_depth=20,
                      min_samples_leaf=5, min_samples_split=5)
Best parameters:   {'criterion': 'friedman_mse', 'max_depth': 20, 'max_features': None, 'min_samples_leaf': 5, 'min_samples_split': 5}
Best cross-validation score (msle):   0.006572604494729249



>Overfitting metrics (using the train dataset as test dataset)
>> PREDICTIONS AND REAL VALUES:
-------------------------------------------------------------------------------------------------------------------
Predictions of "enrg"


array([[ 515.23169348,  771.89588142, 1059.10083198, ..., 3094.98825699,
        3387.73466627, 3680.76934413],
       [ 537.71202647,  793.59808291, 1079.69164035, ..., 3107.27851248,
        3398.92587294, 3690.88681087],
       [ 616.61395182,  838.73257105, 1091.4783667 , ..., 2945.29797518,
        3216.91343129, 3489.59729951],
       ...,
       [ 576.12925847,  834.81608496, 1123.64233689, ..., 3165.39965738,
        3458.68405537, 3752.22133284],
       [ 594.26077129,  801.52128276, 1040.67123483, ..., 2831.02951942,
        3095.72881382, 3361.8010048 ],
       [ 456.53020053,  696.91276728,  968.71874199, ..., 2928.38783898,
        3212.54611773, 3497.33672557]])

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800),E_c(900),E_c(1000),E_c(1100)
0,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.00000,3168.000000,3468.000000,3768.00000
1,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.00000,3168.000000,3468.000000,3768.00000
2,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.00000,3168.000000,3468.000000,3768.00000
3,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.00000,3168.000000,3468.000000,3768.00000
4,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.00000,3168.000000,3468.000000,3768.00000
...,...,...,...,...,...,...,...,...,...,...,...,...
71195,656.904809,847.712595,1069.496358,1298.722089,1533.599451,1772.958204,2015.983628,2262.081497,2510.80261,2761.797602,3014.788388,3269.54933
71196,656.904809,847.712595,1069.496358,1298.722089,1533.599451,1772.958204,2015.983628,2262.081497,2510.80261,2761.797602,3014.788388,3269.54933
71197,656.904809,847.712595,1069.496358,1298.722089,1533.599451,1772.958204,2015.983628,2262.081497,2510.80261,2761.797602,3014.788388,3269.54933
71198,656.904809,847.712595,1069.496358,1298.722089,1533.599451,1772.958204,2015.983628,2262.081497,2510.80261,2761.797602,3014.788388,3269.54933


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00892692 0.00244339 0.00154209 0.00156018 0.00168961 0.00179638
 0.0018643  0.00189944 0.00191045 0.00190453 0.00188703 0.00186177]
Uniform average
0.0024405080091033627
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[ 1803.90627056  1225.6051918   1391.31351278  2177.16818498
  3435.4553572   5069.12847891  7010.45199148  9209.93195335
 11630.09617033 14241.76094648 17021.66853731 19950.92626184]
Uniform average
7847.284404751724



>Prediction metrics (using the actual tes

array([[ 562.54545697,  752.58587568,  974.36035632, ..., 2671.65197099,
        2925.48298991, 3181.08077279],
       [ 499.69195937,  677.38673142,  887.04468406, ..., 2521.96913493,
        2768.6609533 , 3017.39475484],
       [ 599.15602165,  781.51010998,  994.68045266, ..., 2639.16860834,
        2886.44016874, 3135.67056296],
       ...,
       [ 413.16498746,  574.03516668,  766.7253085 , ..., 2310.94225546,
        2547.05941985, 2785.60462933],
       [ 407.97027194,  581.79554664,  788.62040034, ..., 2415.31681962,
        2661.40671066, 2909.59883329],
       [ 409.93692065,  602.05647254,  826.19553752, ..., 2538.18883681,
        2793.85099268, 3051.22636824]])

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800),E_c(900),E_c(1000),E_c(1100)
71200,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.934630,2786.462071,3036.011170
71201,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.934630,2786.462071,3036.011170
71202,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.934630,2786.462071,3036.011170
71203,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.934630,2786.462071,3036.011170
71204,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.934630,2786.462071,3036.011170
...,...,...,...,...,...,...,...,...,...,...,...,...
89095,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.230470,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462
89096,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.230470,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462
89097,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.230470,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462
89098,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.230470,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.0142241  0.00437586 0.00430134 0.00513855 0.00576667 0.00613707
 0.00631991 0.00637857 0.00635735 0.00628553 0.00618226 0.0060601 ]
Uniform average
0.006460609199078299
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[ 3428.77225274  2565.47109603  3887.1455185   6954.19123428
 11347.51982178 16792.51710286 23097.98011878 30124.49339096
 37766.69444971 45942.65203941 54587.1578438  63647.3026551 ]
Uniform average
25011.824793663356



>Saving the grid search info:
The grid se

## **2.2 Using 16 M-R points**

### A. Predicting Energy on center $E_c$ Values

#### ->Using rowwise-shuffled data

In [15]:
# Showing the datasets
# regression_ML(filename="QS_reg_data_pp8mr16s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).show_datasets()

In [16]:
# Building a regression model
regression_ML("QS_reg_data_pp8mr16s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).train_test("dtree",5,dtree_grid,"msle",cores_par=18,filesave="QS_dtree_grid_enrg_32X_rwsh")

TRAINING AND ASSESSING A MACHINE LEARNING REGRESSION MODEL


>Preliminaries
>> DATA INFO AND SCALING:
-------------------------------------------------------------------------------------------------------------------
Y (response) data type: "enrg"
Number of Y columns:  12
X (explanatory) data type: "Mass" and "Radius"
Number of X columns:  32
The scaling of the X (explanatory) data has been completed
-------------------------------------------------------------------------------------------------------------------
>> CROSS-VALIDATION SETTINGS:
-------------------------------------------------------------------------------------------------------------------
The KFold cross-validator has been initialized with 5 n_splits
The cross-validation scorer has been initialized with the "Mean_Squared_Log_Error" as metric
-------------------------------------------------------------------------------------------------------------------
>> ESTIMATOR INFO:
------------------------------------------

{'max_depth': [None, 5, 10, 20],
 'min_samples_split': [2, 5, 10],
 'min_samples_leaf': [1, 2, 5],
 'max_features': [None, 'sqrt', 'log2'],
 'criterion': ['squared_error', 'friedman_mse']}

-------------------------------------------------------------------------------------------------------------------
>> FITTING PROCEDURE OVERVIEW:
-------------------------------------------------------------------------------------------------------------------
The grid search has been initialized


Ongoing fitting process...
Fitting 5 folds for each of 216 candidates, totalling 1080 fits


The fitting process has been completed
Elapsed fitting time: 1.0'54.10"
Available CPU cores: 18
-------------------------------------------------------------------------------------------------------------------
>> RESULTS:
-------------------------------------------------------------------------------------------------------------------
Best model:   DecisionTreeRegressor(criterion='friedman_mse', max_depth=20,
                      max_features='log2', min_samples_leaf=5)
Best parameters:   {'criterion': 'friedman_mse', 'max_depth': 20, 'max_features': 'log2', 'min_samples_leaf': 5, 'min_samples_split': 2}
Best cross-validation score (msle):   0.007367574304733888



>Overfitting metrics (using the train dataset as test dataset)
>> PREDICTIONS AND REAL VALUES:
-------------------------------------------------------------------------------------------------------------------
Predictions of "enrg"


array([[ 521.33333333,  791.33333333, 1091.33333333, ..., 3191.33333333,
        3491.33333333, 3791.33333333],
       [ 585.28098197,  837.37559977, 1119.89214187, ..., 3130.29040298,
        3420.04638575, 3710.19800155],
       [ 482.10519881,  723.81202633,  996.80195346, ..., 2962.00565183,
        3246.77302357, 3532.14691908],
       ...,
       [ 561.38096185,  815.54251894, 1099.89927165, ..., 3118.35952033,
        3408.95767752, 3699.90986598],
       [ 616.72624715,  822.39216724, 1058.81571595, ..., 2826.90682948,
        3088.59607787, 3351.72247367],
       [ 578.80545025,  822.46271806, 1096.71072466, ..., 3064.30339953,
        3349.13394471, 3634.54628713]])

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800),E_c(900),E_c(1000),E_c(1100)
0,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.00000,3168.000000,3468.000000,3768.00000
1,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.00000,3168.000000,3468.000000,3768.00000
2,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.00000,3168.000000,3468.000000,3768.00000
3,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.00000,3168.000000,3468.000000,3768.00000
4,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.00000,3168.000000,3468.000000,3768.00000
...,...,...,...,...,...,...,...,...,...,...,...,...
71195,656.904809,847.712595,1069.496358,1298.722089,1533.599451,1772.958204,2015.983628,2262.081497,2510.80261,2761.797602,3014.788388,3269.54933
71196,656.904809,847.712595,1069.496358,1298.722089,1533.599451,1772.958204,2015.983628,2262.081497,2510.80261,2761.797602,3014.788388,3269.54933
71197,656.904809,847.712595,1069.496358,1298.722089,1533.599451,1772.958204,2015.983628,2262.081497,2510.80261,2761.797602,3014.788388,3269.54933
71198,656.904809,847.712595,1069.496358,1298.722089,1533.599451,1772.958204,2015.983628,2262.081497,2510.80261,2761.797602,3014.788388,3269.54933


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.0102943  0.00297331 0.00215449 0.0022935  0.00249855 0.00264372
 0.00272583 0.00276019 0.00276133 0.00274016 0.00270431 0.00265909]
Uniform average
0.0032673989152961834
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[ 2098.93274317  1469.96722139  1863.79600066  3091.9988733
  4953.88589407  7318.21095486 10093.65740558 13213.65779744
 16627.90615117 20297.2818194  24190.64607949 28282.73047841]
Uniform average
11125.222618245054



>Prediction metrics (using the actual tes

array([[ 496.83809931,  704.02446438,  943.48327395, ..., 2737.61503612,
        3002.8542633 , 3269.45676479],
       [ 398.1115471 ,  650.46003824,  934.70006749, ..., 2961.96486466,
        3254.03343626, 3546.44724861],
       [ 412.30259876,  638.43230704,  897.07884619, ..., 2793.83760708,
        3071.03373799, 3349.15537084],
       ...,
       [ 532.89664241,  736.10408618,  971.52767985, ..., 2743.47993358,
        3006.07730825, 3270.12622753],
       [ 514.74772494,  704.29432202,  926.10509879, ..., 2627.18796477,
        2881.65323009, 3137.88326934],
       [ 456.22745105,  665.35320695,  907.53108674, ..., 2721.044446  ,
        2988.79542036, 3257.84524414]])

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800),E_c(900),E_c(1000),E_c(1100)
71200,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.934630,2786.462071,3036.011170
71201,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.934630,2786.462071,3036.011170
71202,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.934630,2786.462071,3036.011170
71203,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.934630,2786.462071,3036.011170
71204,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.934630,2786.462071,3036.011170
...,...,...,...,...,...,...,...,...,...,...,...,...
89095,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.230470,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462
89096,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.230470,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462
89097,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.230470,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462
89098,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.230470,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.01438535 0.0050889  0.00528396 0.00621397 0.00685926 0.00721321
 0.00736466 0.00738579 0.00732522 0.00721441 0.00707357 0.00691571]
Uniform average
0.007360334684093872
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[ 3374.42490467  2727.00686649  4507.18846921  8192.44767797
 13322.4086764  19596.52175021 26805.97538869 34798.73278402
 43460.0119206  52700.62727908 62449.64136453 72649.5257396 ]
Uniform average
28715.376068456604



>Saving the grid search info:
The grid se