**MSc Computational Physics AUTh**<br>
**Academic Year: 2024-2025**<br>
**Master's Thesis**<br>

**Thesis Title:**<br>  
# **"Reconstruction of the EoSs of Exotic Stars using ML and ANNs regression models"**

**Implemented by: Ioannis Stergakis**<br>
**AEM: 4439**<br>

**Jupyter Notebook: JN4b**<br>
**Name: "train_test_rf_regress.ipynb"**<br>

**Description:**<br> 
**Training and testing the `RandomForestRegressor` algorithm:**<br>
**1. Performing grid search to determine the best hyperparameters**<br>
**2. Performing cross validation to optimize the model for future foreign data**<br>
**3. Assessing the accuracy of the best model using different scorers and metrics**


**Abbrevations:**<br>
**1. NS -> Neutron Star**<br>
**2. QS -> Quark Star**<br>
**3. ML -> Machine Learning**

In [1]:
# Importing useful modules
import joblib
from data_analysis_ES_ML import *

In [None]:
# Defining the grid of hyperparameters values for the 'RandomForest' regressor
rf_grid = {
    'n_estimators': [25, 50],
    'max_depth': [None, 10, 20],
    'min_samples_split': [20,40],
    'min_samples_leaf': [10, 12, 14],
    'max_features': [None, 'sqrt', 'log2'],
}

# 1. Neutron Stars

## **1.1 Using 8 M-R points**

### A. Predicting Energy on center $E_c$ Values

#### ->Using rowwise-shuffled data (linear)

In [4]:
# Showing the datasets
# regression_ML(filename="linNS_reg_data_pp8mr8s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).show_datasets()

In [5]:
# Building a regression model
regression_ML("linNS_reg_data_pp8mr8s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).train_test("rf",5,rf_grid,"msle",cores_par=18,filesave="linNS_rf_grid_enrg_16X_rwsh")

TRAINING AND ASSESSING A MACHINE LEARNING REGRESSION MODEL


>Preliminaries
>> DATA INFO AND SCALING:
-------------------------------------------------------------------------------------------------------------------
Y (response) data type: "enrg"
Number of Y columns:  12
X (explanatory) data type: "Mass" and "Radius"
Number of X columns:  16
The scaling of the X (explanatory) data has been completed
-------------------------------------------------------------------------------------------------------------------
>> CROSS-VALIDATION SETTINGS:
-------------------------------------------------------------------------------------------------------------------
The KFold cross-validator has been initialized with 5 n_splits
The cross-validation scorer has been initialized with the "Mean_Squared_Log_Error" as metric
-------------------------------------------------------------------------------------------------------------------
>> ESTIMATOR INFO:
------------------------------------------

{'n_estimators': [25, 50],
 'max_depth': [None, 10, 20],
 'min_samples_split': [20, 40],
 'min_samples_leaf': [10, 12, 14],
 'max_features': [None, 'sqrt', 'log2']}

-------------------------------------------------------------------------------------------------------------------
>> FITTING PROCEDURE OVERVIEW:
-------------------------------------------------------------------------------------------------------------------
The grid search has been initialized


Ongoing fitting process...
Fitting 5 folds for each of 108 candidates, totalling 540 fits


The fitting process has been completed
Elapsed fitting time: 3.0'21.53"
Available CPU cores: 18
-------------------------------------------------------------------------------------------------------------------
>> RESULTS:
-------------------------------------------------------------------------------------------------------------------
Best model:   RandomForestRegressor(max_features=None, min_samples_leaf=10,
                      min_samples_split=20, n_estimators=50, random_state=45)
Best parameters:   {'max_depth': None, 'max_features': None, 'min_samples_leaf': 10, 'min_samples_split': 20, 'n_estimators': 50}
Best cross-validation score (msle):   0.007373547434258438



>Overfitting metrics (using the train dataset as test dataset)
>> PREDICTIONS AND REAL VALUES:
-------------------------------------------------------------------------------------------------------------------
Predictions of "enrg"


array([[ 284.5823743 ,  386.21986679,  483.10073626, ..., 1362.71962896,
        1492.15845835, 1619.96983333],
       [ 284.08515337,  392.18142585,  492.40342252, ..., 1386.6190157 ,
        1513.11143716, 1637.19827786],
       [ 282.84872798,  378.10560036,  472.20333036, ..., 1292.36544852,
        1410.10417232, 1526.41797169],
       ...,
       [ 282.42872075,  405.24283715,  556.25044913, ..., 1544.68251266,
        1663.98535332, 1779.7540141 ],
       [ 281.99839254,  396.2131556 ,  524.72680785, ..., 1596.78904736,
        1734.53822725, 1867.71240675],
       [ 274.03862429,  394.79123788,  506.42952887, ..., 1559.35870059,
        1703.08386098, 1842.61233325]])

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(25),E_c(50),E_c(75),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800)
0,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
1,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
2,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
3,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
4,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
...,...,...,...,...,...,...,...,...,...,...,...,...
24195,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665
24196,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665
24197,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665
24198,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00907371 0.00640202 0.00529042 0.00440508 0.00421369 0.00260272
 0.00230011 0.00270025 0.0037101  0.00499019 0.00631178 0.00757262]
Uniform average
0.0049643907512966195
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[ 1078.14101299  1549.10366949  1915.05238428  2155.16434296
  2421.53880243  2391.78237129  2782.33350532  4302.07826198
  7954.60128061 14114.32154055 23015.51122522 34836.87303827]
Uniform average
8209.708452950186



>Prediction metrics (using the actual tes

array([[ 214.62682361,  280.91924156,  365.90434467, ..., 1045.60002689,
        1149.33528468, 1252.80123335],
       [ 219.40477782,  286.46985238,  363.24182724, ..., 1022.92174421,
        1126.80074387, 1230.47900459],
       [ 217.04306781,  285.04316168,  370.01613431, ..., 1040.52445446,
        1143.8686187 , 1246.96351954],
       ...,
       [ 278.39856199,  388.51754936,  489.55058562, ..., 1268.46419354,
        1376.79162708, 1484.05309037],
       [ 284.74926421,  397.08516419,  499.24010124, ..., 1356.25439254,
        1472.29789457, 1586.34613899],
       [ 284.18945825,  400.7565591 ,  507.77775304, ..., 1364.93152288,
        1486.01290436, 1605.53427127]])

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(25),E_c(50),E_c(75),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800)
24200,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24201,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24202,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24203,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24204,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
...,...,...,...,...,...,...,...,...,...,...,...,...
30295,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602
30296,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602
30297,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602
30298,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00899962 0.00798051 0.00734336 0.00623562 0.00636395 0.00409256
 0.00466695 0.00564949 0.00720416 0.0095249  0.01199487 0.01438595]
Uniform average
0.007870161540872939
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[ 1267.29376891  1966.29703607  2267.25419356  2429.10252343
  3038.79247896  3236.35188932  4948.39370836  8047.43219191
 12856.39328213 20240.25601146 30360.43194883 43414.31968167]
Uniform average
11172.693226218134



>Saving the grid search info:
The grid se

## **1.2 Using 16 M-R points**

### A. Predicting Energy on center $E_c$ Values

#### ->Using rowwise-shuffled data (linear)

In [7]:
# Showing the datasets
# regression_ML(filename="linNS_reg_data_pp8mr16s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).show_datasets()

In [8]:
# Building a regression model
regression_ML("linNS_reg_data_pp8mr16s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).train_test("rf",5,rf_grid,"msle",cores_par=18,filesave="linNS_rf_grid_enrg_32X_rwsh")

TRAINING AND ASSESSING A MACHINE LEARNING REGRESSION MODEL


>Preliminaries
>> DATA INFO AND SCALING:
-------------------------------------------------------------------------------------------------------------------
Y (response) data type: "enrg"
Number of Y columns:  12
X (explanatory) data type: "Mass" and "Radius"
Number of X columns:  32
The scaling of the X (explanatory) data has been completed
-------------------------------------------------------------------------------------------------------------------
>> CROSS-VALIDATION SETTINGS:
-------------------------------------------------------------------------------------------------------------------
The KFold cross-validator has been initialized with 5 n_splits
The cross-validation scorer has been initialized with the "Mean_Squared_Log_Error" as metric
-------------------------------------------------------------------------------------------------------------------
>> ESTIMATOR INFO:
------------------------------------------

{'n_estimators': [25, 50],
 'max_depth': [None, 10, 20],
 'min_samples_split': [20, 40],
 'min_samples_leaf': [10, 12, 14],
 'max_features': [None, 'sqrt', 'log2']}

-------------------------------------------------------------------------------------------------------------------
>> FITTING PROCEDURE OVERVIEW:
-------------------------------------------------------------------------------------------------------------------
The grid search has been initialized


Ongoing fitting process...
Fitting 5 folds for each of 108 candidates, totalling 540 fits


The fitting process has been completed
Elapsed fitting time: 6.0'1.46"
Available CPU cores: 18
-------------------------------------------------------------------------------------------------------------------
>> RESULTS:
-------------------------------------------------------------------------------------------------------------------
Best model:   RandomForestRegressor(max_features=None, min_samples_leaf=10,
                      min_samples_split=20, n_estimators=50, random_state=45)
Best parameters:   {'max_depth': None, 'max_features': None, 'min_samples_leaf': 10, 'min_samples_split': 20, 'n_estimators': 50}
Best cross-validation score (msle):   0.007824334585199207



>Overfitting metrics (using the train dataset as test dataset)
>> PREDICTIONS AND REAL VALUES:
-------------------------------------------------------------------------------------------------------------------
Predictions of "enrg"


array([[ 287.97981418,  400.53301906,  504.97368876, ..., 1348.67422056,
        1462.30521493, 1573.87010221],
       [ 283.61660825,  385.42419788,  481.70505378, ..., 1298.95683589,
        1417.99274096, 1535.65545164],
       [ 283.59128079,  388.96622626,  487.80676304, ..., 1347.29868174,
        1468.82679245, 1588.64809663],
       ...,
       [ 284.36051176,  402.26408004,  521.68986353, ..., 1524.81472887,
        1656.93081279, 1785.48285752],
       [ 286.5300373 ,  420.97540818,  567.53166516, ..., 1553.1278996 ,
        1674.0504311 , 1791.54767989],
       [ 282.70024589,  401.58422026,  526.53023174, ..., 1552.39170895,
        1685.63382137, 1814.95797041]])

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(25),E_c(50),E_c(75),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800)
0,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
1,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
2,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
3,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
4,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
...,...,...,...,...,...,...,...,...,...,...,...,...
24195,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665
24196,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665
24197,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665
24198,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00864185 0.00615652 0.00556155 0.00495011 0.00491284 0.00302049
 0.00245218 0.00266039 0.00360911 0.00489472 0.00626066 0.00758665]
Uniform average
0.005058922773853361
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[ 1016.67476534  1465.9093757   1908.55915805  2254.22773992
  2627.41737526  2580.70122379  2753.40994412  3986.97636498
  7469.94059159 13600.13770004 22637.30979456 34776.13737239]
Uniform average
8089.783450478685



>Prediction metrics (using the actual test

array([[ 220.33540509,  295.16918159,  388.64888467, ..., 1101.56584272,
        1209.16362985, 1316.41883538],
       [ 215.15060805,  283.69891441,  372.5878731 , ..., 1042.57561002,
        1146.35682406, 1250.00655343],
       [ 218.4579426 ,  289.41526545,  378.50980211, ..., 1069.87659946,
        1175.30196101, 1280.35261164],
       ...,
       [ 290.3200287 ,  396.8630598 ,  495.4920815 , ..., 1290.16671396,
        1401.15170447, 1510.76695822],
       [ 279.33701149,  383.35069783,  483.68334048, ..., 1337.90715504,
        1460.49831863, 1581.56317244],
       [ 278.06639836,  384.60024915,  487.21638198, ..., 1340.07538488,
        1459.75416702, 1577.71745713]])

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(25),E_c(50),E_c(75),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800)
24200,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24201,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24202,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24203,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24204,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
...,...,...,...,...,...,...,...,...,...,...,...,...
30295,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602
30296,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602
30297,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602
30298,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00811455 0.00734833 0.00741727 0.00676001 0.00724308 0.00502615
 0.00583252 0.00693587 0.00860954 0.01102831 0.01356657 0.01600635]
Uniform average
0.008657379779233613
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[ 1133.97917166  1827.38983034  2133.51800331  2382.07241284
  3189.31615755  3511.81399138  5516.83765552  9009.59815741
 14351.25699371 22377.92633325 33216.28094491 47056.12296734]
Uniform average
12142.176051601571



>Saving the grid search info:
The grid se

# 2. Quark Stars

## **2.1 Using 8 M-R points**

### A. Predicting Energy on center $E_c$ Values

#### ->Using rowwise-shuffled data

In [10]:
# Showing the datasets
# regression_ML(filename="QS_reg_data_pp8mr8s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).show_datasets()

In [11]:
# Building a regression model
regression_ML("QS_reg_data_pp8mr8s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).train_test("rf",5,rf_grid,"msle",cores_par=18,filesave="QS_rf_grid_enrg_16X_rwsh")

TRAINING AND ASSESSING A MACHINE LEARNING REGRESSION MODEL


>Preliminaries
>> DATA INFO AND SCALING:
-------------------------------------------------------------------------------------------------------------------
Y (response) data type: "enrg"
Number of Y columns:  12
X (explanatory) data type: "Mass" and "Radius"
Number of X columns:  16
The scaling of the X (explanatory) data has been completed
-------------------------------------------------------------------------------------------------------------------
>> CROSS-VALIDATION SETTINGS:
-------------------------------------------------------------------------------------------------------------------
The KFold cross-validator has been initialized with 5 n_splits
The cross-validation scorer has been initialized with the "Mean_Squared_Log_Error" as metric
-------------------------------------------------------------------------------------------------------------------
>> ESTIMATOR INFO:
------------------------------------------

{'n_estimators': [25, 50],
 'max_depth': [None, 10, 20],
 'min_samples_split': [20, 40],
 'min_samples_leaf': [10, 12, 14],
 'max_features': [None, 'sqrt', 'log2']}

-------------------------------------------------------------------------------------------------------------------
>> FITTING PROCEDURE OVERVIEW:
-------------------------------------------------------------------------------------------------------------------
The grid search has been initialized


Ongoing fitting process...
Fitting 5 folds for each of 108 candidates, totalling 540 fits


The fitting process has been completed
Elapsed fitting time: 14.0'12.55"
Available CPU cores: 18
-------------------------------------------------------------------------------------------------------------------
>> RESULTS:
-------------------------------------------------------------------------------------------------------------------
Best model:   RandomForestRegressor(max_features=None, min_samples_leaf=10,
                      min_samples_split=20, n_estimators=50, random_state=45)
Best parameters:   {'max_depth': None, 'max_features': None, 'min_samples_leaf': 10, 'min_samples_split': 20, 'n_estimators': 50}
Best cross-validation score (msle):   0.0037241165556840517



>Overfitting metrics (using the train dataset as test dataset)
>> PREDICTIONS AND REAL VALUES:
-------------------------------------------------------------------------------------------------------------------
Predictions of "enrg"


array([[ 533.9603857 ,  795.03443815, 1086.39251772, ..., 3142.50327217,
        3437.50365367, 3732.69885461],
       [ 530.87111078,  785.2356862 , 1070.03475067, ..., 3092.38761258,
        3383.51698677, 3674.98911805],
       [ 583.07346429,  810.00802495, 1067.83257951, ..., 2950.19542286,
        3225.20808595, 3501.17237645],
       ...,
       [ 497.02737266,  698.30020958,  931.51198609, ..., 2690.39271454,
        2951.42039397, 3213.95352043],
       [ 499.75244182,  693.5184231 ,  919.24078052, ..., 2638.56804488,
        2895.00738511, 3153.12193111],
       [ 416.23321858,  623.01422002,  862.3826084 , ..., 2657.7949    ,
        2923.26680088, 3190.10193991]])

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800),E_c(900),E_c(1000),E_c(1100)
0,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
1,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
2,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
3,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
4,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
...,...,...,...,...,...,...,...,...,...,...,...,...
71295,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.93463,2786.462071,3036.01117
71296,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.93463,2786.462071,3036.01117
71297,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.93463,2786.462071,3036.01117
71298,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.93463,2786.462071,3036.01117


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00544532 0.00134516 0.00127028 0.00159782 0.00186019 0.00202694
 0.00212096 0.00216527 0.00217662 0.00216635 0.00214205 0.00210879]
Uniform average
0.0022021467702552284
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[ 1205.26581033   777.91657858  1130.25722615  2118.95132663
  3593.00995066  5452.9711275   7629.31831766 10071.21536045
 12740.15401991 15606.13317637 18645.23762999 21838.03700493]
Uniform average
8400.70562743132



>Prediction metrics (using the actual test

array([[ 770.59486266, 1040.51297963, 1340.43109247, ..., 3439.99547335,
        3739.94469739, 4039.89576772],
       [ 832.00976311, 1102.00976311, 1402.00976311, ..., 3502.00976311,
        3802.00976311, 4102.00976311],
       [ 948.18224272, 1218.18224272, 1518.18224272, ..., 3618.18224272,
        3918.18224272, 4218.18224272],
       ...,
       [ 404.2291493 ,  599.50697149,  827.68096602, ..., 2565.96859327,
        2824.9218773 , 3085.49140886],
       [ 418.65066018,  606.07527036,  826.56805521, ..., 2525.65002351,
        2780.13517281, 3036.41098267],
       [ 417.66001145,  619.07907442,  853.32801182, ..., 2623.12448475,
        2885.69620979, 3149.74592911]])

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800),E_c(900),E_c(1000),E_c(1100)
71300,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
71301,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
71302,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
71303,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
71304,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
...,...,...,...,...,...,...,...,...,...,...,...,...
89095,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.23047,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462
89096,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.23047,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462
89097,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.23047,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462
89098,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.23047,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00716509 0.00187384 0.00219759 0.00289634 0.00338953 0.00368505
 0.00384271 0.00391003 0.00391912 0.00389091 0.00383906 0.00377249]
Uniform average
0.0036984816808192087
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[ 1753.57749917  1167.55182626  1965.56720914  3869.44066988
  6613.31063323 10023.40534433 13978.81286509 18391.37037244
 23194.43275915 28336.15820639 33775.27109484 39478.26864978]
Uniform average
15212.263927476044



>Saving the grid search info:
The grid s

## **2.2 Using 16 M-R points**

### A. Predicting Energy on center $E_c$ Values

#### ->Using rowwise-shuffled data

In [13]:
# Showing the datasets
# regression_ML(filename="QS_reg_data_pp8mr16s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).show_datasets()

In [14]:
# Building a regression model
regression_ML("QS_reg_data_pp8mr16s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).train_test("rf",5,rf_grid,"msle",cores_par=18,filesave="QS_rf_grid_enrg_32X_rwsh")

TRAINING AND ASSESSING A MACHINE LEARNING REGRESSION MODEL


>Preliminaries
>> DATA INFO AND SCALING:
-------------------------------------------------------------------------------------------------------------------
Y (response) data type: "enrg"
Number of Y columns:  12
X (explanatory) data type: "Mass" and "Radius"
Number of X columns:  32
The scaling of the X (explanatory) data has been completed
-------------------------------------------------------------------------------------------------------------------
>> CROSS-VALIDATION SETTINGS:
-------------------------------------------------------------------------------------------------------------------
The KFold cross-validator has been initialized with 5 n_splits
The cross-validation scorer has been initialized with the "Mean_Squared_Log_Error" as metric
-------------------------------------------------------------------------------------------------------------------
>> ESTIMATOR INFO:
------------------------------------------

{'n_estimators': [25, 50],
 'max_depth': [None, 10, 20],
 'min_samples_split': [20, 40],
 'min_samples_leaf': [10, 12, 14],
 'max_features': [None, 'sqrt', 'log2']}

-------------------------------------------------------------------------------------------------------------------
>> FITTING PROCEDURE OVERVIEW:
-------------------------------------------------------------------------------------------------------------------
The grid search has been initialized


Ongoing fitting process...
Fitting 5 folds for each of 108 candidates, totalling 540 fits


The fitting process has been completed
Elapsed fitting time: 25.0'11.60"
Available CPU cores: 18
-------------------------------------------------------------------------------------------------------------------
>> RESULTS:
-------------------------------------------------------------------------------------------------------------------
Best model:   RandomForestRegressor(max_features=None, min_samples_leaf=10,
                      min_samples_split=20, n_estimators=50, random_state=45)
Best parameters:   {'max_depth': None, 'max_features': None, 'min_samples_leaf': 10, 'min_samples_split': 20, 'n_estimators': 50}
Best cross-validation score (msle):   0.0036099754639829116



>Overfitting metrics (using the train dataset as test dataset)
>> PREDICTIONS AND REAL VALUES:
-------------------------------------------------------------------------------------------------------------------
Predictions of "enrg"


array([[ 550.64061479,  792.17773772, 1064.43249203, ..., 3022.06409114,
        3305.75951596, 3590.08019728],
       [ 557.85944666,  810.40674146, 1093.2938985 , ..., 3104.81592683,
        3394.64778245, 3684.86579283],
       [ 550.60287077,  799.64004243, 1079.20414627, ..., 3074.40892467,
        3362.41392286, 3650.87985423],
       ...,
       [ 421.4795133 ,  629.51936417,  870.23976503, ..., 2673.26547714,
        2939.64044732, 3207.34703681],
       [ 473.96830539,  690.17996533,  938.20517716, ..., 2774.45332162,
        3044.4055754 , 3315.52746379],
       [ 527.12048391,  731.31787375,  967.17905659, ..., 2738.24895936,
        3000.59395407, 3264.38503182]])

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800),E_c(900),E_c(1000),E_c(1100)
0,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
1,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
2,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
3,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
4,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
...,...,...,...,...,...,...,...,...,...,...,...,...
71295,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.93463,2786.462071,3036.01117
71296,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.93463,2786.462071,3036.01117
71297,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.93463,2786.462071,3036.01117
71298,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.93463,2786.462071,3036.01117


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00496222 0.00098852 0.00096776 0.00132686 0.00161328 0.00179997
 0.00191096 0.0019699  0.00199399 0.0019949  0.00198048 0.001956  ]
Uniform average
0.0019554030163968835
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[ 1019.60746961   526.7855593    777.57059563  1643.42153303
  2978.86468734  4687.92924582  6703.46851863  8976.34101909
 11469.29963458 14153.31166499 17005.22465858 20006.22168545]
Uniform average
7495.670522671096



>Prediction metrics (using the actual tes

array([[ 776.00154048, 1043.93858617, 1341.90037276, ..., 3431.26394624,
        3730.03650491, 4028.85514086],
       [ 873.71614621, 1143.71614621, 1443.71614621, ..., 3543.71614621,
        3843.71614621, 4143.71614621],
       [ 815.88394804, 1085.88394804, 1385.88394804, ..., 3485.88394804,
        3785.88394804, 4085.88394804],
       ...,
       [ 416.51469191,  625.42079955,  866.84836345, ..., 2672.56694224,
        2939.19977626, 3207.14883266],
       [ 444.93921298,  653.98318347,  895.37298525, ..., 2699.88573644,
        2966.32782659, 3234.08774922],
       [ 399.21771425,  623.55518211,  880.31572864, ..., 2766.24615517,
        3042.13485112, 3318.99153641]])

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800),E_c(900),E_c(1000),E_c(1100)
71300,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
71301,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
71302,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
71303,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
71304,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
...,...,...,...,...,...,...,...,...,...,...,...,...
89095,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.23047,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462
89096,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.23047,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462
89097,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.23047,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462
89098,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.23047,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00650281 0.00146464 0.00198916 0.00279515 0.00334952 0.00368211
 0.0038632  0.00394578 0.00396495 0.00394346 0.00389605 0.00383237]
Uniform average
0.0036024344575427817
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[ 1464.39062442   815.82174588  1586.45483779  3493.57870694
  6263.49626956  9717.52697213 13731.41719699 18214.59930173
 23098.62595079 28330.26191114 33867.12711866 39674.82541474]
Uniform average
15021.510504231655



>Saving the grid search info:
The grid s