**MSc Computational Physics AUTh**<br>
**Academic Year: 2024-2025**<br>
**Master's Thesis**<br>

**Thesis Title:**<br>  
# **"Reconstruction of the EoSs of Exotic Stars using ML and ANNs regression models"**

**Implemented by: Ioannis Stergakis**<br>
**AEM: 4439**<br>

**Jupyter Notebook: JN4d**<br>
**Name: "train_test_xgboost_regress.ipynb"**<br>

**Description:**<br> 
**Training and testing the `XGBRegressor` algorithm:**<br>
**1. Performing grid search to determine the best hyperparameters**<br>
**2. Performing cross validation to optimize the model for future foreign data**<br>
**3. Assessing the accuracy of the best model using different scorers and metrics**


**Abbrevations:**<br>
**1. NS -> Neutron Star**<br>
**2. QS -> Quark Star**<br>
**3. ML -> Machine Learning**

In [1]:
# Importing useful modules
import joblib
from data_analysis_ES_ML import *

In [2]:
# Defining the grid of hyperparameters value for the 'ExtremeGradientBoosting (XGBoost) regressor
xgboost_grid = {
    'estimator__n_estimators': [50, 100],
    'estimator__max_depth': [3, 5, 7],
    'estimator__learning_rate': [0.05, 0.1],
    'estimator__subsample': [0.7, 1.0],
    'estimator__colsample_bytree': [0.7, 1.0],
    'estimator__reg_alpha': [0.1],
    'estimator__reg_lambda': [1.0, 5.0]
}

# 1. Neutron Stars

## **1.1 Using 8 M-R points**

### A. Predicting Energy on center $E_c$ Values

#### ->Using rowwise-shuffled data (linear)

In [10]:
# Showing the datasets
# regression_ML(filename="linNS_reg_data_pp8mr8s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).show_datasets()

In [5]:
# Building a regression model
regression_ML("linNS_reg_data_pp8mr8s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).train_test("xgboost",5,xgboost_grid,"msle",cores_par=18,filesave="linNS_xgboost_grid_enrg_16X_rwsh")

TRAINING AND ASSESSING A MACHINE LEARNING REGRESSION MODEL


>Preliminaries
>> DATA INFO AND SCALING:
-------------------------------------------------------------------------------------------------------------------
Y (response) data type: "enrg"
Number of Y columns:  12
X (explanatory) data type: "Mass" and "Radius"
Number of X columns:  16
The scaling of the X (explanatory) data has been completed
-------------------------------------------------------------------------------------------------------------------
>> CROSS-VALIDATION SETTINGS:
-------------------------------------------------------------------------------------------------------------------
The KFold cross-validator has been initialized with 5 n_splits
The cross-validation scorer has been initialized with the "Mean_Squared_Log_Error" as metric
-------------------------------------------------------------------------------------------------------------------
>> ESTIMATOR INFO:
------------------------------------------

{'estimator__n_estimators': [50, 100],
 'estimator__max_depth': [3, 5, 7],
 'estimator__learning_rate': [0.05, 0.1],
 'estimator__subsample': [0.7, 1.0],
 'estimator__colsample_bytree': [0.7, 1.0],
 'estimator__reg_alpha': [0.1],
 'estimator__reg_lambda': [1.0, 5.0]}

-------------------------------------------------------------------------------------------------------------------
>> FITTING PROCEDURE OVERVIEW:
-------------------------------------------------------------------------------------------------------------------
The grid search has been initialized


Ongoing fitting process...
Fitting 5 folds for each of 96 candidates, totalling 480 fits


The fitting process has been completed
Elapsed fitting time: 2.0'47.39"
Available CPU cores: 18
-------------------------------------------------------------------------------------------------------------------
>> RESULTS:
-------------------------------------------------------------------------------------------------------------------
Best model:   MultiOutputRegressor(estimator=XGBRegressor(base_score=None, booster=None,
                                            callbacks=None,
                                            colsample_bylevel=None,
                                            colsample_bynode=None,
                                            colsample_bytree=0.7, device=None,
                                            early_stopping_rounds=None,
                                            enable_categorical=False,
                                            eval_metric=None,
                                            feature_types=None,
                             

array([[ 286.8935 ,  374.32007,  456.37204, ..., 1435.5051 , 1552.0219 ,
        1661.6101 ],
       [ 285.18607,  378.94077,  466.0175 , ..., 1370.9202 , 1546.0654 ,
        1658.6085 ],
       [ 287.17715,  379.3942 ,  470.90524, ..., 1356.495  , 1375.1211 ,
        1579.6726 ],
       ...,
       [ 304.04718,  414.64432,  556.6867 , ..., 1547.6646 , 1685.2179 ,
        1821.3439 ],
       [ 293.76996,  406.90704,  535.0577 , ..., 1631.5476 , 1764.6345 ,
        1869.6688 ],
       [ 298.3553 ,  393.88345,  485.88367, ..., 1548.1301 , 1696.1045 ,
        1814.5828 ]], dtype=float32)

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(25),E_c(50),E_c(75),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800)
0,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
1,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
2,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
3,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
4,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
...,...,...,...,...,...,...,...,...,...,...,...,...
24195,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665
24196,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665
24197,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665
24198,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00286374 0.00257844 0.00268322 0.00226524 0.00231231 0.00153365
 0.00141944 0.0018991  0.002775   0.00379664 0.00481968 0.00572676]
Uniform average
0.0028894350464511976
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[  326.96497856   567.21753378   889.12777649  1085.77470532
  1296.55339203  1449.01682789  1808.61303532  3170.19051534
  6089.95553803 10890.2053967  17520.11590474 25890.40695203]
Uniform average
5915.345213018355



>Prediction metrics (using the actual tes

array([[ 210.36832,  271.32462,  355.95264, ..., 1055.9961 , 1138.6624 ,
        1186.95   ],
       [ 216.99313,  285.74426,  361.8172 , ..., 1032.7137 , 1075.4167 ,
        1220.1196 ],
       [ 220.05807,  295.3465 ,  390.02948, ..., 1018.3957 , 1121.6222 ,
        1206.2467 ],
       ...,
       [ 301.81223,  429.5653 ,  509.39572, ..., 1315.0792 , 1387.156  ,
        1491.2053 ],
       [ 278.0481 ,  426.88855,  506.58267, ..., 1336.1425 , 1463.218  ,
        1628.1062 ],
       [ 295.6065 ,  407.01062,  512.6787 , ..., 1350.6019 , 1447.9973 ,
        1624.3774 ]], dtype=float32)

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(25),E_c(50),E_c(75),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800)
24200,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24201,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24202,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24203,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24204,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
...,...,...,...,...,...,...,...,...,...,...,...,...
30295,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602
30296,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602
30297,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602
30298,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00432606 0.00435426 0.00461305 0.00473827 0.005404   0.00320919
 0.00353181 0.00456723 0.00627482 0.00869044 0.01126649 0.01402084]
Uniform average
0.006249705372236475
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[  605.5781699   1230.3355798   1401.59176934  1854.99753718
  2608.0673019   2672.55183658  3934.42170427  6663.15196875
 11308.99901988 18441.42246568 28430.26315004 42132.16187603]
Uniform average
10106.961864946237



>Saving the grid search info:
The grid se

## **1.2 Using 16 M-R points**

### A. Predicting Energy on center $E_c$ Values

#### ->Using rowwise-shuffled data (linear)

In [7]:
# Showing the datasets
# regression_ML(filename="linNS_reg_data_pp8mr16s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).show_datasets()

In [8]:
# Building a regression model
regression_ML("linNS_reg_data_pp8mr16s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).train_test("xgboost",5,xgboost_grid,"msle",cores_par=18,filesave="linNS_xgboost_grid_enrg_32X_rwsh")

TRAINING AND ASSESSING A MACHINE LEARNING REGRESSION MODEL


>Preliminaries
>> DATA INFO AND SCALING:
-------------------------------------------------------------------------------------------------------------------
Y (response) data type: "enrg"
Number of Y columns:  12
X (explanatory) data type: "Mass" and "Radius"
Number of X columns:  32
The scaling of the X (explanatory) data has been completed
-------------------------------------------------------------------------------------------------------------------
>> CROSS-VALIDATION SETTINGS:
-------------------------------------------------------------------------------------------------------------------
The KFold cross-validator has been initialized with 5 n_splits
The cross-validation scorer has been initialized with the "Mean_Squared_Log_Error" as metric
-------------------------------------------------------------------------------------------------------------------
>> ESTIMATOR INFO:
------------------------------------------

{'estimator__n_estimators': [50, 100],
 'estimator__max_depth': [3, 5, 7],
 'estimator__learning_rate': [0.05, 0.1],
 'estimator__subsample': [0.7, 1.0],
 'estimator__colsample_bytree': [0.7, 1.0],
 'estimator__reg_alpha': [0.1],
 'estimator__reg_lambda': [1.0, 5.0]}

-------------------------------------------------------------------------------------------------------------------
>> FITTING PROCEDURE OVERVIEW:
-------------------------------------------------------------------------------------------------------------------
The grid search has been initialized


Ongoing fitting process...
Fitting 5 folds for each of 96 candidates, totalling 480 fits


The fitting process has been completed
Elapsed fitting time: 5.0'27.67"
Available CPU cores: 18
-------------------------------------------------------------------------------------------------------------------
>> RESULTS:
-------------------------------------------------------------------------------------------------------------------
Best model:   MultiOutputRegressor(estimator=XGBRegressor(base_score=None, booster=None,
                                            callbacks=None,
                                            colsample_bylevel=None,
                                            colsample_bynode=None,
                                            colsample_bytree=1.0, device=None,
                                            early_stopping_rounds=None,
                                            enable_categorical=False,
                                            eval_metric=None,
                                            feature_types=None,
                             

array([[ 290.94965,  387.45053,  481.96133, ..., 1339.3997 , 1454.7833 ,
        1561.4854 ],
       [ 287.41766,  370.04355,  465.98477, ..., 1425.8751 , 1428.8866 ,
        1545.124  ],
       [ 284.94742,  377.2187 ,  456.80475, ..., 1408.9957 , 1429.9406 ,
        1656.3656 ],
       ...,
       [ 295.02316,  400.54987,  515.0721 , ..., 1540.8253 , 1675.2787 ,
        1786.8441 ],
       [ 289.7933 ,  397.63562,  556.2255 , ..., 1564.8241 , 1680.5406 ,
        1815.0515 ],
       [ 293.47955,  393.7537 ,  522.71136, ..., 1628.5128 , 1736.3585 ,
        1864.4922 ]], dtype=float32)

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(25),E_c(50),E_c(75),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800)
0,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
1,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
2,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
3,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
4,276.583390,372.852857,479.016150,557.218753,622.024504,820.705714,975.002096,1108.032300,1228.148143,1339.437507,1444.243512,1544.706350
...,...,...,...,...,...,...,...,...,...,...,...,...
24195,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665
24196,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665
24197,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665
24198,302.075586,383.981829,510.689681,639.227324,745.339833,976.073292,1152.853575,1303.783182,1439.014389,1563.516329,1680.137249,1790.679665


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00247723 0.00192369 0.00197343 0.00187509 0.00196579 0.0011255
 0.00089211 0.00124354 0.0020085  0.00277622 0.00363604 0.00446309]
Uniform average
0.0021966868276673108
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[  283.10508375   419.76310518   606.98103858   799.04479328
   998.66856086  1043.36982147  1145.44031663  2100.5571126
  4487.60084489  8044.83744237 13378.83162278 20368.90196394]
Uniform average
4473.091808861888



>Prediction metrics (using the actual test 

array([[ 216.56573,  287.44388,  354.38113, ..., 1027.1427 , 1204.1008 ,
        1286.9844 ],
       [ 216.6762 ,  294.19913,  395.91455, ...,  975.0877 , 1051.5972 ,
        1145.5674 ],
       [ 216.95819,  286.98038,  375.75156, ..., 1036.6152 , 1128.7327 ,
        1327.3743 ],
       ...,
       [ 291.6469 ,  402.5614 ,  483.22934, ..., 1352.1096 , 1377.7792 ,
        1556.5219 ],
       [ 282.50873,  430.5035 ,  501.62308, ..., 1286.0724 , 1406.2153 ,
        1483.8138 ],
       [ 280.67517,  384.28253,  495.64542, ..., 1350.9207 , 1434.2455 ,
        1695.7902 ]], dtype=float32)

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(25),E_c(50),E_c(75),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800)
24200,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24201,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24202,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24203,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
24204,229.674954,299.164609,362.525289,407.755484,446.613232,599.701360,722.015264,829.584262,929.945992,1029.945992,1129.945992,1229.945992
...,...,...,...,...,...,...,...,...,...,...,...,...
30295,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602
30296,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602
30297,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602
30298,284.859202,439.667175,529.611941,592.667261,643.331503,811.114901,964.023356,1095.948633,1215.131408,1325.605161,1429.681834,1529.890602


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00400813 0.00382107 0.00439887 0.00474215 0.00569878 0.00298048
 0.00352851 0.00455808 0.00643008 0.00907178 0.01173135 0.0144731 ]
Uniform average
0.006286865038960321
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[  555.62826981  1146.76411613  1246.15150918  1667.69273335
  2528.8089525   2365.42964474  3746.5261392   6406.1730179
 11217.59952255 18742.50727652 28984.78810269 42860.32627555]
Uniform average
10122.366296676571



>Saving the grid search info:
The grid sea

# 2. Quark Stars

## **2.1 Using 8 M-R points**

### A. Predicting Energy on center $E_c$ Values

#### ->Using rowwise-shuffled data

In [12]:
# Showing the datasets
# regression_ML(filename="QS_reg_data_pp8mr8s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).show_datasets()

In [13]:
# Building a regression model
regression_ML("QS_reg_data_pp8mr8s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).train_test("xgboost",5,xgboost_grid,"msle",cores_par=18,filesave="QS_xgboost_grid_enrg_16X_rwsh")

TRAINING AND ASSESSING A MACHINE LEARNING REGRESSION MODEL


>Preliminaries
>> DATA INFO AND SCALING:
-------------------------------------------------------------------------------------------------------------------
Y (response) data type: "enrg"
Number of Y columns:  12
X (explanatory) data type: "Mass" and "Radius"
Number of X columns:  16
The scaling of the X (explanatory) data has been completed
-------------------------------------------------------------------------------------------------------------------
>> CROSS-VALIDATION SETTINGS:
-------------------------------------------------------------------------------------------------------------------
The KFold cross-validator has been initialized with 5 n_splits
The cross-validation scorer has been initialized with the "Mean_Squared_Log_Error" as metric
-------------------------------------------------------------------------------------------------------------------
>> ESTIMATOR INFO:
------------------------------------------

{'estimator__n_estimators': [50, 100],
 'estimator__max_depth': [3, 5, 7],
 'estimator__learning_rate': [0.05, 0.1],
 'estimator__subsample': [0.7, 1.0],
 'estimator__colsample_bytree': [0.7, 1.0],
 'estimator__reg_alpha': [0.1],
 'estimator__reg_lambda': [1.0, 5.0]}

-------------------------------------------------------------------------------------------------------------------
>> FITTING PROCEDURE OVERVIEW:
-------------------------------------------------------------------------------------------------------------------
The grid search has been initialized


Ongoing fitting process...
Fitting 5 folds for each of 96 candidates, totalling 480 fits


The fitting process has been completed
Elapsed fitting time: 5.0'34.58"
Available CPU cores: 18
-------------------------------------------------------------------------------------------------------------------
>> RESULTS:
-------------------------------------------------------------------------------------------------------------------
Best model:   MultiOutputRegressor(estimator=XGBRegressor(base_score=None, booster=None,
                                            callbacks=None,
                                            colsample_bylevel=None,
                                            colsample_bynode=None,
                                            colsample_bytree=1.0, device=None,
                                            early_stopping_rounds=None,
                                            enable_categorical=False,
                                            eval_metric=None,
                                            feature_types=None,
                             

array([[ 503.5901 ,  754.99677, 1069.4283 , ..., 3124.9062 , 3427.5955 ,
        3757.0244 ],
       [ 528.6572 ,  783.67194, 1076.6255 , ..., 3126.0842 , 3344.9707 ,
        3663.6409 ],
       [ 593.48596,  826.1015 , 1051.8531 , ..., 2805.1052 , 3089.3599 ,
        3373.255  ],
       ...,
       [ 493.189  ,  693.9846 ,  912.1765 , ..., 2688.0889 , 2955.424  ,
        3200.7036 ],
       [ 517.3399 ,  711.0466 ,  912.8261 , ..., 2638.6526 , 2853.6262 ,
        3061.531  ],
       [ 469.2241 ,  677.09106,  904.6919 , ..., 2607.2212 , 2862.613  ,
        3141.6355 ]], dtype=float32)

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800),E_c(900),E_c(1000),E_c(1100)
0,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
1,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
2,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
3,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
4,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
...,...,...,...,...,...,...,...,...,...,...,...,...
71295,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.93463,2786.462071,3036.01117
71296,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.93463,2786.462071,3036.01117
71297,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.93463,2786.462071,3036.01117
71298,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.93463,2786.462071,3036.01117


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00351732 0.00101514 0.00130194 0.00167758 0.00193221 0.00208732
 0.00216332 0.00218591 0.00218688 0.00217543 0.00213821 0.00209085]
Uniform average
0.0020393442189270802
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[  903.40322311   630.81209316  1223.87242966  2358.99988065
  3915.24222364  5859.70263331  8083.08400472 10521.25895935
 13228.16238775 16163.35946291 19158.49969412 22239.55276408]
Uniform average
8690.495813035635



>Prediction metrics (using the actual tes

array([[ 820.1989 , 1069.2557 , 1362.4802 , ..., 3452.0981 , 3730.075  ,
        4079.9587 ],
       [ 864.4021 , 1104.3857 , 1425.2083 , ..., 3556.3518 , 3852.2004 ,
        4146.0757 ],
       [ 948.8548 , 1197.116  , 1500.3844 , ..., 3590.5007 , 3888.5884 ,
        4193.812  ],
       ...,
       [ 407.07706,  599.3828 ,  845.6217 , ..., 2468.0356 , 2791.1816 ,
        3062.2776 ],
       [ 455.89365,  660.7975 ,  873.7101 , ..., 2612.9473 , 2800.6345 ,
        3023.464  ],
       [ 434.3397 ,  622.5791 ,  844.867  , ..., 2486.246  , 2834.5703 ,
        2981.502  ]], dtype=float32)

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800),E_c(900),E_c(1000),E_c(1100)
71300,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
71301,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
71302,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
71303,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
71304,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
...,...,...,...,...,...,...,...,...,...,...,...,...
89095,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.23047,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462
89096,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.23047,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462
89097,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.23047,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462
89098,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.23047,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00450825 0.00141998 0.00186608 0.00244253 0.00278515 0.00301098
 0.00310437 0.0031496  0.0031375  0.00311659 0.00306084 0.00298535]
Uniform average
0.0028822684577248414
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[ 1243.34388117   908.72622867  1665.38001155  3279.81539677
  5476.95599804  8257.95268326 11362.48713613 14893.5041248
 18680.10338362 22805.58693279 27060.79632526 31407.47133284]
Uniform average
12253.510286240546



>Saving the grid search info:
The grid se

## **2.2 Using 16 M-R points**

### A. Predicting Energy on center $E_c$ Values

#### ->Using rowwise-shuffled data

In [15]:
# Showing the datasets
# regression_ML(filename="QS_reg_data_pp8mr16s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).show_datasets()

In [16]:
# Building a regression model
regression_ML("QS_reg_data_pp8mr16s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2,samples_per_EOS=100).train_test("xgboost",5,xgboost_grid,"msle",cores_par=18,filesave="QS_xgboost_grid_enrg_32X_rwsh")

TRAINING AND ASSESSING A MACHINE LEARNING REGRESSION MODEL


>Preliminaries
>> DATA INFO AND SCALING:
-------------------------------------------------------------------------------------------------------------------
Y (response) data type: "enrg"
Number of Y columns:  12
X (explanatory) data type: "Mass" and "Radius"
Number of X columns:  32
The scaling of the X (explanatory) data has been completed
-------------------------------------------------------------------------------------------------------------------
>> CROSS-VALIDATION SETTINGS:
-------------------------------------------------------------------------------------------------------------------
The KFold cross-validator has been initialized with 5 n_splits
The cross-validation scorer has been initialized with the "Mean_Squared_Log_Error" as metric
-------------------------------------------------------------------------------------------------------------------
>> ESTIMATOR INFO:
------------------------------------------

{'estimator__n_estimators': [50, 100],
 'estimator__max_depth': [3, 5, 7],
 'estimator__learning_rate': [0.05, 0.1],
 'estimator__subsample': [0.7, 1.0],
 'estimator__colsample_bytree': [0.7, 1.0],
 'estimator__reg_alpha': [0.1],
 'estimator__reg_lambda': [1.0, 5.0]}

-------------------------------------------------------------------------------------------------------------------
>> FITTING PROCEDURE OVERVIEW:
-------------------------------------------------------------------------------------------------------------------
The grid search has been initialized


Ongoing fitting process...
Fitting 5 folds for each of 96 candidates, totalling 480 fits


The fitting process has been completed
Elapsed fitting time: 10.0'24.06"
Available CPU cores: 18
-------------------------------------------------------------------------------------------------------------------
>> RESULTS:
-------------------------------------------------------------------------------------------------------------------
Best model:   MultiOutputRegressor(estimator=XGBRegressor(base_score=None, booster=None,
                                            callbacks=None,
                                            colsample_bylevel=None,
                                            colsample_bynode=None,
                                            colsample_bytree=1.0, device=None,
                                            early_stopping_rounds=None,
                                            enable_categorical=False,
                                            eval_metric=None,
                                            feature_types=None,
                            

array([[ 541.9359 ,  782.5096 , 1057.2982 , ..., 2985.0286 , 3209.016  ,
        3519.3176 ],
       [ 532.5183 ,  776.3247 , 1061.6156 , ..., 3085.1667 , 3414.7126 ,
        3714.9487 ],
       [ 534.2008 ,  792.29877, 1096.7477 , ..., 3096.684  , 3345.9873 ,
        3675.4058 ],
       ...,
       [ 440.62424,  647.1823 ,  878.57385, ..., 2668.73   , 2872.2798 ,
        3192.6516 ],
       [ 479.3142 ,  687.27075,  919.2181 , ..., 2712.035  , 2970.8716 ,
        3310.4817 ],
       [ 516.70154,  719.8428 ,  936.2204 , ..., 2678.124  , 3015.4502 ,
        3126.9038 ]], dtype=float32)

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800),E_c(900),E_c(1000),E_c(1100)
0,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
1,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
2,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
3,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
4,498.000000,768.000000,1068.000000,1368.000000,1668.000000,1968.000000,2268.000000,2568.000000,2868.000000,3168.00000,3468.000000,3768.00000
...,...,...,...,...,...,...,...,...,...,...,...,...
71295,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.93463,2786.462071,3036.01117
71296,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.93463,2786.462071,3036.01117
71297,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.93463,2786.462071,3036.01117
71298,508.005692,686.744667,897.469813,1117.208404,1343.680995,1575.432417,1811.466499,2051.065804,2293.693308,2538.93463,2786.462071,3036.01117


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00244548 0.00072776 0.00099371 0.00131373 0.00151882 0.00164464
 0.0016994  0.00175323 0.00173863 0.0017105  0.00169833 0.001661  ]
Uniform average
0.0015754358071467765
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[  621.34035359   429.63492037   917.2938621   1820.3213894
  3066.62496672  4609.26717503  6367.24196617  8449.52973611
 10505.0105955  12728.30633997 15232.67549814 17701.4346349 ]
Uniform average
6870.7234531652



>Prediction metrics (using the actual test d

array([[ 764.45605, 1072.6262 , 1366.082  , ..., 3469.7964 , 3779.6155 ,
        4113.8174 ],
       [ 924.8462 , 1190.2935 , 1448.4253 , ..., 3572.8213 , 3878.2268 ,
        4182.1426 ],
       [ 809.3785 , 1129.8844 , 1422.2843 , ..., 3539.7334 , 3836.6453 ,
        4139.0996 ],
       ...,
       [ 430.55484,  628.4462 ,  857.7564 , ..., 2558.2432 , 2861.1445 ,
        3248.8306 ],
       [ 446.16235,  641.493  ,  865.4948 , ..., 2570.7957 , 2748.2083 ,
        2931.1787 ],
       [ 401.25626,  632.57513,  888.389  , ..., 2775.4275 , 2993.7244 ,
        3287.477  ]], dtype=float32)

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800),E_c(900),E_c(1000),E_c(1100)
71300,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
71301,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
71302,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
71303,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
71304,852.000000,1122.000000,1422.000000,1722.000000,2022.000000,2322.000000,2622.00000,2922.000000,3222.000000,3522.000000,3822.000000,4122.000000
...,...,...,...,...,...,...,...,...,...,...,...,...
89095,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.23047,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462
89096,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.23047,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462
89097,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.23047,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462
89098,471.758594,637.295412,834.061872,1040.614857,1254.607242,1474.502678,1699.23047,1928.009096,2160.247412,2395.485434,2633.356884,2873.564462


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00354672 0.00113907 0.00163293 0.00216736 0.00248589 0.00266294
 0.00273323 0.00278194 0.00274313 0.00272344 0.00269119 0.00264661]
Uniform average
0.0024962042241867305
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[  945.84775157   662.19476932  1353.23555131  2795.36187309
  4742.08465793  7122.4172229   9831.9927637  12970.83101945
 16069.75651558 19700.30390251 23523.90822469 27519.00670478]
Uniform average
10603.078413068604



>Saving the grid search info:
The grid s