**MSc Computational Physics AUTh**<br>
**Academic Year: 2024-2025**<br>
**Master's Thesis**<br>

**Thesis Title:**<br>  
# **"Reconstruction of the EoSs of Exotic Stars using ML and ANNs regression models"**

**Implemented by: Ioannis Stergakis**<br>
**AEM: 4439**<br>

**Jupyter Notebook: JN4d**<br>
**Name: "testing_xgboost_regress.ipynb"**<br>

**Description:**<br> 
**Training and testing the `XGBRegressor` algorithm:**<br>
**1. Performing grid search to determine the best hyperparameters**<br>
**2. Performing cross validation to optimize the model for future foreign data**<br>
**3. Assessing the accuracy of the best model using different scorers and metrics**


**Abbrevations:**<br>
**1. NS -> Neutron Star**<br>
**2. QS -> Quark Star**<br>
**3. ML -> Machine Learning**

In [1]:
# Importing useful modules
import joblib
from data_analysis_ES_ML import *

In [None]:
# Defining the grid of hyperparameters value for the 'ExtremeGradientBoosting (XGBoost) regressor
xgboost_grid = {
    'estimator__n_estimators': [50, 100],
    'estimator__max_depth': [3, 5, 7],
    'estimator__learning_rate': [0.05, 0.1],
    'estimator__subsample': [0.7, 1.0],
    'estimator__colsample_bytree': [0.7, 1.0],
    'estimator__reg_alpha': [0.1],
    'estimator__reg_lambda': [1.0, 5.0]
}

# 1. Neutron Stars

## **1.1 Using 8 M-R points**

### A. Predicting Energy on center $E_c$ Values

#### ->Using rowwise-shuffled data (linear)

In [3]:
# Showing the datasets
# regression_ML("linNS_reg_data_pp8mr8s100_rwshuffled.csv","enrg",0.2).show_datasets()

In [8]:
# Building a regression model
regression_ML("linNS_reg_data_pp8mr8s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2).train_test("xgboost",5,xgboost_grid,"msle",cores_par=18,filesave="linNS_xgboost_grid_enrg_16X_rwsh")

TRAINING AND ASSESSING A MACHINE LEARNING REGRESSION MODEL


>Preliminaries
>> DATA INFO AND SCALING:
-------------------------------------------------------------------------------------------------------------------
Y (response) data type: "enrg"
Number of Y columns:  12
X (explanatory) data type: "Mass" and "Radius"
Number of X columns:  16
The scaling of the X (explanatory) data has been completed
-------------------------------------------------------------------------------------------------------------------
>> CROSS-VALIDATION SETTINGS:
-------------------------------------------------------------------------------------------------------------------
The KFold cross-validator has been initialized with 5 n_splits
The cross-validation scorer has been initialized with the "Mean_Squared_Log_Error" as metric
-------------------------------------------------------------------------------------------------------------------
>> ESTIMATOR INFO:
------------------------------------------

{'estimator__n_estimators': [50, 100],
 'estimator__max_depth': [3, 5, 7],
 'estimator__learning_rate': [0.05, 0.1],
 'estimator__subsample': [0.7, 1.0],
 'estimator__colsample_bytree': [0.7, 1.0],
 'estimator__reg_alpha': [0.1],
 'estimator__reg_lambda': [1.0, 5.0]}

-------------------------------------------------------------------------------------------------------------------
>> FITTING PROCEDURE OVERVIEW:
-------------------------------------------------------------------------------------------------------------------
The grid search has been initialized


Ongoing fitting process...
Fitting 5 folds for each of 96 candidates, totalling 480 fits


The fitting process has been completed
Elapsed fitting time: 2.0'57.55"
Available CPU cores: 18
-------------------------------------------------------------------------------------------------------------------
>> RESULTS:
-------------------------------------------------------------------------------------------------------------------
Best model:   MultiOutputRegressor(estimator=XGBRegressor(base_score=None, booster=None,
                                            callbacks=None,
                                            colsample_bylevel=None,
                                            colsample_bynode=None,
                                            colsample_bytree=0.7, device=None,
                                            early_stopping_rounds=None,
                                            enable_categorical=False,
                                            eval_metric=None,
                                            feature_types=None,
                             

array([[ 493.66327,  641.6864 ,  833.11285, ..., 1939.8513 , 2082.8079 ,
        2208.464  ],
       [ 216.74677,  279.45786,  350.74432, ..., 1007.48975, 1090.6816 ,
        1182.7208 ],
       [ 286.34488,  390.6052 ,  488.68073, ..., 1512.3474 , 1732.113  ,
        1818.7125 ],
       ...,
       [ 353.9394 ,  449.62973,  553.68384, ..., 1972.8875 , 2213.1292 ,
        2584.3171 ],
       [ 422.61002,  648.18896,  859.125  , ..., 2014.6012 , 2163.5923 ,
        2291.527  ],
       [ 230.24374,  314.9428 ,  401.92822, ..., 1272.6293 , 1433.292  ,
        1527.2328 ]], dtype=float32)

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(25),E_c(50),E_c(75),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800)
3358,522.587758,661.261205,793.133138,904.143641,1003.864751,1301.794279,1525.711028,1714.165910,1881.085781,2033.287282,2174.677627,2307.729503
25102,229.674954,299.164609,362.525289,407.755484,444.630868,556.406872,656.406872,756.406872,856.406872,956.406872,1056.406872,1156.406872
18087,284.859202,439.667175,529.611941,592.667261,643.331503,850.302570,1096.454738,1319.666715,1528.218645,1726.357092,1916.604708,2100.605141
3036,556.566872,854.136401,1022.501693,1138.139438,1229.479503,1489.132202,1674.216890,1824.939469,1955.322384,2072.069926,2178.956657,2279.508125
3968,431.415278,617.408129,787.136485,889.967963,962.801898,1171.997297,1323.249424,1447.800384,1556.546353,1656.942131,1756.942131,1856.942131
...,...,...,...,...,...,...,...,...,...,...,...,...
12895,276.583390,351.927089,425.270805,477.194812,519.248282,644.520181,745.128457,845.128457,945.128457,1045.128457,1145.128457,1245.128457
28192,206.135510,267.396054,346.148905,405.123725,452.610739,565.656583,665.656583,765.656583,865.656583,965.656583,1065.656583,1165.656583
6012,356.348895,452.226902,544.548056,609.196790,661.093638,813.203687,1042.957823,1505.683260,1993.675851,2501.803955,3026.676756,3565.884264
6558,380.124889,694.663632,910.909023,1014.641848,1096.772706,1331.316335,1499.564923,1637.263717,1756.879436,1864.372564,1964.800405,2064.800405


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00283167 0.00259609 0.00262376 0.00223264 0.00234507 0.00150447
 0.0014788  0.00198585 0.00286376 0.00388476 0.00487641 0.00578892]
Uniform average
0.0029176838226193974
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[  317.67966148   588.42599462   856.36438234  1017.93889966
  1265.84386534  1378.85471438  1853.15655     3267.99912456
  6149.44752099 10707.54361724 16834.79505662 24665.45873714]
Uniform average
5741.959010364027



>Prediction metrics (using the actual tes

array([[ 262.40964,  392.91653,  514.86554, ..., 1496.0773 , 1509.619  ,
        1680.277  ],
       [ 353.28513,  501.81992,  669.87604, ..., 2133.7083 , 2234.7188 ,
        2409.2742 ],
       [ 252.67555,  352.23822,  461.844  , ..., 1364.4457 , 1555.0438 ,
        1636.9966 ],
       ...,
       [ 235.57391,  318.47333,  408.0856 , ..., 1082.7181 , 1220.191  ,
        1315.1995 ],
       [ 261.05618,  329.06238,  401.79895, ..., 1102.5237 , 1270.9962 ,
        1390.6703 ],
       [ 228.38713,  361.77817,  604.42584, ..., 1829.4967 , 2000.6831 ,
        2098.0222 ]], dtype=float32)

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(25),E_c(50),E_c(75),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800)
18734,272.415878,375.439185,514.493553,643.886097,747.590727,916.066641,1040.015594,1143.951321,1243.951321,1343.951321,1443.951321,1543.951321
4705,379.824920,521.215682,665.941597,771.194974,862.628256,1278.519222,1620.910886,1925.256511,2205.288621,2468.050089,2717.724707,2957.038444
19923,266.393108,339.113488,410.032780,495.637421,585.710257,846.315342,1004.317801,1140.298409,1262.905756,1376.372981,1483.126468,1584.750560
24404,229.674954,299.164609,362.525289,407.755484,450.599692,695.822839,868.080212,968.080212,1068.080212,1168.080212,1268.080212,1368.080212
18588,272.415878,375.439185,514.493553,643.886097,756.893082,1128.987106,1437.772193,1713.786164,1968.857585,2209.052866,2437.975733,2657.974211
...,...,...,...,...,...,...,...,...,...,...,...,...
5628,356.348895,452.226902,597.567026,745.629907,869.222015,1132.155062,1331.522600,1500.433889,1650.849941,1788.625194,1917.115339,2038.444031
12800,276.583390,351.927089,425.270805,477.194812,519.248282,644.520181,745.128457,845.128457,945.128457,1045.128457,1145.128457,1245.128457
23617,229.674954,317.430826,409.188766,465.653580,506.846421,629.771799,730.045380,830.045380,930.045380,1030.045380,1130.045380,1230.045380
20876,266.393108,339.113488,410.032780,460.331165,501.127114,622.970359,723.108070,823.108070,923.108070,1023.108070,1123.108070,1223.108070


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00473057 0.00436418 0.00410003 0.00379683 0.00404811 0.00276651
 0.00280681 0.00385083 0.0057057  0.00789917 0.01044459 0.01265524]
Uniform average
0.0055973795953590975
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[  585.64413234  1120.93066385  1470.70764185  1850.45563677
  2282.21454819  2538.55538399  3458.81385703  6237.44111545
 12019.34615676 21163.04606326 34839.61381691 52291.29650175]
Uniform average
11654.838793178651



>Saving the grid search info:
The grid s

## **1.2 Using 16 M-R points**

### A. Predicting Energy on center $E_c$ Values

#### ->Using rowwise-shuffled data (linear)

In [5]:
# Showing the datasets
# regression_ML("linNS_reg_data_pp8mr16s100_rwshuffled.csv","enrg",0.2).show_datasets()

In [9]:
# Building a regression model
regression_ML("linNS_reg_data_pp8mr16s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2).train_test("xgboost",5,xgboost_grid,"msle",cores_par=18,filesave="linNS_xgboost_grid_enrg_32X_rwsh")

TRAINING AND ASSESSING A MACHINE LEARNING REGRESSION MODEL


>Preliminaries
>> DATA INFO AND SCALING:
-------------------------------------------------------------------------------------------------------------------
Y (response) data type: "enrg"
Number of Y columns:  12
X (explanatory) data type: "Mass" and "Radius"
Number of X columns:  32
The scaling of the X (explanatory) data has been completed
-------------------------------------------------------------------------------------------------------------------
>> CROSS-VALIDATION SETTINGS:
-------------------------------------------------------------------------------------------------------------------
The KFold cross-validator has been initialized with 5 n_splits
The cross-validation scorer has been initialized with the "Mean_Squared_Log_Error" as metric
-------------------------------------------------------------------------------------------------------------------
>> ESTIMATOR INFO:
------------------------------------------

{'estimator__n_estimators': [50, 100],
 'estimator__max_depth': [3, 5, 7],
 'estimator__learning_rate': [0.05, 0.1],
 'estimator__subsample': [0.7, 1.0],
 'estimator__colsample_bytree': [0.7, 1.0],
 'estimator__reg_alpha': [0.1],
 'estimator__reg_lambda': [1.0, 5.0]}

-------------------------------------------------------------------------------------------------------------------
>> FITTING PROCEDURE OVERVIEW:
-------------------------------------------------------------------------------------------------------------------
The grid search has been initialized


Ongoing fitting process...
Fitting 5 folds for each of 96 candidates, totalling 480 fits


The fitting process has been completed
Elapsed fitting time: 5.0'39.02"
Available CPU cores: 18
-------------------------------------------------------------------------------------------------------------------
>> RESULTS:
-------------------------------------------------------------------------------------------------------------------
Best model:   MultiOutputRegressor(estimator=XGBRegressor(base_score=None, booster=None,
                                            callbacks=None,
                                            colsample_bylevel=None,
                                            colsample_bynode=None,
                                            colsample_bytree=1.0, device=None,
                                            early_stopping_rounds=None,
                                            enable_categorical=False,
                                            eval_metric=None,
                                            feature_types=None,
                             

array([[ 481.1048 ,  657.9195 ,  830.5362 , ..., 2045.6249 , 2182.71   ,
        2263.6992 ],
       [ 223.62016,  293.5895 ,  368.29065, ...,  985.2913 , 1046.1017 ,
        1163.5386 ],
       [ 284.33755,  418.65762,  524.325  , ..., 1604.5958 , 1731.2059 ,
        1839.9335 ],
       ...,
       [ 325.89304,  432.34207,  572.9113 , ..., 1921.2827 , 2166.577  ,
        2489.6611 ],
       [ 382.83713,  653.14276,  903.95624, ..., 1889.5414 , 1978.4395 ,
        2172.5974 ],
       [ 241.87897,  317.64224,  390.50223, ..., 1188.4537 , 1245.6538 ,
        1359.4664 ]], dtype=float32)

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(25),E_c(50),E_c(75),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800)
3358,522.587758,661.261205,793.133138,904.143641,1003.864751,1301.794279,1525.711028,1714.165910,1881.085781,2033.287282,2174.677627,2307.729503
25102,229.674954,299.164609,362.525289,407.755484,444.630868,556.406872,656.406872,756.406872,856.406872,956.406872,1056.406872,1156.406872
18087,284.859202,439.667175,529.611941,592.667261,643.331503,850.302570,1096.454738,1319.666715,1528.218645,1726.357092,1916.604708,2100.605141
3036,556.566872,854.136401,1022.501693,1138.139438,1229.479503,1489.132202,1674.216890,1824.939469,1955.322384,2072.069926,2178.956657,2279.508125
3968,431.415278,617.408129,787.136485,889.967963,962.801898,1171.997297,1323.249424,1447.800384,1556.546353,1656.942131,1756.942131,1856.942131
...,...,...,...,...,...,...,...,...,...,...,...,...
12895,276.583390,351.927089,425.270805,477.194812,519.248282,644.520181,745.128457,845.128457,945.128457,1045.128457,1145.128457,1245.128457
28192,206.135510,267.396054,346.148905,405.123725,452.610739,565.656583,665.656583,765.656583,865.656583,965.656583,1065.656583,1165.656583
6012,356.348895,452.226902,544.548056,609.196790,661.093638,813.203687,1042.957823,1505.683260,1993.675851,2501.803955,3026.676756,3565.884264
6558,380.124889,694.663632,910.909023,1014.641848,1096.772706,1331.316335,1499.564923,1637.263717,1756.879436,1864.372564,1964.800405,2064.800405


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.0026039  0.00204075 0.00219748 0.00198182 0.00220635 0.00116405
 0.00105602 0.00144908 0.0021897  0.00308212 0.00402754 0.00484689]
Uniform average
0.0024038063960921344
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[  285.67122384   450.74739106   657.44147996   808.34381922
  1084.13654016  1056.14537335  1319.15844939  2383.20378186
  4737.68210301  8592.89750314 14182.89937178 21084.22456613]
Uniform average
4720.212633575473



>Prediction metrics (using the actual tes

array([[ 275.89886,  384.46347,  511.66406, ..., 1322.1912 , 1458.9725 ,
        1628.397  ],
       [ 386.2611 ,  545.67487,  719.738  , ..., 2214.0288 , 2394.5444 ,
        2493.0718 ],
       [ 255.34201,  352.92404,  446.309  , ..., 1515.1298 , 1667.889  ,
        1666.4211 ],
       ...,
       [ 234.936  ,  314.11807,  395.59357, ..., 1177.5887 , 1340.3473 ,
        1387.3217 ],
       [ 246.87114,  332.71466,  413.51776, ..., 1151.7083 , 1158.2906 ,
        1250.1931 ],
       [ 267.12912,  404.9013 ,  593.2627 , ..., 1832.3763 , 2054.481  ,
        2077.1816 ]], dtype=float32)

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(25),E_c(50),E_c(75),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800)
18734,272.415878,375.439185,514.493553,643.886097,747.590727,916.066641,1040.015594,1143.951321,1243.951321,1343.951321,1443.951321,1543.951321
4705,379.824920,521.215682,665.941597,771.194974,862.628256,1278.519222,1620.910886,1925.256511,2205.288621,2468.050089,2717.724707,2957.038444
19923,266.393108,339.113488,410.032780,495.637421,585.710257,846.315342,1004.317801,1140.298409,1262.905756,1376.372981,1483.126468,1584.750560
24404,229.674954,299.164609,362.525289,407.755484,450.599692,695.822839,868.080212,968.080212,1068.080212,1168.080212,1268.080212,1368.080212
18588,272.415878,375.439185,514.493553,643.886097,756.893082,1128.987106,1437.772193,1713.786164,1968.857585,2209.052866,2437.975733,2657.974211
...,...,...,...,...,...,...,...,...,...,...,...,...
5628,356.348895,452.226902,597.567026,745.629907,869.222015,1132.155062,1331.522600,1500.433889,1650.849941,1788.625194,1917.115339,2038.444031
12800,276.583390,351.927089,425.270805,477.194812,519.248282,644.520181,745.128457,845.128457,945.128457,1045.128457,1145.128457,1245.128457
23617,229.674954,317.430826,409.188766,465.653580,506.846421,629.771799,730.045380,830.045380,930.045380,1030.045380,1130.045380,1230.045380
20876,266.393108,339.113488,410.032780,460.331165,501.127114,622.970359,723.108070,823.108070,923.108070,1023.108070,1123.108070,1223.108070


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00491249 0.00393187 0.00374358 0.00352936 0.00407043 0.00255479
 0.0024438  0.00341209 0.0052511  0.00764026 0.00997449 0.01236766]
Uniform average
0.005319326622069018
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[  592.50398316  1006.08224844  1266.34127448  1577.711493
  2139.33380109  2295.3415214   2913.30724283  5291.17477861
 10703.45970239 19897.25410563 32889.67343926 50486.39618058]
Uniform average
10921.54831423706



>Saving the grid search info:
The grid searc

# 2. Quark Stars

## **2.1 Using 8 M-R points**

### A. Predicting Energy on center $E_c$ Values

#### ->Using rowwise-shuffled data

In [11]:
# Showing the datasets
# regression_ML("QS_reg_data_pp8mr8s100_rwshuffled.csv","enrg",0.2).show_datasets()

In [12]:
# Building a regression model
regression_ML("QS_reg_data_pp8mr8s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2).train_test("xgboost",5,xgboost_grid,"msle",cores_par=18,filesave="QS_xgboost_grid_enrg_16X_rwsh")

TRAINING AND ASSESSING A MACHINE LEARNING REGRESSION MODEL


>Preliminaries
>> DATA INFO AND SCALING:
-------------------------------------------------------------------------------------------------------------------
Y (response) data type: "enrg"
Number of Y columns:  12
X (explanatory) data type: "Mass" and "Radius"
Number of X columns:  16
The scaling of the X (explanatory) data has been completed
-------------------------------------------------------------------------------------------------------------------
>> CROSS-VALIDATION SETTINGS:
-------------------------------------------------------------------------------------------------------------------
The KFold cross-validator has been initialized with 5 n_splits
The cross-validation scorer has been initialized with the "Mean_Squared_Log_Error" as metric
-------------------------------------------------------------------------------------------------------------------
>> ESTIMATOR INFO:
------------------------------------------

  df = pd.read_csv(self.filename)


{'estimator__n_estimators': [50, 100],
 'estimator__max_depth': [3, 5, 7],
 'estimator__learning_rate': [0.05, 0.1],
 'estimator__subsample': [0.7, 1.0],
 'estimator__colsample_bytree': [0.7, 1.0],
 'estimator__reg_alpha': [0.1],
 'estimator__reg_lambda': [1.0, 5.0]}

-------------------------------------------------------------------------------------------------------------------
>> FITTING PROCEDURE OVERVIEW:
-------------------------------------------------------------------------------------------------------------------
The grid search has been initialized


Ongoing fitting process...
Fitting 5 folds for each of 96 candidates, totalling 480 fits


The fitting process has been completed
Elapsed fitting time: 5.0'28.44"
Available CPU cores: 18
-------------------------------------------------------------------------------------------------------------------
>> RESULTS:
-------------------------------------------------------------------------------------------------------------------
Best model:   MultiOutputRegressor(estimator=XGBRegressor(base_score=None, booster=None,
                                            callbacks=None,
                                            colsample_bylevel=None,
                                            colsample_bynode=None,
                                            colsample_bytree=1.0, device=None,
                                            early_stopping_rounds=None,
                                            enable_categorical=False,
                                            eval_metric=None,
                                            feature_types=None,
                             

array([[ 706.2533 ,  973.0018 , 1284.5938 , ..., 3385.8337 , 3696.5715 ,
        4000.4858 ],
       [ 772.7114 , 1051.2861 , 1343.9392 , ..., 3429.147  , 3743.6074 ,
        4051.5698 ],
       [ 574.6101 ,  761.26746,  967.792  , ..., 2543.193  , 2828.9873 ,
        3013.7097 ],
       ...,
       [ 387.30804,  634.5704 ,  939.7519 , ..., 2977.5767 , 3333.527  ,
        3579.0835 ],
       [ 377.77908,  594.8504 ,  849.06274, ..., 2692.773  , 2972.242  ,
        3232.5784 ],
       [ 413.46127,  675.2379 ,  982.1841 , ..., 3071.4062 , 3340.1953 ,
        3637.2332 ]], dtype=float32)

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800),E_c(900),E_c(1000),E_c(1100)
22993,740.000000,1010.000000,1310.000000,1610.000000,1910.000000,2210.000000,2510.000000,2810.000000,3110.000000,3410.000000,3710.000000,4010.000000
25215,788.000000,1058.000000,1358.000000,1658.000000,1958.000000,2258.000000,2558.000000,2858.000000,3158.000000,3458.000000,3758.000000,4058.000000
84960,585.603738,751.275503,946.422873,1150.377228,1361.233627,1577.688496,1798.803280,2023.875598,2252.363573,2483.838743,2717.955370,2954.429673
36666,1022.000000,1292.000000,1592.000000,1892.000000,2192.000000,2492.000000,2792.000000,3092.000000,3392.000000,3692.000000,3992.000000,4292.000000
57455,253.586613,399.454662,579.179473,771.717517,973.679929,1182.951297,1398.106032,1618.129325,1842.268362,2069.946400,2300.710001,2534.195001
...,...,...,...,...,...,...,...,...,...,...,...,...
12895,530.000000,800.000000,1100.000000,1400.000000,1700.000000,2000.000000,2300.000000,2600.000000,2900.000000,3200.000000,3500.000000,3800.000000
60960,448.606299,697.617208,978.872278,1262.872680,1548.679461,1835.791373,2123.903389,2412.813509,2702.379771,2992.497888,3283.088582,3574.089911
6012,392.000000,662.000000,962.000000,1262.000000,1562.000000,1862.000000,2162.000000,2462.000000,2762.000000,3062.000000,3362.000000,3662.000000
63107,371.632374,575.110491,814.020809,1061.127719,1313.815373,1570.619557,1830.622511,2093.203983,2357.921515,2624.446347,2892.526283,3161.962842


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00156815 0.00049144 0.00073611 0.00098453 0.0011308  0.00122742
 0.00126435 0.00128158 0.00128485 0.00126983 0.00123853 0.0012098 ]
Uniform average
0.0011406157401113001
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[  398.98628158   281.86629878   630.38579539  1276.18007178
  2157.43415425  3272.95859119  4518.63026131  5938.08334972
  7496.93759597  9115.55274989 10766.89505601 12501.36536895]
Uniform average
4862.93963123551



>Prediction metrics (using the actual test

array([[ 877.5192 , 1140.0173 , 1443.6364 , ..., 3560.9958 , 3828.6099 ,
        4135.832  ],
       [ 504.03897,  687.2804 ,  860.36145, ..., 2370.7627 , 2704.4287 ,
        2948.7195 ],
       [ 417.9901 ,  639.0056 ,  907.51636, ..., 2783.6125 , 3073.333  ,
        3355.1763 ],
       ...,
       [ 733.8327 , 1007.1312 , 1314.2273 , ..., 3423.5312 , 3713.773  ,
        4022.6768 ],
       [ 282.66083,  454.01807,  679.6164 , ..., 2324.9019 , 2674.832  ,
        2847.0513 ],
       [ 159.25996,  311.2515 ,  501.14197, ..., 2101.867  , 2309.54   ,
        2546.2795 ]], dtype=float32)

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800),E_c(900),E_c(1000),E_c(1100)
30455,894.000000,1164.000000,1464.000000,1764.000000,2064.000000,2364.000000,2664.000000,2964.000000,3264.000000,3564.000000,3864.000000,4164.000000
79774,510.504056,683.590590,888.096276,1101.825877,1322.531935,1548.754039,1779.479738,2013.972942,2251.678594,2492.165863,2735.092390,2980.180765
59601,402.825414,639.761740,910.281125,1185.138089,1462.843745,1742.605326,2023.941165,2306.532634,2590.155975,2874.646917,3159.880634,3445.759642
5002,372.000000,642.000000,942.000000,1242.000000,1542.000000,1842.000000,2142.000000,2442.000000,2742.000000,3042.000000,3342.000000,3642.000000
7958,432.000000,702.000000,1002.000000,1302.000000,1602.000000,1902.000000,2202.000000,2502.000000,2802.000000,3102.000000,3402.000000,3702.000000
...,...,...,...,...,...,...,...,...,...,...,...,...
86697,702.984723,895.407772,1118.459167,1348.627279,1584.240652,1824.194819,2067.714786,2314.231930,2563.314392,2814.625021,3067.894625,3322.904208
72344,469.621960,671.218624,906.579302,1149.706769,1398.333170,1651.117518,1907.188779,2165.943266,2426.942171,2689.854660,2954.423978,3220.446116
20260,682.000000,952.000000,1252.000000,1552.000000,1852.000000,2152.000000,2452.000000,2752.000000,3052.000000,3352.000000,3652.000000,3952.000000
60569,295.905026,461.884118,663.406626,876.667874,1098.262390,1326.168921,1559.071276,1796.055601,2036.456736,2279.772661,2525.613435,2773.669006


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00209718 0.00067894 0.00107965 0.00145515 0.00166509 0.00178909
 0.00186027 0.00188241 0.00188178 0.00184091 0.00180875 0.00178627]
Uniform average
0.0016521251851211065
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[  540.37583452   371.91589337   845.89333364  1780.80055927
  3060.55400056  4625.62561224  6487.35967031  8543.6300424
 10773.32520585 13002.69835605 15488.45405048 18206.53563798]
Uniform average
6977.264016387886



>Saving the grid search info:
The grid sea

## **2.2 Using 16 M-R points**

### A. Predicting Energy on center $E_c$ Values

#### ->Using rowwise-shuffled data

In [None]:
# Showing the datasets
# regression_ML("QS_reg_data_pp8mr16s100_rwshuffled.csv","enrg",0.2).show_datasets()

In [13]:
# Building a regression model
regression_ML("QS_reg_data_pp8mr16s100_rwshuffled.csv",mag_reg="enrg",test_ratio=0.2).train_test("xgboost",5,xgboost_grid,"msle",cores_par=18,filesave="QS_xgboost_grid_enrg_32X_rwsh")

TRAINING AND ASSESSING A MACHINE LEARNING REGRESSION MODEL


>Preliminaries
>> DATA INFO AND SCALING:
-------------------------------------------------------------------------------------------------------------------
Y (response) data type: "enrg"
Number of Y columns:  12
X (explanatory) data type: "Mass" and "Radius"
Number of X columns:  32
The scaling of the X (explanatory) data has been completed
-------------------------------------------------------------------------------------------------------------------
>> CROSS-VALIDATION SETTINGS:
-------------------------------------------------------------------------------------------------------------------
The KFold cross-validator has been initialized with 5 n_splits
The cross-validation scorer has been initialized with the "Mean_Squared_Log_Error" as metric
-------------------------------------------------------------------------------------------------------------------
>> ESTIMATOR INFO:
------------------------------------------

  df = pd.read_csv(self.filename)


{'estimator__n_estimators': [50, 100],
 'estimator__max_depth': [3, 5, 7],
 'estimator__learning_rate': [0.05, 0.1],
 'estimator__subsample': [0.7, 1.0],
 'estimator__colsample_bytree': [0.7, 1.0],
 'estimator__reg_alpha': [0.1],
 'estimator__reg_lambda': [1.0, 5.0]}

-------------------------------------------------------------------------------------------------------------------
>> FITTING PROCEDURE OVERVIEW:
-------------------------------------------------------------------------------------------------------------------
The grid search has been initialized


Ongoing fitting process...
Fitting 5 folds for each of 96 candidates, totalling 480 fits


The fitting process has been completed
Elapsed fitting time: 9.0'58.97"
Available CPU cores: 18
-------------------------------------------------------------------------------------------------------------------
>> RESULTS:
-------------------------------------------------------------------------------------------------------------------
Best model:   MultiOutputRegressor(estimator=XGBRegressor(base_score=None, booster=None,
                                            callbacks=None,
                                            colsample_bylevel=None,
                                            colsample_bynode=None,
                                            colsample_bytree=1.0, device=None,
                                            early_stopping_rounds=None,
                                            enable_categorical=False,
                                            eval_metric=None,
                                            feature_types=None,
                             

array([[ 736.2286 , 1007.7162 , 1323.6672 , ..., 3468.55   , 3767.7537 ,
        4058.8462 ],
       [ 754.278  , 1044.6506 , 1341.2526 , ..., 3465.8506 , 3751.5608 ,
        4061.6045 ],
       [ 568.5365 ,  758.7147 ,  960.09796, ..., 2643.576  , 2933.8308 ,
        3165.125  ],
       ...,
       [ 402.1051 ,  645.5056 ,  948.66077, ..., 3008.5947 , 3298.3113 ,
        3528.0334 ],
       [ 358.00214,  571.5511 ,  808.7052 , ..., 2661.8286 , 3030.0173 ,
        3253.1184 ],
       [ 433.11124,  676.5715 ,  954.10333, ..., 2940.4678 , 3181.7388 ,
        3460.7673 ]], dtype=float32)

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800),E_c(900),E_c(1000),E_c(1100)
22993,740.000000,1010.000000,1310.000000,1610.000000,1910.000000,2210.000000,2510.000000,2810.000000,3110.000000,3410.000000,3710.000000,4010.000000
25215,788.000000,1058.000000,1358.000000,1658.000000,1958.000000,2258.000000,2558.000000,2858.000000,3158.000000,3458.000000,3758.000000,4058.000000
84960,585.603738,751.275503,946.422873,1150.377228,1361.233627,1577.688496,1798.803280,2023.875598,2252.363573,2483.838743,2717.955370,2954.429673
36666,1022.000000,1292.000000,1592.000000,1892.000000,2192.000000,2492.000000,2792.000000,3092.000000,3392.000000,3692.000000,3992.000000,4292.000000
57455,253.586613,399.454662,579.179473,771.717517,973.679929,1182.951297,1398.106032,1618.129325,1842.268362,2069.946400,2300.710001,2534.195001
...,...,...,...,...,...,...,...,...,...,...,...,...
12895,530.000000,800.000000,1100.000000,1400.000000,1700.000000,2000.000000,2300.000000,2600.000000,2900.000000,3200.000000,3500.000000,3800.000000
60960,448.606299,697.617208,978.872278,1262.872680,1548.679461,1835.791373,2123.903389,2412.813509,2702.379771,2992.497888,3283.088582,3574.089911
6012,392.000000,662.000000,962.000000,1262.000000,1562.000000,1862.000000,2162.000000,2462.000000,2762.000000,3062.000000,3362.000000,3662.000000
63107,371.632374,575.110491,814.020809,1061.127719,1313.815373,1570.619557,1830.622511,2093.203983,2357.921515,2624.446347,2892.526283,3161.962842


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.0011987  0.00038251 0.00062869 0.00084786 0.00098107 0.00105994
 0.00110749 0.00109798 0.00110332 0.00109554 0.00106923 0.00104292]
Uniform average
0.0009679377990218642
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[  294.44387048   212.46808196   521.11590643  1096.88785222
  1878.46523105  2843.27111858  3975.10587592  5125.56243893
  6476.29095792  7919.9501585   9342.0550522  10848.9608822 ]
Uniform average
4211.214785532428



>Prediction metrics (using the actual tes

array([[ 932.29926, 1200.1312 , 1501.4347 , ..., 3590.5251 , 3892.3032 ,
        4184.2915 ],
       [ 489.4748 ,  669.9615 ,  886.147  , ..., 2518.4534 , 2758.5964 ,
        2953.9858 ],
       [ 398.2568 ,  627.50555,  901.1772 , ..., 2818.1724 , 3100.0662 ,
        3390.0764 ],
       ...,
       [ 725.9987 ,  977.0255 , 1283.879  , ..., 3424.8835 , 3733.1216 ,
        4026.9646 ],
       [ 289.7096 ,  458.8671 ,  663.1487 , ..., 2360.6519 , 2625.2063 ,
        3013.047  ],
       [ 159.36995,  306.7691 ,  482.96332, ..., 1975.1124 , 2284.8962 ,
        2526.647  ]], dtype=float32)

Actual values of "enrg"


Unnamed: 0,E_c(10),E_c(100),E_c(200),E_c(300),E_c(400),E_c(500),E_c(600),E_c(700),E_c(800),E_c(900),E_c(1000),E_c(1100)
30455,894.000000,1164.000000,1464.000000,1764.000000,2064.000000,2364.000000,2664.000000,2964.000000,3264.000000,3564.000000,3864.000000,4164.000000
79774,510.504056,683.590590,888.096276,1101.825877,1322.531935,1548.754039,1779.479738,2013.972942,2251.678594,2492.165863,2735.092390,2980.180765
59601,402.825414,639.761740,910.281125,1185.138089,1462.843745,1742.605326,2023.941165,2306.532634,2590.155975,2874.646917,3159.880634,3445.759642
5002,372.000000,642.000000,942.000000,1242.000000,1542.000000,1842.000000,2142.000000,2442.000000,2742.000000,3042.000000,3342.000000,3642.000000
7958,432.000000,702.000000,1002.000000,1302.000000,1602.000000,1902.000000,2202.000000,2502.000000,2802.000000,3102.000000,3402.000000,3702.000000
...,...,...,...,...,...,...,...,...,...,...,...,...
86697,702.984723,895.407772,1118.459167,1348.627279,1584.240652,1824.194819,2067.714786,2314.231930,2563.314392,2814.625021,3067.894625,3322.904208
72344,469.621960,671.218624,906.579302,1149.706769,1398.333170,1651.117518,1907.188779,2165.943266,2426.942171,2689.854660,2954.423978,3220.446116
20260,682.000000,952.000000,1252.000000,1552.000000,1852.000000,2152.000000,2452.000000,2752.000000,3052.000000,3352.000000,3652.000000,3952.000000
60569,295.905026,461.884118,663.406626,876.667874,1098.262390,1326.168921,1559.071276,1796.055601,2036.456736,2279.772661,2525.613435,2773.669006


-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED LOG ERROR (MSLE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[0.00175893 0.0006154  0.00110131 0.00147665 0.0016911  0.00181058
 0.00185116 0.00185543 0.00185403 0.00182131 0.00179907 0.00175372]
Uniform average
0.0016157229373648734
-------------------------------------------------------------------------------------------------------------------
>> MEAN SQUARED ERROR (MSE) RESULTS:
-------------------------------------------------------------------------------------------------------------------
Raw values
[  429.57787014   295.87617061   796.59694411  1750.92126939
  3034.44222193  4597.88718703  6374.83816993  8309.39433026
 10499.01300831 12753.2694217  15252.34437735 17702.17081586]
Uniform average
6816.360982217268



>Saving the grid search info:
The grid se