# Initial FeO prediction

Start by importing MagmaPEC and MagmaPandas (or regular Pandas)

In [1]:
import MagmaPEC as mpc
import MagmaPandas as mp

Import your melt or whole-rock data

In [4]:
wholerock_file = "./data/wholerock.csv"

wholerock = mp.read_melt(wholerock_file, index_col=["name"])
wholerock.head()

Unnamed: 0_level_0,SiO2,TiO2,Al2O3,MnO,MgO,CaO,Na2O,K2O,P2O5,Cr2O3,FeO,total
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
PI032,46.962818,3.447381,16.650124,0.174618,7.004558,9.400741,3.131253,1.290536,0.635346,0.00963,11.350806,100.057812
PI041,46.292709,3.553973,17.191874,0.170671,6.221699,9.443416,2.70111,1.076079,0.610387,0.007262,10.986336,98.255517
PI053,47.515736,3.084096,15.182294,0.169948,8.250577,9.640357,2.298223,1.013332,0.448063,0.009225,10.955438,98.567288
PI054,47.524357,3.369594,16.456587,0.174671,6.569622,9.216309,2.653994,1.367337,0.648534,0.02954,10.95366,98.964204
PI055,46.452168,3.287018,16.039806,0.169084,7.39606,9.469899,2.035204,1.072211,0.539792,0.011676,10.896147,97.369065


Create the FeOi prediction object and initialise it with your data. Make sure to remove the FeO column from the melt compositions used to predict FeOi. 

In [25]:
x = wholerock.drop(columns=["FeO"])
FeOi_predict = mpc.FeOi_prediction(x=x, FeO=wholerock["FeO"])

Select the columns in *x* that you do not want to use to predict initial FeO contents. Here we do not want to use minor elements and the totals column:

In [16]:
do_not_use = ["MnO", "P2O5", "Cr2O3", "total"]

The *calculate_model_fits* method calculates best-fit multiple linear regressions for all possible combinations of elements in *x*. For each new regression, the element whose removal results in the lowest regression F-test p-value is removed from the dataset.

It returns a dataframe with fitted coefficients and misfit statistics (RMSE, cross-validated RMSE and R<sup>2</sup>).

In [18]:
model_fits = FeOi_predict.calculate_model_fits(exclude=do_not_use)
model_fits

Unnamed: 0,intercept,SiO2,TiO2,Al2O3,MgO,CaO,Na2O,K2O,RMSE,CV-RMSE,deltaRMSE,r2
6,24.162353,-0.18445,1.085716,-0.337465,,-0.292231,0.344686,-0.579578,0.151416,0.278102,0.126686,0.930966
5,21.878478,-0.14473,1.179186,-0.293183,,-0.29675,,-0.300763,0.165387,0.273553,0.108166,0.91764
4,22.442147,-0.168107,1.114827,-0.289476,,-0.26028,,,0.171032,0.232349,0.061317,0.911921
3,11.585125,,1.463365,-0.230722,,-0.169262,,,0.206434,0.237533,0.0311,0.871685
2,9.213242,,1.640962,-0.217524,,,,,0.253769,0.294748,0.040978,0.806092
1,5.357205,,1.725906,,,,,,0.31943,0.35567,0.03624,0.692766


RMSE and R<sup>2</sup> are both calculated on the entire calibration dataset and indicate how good the model is at predicting FeO. For RMSE lower values are better, while for R<sup>2</sup> values close to 1 are best. To check for overfitting, cross-validated RMSE's (CV-RMSE) are also calculated, where large differences between RMSE and CV-RMSE (deltaRMSE) can indicate overfitting issues. As long as RMSE and R\ :sup:`2` values are acceptable, the model with the smallest deltaRMSE should be selected. Here, models 1, 2 and 3 have similar RMSE, deltaRMSE and R<sup>2</sup> values and any of these models would work well for predicting melt FeO contents. deltaRMSE increases in models 4, 5 and 6 and overfitting might be an issue here.

Next we use the results for the previous step to select our model. Here we use model 3, where TiO2, Al2O3 and CaO are used as predictors. In the *select_predictor* method, you pass your preferred model number to the *idx* parameter.

In [19]:
FeOi_predict.select_predictors(idx=3)

We can check if the right predictors are used with the *predictors* attribute

In [21]:
FeOi_predict.predictors

array(['TiO2', 'Al2O3', 'CaO'], dtype=object)

Coefficients of the linear regression were automatically calculated by the *select_predictors* method and we can access these with the *intercept*, *intercept_error*, *slopes* and *slopes_errors* attributes.

Fitted coefficients:

In [22]:
FeOi_predict.intercept, FeOi_predict.slopes

(11.585124688327738,
 TiO2     1.463365
 Al2O3   -0.230722
 CaO     -0.169262
 dtype: float64)

and their errors as standard deviations:

In [23]:
FeOi_predict.intercept_error, FeOi_predict.slopes_error

(1.2032554136110751,
 TiO2     0.164396
 Al2O3    0.048417
 CaO      0.049363
 dtype: float64)

The *random_sample_coefficients* method randomly samples fitted coefficients within their errors and calculates the matching x-intercept value. This method is used internally in the Monte Carlo PEC correction model to propagate FeOi prediction errors.

In [24]:
FeOi_predict.random_sample_coefficients(n=5)

Unnamed: 0,TiO2,Al2O3,CaO,intercept
0,1.430664,-0.171509,-0.090637,9.96875
1,1.369141,-0.217407,-0.078552,10.804688
2,1.412109,-0.247437,-0.108765,11.453125
3,1.549805,-0.202271,-0.215332,11.28125
4,1.334961,-0.185547,-0.065186,10.257812
