In [27]:
import pandas as pd
import numpy as np

from scipy.stats import norm

from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import Matern, RBF, RationalQuadratic
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

In [None]:
# Load the data from the Excel file
df = pd.read_excel(r'C:\Users\uqkmuroi\Desktop\count encode and vanillate tot.xlsx')

In [19]:
#specify the feature column(x) and target column(y)
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

In [None]:
# Split into training and testing data (90:10 split)
'''
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.1, random_state=42)
'''

In [20]:
# setup K-fold cross validation
kf = KFold(n_splits=5, shuffle=True, random_state=42)

In [43]:
model = GaussianProcessRegressor(kernel= Matern(length_scale=1.0, nu=0.01), alpha = 1e-4, random_state = 42)

# Store metrics (e.g., Mean Squared Error and R²) for each fold
mse_list = []
r2_list = []

# Perform K-fold cross val
for train_index, test_index in kf.split(X):
    # split data
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    #fit model on train
    model.fit(X_train, y_train)

    #predict on test data
    y_pred = model.predict(X_test)

    # calculate performance metrics
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)

    #store results
    mse_list.append(mse)
    r2_list.append(r2)

# Print the results for all folds
print(f"Mean Squared Error for each fold: {mse_list}")
print(f"R² score for each fold: {r2_list}")

# Calculate and print the average performance metrics across all folds
print(f"Average Mean Squared Error: {np.mean(mse_list)}")
print(f"Average R² score: {np.mean(r2_list)}")

Mean Squared Error for each fold: [np.float64(2219543.8925850224), np.float64(1731155.6128639798), np.float64(3484092.70193188), np.float64(1703488.6541680393), np.float64(1968691.5350832194)]
R² score for each fold: [0.03581308477260792, 0.06872823414253004, -0.2983630816561318, 0.16757513888236686, 0.05306982379213987]
Average Mean Squared Error: 2221394.479326428
Average R² score: 0.00536463998670258


The expression you've provided appears to be:

\[
\text{sgn}(\nu_{\text{ref}}) \propto \frac{\Delta G_r}{RT}
\]

Here, the **sgn** function denotes the **sign function**, as previously mentioned. In this case, it likely indicates the direction of a quantity, \(\nu_{\text{ref}}\), which could be a reference velocity, reaction rate, or some other vectorial quantity. The sign function returns:

- **+1** if \(\nu_{\text{ref}} > 0\),
- **0** if \(\nu_{\text{ref}} = 0\),
- **-1** if \(\nu_{\text{ref}} < 0\).

### Explanation of the other terms:

- **\(\Delta G_r\)**: This is the **Gibbs free energy change** for a reaction. It reflects the spontaneity of a process: if \(\Delta G_r < 0\), the process is spontaneous, and if \(\Delta G_r > 0\), the process is non-spontaneous.
- **R**: The **universal gas constant**, typically \(8.314 \, \text{J/mol·K}\).
- **T**: The **temperature**, typically in kelvins (K).

### What does the expression mean?

The expression suggests that the **sign of the reference value** (\(\nu_{\text{ref}}\)) is related to the ratio of **Gibbs free energy change** (\(\Delta G_r\)) and the product of **temperature** and **gas constant** (RT). In other words, the sign of \(\nu_{\text{ref}}\) (whether it’s positive, zero, or negative) is determined by the sign of the Gibbs free energy change relative to temperature.

- If \(\Delta G_r\) is negative (spontaneous), the sign of \(\nu_{\text{ref}}\) would be **positive**.
- If \(\Delta G_r\) is positive (non-spontaneous), the sign of \(\nu_{\text{ref}}\) would be **negative**.
- If \(\Delta G_r\) is zero, \(\nu_{\text{ref}}\) could be zero as well.

Thus, this expression provides a relationship between thermodynamic spontaneity (via \(\Delta G_r\)) and the direction or behavior of a process as represented by the sign of \(\nu_{\text{ref}}\).