# Near-infrared (NIR) spectroscopy

## Description

This Jupyter notebook explores the application of machine learning techniques to predict the Module of Elasticity (MOE) of wood samples using a dataset obtained through near-infrared (NIR) spectroscopy. The goal is to assess the feasibility of leveraging non-destructive and cost-effective NIR measurements as predictors, circumventing the need for the traditional, time-consuming, and expensive MOE testing process.

## Imports

In [236]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score
import tensorflow as tf
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestRegressor

## Prepare dataset

In [237]:
df = pd.read_csv('Dataset.csv')

## Seperate features and target variable

In [238]:
X = df.iloc[:, :-1].values  # X1-X692
y = df.iloc[:, -1].values   # Module of Elasticity (MOE)

## Split dataset into training and testing sets

In [239]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Feature scalling

In [240]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## Regular ML Methods without PCA

### Linear/Polynomial regression

In [242]:
poly = PolynomialFeatures(degree=2)
X_train_poly = poly.fit_transform(X_train_scaled)
X_test_poly = poly.transform(X_test_scaled)

linear_reg = LinearRegression()
linear_reg.fit(X_train_scaled, y_train)
y_train_pred = linear_reg.predict(X_train_scaled)
y_test_pred = linear_reg.predict(X_test_scaled)

In [243]:
# Evaluate performance
print("Linear Regression without PCA:")
print("R-Squared (Train):", r2_score(y_train, y_train_pred))
print("R-Squared (Test):", r2_score(y_test, y_test_pred))

Linear Regression without PCA:
R-Squared (Train): 0.999261537443521
R-Squared (Test): -3.3054466735925875e+19


### Support Vector Regression (SVR)

In [244]:
svr_reg = SVR(kernel='linear')
svr_reg.fit(X_train_scaled, y_train)
y_train_pred_svr = svr_reg.predict(X_train_scaled)
y_test_pred_svr = svr_reg.predict(X_test_scaled)

In [None]:
# Evaluate performance
print("\nSupport Vector Regression without PCA:")
print("R-Squared (Train):", r2_score(y_train, y_train_pred_svr))
print("R-Squared (Test):", r2_score(y_test, y_test_pred_svr))


Support Vector Regression without PCA:
R-Squared (Train): 0.7681718441930887
R-Squared (Test): 0.4890444061345952


### Decision Tree regression

In [None]:
tree_reg = DecisionTreeRegressor(random_state=42)
tree_reg.fit(X_train_scaled, y_train)
y_train_pred_tree = tree_reg.predict(X_train_scaled)
y_test_pred_tree = tree_reg.predict(X_test_scaled)

In [None]:
# Evaluate performance
print("\nDecision Tree Regression without PCA:")
print("R-Squared (Train):", r2_score(y_train, y_train_pred_tree))
print("R-Squared (Test):", r2_score(y_test, y_test_pred_tree))


Decision Tree Regression without PCA:
R-Squared (Train): 0.999287269693166
R-Squared (Test): 0.1136546657592965


## Random Forest without PCA

In [None]:
rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)
rf_regressor.fit(X_train_scaled, y_train)
y_train_pred_rf = rf_regressor.predict(X_train_scaled)
y_test_pred_rf = rf_regressor.predict(X_test_scaled)

In [None]:
r2_train_rf = r2_score(y_train, y_train_pred_rf)
r2_test_rf = r2_score(y_test, y_test_pred_rf)

print("Random Forest Regression:")
print("R-Squared (Train):", r2_train_rf)
print("R-Squared (Test):", r2_test_rf)

Random Forest Regression:
R-Squared (Train): 0.948756193097532
R-Squared (Test): 0.5680383908731111


## ANN without PCA

### Initializing the ANN

In [None]:
ann = tf.keras.models.Sequential()

### Adding the input layer

In [None]:
ann.add(tf.keras.layers.Input(shape=(692,)))  # Input layer for 692 sensors

### Adding two hidden layers

In [None]:
ann.add(tf.keras.layers.Dense(128, activation='relu'))
ann.add(tf.keras.layers.Dropout(0.3))
ann.add(tf.keras.layers.Dense(64, activation='relu'))
ann.add(tf.keras.layers.Dropout(0.3))

### Adding the output layer

In [None]:
ann.add(tf.keras.layers.Dense(units=1, activation='linear'))  # Linear activation for regression

### Compiling the ANN

In [None]:
ann.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss='mean_squared_error')

### Training the ANN

In [None]:
ann.fit(X_train_scaled, y_train, epochs=70, batch_size=32, validation_data=(X_test_scaled, y_test))

Epoch 1/70


Epoch 2/70
Epoch 3/70
Epoch 4/70
Epoch 5/70
Epoch 6/70
Epoch 7/70
Epoch 8/70
Epoch 9/70
Epoch 10/70
Epoch 11/70
Epoch 12/70
Epoch 13/70
Epoch 14/70
Epoch 15/70
Epoch 16/70
Epoch 17/70
Epoch 18/70
Epoch 19/70
Epoch 20/70
Epoch 21/70
Epoch 22/70
Epoch 23/70
Epoch 24/70
Epoch 25/70
Epoch 26/70
Epoch 27/70
Epoch 28/70
Epoch 29/70
Epoch 30/70
Epoch 31/70
Epoch 32/70
Epoch 33/70
Epoch 34/70
Epoch 35/70
Epoch 36/70
Epoch 37/70
Epoch 38/70
Epoch 39/70
Epoch 40/70
Epoch 41/70
Epoch 42/70
Epoch 43/70
Epoch 44/70
Epoch 45/70
Epoch 46/70
Epoch 47/70
Epoch 48/70
Epoch 49/70
Epoch 50/70
Epoch 51/70
Epoch 52/70
Epoch 53/70
Epoch 54/70
Epoch 55/70
Epoch 56/70
Epoch 57/70
Epoch 58/70
Epoch 59/70
Epoch 60/70
Epoch 61/70
Epoch 62/70
Epoch 63/70
Epoch 64/70
Epoch 65/70
Epoch 66/70
Epoch 67/70
Epoch 68/70
Epoch 69/70
Epoch 70/70


<keras.src.callbacks.History at 0x7fcfa70d6490>

### Evaluate the ANN

In [None]:
y_train_pred = ann.predict(X_train_scaled)
y_test_pred = ann.predict(X_test_scaled)



In [None]:
# Evaluate performance
r2_train = r2_score(y_train, y_train_pred)
r2_test = r2_score(y_test, y_test_pred)

print("\nANN Regression without PCA:")
print("R-Squared (Train):", r2_train)
print("R-Squared (Test):", r2_test)


ANN Regression without PCA:
R-Squared (Train): -0.21013493123434457
R-Squared (Test): -1.0518978045835317


## Perform PCA

In [None]:
pca = PCA(n_components=10)
X_pca = pca.fit_transform(X)

Check preserved data

In [None]:
explained_variance_ratio = pca.explained_variance_ratio_
info_preserved = sum(explained_variance_ratio[:10]) * 100
print(f'Percentage of information preserved: {info_preserved:.2f}%')

Percentage of information preserved: 97.56%


## Split dataset into training and testing sets POST PCA

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X_pca, y, test_size=0.2, random_state=42)

## Feature scalling POST PCA

In [None]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## Regular ML Methods with PCA

### Linear/Polynomial regression

In [None]:
linear_reg = LinearRegression()
linear_reg.fit(X_train_scaled, y_train)
y_train_pred = linear_reg.predict(X_train_scaled)
y_test_pred = linear_reg.predict(X_test_scaled)

In [None]:
# Evaluate performance
print("Linear Regression with PCA:")
print("R-Squared (Train):", r2_score(y_train, y_train_pred))
print("R-Squared (Test):", r2_score(y_test, y_test_pred))

Linear Regression with PCA:
R-Squared (Train): 0.58711343998836
R-Squared (Test): 0.4661807873411422


### Support Vector Regression (SVR)

In [None]:
svr_reg = SVR(kernel='linear')
svr_reg.fit(X_train_scaled, y_train)
y_train_pred_svr = svr_reg.predict(X_train_scaled)
y_test_pred_svr = svr_reg.predict(X_test_scaled)

In [None]:
# Evaluate performance
print("\nSupport Vector Regression with PCA:")
print("R-Squared (Train):", r2_score(y_train, y_train_pred_svr))
print("R-Squared (Test):", r2_score(y_test, y_test_pred_svr))


Support Vector Regression with PCA:
R-Squared (Train): 0.5862756200153221
R-Squared (Test): 0.46531346694390485


### Decision Tree regression

In [None]:
tree_reg = DecisionTreeRegressor(random_state=42)
tree_reg.fit(X_train_scaled, y_train)
y_train_pred_tree = tree_reg.predict(X_train_scaled)
y_test_pred_tree = tree_reg.predict(X_test_scaled)

In [None]:
# Evaluate performance
print("\nDecision Tree Regression with PCA:")
print("R-Squared (Train):", r2_score(y_train, y_train_pred_tree))
print("R-Squared (Test):", r2_score(y_test, y_test_pred_tree))


Decision Tree Regression with PCA:
R-Squared (Train): 0.999287269693166
R-Squared (Test): -0.0110425423249505


## ANN with PCA

### Initializing the ANN

In [None]:
ann = tf.keras.models.Sequential()

### Adding the input layer

In [None]:
ann.add(tf.keras.layers.Input(shape=(10,)))  # Input layer for 10 sensors

### Adding one hidden layer

In [None]:
ann.add(tf.keras.layers.Dense(4, activation='relu'))
ann.add(tf.keras.layers.Dropout(0.3))

### Adding the output layer

In [None]:
ann.add(tf.keras.layers.Dense(units=1, activation='linear'))  # Linear activation for regression

### Compiling the ANN

In [None]:
ann.compile(optimizer='adam', loss='mean_squared_error')  # Use mean squared error for regression

### Training the ANN

In [None]:
ann.fit(X_train_scaled, y_train, epochs=50, batch_size=32, validation_data=(X_test_scaled, y_test))

Epoch 1/50


Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.src.callbacks.History at 0x7fcfaa0bf3d0>

### Evaluate the ANN

In [None]:
y_train_pred = ann.predict(X_train_scaled)
y_test_pred = ann.predict(X_test_scaled)



In [None]:
# Evaluate performance
r2_train = r2_score(y_train, y_train_pred)
r2_test = r2_score(y_test, y_test_pred)

print("\nANN Regression with PCA:")
print("R-Squared (Train):", r2_train)
print("R-Squared (Test):", r2_test)


ANN Regression with PCA:
R-Squared (Train): -98.94416579326783
R-Squared (Test): -121.4200934549632
