# Capstone: Fatigue Life Prediction of LPBF AlSi10Mg

**Final Capstone Project (Module 24.1)**  
Professional Certificate in Machine Learning & Artificial Intelligence  
University of California, Berkeley  

**Author:** Erfan Maleki, Ph.D.

---

## 1. Introduction & Problem Statement
Fatigue failure is a key barrier to widespread adoption of additively manufactured (AM) metals in safety-critical applications. This notebook presents a complete, end-to-end machine learning workflow to analyze and predict the fatigue life of LPBF AlSi10Mg as a function of surface post-processing, residual stress state, and mechanical properties.

## 2. Imports and Environment Setup

In [None]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans, DBSCAN
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import r2_score, mean_squared_error

sns.set_style("whitegrid")
plt.rcParams['figure.dpi'] = 120
os.makedirs("figures", exist_ok=True)

## 3. Data Loading and Cleaning
Numeric features are cleaned and missing values are imputed using median statistics.

In [None]:
df = pd.read_excel("Capstone data- Fatigue of LPBF AlSi10Mg.xlsx")
df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_')

numeric_cols = df.select_dtypes(include=np.number).columns
df[numeric_cols] = df[numeric_cols].fillna(df[numeric_cols].median())

X = df[numeric_cols].drop(columns=['fatigue_life'])
y = df['fatigue_life']

## 4. Exploratory Data Analysis (EDA)
### (Figures 01–09)

In [None]:
plt.hist(y, bins=30)
plt.xlabel("Fatigue Life (cycles)")
plt.title("Figure 01 – Distribution of Fatigue Life")
plt.savefig("figures/Figure_01.png")
plt.close()

## 5. Dimensionality Reduction
### 5.1 Principal Component Analysis (PCA)
**(Figures 10–11)**

In [None]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

pca = PCA()
X_pca = pca.fit_transform(X_scaled)

plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel("Number of Components")
plt.ylabel("Cumulative Explained Variance")
plt.title("Figure 10 – PCA Explained Variance")
plt.savefig("figures/Figure_10.png")
plt.close()

## 6. Regression Modeling
### 6.1 Linear Regression (Baseline)
**(Figure 21)**

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.25, random_state=42)

lr = LinearRegression()
lr.fit(X_train, y_train)
y_pred_lr = lr.predict(X_test)

plt.scatter(y_test, y_pred_lr)
plt.xlabel("Actual Fatigue Life")
plt.ylabel("Predicted Fatigue Life")
plt.title("Figure 21 – Linear Regression: Predicted vs Actual")
plt.savefig("figures/Figure_21.png")
plt.close()

print("Linear Regression R²:", r2_score(y_test, y_pred_lr))

## 7. Nonlinear Modeling
### 7.1 Random Forest Regression
**(Figures 22 & 25)**

In [None]:
rf = RandomForestRegressor(n_estimators=200, random_state=42)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)

plt.scatter(y_test, y_pred_rf)
plt.xlabel("Actual Fatigue Life")
plt.ylabel("Predicted Fatigue Life")
plt.title("Figure 22 – Random Forest: Predicted vs Actual")
plt.savefig("figures/Figure_22.png")
plt.close()

importances = rf.feature_importances_
plt.barh(X.columns, importances)
plt.title("Figure 25 – Random Forest Feature Importance")
plt.savefig("figures/Figure_25.png")
plt.close()

print("Random Forest R²:", r2_score(y_test, y_pred_rf))

## 8. Model Evaluation and Cross-Validation
**(Figure 33)**

R² is selected as the primary evaluation metric because it quantifies the proportion of variance in fatigue life explained by the model.

In [None]:
cv_scores = cross_val_score(rf, X_scaled, y, cv=5, scoring='r2')
plt.hist(cv_scores)
plt.xlabel("Cross-Validated R²")
plt.title("Figure 33 – Cross-Validated R² Distribution")
plt.savefig("figures/Figure_33.png")
plt.close()

## 9. Hyperparameter Optimization
**(Figure 35)**

In [None]:
param_grid = {
    'n_estimators': [100, 200],
    'max_depth': [None, 10, 20]
}

grid = GridSearchCV(RandomForestRegressor(random_state=42), param_grid, cv=3, scoring='r2')
grid.fit(X_train, y_train)

print("Best RF Parameters:", grid.best_params_)

## 10. Key Findings
- Stress amplitude is the dominant driver of fatigue life
- Surface roughness governs crack initiation
- Residual compressive stress delays crack growth
- Nonlinear models significantly outperform linear baselines

## 11. Conclusions and Next Steps
This notebook demonstrates a reproducible, data-driven framework for fatigue-life prediction. Future work includes expanding datasets, integrating in-situ monitoring, and deploying decision-support tools.