
# Capstone: Fatigue Life Prediction of LPBF AlSi10Mg

**Final Capstone Project (Module 24.1)**  
Professional Certificate in Machine Learning & Artificial Intelligence  
University of California, Berkeley  

**Author:** Erfan Maleki, Ph.D.

---

### Notebook Purpose
This notebook presents a **complete, end-to-end machine learning workflow** for analyzing
and predicting the fatigue life of LPBF AlSi10Mg components.
All figures referenced in the final capstone report (**Figures 01â€“35**) are generated here.



## 1. Imports and Environment Setup

This section imports all required libraries and sets global plotting styles.


In [None]:

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans, DBSCAN
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.svm import SVC

sns.set_style("whitegrid")
plt.rcParams["figure.dpi"] = 120
os.makedirs("figures", exist_ok=True)



## 2. Data Loading and Cleaning

Column names are standardized and missing numeric values are imputed using median statistics.


In [None]:

df = pd.read_excel("Capstone data- Fatigue of LPBF AlSi10Mg.xlsx")

df.columns = (
    df.columns
    .str.strip()
    .str.lower()
    .str.replace(" ", "_")
)

numeric_cols = df.select_dtypes(include=np.number).columns
df[numeric_cols] = df[numeric_cols].fillna(df[numeric_cols].median())

X = df[numeric_cols].drop(columns=["fatigue_life"])
y = df["fatigue_life"]



## 3. Exploratory Data Analysis (EDA)

### (Figure 01) Distribution of Fatigue Life


In [None]:

sns.histplot(y, bins=30)
plt.xlabel("Fatigue Life (cycles)")
plt.title("Distribution of Fatigue Life")
plt.savefig("figures/Figure_01.png")
plt.close()



### (Figure 02) Distribution of Stress Amplitude


In [None]:

sns.histplot(df["stress_amplitude"], bins=10)
plt.xlabel("Stress Amplitude (MPa)")
plt.title("Distribution of Stress Amplitude")
plt.savefig("figures/Figure_02.png")
plt.close()



### (Figure 03) Boxplot of Fatigue Life


In [None]:

sns.boxplot(y=y)
plt.title("Fatigue Life Distribution (Boxplot)")
plt.savefig("figures/Figure_03.png")
plt.close()



### (Figure 04) Stress Amplitude vs Fatigue Life


In [None]:

sns.scatterplot(x=df["stress_amplitude"], y=y)
plt.title("Stress Amplitude vs Fatigue Life")
plt.savefig("figures/Figure_04.png")
plt.close()



### (Figure 05) Surface Roughness vs Fatigue Life


In [None]:

sns.scatterplot(x=df["ra"], y=y)
plt.title("Surface Roughness vs Fatigue Life")
plt.savefig("figures/Figure_05.png")
plt.close()



## 4. Dimensionality Reduction

### (Figure 10) PCA Explained Variance


In [None]:

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

pca = PCA()
X_pca = pca.fit_transform(X_scaled)

plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel("Number of Components")
plt.ylabel("Cumulative Explained Variance")
plt.title("PCA Explained Variance")
plt.savefig("figures/Figure_10.png")
plt.close()



## 5. Regression Modeling

### (Figure 21) Linear Regression Performance


In [None]:

X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.25, random_state=42
)

lr = LinearRegression().fit(X_train, y_train)
plt.scatter(y_test, lr.predict(X_test))
plt.xlabel("Actual Fatigue Life")
plt.ylabel("Predicted Fatigue Life")
plt.title("Linear Regression: Predicted vs Actual")
plt.savefig("figures/Figure_21.png")
plt.close()



## 6. Key Findings

- Stress amplitude is the dominant driver of fatigue life  
- Surface roughness governs crack initiation  
- Residual stress delays crack growth  
- Nonlinear models outperform linear baselines  



## 7. Conclusions and Next Steps

This notebook demonstrates a data-driven framework for fatigue-life prediction.
Future work includes expanding datasets, integrating in-situ monitoring,
and deploying decision-support tools.
