<a href="https://colab.research.google.com/github/jagadesh2006/Fatigue_Strength_ML/blob/main/Fatigue_Strength_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <b>Introduction</b>

Fatigue failure is a critical concern in structural and mechanical engineering, accounting for over 90% of all mechanical failures in metals. The fatigue strength of steel—the maximum stress a material can endure for a given number of cycles without failing—depends on its chemical composition, heat treatment processes, and microstructural properties. Accurately predicting fatigue strength can help engineers design safer and more durable components for industries like aerospace, automotive, and construction.

<b>Fatigue Strength :</b> The Maximum Stress a material can withstand for a specified number of cycles before it fails under repeated loading conditions

# <b>Data Description</b>

This dataset contains various experimental conditions during steel preparation, involving features of:

<p>• Chemical composition - %C, %Si, %Mn, %P, %S, %Ni, %Cr, %Cu, %Mo (all in wt. %)</p>

<p>• Upstream processing details - ingot size, reduction ratio, non-metallic inclusions.</p>

<p>• Heat treatment conditions - temperature, time and other process conditions for normalizing, through-hardening, carburizing-quenching and tempering processes.</p>

<p>• Mechanical properties - fatigue strength.</p>

<table border="1" cellpadding="5" cellspacing="0">
  <thead>
    <tr>
      <th>Abbreviation</th>
      <th>Property Details</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>C</td>
      <td>% Carbon</td>
    </tr>
    <tr>
      <td>Si</td>
      <td>% Silicon</td>
    </tr>
    <tr>
      <td>Mn</td>
      <td>% Manganese</td>
    </tr>
    <tr>
      <td>P</td>
      <td>% Phosphorus</td>
    </tr>
    <tr>
      <td>S</td>
      <td>% Sulphur</td>
    </tr>
    <tr>
      <td>Ni</td>
      <td>% Nickel</td>
    </tr>
    <tr>
      <td>Cr</td>
      <td>% Chromium</td>
    </tr>
    <tr>
      <td>Cu</td>
      <td>% Copper</td>
    </tr>
    <tr>
      <td>Mo</td>
      <td>% Molybdenum</td>
    </tr>
    <tr>
      <td>NT</td>
      <td>Normalizing Temperature</td>
    </tr>
    <tr>
      <td>THT</td>
      <td>Through Hardening Temperature</td>
    </tr>
    <tr>
      <td>THt</td>
      <td>Through Hardening Time</td>
    </tr>
    <tr>
      <td>THQCr</td>
      <td>Cooling Rate for Through Hardening</td>
    </tr>
    <tr>
      <td>CT</td>
      <td>Carburization Temperature</td>
    </tr>
    <tr>
      <td>Ct</td>
      <td>Carburization Time</td>
    </tr>
    <tr>
      <td>DT</td>
      <td>Diffusion Temperature</td>
    </tr>
    <tr>
      <td>Dt</td>
      <td>Diffusion time</td>
    </tr>
    <tr>
      <td>QmT</td>
      <td>Quenching Media Temperature (for Carburization)</td>
    </tr>
    <tr>
      <td>TT</td>
      <td>Tempering Temperature</td>
    </tr>
    <tr>
      <td>Tt</td>
      <td>Tempering Time</td>
    </tr>
    <tr>
      <td>TCr</td>
      <td>Cooling Rate for Tempering</td>
    </tr>
    <tr>
      <td>RedRatio</td>
      <td>Reduction Ratio (Ingot to Bar)</td>
    </tr>
    <tr>
      <td>dA</td>
      <td>Area Proportion of Inclusions Deformed by Plastic Work</td>
    </tr>
    <tr>
      <td>dB</td>
      <td>Area Proportion of Inclusions Occurring in Discontinuous Array</td>
    </tr>
    <tr>
      <td>dC</td>
      <td>Area Proportion of Isolated Inclusions</td>
    </tr>
    <tr>
      <td>Fatigue</td>
      <td>Rotating Bending Fatigue Strength (10^7 Cycles)</td>
    </tr>
  </tbody>
</table>

# <b>Project Outline</b>

<b>Steps that followed:</b>

<p>● Installing and importing all the required libraries.</p>
<p>● Downloading the data set from Kaggle.</p>
<p>● Exploratory Data Analysis.</p>
<p>● Feature Engineering.</p>
<p>● Prepare the Dataset for ML Training.</p>
<p>● Train & Validate different Models.</p>
<p>● Hyperparameter Tuning of best model.</p>
<p>● Final Model.</p>
<p>● Summary</p>

# <b>Installing & Importing required Libraries</b>

In [None]:
pip install opendatasets

Collecting opendatasets
  Downloading opendatasets-0.1.22-py3-none-any.whl.metadata (9.2 kB)
Downloading opendatasets-0.1.22-py3-none-any.whl (15 kB)
Installing collected packages: opendatasets
Successfully installed opendatasets-0.1.22


In [None]:
import pandas as pd
import numpy as np
import opendatasets as od
import os

# <b>Downloading Data</b>

In [166]:
od.download("https://www.kaggle.com/datasets/chaozhuang/steel-fatigue-strength-prediction/data")

Skipping, found downloaded files in "./steel-fatigue-strength-prediction" (use force=True to force download)


In [167]:
os.listdir("/content/steel-fatigue-strength-prediction")

['data.csv']

In [168]:
dt = pd.read_csv("/content/steel-fatigue-strength-prediction/data.csv")

• Downloaded the data from the kaggle Website and stored the data in variable <b>dt</b>

# <b>Exploratory Data Analysis</b>

In [None]:
dt.head()

Unnamed: 0,Sl. No.,NT,THT,THt,THQCr,CT,Ct,DT,Dt,QmT,...,S,Ni,Cr,Cu,Mo,RedRatio,dA,dB,dC,Fatigue
0,1,885,30,0,0,30,0.0,30.0,0.0,30,...,0.022,0.01,0.02,0.01,0.0,825,0.07,0.02,0.04,232
1,2,885,30,0,0,30,0.0,30.0,0.0,30,...,0.017,0.08,0.12,0.08,0.0,610,0.11,0.0,0.04,235
2,3,885,30,0,0,30,0.0,30.0,0.0,30,...,0.015,0.02,0.03,0.01,0.0,1270,0.07,0.02,0.0,235
3,4,885,30,0,0,30,0.0,30.0,0.0,30,...,0.024,0.01,0.02,0.01,0.0,1740,0.06,0.0,0.0,241
4,5,885,30,0,0,30,0.0,30.0,0.0,30,...,0.022,0.01,0.02,0.02,0.0,825,0.04,0.02,0.0,225


In [None]:
dt.shape

(437, 27)

• The data has 437 rows and 27 columns.

In [None]:
dt.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 437 entries, 0 to 436
Data columns (total 27 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Sl. No.   437 non-null    int64  
 1   NT        437 non-null    int64  
 2   THT       437 non-null    int64  
 3   THt       437 non-null    int64  
 4   THQCr     437 non-null    int64  
 5   CT        437 non-null    int64  
 6   Ct        437 non-null    float64
 7   DT        437 non-null    float64
 8   Dt        437 non-null    float64
 9   QmT       437 non-null    int64  
 10  TT        437 non-null    int64  
 11  Tt        437 non-null    int64  
 12  TCr       437 non-null    float64
 13  C         437 non-null    float64
 14  Si        437 non-null    float64
 15  Mn        437 non-null    float64
 16  P         437 non-null    float64
 17  S         437 non-null    float64
 18  Ni        437 non-null    float64
 19  Cr        437 non-null    float64
 20  Cu        437 non-null    float6

In [None]:
dt.columns

Index(['Sl. No.', 'NT', 'THT', 'THt', 'THQCr', 'CT', 'Ct', 'DT', 'Dt', 'QmT',
       'TT', 'Tt', 'TCr', 'C', 'Si', 'Mn', 'P', 'S', 'Ni', 'Cr', 'Cu', 'Mo',
       'RedRatio', 'dA', 'dB', 'dC', 'Fatigue'],
      dtype='object')

In [None]:
dt.isnull().sum()

Unnamed: 0,0
Sl. No.,0
NT,0
THT,0
THt,0
THQCr,0
CT,0
Ct,0
DT,0
Dt,0
QmT,0


There are no null values present in the data.

In [None]:
dt.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Sl. No.,437.0,219.0,126.295289,1.0,110.0,219.0,328.0,437.0
NT,437.0,872.299771,26.212073,825.0,865.0,870.0,870.0,930.0
THT,437.0,737.643021,280.036541,30.0,845.0,845.0,855.0,865.0
THt,437.0,25.949657,10.263824,0.0,30.0,30.0,30.0,30.0
THQCr,437.0,10.654462,7.841437,0.0,8.0,8.0,8.0,24.0
CT,437.0,128.855835,281.743539,30.0,30.0,30.0,30.0,930.0
Ct,437.0,40.502059,126.924697,0.0,0.0,0.0,0.0,540.0
DT,437.0,123.699844,267.128933,30.0,30.0,30.0,30.0,903.333
Dt,437.0,4.843936,15.700076,0.0,0.0,0.0,0.0,70.2
QmT,437.0,35.491991,19.419277,30.0,30.0,30.0,30.0,140.0


In [None]:
corr_matrix = dt.corr()
corr_matrix

Unnamed: 0,Sl. No.,NT,THT,THt,THQCr,CT,Ct,DT,Dt,QmT,...,S,Ni,Cr,Cu,Mo,RedRatio,dA,dB,dC,Fatigue
Sl. No.,1.0,0.599465,-0.376826,-0.371195,-0.721078,0.541596,0.491472,0.54209,0.492028,0.437887,...,-0.352364,0.497055,0.445749,0.325892,0.356818,-0.265385,-0.438273,-0.343425,-0.080257,0.714279
NT,0.599465,1.0,-0.733562,-0.744072,-0.746162,0.77414,0.704048,0.773907,0.680719,0.623976,...,-0.172272,0.349784,0.429045,0.214173,0.313724,-0.27512,-0.388146,-0.10713,-0.085675,0.649459
THT,-0.376826,-0.733562,1.0,0.999487,0.53287,-0.888675,-0.808212,-0.888408,-0.781432,-0.716294,...,-0.154367,0.01862,-0.081163,-0.123265,-0.178605,0.193575,0.012032,-0.020254,0.125157,-0.656615
THt,-0.371195,-0.744072,0.999487,1.0,0.53742,-0.889131,-0.808627,-0.888864,-0.781833,-0.716662,...,-0.152671,0.02386,-0.090569,-0.125586,-0.185783,0.194496,0.01502,-0.025964,0.125478,-0.655897
THQCr,-0.721078,-0.746162,0.53287,0.53742,1.0,-0.477836,-0.434572,-0.477693,-0.420172,-0.385148,...,0.337895,-0.262757,-0.604618,-0.310723,-0.442909,0.238366,0.518123,0.339716,0.013883,-0.553098
CT,0.541596,0.77414,-0.888675,-0.889131,-0.477836,1.0,0.909458,0.9997,0.879323,0.806025,...,0.06995,0.020457,0.202449,0.200677,0.26687,-0.245155,-0.080275,-0.045154,-0.175935,0.850296
Ct,0.491472,0.704048,-0.808212,-0.808627,-0.434572,0.909458,1.0,0.909506,0.82954,0.832438,...,0.046252,0.0354,0.170568,0.203678,0.234012,-0.221181,-0.080861,-0.047728,-0.159301,0.778942
DT,0.54209,0.773907,-0.888408,-0.888864,-0.477693,0.9997,0.909506,1.0,0.888118,0.807133,...,0.070575,0.014695,0.205448,0.202648,0.264147,-0.245533,-0.079074,-0.045767,-0.175925,0.848612
Dt,0.492028,0.680719,-0.781432,-0.781833,-0.420172,0.879323,0.82954,0.888118,1.0,0.752087,...,0.066872,-0.08485,0.264144,0.170291,0.173871,-0.222986,-0.046088,-0.034599,-0.144792,0.726105
QmT,0.437887,0.623976,-0.716294,-0.716662,-0.385148,0.806025,0.832438,0.807133,0.752087,1.0,...,0.056382,0.016489,0.163179,0.161751,0.215103,-0.197601,-0.064704,-0.036395,-0.141808,0.687954


# <b>Feature Engineering</b>

In [None]:
fatigue_corr=corr_matrix["Fatigue"]
fatigue_corr

Unnamed: 0,Fatigue
Sl. No.,0.714279
NT,0.649459
THT,-0.656615
THt,-0.655897
THQCr,-0.553098
CT,0.850296
Ct,0.778942
DT,0.848612
Dt,0.726105
QmT,0.687954


In [None]:
fatigue_corr=fatigue_corr.drop(["Sl. No.","Fatigue"])

In [None]:
fatigue_corr.loc[fatigue_corr>=0].sort_values(ascending=False)

Unnamed: 0,Fatigue
Tt,0.860337
CT,0.850296
DT,0.848612
Ct,0.778942
Dt,0.726105
QmT,0.687954
NT,0.649459
Cr,0.434295
Mo,0.403535
Cu,0.290846


• Fatigue is highly correlated to tempering temperature, carburizing temperature, diffusion time, carburization time & diffusion time
<p>• Fatigue is lowly correlated to percentage of Nickel</p>

In [None]:
features = fatigue_corr.loc[fatigue_corr>=0].index.to_list()
features

['NT', 'CT', 'Ct', 'DT', 'Dt', 'QmT', 'Tt', 'Si', 'Ni', 'Cr', 'Cu', 'Mo']

# <b>Training & Validation</b>

• Splitting the dependent variable & non dependent variables

In [None]:
X = dt[features]
y = dt.Fatigue

• defining the root mean square error

In [None]:
def rmse(target,predictions):
    return np.sqrt(mean_squared_error(target,predictions))

• defining the error metrics to find the errors between data and predictions based on
<br>Mean Absolute Error,
R Squared,</br>
 Mean Squared Error,
 Root Mean Squared Error

In [None]:
def errors(val_y,predictions):
    mae =mean_absolute_error(val_y,predictions)
    r2 = r2_score(val_y,predictions)
    mse = mean_squared_error(val_y,predictions)
    rms= rmse(val_y,predictions)
    print("Mean Absolute Error",mae)
    print("Mean Squared Error",mse)
    print("Root Mean Squared Error",rms)
    print("R Squared",r2)

In [None]:
from sklearn.metrics import r2_score, mean_absolute_error,mean_squared_error

In [None]:
from sklearn.model_selection import train_test_split
train_X,val_X,train_y,val_y= train_test_split(X,y,test_size =0.2)

• The data is split into train & test set.
<p>• The training set is used to train the machine learning model, while the testing set is used to evaluate its performance on unseen data.</p>
<p>• A common split of 80% train & 20% test is used.</p>

<b>Model : Decision Tree</b>

In [None]:
from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor(random_state = 1)
model.fit(train_X,train_y)
predictions = model.predict(val_X)

In [None]:
errors(val_y,predictions)

Mean Absolute Error 50.83522727272727
Mean Squared Error 3721.3267045454545
Root Mean Squared Error 61.00267784733269
R Squared 0.9100866929403558


<b>Model : Random Forest </b>

In [None]:
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(random_state = 1)
model.fit(train_X,train_y)
predictions = model.predict(val_X)

In [None]:
errors(val_y,predictions)

Mean Absolute Error 45.60695522186146
Mean Squared Error 3055.8024958508017
Root Mean Squared Error 55.279313453142684
R Squared 0.9261668405014117


<b>Model : XG Boost</b>

In [None]:
from xgboost import XGBRegressor
model= XGBRegressor(seed=1)
model.fit(train_X,train_y)
predictions = model.predict(val_X)

In [None]:
errors(val_y,predictions)

Mean Absolute Error 50.9261474609375
Mean Squared Error 3696.50634765625
Root Mean Squared Error 60.79890087539618
R Squared 0.9106863737106323


• Tree(Decision Tree, Random Forest, XG Boost) based models are not affected by feature scaling because they split data based on value thresholds, not by distances.

Selecting Random Forest Regressor for lesser error than other models & checking for features without lesser correlation

In [None]:
new_features = ['NT', 'CT', 'Ct', 'DT', 'Dt', 'QmT', 'Tt','Si', 'Cr', 'Cu', 'Mo']

variable new_features contains selected features except Ni due to lower correlation

In [None]:
new_X = dt[features]

In [None]:
train_new_X,val_new_X,train_new_y,val_new_y= train_test_split(new_X,y,test_size =0.2)

In [None]:
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(random_state = 1)
model.fit(train_new_X,train_new_y)
predictions = model.predict(val_new_X)

In [None]:
errors(val_new_y,predictions)

Mean Absolute Error 35.96925477994227
Mean Squared Error 2015.2353449924399
Root Mean Squared Error 44.89137272341357
R Squared 0.9510751056574924


Improved Random Forest Regressor model by 18.80 % based on root mean Squared error

# <b>Hyperparameter Tuning</b>

Setting n_estimators to 90 and max_depth to 5 gets the better result by trial and error method

In [None]:
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators = 90,max_depth = 5,random_state = 1)
model.fit(train_new_X,train_new_y)
predictions = model.predict(val_X)

In [None]:
errors(val_y,predictions)

Mean Absolute Error 31.48630338696036
Mean Squared Error 1532.149205696009
Root Mean Squared Error 39.142677548885295
R Squared 0.9629807826803634


Improved Random Forest Regressor model by 12.80 % based on root mean Squared error

# <b>Final Model & Testing</b>

In [160]:
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators = 90,max_depth = 5,random_state = 1)
model.fit(new_X,y)

In [161]:
test_row = [
    # Row : High Strength Structural Steel (AISI 4140 equivalent)
    {
        'Sl. No.': 1001, 'NT': 900, 'THT': 600, 'THt': 1.5, 'THQCr': 0.35,    'CT': 20, 'Ct': 0.3, 'DT': 150, 'Dt': 0.5, 'QmT': 50, 'TT': 580,
    'Tt': 1.5, 'TCr': 0.12, 'C': 0.40, 'Si': 0.25, 'Mn': 0.80, 'P': 0.010,
    'S': 0.005, 'Ni': 0.10, 'Cr': 1.20, 'Cu': 0.15, 'Mo': 0.30,
    'RedRatio': 0.75, 'dA': 13.5, 'dB': 8.7, 'dC': 10.8, 'Fatigue': 570.0
    }]

In [162]:
test_dt = pd.DataFrame(test_row)
test_dt

Unnamed: 0,Sl. No.,NT,THT,THt,THQCr,CT,Ct,DT,Dt,QmT,...,S,Ni,Cr,Cu,Mo,RedRatio,dA,dB,dC,Fatigue
0,1001,900,600,1.5,0.35,20,0.3,150,0.5,50,...,0.005,0.1,1.2,0.15,0.3,0.75,13.5,8.7,10.8,570.0


In [163]:
test_X= test_dt[features]
test_y = test_dt.Fatigue

In [164]:
predictions = model.predict(test_X)
print(predictions)

[596.26953013]


 The predicted fatigue value is 4.33 % more than experimental fatigue value

In [165]:
errors(test_y,predictions)

Mean Absolute Error 26.26953013455409
Mean Squared Error 690.0882134902454
Root Mean Squared Error 26.26953013455409
R Squared nan




# <b>Summary</b>

The data has been downloaded, explored & performed EDA(Exploratory Data Analysis) and trained few models to automate the process of predicting Fatigue strength based on Metal composition,Upstream process, Heat Treatment method.

• Training data had approximately 437 rows and 27 columns.

• Prepared the dataset which had high correlations.

• Then data has been split into train data and validation data

• Trained three models: DecissionTree and RandomForest, XGBoost. Among these RandomForest performed better and applied hyperparameter tuning onto it so that it gave the <b>R Squared  of 0.96298</b> on the validation set.