# Gradient Boosting Machines (GBM)

## Theory and Mathematics of GBM

Gradient Boosting Machines (GBM) are a class of ensemble learning techniques that build predictive models by combining the outputs of multiple weak learners, usually decision trees. Here are the core components and mathematical details of GBM:

### Gradient Boosting Principle:
GBM builds trees sequentially, where each new tree corrects errors made by the previous ones. It minimizes a loss function by gradient descent during training, updating the model's predictions iteratively.

### Tree Building:
GBM employs regression trees as base learners, optimizing them to fit negative gradients of the loss function. Shallow trees are preferred to avoid overfitting while capturing essential patterns in the data.

### Model Prediction:
The final prediction is the aggregate of predictions from all trees, weighted by the learning rate (shrinkage) parameter. This parameter controls the contribution of each tree to the final prediction, ensuring a balanced ensemble model.

### Mathematical Formulation:
At each iteration \( m \), the model updates as:
\[ F_m(x) = F_{m-1}(x) + \lambda \cdot f_m(x) \]
where \( F_m(x) \) represents the ensemble model, \( f_m(x) \) is the \( m \)th weak learner (decision tree), and \( \lambda \) is the learning rate. The model minimizes the loss function \( L(y, F(x)) \) iteratively to improve predictions.

### Reasons for Choosing GBM

1. **Strong Predictive Power:** GBM excels in capturing complex nonlinear relationships, making it suitable for accurately predicting fatigue behavior influenced by intricate material properties and various environmental factors.

2. **Ensemble Learning Benefits:** By combining multiple weak learners, GBM reduces bias and variance, leading to more robust models. This is advantageous in fatigue prediction, where subtle but critical features impact crack growth rates and failure mechanisms.

3. **Flexibility and Tunability:** GBM offers flexibility in hyperparameter tuning, allowing researchers to optimize model performance for specific fatigue prediction tasks. Its adaptability to different datasets and problem complexities makes it a versatile and widely-used choice in machine learning applications.

4. **Iterative Learning Process:** The iterative nature of GBM's training process enables continuous improvement in model predictions, gradually minimizing errors and enhancing accuracy with each iteration.



# XGBoost (eXtreme Gradient Boosting)

## Theory and Mathematics of XGBoost

XGBoost is an optimized implementation of gradient boosting machines, designed for enhanced performance and efficiency. Here are the key aspects and mathematical details of XGBoost:

### Regularization Techniques:
XGBoost incorporates L1 and L2 regularization to control model complexity and prevent overfitting. These regularization techniques penalize large coefficients, encouraging simpler and more generalizable models.

### Parallel Processing:
One of XGBoost's strengths is its support for parallel processing, enabling faster training and evaluation on multicore processors. This capability significantly reduces computation time, especially for large datasets.

### Tree Pruning:
XGBoost implements tree pruning during training based on their predicted utility, removing nodes that contribute little to improving the model's predictions. Pruning enhances computational efficiency and helps prevent overfitting.

### Mathematical Formulation:
The objective function in XGBoost combines a loss function \( L(y, F(x)) \) and regularization terms \( \Omega(f) \) for each tree. The optimization process involves minimizing the following objective:
\[ \text{Objective} = \sum_{i=1}^{n} L(y_i, F(x_i)) + \sum_{k=1}^{K} \Omega(f_k) \]

### Reasons for Choosing XGBoost

1. **Enhanced Regularization:** XGBoost's regularization techniques mitigate overfitting, crucial when dealing with complex fatigue behavior influenced by numerous factors such as material properties, loading conditions, and environmental variables.

2. **Efficiency and Scalability:** Its support for parallel processing and optimized algorithms make XGBoost efficient for processing large datasets, speeding up model training and evaluation. This efficiency is vital for handling the extensive feature sets common in fatigue prediction tasks.

3. **State-of-the-Art Performance:** XGBoost has consistently demonstrated high performance in various machine learning competitions and real-world applications, showcasing its effectiveness across domains, including fatigue prediction and material behavior modeling.



# LightGBM (Light Gradient Boosting Machine)

## Theory and Mathematics of LightGBM

LightGBM is a gradient boosting framework known for its speed, efficiency, and ability to handle large datasets. Here are the detailed aspects and mathematical formulations of LightGBM:

### Histogram-Based Splitting:
LightGBM employs histogram-based splitting, where continuous feature values are binned into discrete bins during training. This approach reduces the computational cost of finding optimal split points, making LightGBM efficient for processing large feature sets.

### Leaf-Wise Tree Growth:
Unlike traditional depth-wise growth, LightGBM grows trees leaf-wise, prioritizing nodes with the highest loss reduction. This strategy leads to faster convergence and more accurate models, especially when capturing complex interactions in fatigue behavior.

### Gradient-Based One-Side Sampling (GOSS):
LightGBM uses GOSS to select informative samples with large gradients during training. By focusing on critical samples, GOSS improves model generalization and sensitivity to important features, crucial for fatigue prediction tasks.

### Mathematical Formulation:
LightGBM's objective function combines a loss function \( L(y, F(x)) \), regularization terms, and penalties for data partitioning. The optimization process involves minimizing the following objective:
\[ \text{Objective} = \sum_{i=1}^{n} L(y_i, F(x_i)) + \sum_{k=1}^{K} \Omega(f_k) + \sum_{i=1}^{n} \phi(\frac{G_i}{H_i} + \lambda) \]

### Reasons for Choosing LightGBM

1. **Efficient Handling of Large Datasets:** LightGBM's histogram-based splitting and leaf-wise tree growth make it highly efficient for processing extensive feature sets commonly found in fatigue prediction tasks. This efficiency significantly reduces training time and resource requirements.

2. **Enhanced Sensitivity to Critical Features:** The use of GOSS in LightGBM allows it to focus on informative samples with large gradients, improving model sensitivity to critical features influencing fatigue behavior. This capability is crucial for accurately capturing complex material interactions and predicting fatigue life.

3. **Optimized Computational Resources:** LightGBM's optimized algorithms and memory usage make it suitable for deployment in resource-constrained environments, such as production systems or cloud platforms. Its ability to deliver accurate predictions with minimal computational overhead is a key advantage in real-world applications.


# Explanation, Implementation, and Assumptions

## Gradient Boosting Machines (GBM)

### Explanation of Calculations:
GBM predicts fatigue life based on material properties using ensemble learning. It calculates predicted fatigue life values, comparing them to actual values for accuracy assessment.

### Implementation and Definitions:
- **Implementation:** GBM is implemented using libraries like scikit-learn or specialized gradient boosting libraries like XGBoost, LightGBM.
- **R-squared (R^2):** Measures variance proportion in fatigue life explained by the model. Higher R-squared values indicate better fit.
- **Root Mean Squared Error (RMSE):** Measures average deviation of predicted fatigue life from actual values. Lower RMSE implies higher prediction accuracy.

### Implemented Models:
- GBM (Gradient Boosting Machines)

### Assumptions:
1. **Feature Relevance:** Assumes the dataset contains highly relevant fatigue-related features influencing fatigue life predictions.
2. **Pattern Consistency:** Assumes fatigue behavior exhibits consistent patterns over the dataset, allowing GBM to learn meaningful relationships.
3. **No Collinearity:** Assumes features used in GBM are not highly correlated, preventing multicollinearity issues that can skew model predictions.

## XGBoost (eXtreme Gradient Boosting)

### Explanation of Calculations:
XGBoost optimizes GBM by incorporating regularization techniques and parallel processing for improved efficiency.

### Implementation and Definitions:
- **Implementation:** Utilizes the XGBoost library, known for its efficiency and optimization capabilities in gradient boosting tasks.
- **R-squared (R^2) and RMSE:** Evaluation metrics similar to GBM assess model performance.

### Assumptions:
1. **Complexity Control:** Assumes the dataset requires fine-tuned regularization to prevent overfitting, a strength of XGBoost.
2. **Parallelization Benefits:** Assumes XGBoost leverages multicore processors efficiently, improving training and prediction speeds for large datasets.
3. **Nonlinear Patterns:** Assumes fatigue behavior exhibits nonlinear relationships with features, aligning with XGBoost's ability to capture complex patterns.

## LightGBM (Light Gradient Boosting Machine)

### Explanation of Calculations:
LightGBM introduces optimization techniques like histogram-based splitting and leaf-wise tree growth for faster training.

### Implementation and Definitions:
- **Implementation:** Leverages the LightGBM library, known for its speed and efficiency in gradient boosting tasks.
- **R-squared (R^2) and RMSE:** Standard metrics used to evaluate model performance.

### Assumptions:
1. **Efficient Data Handling:** Assumes LightGBM's optimized algorithms handle large datasets efficiently, a critical factor in fatigue prediction with numerous material attributes.
2. **Sensitive Feature Selection:** Assumes LightGBM effectively prioritizes critical features impacting fatigue behavior, leading to accurate predictions.
3. **Memory Optimization:** Assumes LightGBM's memory-efficient design allows deployment in resource-constrained environments, offering scalability and practicality.

