# Binary Classification of Machine Failures

"The dataset for this competition (both train and test) was generated from a deep learning model trained on the Machine Failure Predictions."

**Sources**:

- [Kaggle challenge](https://www.kaggle.com/competitions/playground-series-s3e17/data?select=train.csv)
- [Original Dataset](https://www.kaggle.com/datasets/dineshmanikanta/machine-failure-predictions)

# Outline
- [ 1 - Read Data ](#1) 
- [ 2 - EDA - Exploratory Data Analysis](#2)
- [ 3 - Preprocessing Data Before Modeling](#3)
- [ 4 - Model Training](#4)
- [ 5 - Model Comparison](#5)
- [ 6 - Model Tuning](#6)
- [ 7 - Model Diagnostic](#7)
- [ 8 - Challenge Submission](#8)

In [1]:
# importing standard libraries
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns

from sklearn.preprocessing import MinMaxScaler, PowerTransformer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import learning_curve, LearningCurveDisplay, ShuffleSplit

from xgboost import XGBClassifier

In [2]:
# Define Seaborn theme parameters
theme_parameters =  {
    'axes.spines.right': False,
    'axes.spines.top': False,
    'grid.alpha':0.3,
    'axes.titlesize': 16,
    'figure.figsize': (12, 4),
}

# Set the theme
sns.set_theme(style='whitegrid',
              palette=sns.color_palette('colorblind'), 
              rc=theme_parameters)

<a name="1"></a>
# Read Data

In [4]:
read_from_kaggle = False

In [6]:
if read_from_kaggle:
    
    data = pd.read_csv('',
                       index_col=0)
    test_data = pd.read_csv('',
                       index_col=0)
    orig_data = pd.read_csv('',
                            index_col=0)
    
    
else:

    train_data = pd.read_csv("./../data/machine_failure_train.csv",
                             index_col=0)
    test_data = pd.read_csv("./../data/machine_failure_test.csv",
                            index_col=0)
    orig_data = pd.read_csv("./../data/machine_failure_original.csv")