# ***[TPS Apr 2022] Neural Network for Beginners***

<img src="https://deepage.net/img/post_nn_example/thumbnail.jpg" width="500">

In TPS Apr 2022, many notebooks that make predictions using Neural Networks such as LSTM are published. However, I think it is difficult for many beginners, including myself, to understand. The goal of this notebook is to explain Neural Networks so that even beginners can understand them.

*  There are some places where the explanation is insufficient, so I will update it from time to time.
* I've just started learning about Neural Network, so if there are any mistakes please point out in the comments.
* I'm not good at English, so my English may be wrong in some places.

# Reference Notebook
And here are some great notebooks that I've refered when creating this notebook. Please check it out.
* [Top 1% | TPS APR 22 EDA | LSTM](https://www.kaggle.com/code/kartushovdanil/top-1-tps-apr-22-eda-lstm)
* [LSTM Baseline](https://www.kaggle.com/code/ryanbarretto/lstm-baseline)
* [Tps April Tensorflow Bi-LSTM](https://www.kaggle.com/code/hamzaghanmi/tps-april-tensorflow-bi-lstm)

# Import each data

In [None]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")

In [None]:
train_df = pd.read_csv('../input/tabular-playground-series-apr-2022/train.csv')
train_df

In [None]:
train_labels = pd.read_csv('../input/tabular-playground-series-apr-2022/train_labels.csv')
train_labels

In [None]:
test_df = pd.read_csv('../input/tabular-playground-series-apr-2022/test.csv')
test_df

# Data preprocessing

### 1. Create lag and difference features so that you can see amount of changes in each sensors. 

In [None]:
features = train_df.columns.tolist()[3:]
def preprocessing(df):
    for feature in features:
        df[feature + '_lag1'] = df.groupby('sequence')[feature].shift(1)
        df.fillna(0, inplace=True)
        df[feature + '_diff1'] = df[feature] - df[feature + '_lag1']    

preprocessing(train_df)
preprocessing(test_df)

### 2. StanderdScaling

In [None]:
from sklearn.preprocessing import StandardScaler

features = train_df.columns.tolist()[3:]
sc = StandardScaler()
train_df[features] = sc.fit_transform(train_df[features])
test_df[features] = sc.transform(test_df[features])

### 3. Reshape data

In [None]:
groups = train_df['sequence']
labels = train_labels['state']

train_df = train_df.drop(['sequence', 'subject', 'step'], axis=1).values
train_df = train_df.reshape(-1, 60, train_df.shape[-1])

test_df = test_df.drop(['sequence', 'subject', 'step'], axis=1).values
test_df = test_df.reshape(-1, 60, test_df.shape[-1])

I was taught a WEB page that shows how to difine the input shape in LSTM. Please take a look.
[reshape input data LSTM](https://machinelearningmastery.com/reshape-input-data-long-short-term-memory-networks-keras/)

# Modeling

### 1. Import libraries

In [None]:
import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.utils import plot_model
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.layers import GlobalMaxPooling1D
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Dense, Dropout, Input
from tensorflow.keras.layers import Concatenate, LSTM, GRU
from tensorflow.keras.layers import Bidirectional, Multiply

from sklearn.metrics import roc_auc_score

from sklearn.model_selection import GroupKFold

### 2. Set  TPU

Neural networks take a lot of time to learn, so we use TPU. Open up the settings menu in the Notebook editor, and select ‘TPU v3-8’ in the Accelerator menu.

In [None]:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver.connect()
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)

### 3. Define model

In [None]:
def BuildNN():
    with tpu_strategy.scope():
        x_input = Input(shape=(train_df.shape[-2:]))
    
        x1 = Bidirectional(LSTM(units=512, return_sequences=True))(x_input)
        x2 = Bidirectional(LSTM(units=256, return_sequences=True))(x1)
        z1 = Bidirectional(GRU(units=256, return_sequences=True))(x1)
    
        c = Concatenate(axis=2)([x2, z1])
    
        x3 = Bidirectional(LSTM(units=128, return_sequences=True))(c)
    
        x4 = GlobalMaxPooling1D()(x3)
        x5 = Dense(units=128, activation='selu')(x4)
        x_output = Dense(1, activation='sigmoid')(x5)

        model = Model(inputs=x_input, outputs=x_output, name='lstm_model')
    return model

* **bidirectional** : Bidirectional wrapper for RNNs.
* **LSTM** : Long Short-Term Memory
* **Concatenate** : This takes as input a list of tensors, all of the same shape except for the concatenation axis, and returns a single tensor that is the concatenation of all inputs.
* **GlobalMaxPooling1D** : Downsamples the input representation by taking the maximum value over the time dimension.
* **Dence** : This is used to create fully connected layers, in which every output depends on every input.

In [None]:
model = BuildNN()
model.compile(optimizer='adam',loss='binary_crossentropy', metrics='AUC')

* optimizer : Algorithm for efficient loss minimization (ex. 'adam', 'RMSProp')
* loss : The difference between the expected outcome and the outcome produced by model  (ex. 'mean_squared_error', 'binary_crossentropy')
* A function that is used to judge the performance of model (ex. 'MAE', 'ACC', 'AUC')

In this competition, submissions are evaluated on ***area under the ROC curve*** between the predicted probability and the observed target. So, you should set metrics to AUC.

### 4. Visualize model

In [None]:
%pip install pydot
%pip install pydotplus

In [None]:
model.summary()

In [None]:
from tensorflow.keras.utils import plot_model
plot_model(model, show_shapes=True)

### 5. Training the Model

In [None]:
scores = []
test_preds = []
kf = GroupKFold(n_splits=10)

##### GroupKFold
Grouping the data by 'sequence' and performing KFold, the data will not be separated within the same 'sequence'.

In [None]:
for fold_idx, (train_idx, valid_idx) in enumerate(kf.split(train_df, train_labels, groups.unique())):
    
    print('\n')
    print('*'*15, f'↓ Fold {fold_idx+1} ↓', '*'*15)
    
    # Separate into train data and validation data
    X_train, X_valid = train_df[train_idx], train_df[valid_idx]
    y_train, y_valid = labels.iloc[train_idx].values, labels.iloc[valid_idx].values
    
    # Train the model
    model.fit(X_train, y_train, 
              validation_data=(X_valid, y_valid), 
              epochs=15, 
              batch_size=1024, 
              callbacks=[EarlyStopping(monitor='val_auc', patience=7, mode='max', 
                                       restore_best_weights=True),
                         ReduceLROnPlateau(monitor='val_auc', factor=0.6, 
                                           patience=4, verbose=False)]
             )
    
    # Save score
    score = roc_auc_score(y_valid, model.predict(X_valid, batch_size=512).squeeze())
    scores.append(score)
    
    # Predict
    test_preds.append(model.predict(test_df, batch_size=512).squeeze())
    
    print(f'Fold {fold_idx+1} | Score: {score}')
    print('*'*15, f'↑ Fold {fold_idx+1} ↑', '*'*15)
    
print(f'Mean accuracy on {kf.n_splits} folds {np.mean(scores)}')

* **Batch_size** : the number of samples that will be propagated through the network
* **Epoch** : the number times that the learning will work through the entire training dataset.

Increasing the epoch causes overfitting, so I set EarlyStopping to stop learning before overfitting occurs.

# Submission

In [None]:
submission = pd.read_csv("../input/tabular-playground-series-apr-2022/sample_submission.csv")

submission["state"] = sum(test_preds)/kf.n_splits
submission.to_csv("submission.csv", index=False)
submission