## Introduction

Hi fellow Kagglers!

Here is an attempt to perform a clear and simple implementation of Neural network using Tensorflow 2.0 to solve House Prices Regression algorithm.
I have then chosen the optimum configuration of the neural network using KerasTuner module.
Hope to make it helpful for those new to Deep Learning!<br>
<br>
Hope you enjoy! Let's dive in!
<br>
<img src="https://media.giphy.com/media/dWy2WwcB3wvX8QA1Iu/giphy.gif">


In [None]:
# Importing libraries
import pandas as pd
import numpy as np
import logging
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from time import time
import datetime
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import r2_score, mean_squared_error

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
pd.options.mode.chained_assignment= None #avoid unnecessary warnings

In [None]:
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logging.info("test")

# To visualize all columns in the dataframe
pd.pandas.set_option('display.max_columns', None)

In [None]:
# Importing data

df= pd.read_csv('/kaggle/input/house-prices-advanced-regression-techniques/train.csv')
print(df.shape)
df.head()

## Data Preprocessing

In [None]:
# def reduce_mem_usage(df, verbose=True):
#     numerics = ['int16','int32','int64','float16','float32','float64']
#     start_mem = df.memory_usage().sum() / (1024**2)
    
#     for col in df.columns:
#         col_type = df[col].dtypes
#         if col_type in numerics:
#             c_min = df[col].min()
#             c_max = df[col].max()
#             if str(col_type)[:3] == 'int':
#                 if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
#                     df[col] = df[col].astype(np.int8)
#                 elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
#                     df[col] = df[col].astype(np.int16)
#                 elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
#                     df[col] = df[col].astype(np.int32)
#                 elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
#                     df[col] = df[col].astype(np.int64)  
#             else:
#                 if c_min > np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:
#                     df[col] = df[col].astype(np.float16)
#                 elif c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
#                     df[col] = df[col].astype(np.float32)
#                 else:
#                     df[col] = df[col].astype(np.float64)    
#     end_mem = df.memory_usage().sum() / 1024**2
#     if verbose: print('Mem. usage decreased to {:5.2f} Mb ({:.1f}% reduction)'.format(end_mem, 100 * (start_mem - end_mem) / start_mem))
#     return df

# df= reduce_mem_usage(df)

In [None]:
# 1. Excluding columns which have majority as null
df= df.loc[:, df.isnull().sum()/len(df)<0.80]

x= df.iloc[:, 1:-1] # Dropping 'Id' and the Y feature
y= df.iloc[:,-1]

train_cols= x.columns
print(df.shape, x.shape, y.shape)

In [None]:
# 2. Looking at the Overall statistics of variables and correlation among all variables
train_stats= x.describe().transpose()
train_stats

In [None]:
#correlation matrix
corrmat = df.corr()
f, ax = plt.subplots(figsize=(12, 9))
# sns.heatmap(corrmat, vmax=.8, square=True)

#saleprice correlation matrix
k=10
cols= corrmat.nlargest(k, 'SalePrice')['SalePrice'].index
cm= np.corrcoef(df[cols].values.T)
sns.set(font_scale=1.25)
hm= sns.heatmap(cm, cbar=True, annot=True, square=True,
                fmt='.2f', annot_kws={'size':10},
                yticklabels=cols.values,
                xticklabels=cols.values)
plt.show()
# Conclusion: OverallQual is highly correlated with the target variable.

3. Division of different column type for different preprocessing and division between the train and test datasets

In [None]:
ordinal_cols= list(x.columns[x.columns.str.contains('Yr|Year')])
print('ordinal/temporal columns:\n',ordinal_cols)
nominal_cols= list(set(x.select_dtypes(include=['object']).columns)- set(ordinal_cols))
print('nominal columns:\n', nominal_cols)
numeric_cols= list(set(x.select_dtypes(exclude=['object']).columns)- set(ordinal_cols))
print('numeric columns:\n',numeric_cols)

In [None]:
# Checking unique values
x[nominal_cols].describe().transpose()

In [None]:
x_train, x_test, y_train, y_test= train_test_split(x,y, test_size=0.20, random_state=0)

x_train.shape, x_test.shape, y_train.shape, y_test.shape

4. Missing Value Imputation

Performing simple imputation as the goal of this exercise is to focus on improvement using KerasTuner.

In [None]:
def missing_val_imputation(x, ordinal_cols,nominal_cols,numeric_cols):
    
    for col in ordinal_cols:
        x.loc[:,col]= x.loc[:,col].fillna(x.loc[:,col].mean())

    x.loc[:,nominal_cols]= x.loc[:,nominal_cols].fillna("?")
    
    for col in numeric_cols:
        x.loc[:,col]= x.loc[:,col].fillna(x.loc[:,col].mean())
#         x.loc[:,col]= x.groupby("OverallQual")[col].transform(lambda grp:grp.fillna(np.mean(grp)))

    print("All missing values are now imputed!\n",x.isnull().sum().sort_values(ascending=False))
    
    return x

In [None]:
x_train= missing_val_imputation(x_train,ordinal_cols,nominal_cols,numeric_cols)
x_test= missing_val_imputation(x_test,ordinal_cols,nominal_cols,numeric_cols)

5. Analysis and treatment of temporal variables

Strategy- capture time difference from YearSold

In [None]:
# Fitting OHE object
ohe= OneHotEncoder(handle_unknown='ignore', sparse=False).fit(x_train[nominal_cols]) 

#Feature Encoding for nominal columns

def ohe_transform(x, ohe, nominal_cols):
    x_ohe= pd.DataFrame(ohe.transform(x[nominal_cols]))
    x_ohe.columns=ohe.get_feature_names(nominal_cols)

    # prepping x
    x=x.drop(nominal_cols, axis=1)
    x.reset_index(inplace=True, drop=True)
    x= x.merge(x_ohe, left_index=True, right_index=True)
    
    return x

x_train= ohe_transform(x_train, ohe, nominal_cols)
x_test= ohe_transform(x_test, ohe, nominal_cols)
x_train.shape, x_test.shape

In [None]:
# Standard Scaling
ss= StandardScaler()
x_train_ss= pd.DataFrame(ss.fit_transform(x_train))
x_test_ss= pd.DataFrame(ss.transform(x_test))
x_train.shape, x_train_ss.shape, x_test_ss.shape, y_train.shape

## Feature Selection

In [None]:
from sklearn.linear_model import Lasso
from sklearn.feature_selection import SelectFromModel

sel= SelectFromModel(Lasso(alpha=0.5, max_iter=3000, tol=0.005, random_state=0, warm_start= False)) # warm_start= True

# train the lasso model and select features
sel.fit(x_train_ss, y_train)

sel.get_support()

selected_feats= x_train_ss.columns[(sel.get_support())]

# print the stats
print("# of total features: ",x_train.shape[1])
print("# of selected features: ",len(selected_feats))
# print("# of rejected features: ",np.sum(sel.estimator_.coef_==0))
# print('Selected features:', selected_feats)

x_train_ss= x_train_ss[selected_feats]
x_test_ss= x_test_ss[selected_feats]

x_train_ss.shape, x_test_ss.shape

In [None]:
# Setting up a baseline model

logreg= LogisticRegression(random_state=3, max_iter=150, warm_start=True, n_jobs=-1)

logreg.fit(x_train_ss, y_train)

y_pred_logreg= logreg.predict(x_test_ss)
y_pred_logreg_train= logreg.predict(x_train_ss)

print("Training score:",r2_score(y_train, y_pred_logreg_train))
# Model Accuracy

print("Test score:",r2_score(y_test,y_pred_logreg))
print("Mean squared error: ",np.sqrt(mean_squared_error(y_test, y_pred_logreg))) 

# Big time overfitting however our 

## Implemention of Neural network

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models

# import tensorflow_docs as tfdocs
# import tensorflow_docs.plots
# import tensorflow_docs.modeling
print(tf.__version__)

In [None]:
# Building a simple neural network

def build_model():
    model=keras.Sequential([
        layers.Dense(128, activation='relu', input_shape=[len(x_train_ss.keys())]),
        layers.Dense(64, activation='relu'),
        layers.Dense(32, activation='relu'),
        layers.Dense(1)
    ])
    #No activation is used in the last layer as this is regression
    optimizer= tf.keras.optimizers.Adam(0.001)
    
    model.compile(loss='mse',
                 optimizer= optimizer,
                 metrics= ['mae', 'mse'])
    return model

In [None]:
# Build and inspect the model

model= build_model()
model.summary()

## Train the model

Train the model for 500 epochs, and record the training and validation accuracy in the history object

In [None]:
# Setting 'restore_best_weights' as True helps restore model weights from the epoch with the best value of the monitored quantity. If False, the model weights obtained at the last step of training are used.
early_stopping_cb = keras.callbacks.EarlyStopping(monitor='val_loss', patience=20, verbose=1, mode='min', restore_best_weights=False)

history= model.fit(
    x_train_ss, y_train,
    epochs=200,
    validation_data=(x_test_ss, y_test),
    verbose=0, #set verbose=1 for full details at every epoch
    callbacks= [early_stopping_cb])

loss, mae, mse= model.evaluate(x_test_ss, y_test, verbose=2)

print("Test-set Mean absolute error: {:5.2f}".format(mae)) # test mae- 36286

In [None]:
plt.figure(figsize=(10,10))

plt.subplot(2,2,1)
plt.plot(history.history['loss'], label='loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.legend()
plt.title('Training - Loss Function')

plt.subplot(2, 2, 2)
plt.plot(history.history['mae'], label='Mean absolute error')
plt.plot(history.history['val_mae'], label='Validation mean absolute error')
plt.legend()
plt.title('Train - MAE')

Visualize the model's training progress using the stats stored in the history object

In [None]:
y_pred_test= model.predict(x_test_ss).flatten()

# a = plt.axes(aspect='equal')
plt.scatter(y_test, y_pred_test)
plt.xlabel('True Values [SalesPrice]')
plt.ylabel('Predictions [SalesPrice]')

lims=[0, max(y_test)]
plt.xlim(lims)
plt.ylim(lims)
_ = plt.plot(lims, lims)

#### It looks like our model predicts reasonably well. Let's take a look at the error distribution.

In [None]:
error= y_pred_test-y_test
plt.hist(error, bins=25)
plt.xlabel('Prediction Error [SalesPrice]')
_=plt.ylabel('Count')

#### It's not quite gaussian, but we might expect that because the number of samples is very small.

In [None]:
# Accuracy metrics :

y_pred_train= model.predict(x_train_ss).flatten()

print("Accuracy obtained using x_train and x_val sets from the original x!")

print("Training accuracy: ",r2_score(y_train, y_pred_train))

print("Test accuracy: ",r2_score(y_test, y_pred_test))

print("Test mean-squared error: ",np.sqrt(mean_squared_error(y_test, y_pred_test)))

#Hence the current test accuracy is 0.48385 and Test MSE is 59703.04
# Note- adding a droput layer decreases the accuracy to 0.46

## KerasTuner

Libraries such as keras tuner make it dead simple to implement hyperparameter optimization into our training scripts in an organic manner:<br>

As we implement our model architecture, we define what ranges we want to search over for a given parameter (e.g., # of filters in our first CONV layer, # of filters in the second CONV layer, etc.)<br>
We then define an instance of either Hyperband, RandomSearch, or BayesianOptimization <br>
The keras tuner package takes care of the rest, running multiple trials until we converge on the best set of hyperparameters. <br>

It implements novel hyperparameter tuning algorithms including **Bayesian hyperparameter optimization and Hyperband**. It is an amazing tool to boost accuracy with minimal effort on your part!

### Optimizing Neural networks through KerasTuner


In [None]:
from kerastuner import HyperModel
from kerastuner.tuners import RandomSearch, Hyperband
import IPython

#### 1. Define the model for hypertuning

In [None]:
class ANNhypermodel(HyperModel):
    
    def __init__(self, input_shape):
        self.input_shape= input_shape
        
    def build(self, hp):
        model= keras.Sequential()
        
        # Tune the number of units in the first Dense layer
        # Defining dense units as a close approx to the original neural network to perform a fair comparision!
        
        
        hp_units_1= hp.Int('units_1', min_value=128, max_value= 160, step=32)
        hp_units_2= hp.Int('units_2', min_value=64, max_value= 128, step=32)
        hp_units_3= hp.Int('units_3', min_value=32, max_value= 64, step=16)

        model.add(keras.layers.Dense(units=hp_units_1, activation='relu', input_shape= self.input_shape))
        model.add(keras.layers.Dense(units=hp_units_2, activation='relu'))
        model.add(keras.layers.Dense(units=hp_units_3, activation='relu'))
        model.add(keras.layers.Dense(1))

        # Tune the learning rate for the optimizer 
        hp_learning_rate=hp.Float('learning_rate', min_value=1e-4, max_value=1e-2, sampling='LOG', default= 0.0005)

        model.compile(loss='mse',
                    optimizer= keras.optimizers.Adam(learning_rate=hp_learning_rate),
                    metrics= ['mae','mse']
                     )

        return model

hypermodel= ANNhypermodel(input_shape= [len(x_train_ss.keys())])

This is the same model we built earlier, except that for every hyperparameter, we defined a search space. You may have noticed hp.Int, hp.Float, and hp.Choice, these are used to define a search space for a hyperparameter that accepts an integer, float and a category respectively. ‘hp’ is an alias for Keras Tuner’s HyperParameters class.

#### 2. Instantiate the tuner to perform hypertuning <br>
The Keras Tuner has four tuners available - RandomSearch, Hyperband, BayesianOptimization, and Sklearn.

The most intuitive way to perform hyperparameter tuning is to randomly sample hyperparameter combinations and test them out. This is exactly what the RandomSearch tuner does! The objective is the function to optimize. The tuner infers if it is a maximization or a minimization problem based on its value.

<pre>
MAX_TRIALS = 20
tuner= RandomSearch(hypermodel,
               objective= 'val_mse',
               max_trials= MAX_TRIALS,
               executions_per_trial= EXECUTION_PER_TRIAL,
               directory= 'random_search',
               project_name='houseprices',
               overwrite=True)
</pre>
'Max_trials' variable represents the number of hyperparameter combinations that will be tested by the tuner, while the 'execution_per_trial' variable is the number of models that should be built and fit for each trial for robustness purposes.

**In this tutorial, we use the Hyperband tuner.** <br> Hyperband is an optimized version of random search which uses early-stopping to speed up the hyperparameter tuning process. The main idea is to fit a large number of models for a small number of epochs and to only continue training for the models achieving the highest accuracy on the validation set. The max_epochs variable is the max number of epochs that a model can be trained for.

In [None]:
HYPERBAND_MAX_EPOCHS = 150
EXECUTION_PER_TRIAL = 2

tuner= Hyperband(hypermodel,
                   objective= 'val_mse',
                   max_epochs=HYPERBAND_MAX_EPOCHS, #Set 100+ for good results
                   executions_per_trial=EXECUTION_PER_TRIAL,
                   directory= 'hyperband',
                   project_name='houseprices',
                   overwrite=True)

# tuner.search_space_summary()

The Hyperband tuning algorithm uses adaptive resource allocation and early-stopping to quickly converge on a high-performing model. This is done using a sports championship style bracket. The algorithm trains a large number of models for a few epochs and carries forward only the top-performing half of models to the next round. Hyperband determines the number of models to train in a bracket by computing 1 + logfactor(max_epochs) and rounding it up to the nearest integer.

3. Run the hyperparameter search. <br>
The arguments for the search method are the same as those used for tf.keras.model.fit in addition to the callback above.

In [None]:
print('searching for the best params!')

t0= time()
tuner.search(x= x_train_ss,
             y= y_train,
             epochs=100,
             batch_size= 64,
             validation_data= (x_test_ss, y_test),
             verbose=0,
             callbacks= []
            )
print(time()- t0," secs")

# Retreive the optimal hyperparameters
best_hps= tuner.get_best_hyperparameters(num_trials=1)[0]

# Retrieve the best model
best_model = tuner.get_best_models(num_models=1)[0]

In [None]:
print(f"""
The hyperparameter search is complete. The optimal number of units in the 
first densely-connected layer is {best_hps.get('units_1')},
second layer is {best_hps.get('units_2')} 
third layer is {best_hps.get('units_3')}  

and the optimal learning rate for the optimizer
is {best_hps.get('learning_rate')}.
""")

# Evaluate the best model.
print(best_model.metrics_names)
loss, mae, mse = best_model.evaluate(x_test_ss, y_test)
print(f'loss:{loss} mae: {mae} mse: {mse}')

4. Retrain the model with the optimal hyperparameters from the search

In [None]:
# Build the model with the optimal hyperparameters and train it on the data
tuned_model = tuner.hypermodel.build(best_hps)

# Check result using best model
t00= time()
history_tuned= tuned_model.fit(x_train_ss, y_train, 
          epochs = 200, 
          validation_data = (x_test_ss, y_test),
          verbose=0,
          callbacks= early_stopping_cb)

# print(time()- t00," secs")

print("\n Using Early stopping, needed only ",len(history_tuned.history['val_mse']),"epochs to converge!")

In [None]:
y_pred_train_tuned= tuned_model.predict(x_train_ss).flatten()
y_pred_test_tuned= tuned_model.predict(x_test_ss).flatten()

print("Training accuracy: ",r2_score(y_train, y_pred_train_tuned))

print("Test accuracy: ",r2_score(y_test, y_pred_test_tuned))

print("Test mean-squared error: ",np.sqrt(mean_squared_error(y_test, y_pred_test_tuned)))

## Checking model performance on Test data

In [None]:
# Importing Test data
test_df= pd.read_csv('/kaggle/input/house-prices-advanced-regression-techniques/test.csv')

# Considering same columns which are used for training
test_df= test_df[train_cols]

test_df.shape

In [None]:
# Preprocessing the test dataset
test_df= missing_val_imputation(test_df,ordinal_cols,nominal_cols,numeric_cols)
        
test_df= ohe_transform(test_df, ohe, nominal_cols)

test_df_ss= pd.DataFrame(ss.transform(test_df))

test_df_ss= test_df_ss[selected_feats]

test_df.shape, test_df_ss.shape

### Make predictions

Finally predict the SalesPrice values using the testing dataset:

In [None]:
# tuned_model_final.load_weights(checkpoint_path)

test_predictions= tuned_model.predict(test_df_ss).flatten()

## Submission

In [None]:
subm = pd.read_csv('/kaggle/input/house-prices-advanced-regression-techniques/sample_submission.csv')
subm.iloc[:,1]= np.array(test_predictions)
snapshot_date= datetime.datetime.today().strftime('%m_%d_%Y')

# subm.to_csv(str(snapshot_date)+'_tunedNN.csv',index=False)
subm.to_csv('tunedNN.csv',index=False)

In [None]:
# Fetch the saved file and print results here

# mysubm= pd.read_csv(f'{str(snapshot_date)}_tunedNN.csv')
mysubm= pd.read_csv('tunedNN.csv')

print(mysubm.head())

### Well, Any Conclusions?

<img src="https://media.giphy.com/media/l4FGC3sZppT0imaty/giphy.gif">
<br><br>
This notebook first introduced a simple neural network to handle a regression problem.<br>
We then went on to solve the painful and grey hyperparamater search using Keras Tuner, which is a rather easy-to-use, distributable hyperparameter optimization framework.<br>
<br>

**Here are a few learnings**: <br><br>
Mean Squared Error (MSE) is a common loss function used for regression problems (different loss functions are used for classification problems).<br>
Similarly, evaluation metrics used for regression differ from classification. <br>
When numeric input data features have values with different ranges, each feature should be scaled independently to the same range.<br>
If there is not much training data, one technique is to prefer a small network with few hidden layers to avoid overfitting.<br>
Early stopping and Dropout is a useful technique to prevent overfitting.<br>
It is must to scale all input values before feeding to a Neural network.<br>
For simple datasets, even simple algorithms such as logistic regression can give close to state-of-the-art results without having to dive into the complexity of a neural network.<br>
Using Keras Tuner makes it easy to define a search space and leverage included algorithms to find the best hyperparameter values. <br>

<pre>
References:
You can use these links to dive deeper into KerasTuner:
http:////www.tensorflow.org/tutorials/keras/regression
https://www.tensorflow.org/tutorials/keras/keras_tuner
https://www.sicara.ai/blog/hyperparameter-tuning-keras-tuner
https://pyimagesearch.com/2021/06/07/easy-hyperparameter-tuning-with-keras-tuner-and-tensorflow/
https://neptune.ai/blog/keras-tuner-tuning-hyperparameters-deep-learning-model

Do let me know if you have any suggestions! 
Please upvote if your found my kernel helpful!<br>