### *In this notebook intended for beginners, a simple implementation of ElasticNet for predicting airfoil self-noise is demonstrated. This is a good dataset for practicing regression modelling because of its simplicity. In the latter part, simple ANN is also implemented to try to see how it performs vs the ElasticNet.*

 ### **Data & Libraries Import**
 #### Notice that a seed value is set because we want to be able to recreate the ANN results.

In [None]:
seed_value = 770
import os
os.environ['PYTHONHASHSEED']=str(seed_value)
import random
random.seed(seed_value)
import numpy as np
np.random.seed(seed_value)
import tensorflow as tf
tf.random.set_seed(seed_value)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy as sp
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
%matplotlib inline
import sklearn.metrics as metrics
import warnings
warnings.filterwarnings("ignore")
warnings.simplefilter(action='ignore', category=FutureWarning)

In [None]:
df = pd.read_csv('../input/nasa-airfoil-self-noise/NASA_airfoil_self_noise.csv',sep = ",", header = 0)
df.head()

In [None]:
df.describe()

In [None]:
df.info()

#### We can already see the data's simplicity. The goal is to predict the sound levels based on all the other variables.

### Exploratory Data Analysis

In [None]:
sns.set_palette("GnBu_d")
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (7.0, 5.0)
sns.pairplot(df,plot_kws={"s": 75}, height = 1.5)

In [None]:
plt.rcParams['figure.figsize'] = (7.0, 5.0)
plt.title("Correlation Plot")
sns.heatmap(df.corr(),cmap = 'viridis')

#### It is noticeable that AngleAttack and SuctionSide variables are correlated. We'll see later if their correlation is significant enough to affect the regression modelling

In [None]:
sns.jointplot(x='AngleAttack',y='SuctionSide',data=df,
              joint_kws={"s": 200}, kind = "scatter" )

### Defining X&y

In [None]:
X = df.drop(['Sound'], axis = 1)
y = df['Sound']
print("Dependent Variables")
display(X.head())
print("Independent Variable")
display(y.to_frame().head())

### Collinearity Verification using Variance Inflation Factor

In [None]:
from statsmodels.stats.outliers_influence import variance_inflation_factor
from statsmodels.tools.tools import add_constant
X_numeric = X._get_numeric_data()
X_numeric = add_constant(X_numeric)
VIF_frame = pd.Series([variance_inflation_factor(X_numeric.values, i) 
               for i in range(X_numeric.shape[1])], 
              index=X_numeric.columns).to_frame()

VIF_frame.drop('const', axis = 0, inplace = True) 
VIF_frame.rename(columns={VIF_frame.columns[0]: 'VIF'},inplace = True)
VIF_frame[~VIF_frame.isin([np.nan, np.inf, -np.inf]).any(1)]

#### We can say that there is no VIF value that is high enough that can affect the regression modelling. Let's go ahead and move on to the next step

### Splitting & Scaling

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size = 0.25, random_state=823)
X_train_numeric = X_train._get_numeric_data()
X_test_numeric = X_test._get_numeric_data()
scaler = StandardScaler()
X_train_numeric_scaled = pd.DataFrame(scaler.fit_transform(X_train_numeric), 
                                      index=X_train.index,
                                      columns=X_train_numeric.columns)
X_test_numeric_scaled = pd.DataFrame(scaler.transform(X_test_numeric), 
                                     index = X_test.index, 
                                     columns=X_test_numeric.columns)
X_train.update(X_train_numeric_scaled)
X_test.update(X_test_numeric_scaled)

### Notes on Evaluation

Evaluation metrics for regression problems:

**Mean Absolute Error** (MAE) - mean of the absolute value of the errors:
$$\frac 1n\sum_{i=1}^n|y_i-\hat{y}_i|$$
**Mean Squared Error** (MSE)  - mean of the squared errors:
$$\frac 1n\sum_{i=1}^n(y_i-\hat{y}_i)^2$$
**Root Mean Squared Error** (RMSE) - square root of the mean of the squared errors:
$$\sqrt{\frac 1n\sum_{i=1}^n(y_i-\hat{y}_i)^2}$$

### ElasticNetCV - Training

In [None]:
alpha = [1e-3, 1e-2, 1e-1, 1]
cv = 10
encv = ElasticNetCV(alphas = alpha, cv = cv, random_state = 1234)
encv.fit(X_train,y_train)
coeff_df = pd.DataFrame(encv.coef_,X.columns,columns=['Coefficient'])
intercept = pd.DataFrame(encv.intercept_,['Intercept'],['Coefficient'])
coeffs = pd.concat([coeff_df,intercept])
coeffs.round(3)

### ElasticNetCV - Evaluation

In [None]:
preds_test_en = encv.predict(X_test)
preds_train = encv.predict(X_train)
print("Sample Test Predictions: " + str(preds_test_en[0:5]))
print("Sample Train Predictions: " + str(preds_train[0:5]))

In [None]:
d1 = {'Test' : [metrics.mean_absolute_error(y_test, preds_test_en),
                metrics.mean_squared_error(y_test, preds_test_en),
                np.sqrt(metrics.mean_squared_error(y_test, preds_test_en))],
     'Train' : [metrics.mean_absolute_error(y_train, preds_train),
                metrics.mean_squared_error(y_train, preds_train),
               np.sqrt(metrics.mean_squared_error(y_train, preds_train))]}
m = pd.DataFrame(d1,['MAE','MSE','RMSE'])
m.style.format("{:.4f}")

In [None]:
# Our predictions
plt.scatter(y_test,preds_test_en)
# Perfect predictions
plt.plot(y_test,y_test,'lime')
plt.xlabel("Actual", fontsize = 18)
plt.ylabel("Predicted", fontsize = 18)

In [None]:
preds_test_en = pd.Series(preds_test_en.flatten().tolist())
plt.rcParams['figure.figsize'] = (7.0, 5.0)
plt.title("Error Distribution Plot")
sns.distplot(y_test-preds_test_en, bins = 20);

### ANN - Training

#### Prior to creating the ANN, we split the train set into train and val. This is necessary because we need a validation set during training of the ANN

In [None]:
X_train2, X_val, y_train2, y_val = train_test_split(X_train, y_train, 
                                                    test_size = 0.30, random_state=823)

We construct an ANN that has a very simple architecture. It has an input layer of 5 nodes representing the 5 features then we have 2 hidden layers with 3 neurons each and then of course an output layer with a single neuron.

Note that it is possible to get significantly better results with better architectures

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping

# Initialization
NN_model = Sequential()

# Input Layer
NN_model.add(Dense(5,activation='relu'))

#Hidden Layer/s
NN_model.add(Dense(3,activation='relu'))
NN_model.add(Dense(3,activation='relu'))

# Output Layer:
NN_model.add(Dense(1))

early_stop = EarlyStopping(monitor = 'val_loss', 
                           mode ='min', 
                           verbose = 1, 
                           patience = 20)

NN_model.compile(optimizer = 'adam', loss = 'mse')

NN_model.fit(x = X_train2.values, 
             y = y_train2.values,
             validation_data = (X_val.values, y_val.values),
             epochs = 1000,
             callbacks=[early_stop])

In [None]:
NN_model.summary()

In [None]:
preds_test_nn = NN_model.predict(X_test)
# Our predictions
plt.scatter(y_test,preds_test_nn)
# Perfect predictions
plt.plot(y_test,y_test,'lime')
plt.xlabel("Actual", fontsize = 18)
plt.ylabel("Predicted", fontsize = 18)

In [None]:
preds_train = NN_model.predict(X_train)
d1 = {'Test' : [metrics.mean_absolute_error(y_test, preds_test_nn),
                metrics.mean_squared_error(y_test, preds_test_nn),
                np.sqrt(metrics.mean_squared_error(y_test, preds_test_nn))],
     'Train' : [metrics.mean_absolute_error(y_train, preds_train),
                metrics.mean_squared_error(y_train, preds_train),
               np.sqrt(metrics.mean_squared_error(y_train, preds_train))]}
m = pd.DataFrame(d1,['MAE','MSE','RMSE'])
m.style.format("{:.4f}")

In [None]:
preds_test_en = preds_test_en.values.flatten().tolist()
preds_test_nn = preds_test_nn.flatten().tolist()
sns.distplot(y_test-preds_test_nn, bins = 20,color = 'green', hist = False, label = "ANN")
sns.distplot(y_test-preds_test_en, bins = 20,color = 'yellow', hist = False, label = "ElasticNetCV")
plt.legend(prop={'size': 12})

#### For this dataset, you can notice that the ANN actually did a slightly better job in predicting. The MSE for ANN is only around 18 compared to the the 23ish MSEs of the ElasticNetCV model. This can be visualized through the actualvspredicted scatterplots of the 2 models (the actualvspredicted scatterplot of ANN is more densed towards the perfect line, indicating better fit) and the histogram (The yellow line represents the ElasticNetCV model and the green one is the ANN. We can notice that the ANN model (the green line) had more errors which lie closer to zero, indicating that it did a better job predicting)

#### That's it! If you find this notebook helpful, please upvote :)