### Task / Question
1. Load the given dataset into a pandas data frame.
2. Develop a Multi-Layer Perceptron (MLP) model to regress the target variable y.
3. Evaluate the model using appropriate metrics.
4. Your MLP model should have at least 3 hidden layers and 1 output layer. You can use any number of neurons in each layer however, you should be able to justify your choice.
5. You can use any activation function in the hidden layers however, you should be able to justify your choice. Remember this is a regression problem hence referring to the equation *y = mx + c* might help.

#### Items to consider:
1. Dataset has 200 features and 1 target variable. Not all features are relevant for the target variable hence apply feature selection techniques to select the most relevant features.
2. Use appropriate data pre-processing techniques to prepare the data for the model.
3. You can use:
    - Pytorch
    - Tensorflow v2
    - Scikit-learn (MLPRegressor)
    - Custom implementation of MLP

    <br><br>



### Deliverables:
1. **Jupyter notebook** with the code and results included
    - name the notebook using the format: **A-111111.ipynb**, where A is your group and 111111 is your student number. Any other fill names will lead to an automatic 0 mark.
    - Do not attempt to submit your file via email. If E-Learning fails on the deadline, an alternative submission method will be provided; to avoid this chaos simply  do your work before the deadline.
    - Any slight hint of plagiarism will lead to an automatic 0 mark.

--------------------

## Data Import and Cleaning

In [33]:
# imports

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

from sklearn.neural_network import MLPRegressor


In [34]:
# loading dataset
data = pd.read_csv('data.csv')
data.sample(10)

Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,...,X192,X193,X194,X195,X196,X197,X198,X199,X200,y
552,1.198482,0.421901,-0.057357,-0.812432,0.002842,0.808387,-1.088023,0.467322,1.083326,-0.361963,...,0.793922,-0.594955,0.6451,0.21566,-0.614969,-0.598725,1.98343,1.6513,0.635038,264.993304
832,-0.790611,1.798816,0.507754,0.153409,0.199356,2.756274,0.274984,0.812156,-0.585318,0.422948,...,1.183591,-0.884588,0.535226,-0.804356,-0.764488,-0.426859,1.824522,0.909793,-1.356508,453.936171
884,0.23948,0.142194,0.224224,-0.550874,0.333568,0.167627,-0.289393,-0.442021,-0.872856,0.341103,...,-1.862876,0.461606,-1.080555,0.220297,0.458183,-0.282206,-2.458803,1.644938,0.602713,242.35648
22,1.230011,-1.885474,0.372863,0.693736,-0.182096,0.137533,-1.457881,0.927342,0.40078,-1.252192,...,-0.30833,-1.6532,0.248096,-1.041215,-0.309016,2.151667,1.83962,-0.929768,0.280111,-55.494916
435,0.614802,-0.135875,-0.620497,0.534948,0.674651,0.390903,0.122188,-0.443522,0.917748,1.231139,...,0.287304,1.069283,-2.43997,1.160076,-1.964114,-0.921588,0.011535,-0.393941,1.1692,146.931694
1487,1.437298,-0.481727,0.688852,-1.654683,0.921132,-0.430467,-3.469917,-0.226978,1.940819,0.058462,...,0.720803,-1.422832,-0.903205,-0.609044,-0.545022,2.371244,1.254854,1.082608,0.248875,-169.736576
732,1.221228,-0.001339,-0.458483,-1.649126,-1.230147,0.759465,0.134108,-0.341343,-0.963062,0.056984,...,1.087657,-0.808774,0.246261,-1.403427,-0.526939,-0.181756,0.288818,-0.215641,0.785585,76.400889
1347,-0.161198,0.185536,-0.478614,0.104053,-1.171813,-0.092432,-1.470237,0.353567,0.606548,0.316637,...,-1.552699,1.021248,0.725523,0.398598,-0.4382,0.932721,-0.019001,0.599213,-0.423478,114.618357
1109,0.004391,-1.346463,-1.616855,0.096739,0.593328,-1.42708,-0.180624,-0.104666,-0.143237,2.499584,...,1.393088,-0.888725,0.351978,1.114421,1.075549,-1.088599,-0.943326,-2.617266,-0.277799,-164.361068
522,-0.89613,1.310159,1.112349,-0.222219,0.364094,-2.029945,0.935192,0.111382,0.594272,0.625218,...,-0.349782,0.234437,-0.090169,0.260827,-0.845195,1.593686,-0.309679,-0.894137,0.477895,-262.052547


In [35]:
data.dropna(inplace=True)
data.sample(5)

Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,...,X192,X193,X194,X195,X196,X197,X198,X199,X200,y
1389,-0.289301,-1.140028,0.34763,-0.001174,-0.458486,0.322466,-0.268243,-0.286404,0.191765,-1.737272,...,1.882603,-0.894385,-0.34298,-0.538579,-0.035581,1.169177,0.463086,-0.241061,0.478705,194.080942
1342,-0.616798,-0.235736,1.412722,-0.461596,0.404542,0.074411,-0.11459,-1.487059,-1.36379,-1.518844,...,-0.531533,-1.55,0.712583,-1.508111,-0.191294,-0.711515,1.752392,0.081233,0.380195,17.9452
451,0.975304,0.105875,-0.093141,0.225171,-0.199734,-0.195351,-0.090551,0.443272,0.184787,0.858452,...,0.147098,0.726038,0.304127,-0.551583,2.137897,-0.291877,0.961311,-1.252255,-0.620422,364.910117
356,0.733106,2.163096,0.943709,-0.329514,-0.033183,-2.415496,1.739502,-0.066849,-2.157229,2.043673,...,0.28024,1.231035,1.094091,2.067874,1.290005,-1.137047,-1.176001,-0.161057,1.163689,-252.32127
246,-0.689208,-1.698772,-0.81709,-0.088058,-1.679901,-0.831272,-0.388137,1.076688,-0.896236,0.495671,...,1.645653,0.45585,-0.224992,-0.807427,0.799503,-1.931635,-0.224008,0.880623,0.341889,-111.935079


In [36]:
y = data['y']
y.sample(5)

1361   -221.792346
1161    290.731470
871     256.927693
340     -64.236683
739     -68.875636
Name: y, dtype: float64

In [37]:
x = data.drop(columns=['y'])
x.sample(5)

Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,...,X191,X192,X193,X194,X195,X196,X197,X198,X199,X200
118,-1.431263,-0.536969,-0.501747,0.305498,0.815045,-0.663575,-0.710457,-0.364703,1.513457,-0.707616,...,1.505872,0.00827,-0.191576,-0.444968,1.846716,2.239031,2.899004,-1.083305,2.788915,-1.163532
444,0.401925,-0.002621,0.544598,1.728027,0.621519,-0.657992,-0.904062,0.379202,0.084496,0.835067,...,0.763544,0.520162,-0.280931,-0.630908,0.309915,1.77151,-0.744908,0.164873,1.105228,-1.094875
377,-0.992878,0.829782,0.172035,1.366668,-1.187266,0.21656,-1.246236,-1.486613,-0.41779,1.045471,...,0.188173,-1.822095,0.071484,0.348904,0.282944,2.575357,-0.087626,0.305523,0.031095,-1.438955
889,0.242679,-0.717851,-0.323027,-0.955156,0.359084,2.913601,0.846076,1.626375,0.613599,1.300746,...,0.309821,-0.40278,-0.507388,0.556702,-0.005537,0.654223,0.208303,0.49654,0.230727,0.538992
1374,-1.822366,0.1095,0.694077,-0.552474,0.242206,0.418946,-1.702111,0.610749,-1.690544,1.763407,...,-1.842685,-1.378249,-0.30788,0.189425,-0.626897,1.506168,-0.081037,0.117088,-0.257312,0.962271


In [38]:
# train test splits

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3, shuffle=True)

## Feature Selection using Linear Regression

In [39]:
lr = LinearRegression()
lr.fit(x_train,y_train)

feature_coeffs = lr.coef_
feature_coeffs

array([-3.15782639e-03, -3.48218194e-03, -2.44049383e-04,  3.48866280e-03,
       -7.06460552e-04,  8.80710716e+01,  6.20334841e-03,  7.69349873e-04,
        2.92707426e-03, -2.89730479e-03,  2.22115082e-03, -3.39477650e-04,
       -2.07757312e-03, -1.24352531e-03,  1.55257344e-03, -3.62728399e-03,
        2.78169135e+01,  5.06120892e-03, -1.00894935e-03, -1.48597773e-03,
        3.99362129e-03, -2.62918462e-03,  9.90355611e-04, -1.66738983e-03,
        3.83447133e-03, -4.31962872e-04,  4.91385552e-03,  3.38119289e+00,
        2.88537446e-04, -1.32636184e-03,  1.18350719e-03, -5.24110302e-03,
        2.19606100e-03, -3.61192496e-04,  1.18905978e-03, -6.57885244e-03,
       -2.19019371e-03, -3.31756921e-03, -6.42587875e-03, -2.58682944e-04,
        2.82982186e-03, -3.87244424e-03,  4.03631401e-03, -1.54592812e-03,
       -2.09946029e-03,  8.37368908e-04,  3.43370702e-03, -1.04145071e-03,
        3.60729993e-03,  3.25681442e-03, -4.19313854e-03,  8.91101836e+01,
       -2.68955625e-04,  

In [40]:
sorted_coeffs = sorted(enumerate(feature_coeffs), key=lambda x: abs(x[1]), reverse=True)
sorted_coeffs

[(51, 89.11018356477592),
 (5, 88.07107162767785),
 (112, 76.62111358123776),
 (143, 62.800616834667224),
 (180, 60.4867086230275),
 (165, 58.65455428492043),
 (90, 36.02321044344896),
 (16, 27.816913506930547),
 (160, 13.775481282802762),
 (27, 3.381192890752722),
 (62, -0.013717706029835686),
 (126, 0.009981480212875482),
 (93, 0.008734922341807394),
 (129, 0.00869262554866368),
 (167, 0.008026258957482213),
 (107, 0.0074096643606083035),
 (152, 0.007355690008184368),
 (142, -0.007250410209803437),
 (169, 0.00722395332400394),
 (128, 0.006746983361852532),
 (127, 0.006671013809489423),
 (196, 0.006654184893911452),
 (184, 0.006649197623886494),
 (35, -0.0065788524355383515),
 (38, -0.0064258787477049495),
 (147, -0.006356948378722027),
 (154, -0.0062943312521628325),
 (6, 0.006203348405687592),
 (144, -0.006033613316359876),
 (188, -0.005970235094959309),
 (153, -0.005957879165659374),
 (113, 0.005882746905444325),
 (172, -0.005800970865406541),
 (195, 0.00575401082460214),
 (111, -0

In [41]:
# I chose to use the 10 largest features. 
selected_features = [x.columns[i] for i, _ in sorted_coeffs[:10]]
selected_features

['X52', 'X6', 'X113', 'X144', 'X181', 'X166', 'X91', 'X17', 'X161', 'X28']

In [54]:
features = pd.DataFrame()
x_testSelected = pd.DataFrame()
y_testSelected = pd.DataFrame()
y_trainSelected = pd.DataFrame()

for x in selected_features:
    features[x] = x_train[x]
    x_testSelected[x] = x_test[x]


# features.sample(10)
x_testSelected.sample(10)

Unnamed: 0,X52,X6,X113,X144,X181,X166,X91,X17,X161,X28
474,-1.471343,-0.956339,-0.722024,0.034262,-1.201379,-0.832211,0.087529,1.011444,0.040078,2.097175
121,1.3971,2.186454,-0.347654,0.003133,-1.147956,0.833797,0.165474,0.547276,1.008831,0.006714
345,2.010647,0.561278,0.656782,0.540725,-0.251827,-0.958581,-0.628236,-0.697704,0.791645,0.955113
420,0.445199,3.006767,-1.212275,0.037149,-0.032186,0.273216,-0.257512,0.7756,2.000401,-1.585383
393,0.503396,-0.358197,0.67847,1.487509,2.145514,0.587276,1.794287,-1.494226,1.086311,0.827131
1084,-1.167801,1.488914,-0.41037,0.974631,-0.886858,-0.465955,0.153882,0.972987,1.605012,0.417313
903,0.026394,-0.044161,-1.33304,0.809876,0.547587,0.32526,-0.699058,0.461897,-0.455463,-1.245215
1082,-0.83222,1.026255,-0.110402,1.134146,-0.160426,-0.22681,-0.206112,-0.600732,-1.931564,0.969964
6,-0.801304,-0.253788,-0.001085,0.450219,-0.520041,-1.109464,-0.180564,0.565221,-2.210585,-0.512045
13,0.328279,-1.174165,1.272636,-0.427566,2.049363,0.156537,0.867779,-0.735136,2.236855,0.569347


## Creating the Multi-Layer Perceptron Model 

- ReLu was used as it more computationally efficient and does not suffer from the Vanishing Gradient issue

In [43]:


model = MLPRegressor(hidden_layer_sizes=(1000, 1000, 1000), activation='relu')
model.fit(features, y_train)

### Metrics and Testing

In [28]:
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score

In [55]:
y_pred = model.predict(x_testSelected)

mean = mean_squared_error(y_test,y_pred)
r2score = r2_score(y_test,y_pred)


In [56]:
print("Mean Squared Error (MSE):", mean)
print("R-squared (R2) Score:", r2score)

Mean Squared Error (MSE): 13.495221266365379
R-squared (R2) Score: 0.9996404770816795
