### Task / Question
1. Load the given dataset into a pandas data frame.
2. Develop a Multi-Layer Perceptron (MLP) model to regress the target variable y.
3. Evaluate the model using appropriate metrics.
4. Your MLP model should have at least 3 hidden layers and 1 output layer. You can use any number of neurons in each layer however, you should be able to justify your choice.
5. You can use any activation function in the hidden layers however, you should be able to justify your choice. Remember this is a regression problem hence referring to the equation *y = mx + c* might help.

#### Items to consider:
1. Dataset has 200 features and 1 target variable. Not all features are relevant for the target variable hence apply feature selection techniques to select the most relevant features.
2. Use appropriate data pre-processing techniques to prepare the data for the model.
3. You can use:
    - Pytorch
    - Tensorflow v2
    - Scikit-learn (MLPRegressor)
    - Custom implementation of MLP

    <br><br>



### Deliverables:
1. **Jupyter notebook** with the code and results included
    - name the notebook using the format: **A-111111.ipynb**, where A is your group and 111111 is your student number. Any other fill names will lead to an automatic 0 mark.
    - Do not attempt to submit your file via email. If E-Learning fails on the deadline, an alternative submission method will be provided; to avoid this chaos simply  do your work before the deadline.
    - Any slight hint of plagiarism will lead to an automatic 0 mark.

--------------------

## Data Import and Cleaning

In [62]:
# imports

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

from sklearn.neural_network import MLPRegressor


In [46]:
# loading dataset
data = pd.read_csv('data.csv')
data.sample(10)

Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,...,X192,X193,X194,X195,X196,X197,X198,X199,X200,y
960,-0.214494,-0.526423,1.737889,0.363136,-0.833223,1.463178,0.867419,0.667468,0.163676,-0.537552,...,-0.545196,0.211031,0.191622,-1.335153,-0.228597,0.853606,-1.26184,0.478774,1.584665,231.99202
698,0.607274,-0.210262,-1.533856,-1.852737,-0.438336,0.531767,-0.19932,-1.088554,-0.09751,0.378743,...,1.009495,0.206452,-0.724171,0.497313,0.057857,-1.274429,0.003254,0.87353,1.182927,14.053699
697,0.911011,0.165072,0.379275,0.503359,-0.448846,-0.781213,-0.976645,-1.695327,-0.19179,0.504117,...,-1.897659,-1.747071,0.663051,0.313583,-2.058141,0.661069,1.463502,-0.873551,-0.935928,-369.678159
338,0.09761,-0.243182,-0.598185,1.588426,-1.083978,-0.10672,0.549441,0.123725,1.198469,0.483194,...,-0.08282,-0.5043,0.002888,0.522576,-0.812748,1.408881,-0.39176,1.720578,1.060746,-139.980854
1233,0.44137,-0.786063,-0.732875,0.866461,-0.494958,-0.567279,-0.347578,-0.787476,-2.162638,-0.563442,...,0.987795,0.244834,0.978485,0.894197,-0.306384,0.073213,-0.952186,0.577228,-0.748844,-186.03477
1056,-0.013216,-0.424919,-0.681891,0.174947,0.600912,0.436029,0.836297,0.412544,-1.062567,-0.281578,...,-0.122084,0.992812,-0.111512,0.38595,0.19217,-0.71003,0.538972,-0.197348,-0.217693,292.190962
1173,0.602242,0.398334,-0.829134,-1.100999,1.990874,-0.618697,-1.130042,-1.271964,-0.601665,-1.029683,...,-0.429916,1.580374,0.431153,-0.395067,-1.62027,1.223682,0.773566,-0.487254,0.31794,-229.696688
723,0.531438,-0.483741,0.986426,1.640005,1.162261,1.285156,-0.707655,0.271801,0.818334,-0.07716,...,0.819192,-1.924285,-0.16729,0.188999,2.319434,1.0309,0.236533,-0.460329,-0.661208,-65.883302
373,0.259501,0.580974,1.186829,-0.747786,0.042188,0.672187,-0.172325,0.703192,1.142885,1.152579,...,-0.162724,-0.802573,-0.625687,-0.831675,1.418508,-1.469225,-1.108635,0.955053,0.159341,-151.175024
685,,0.27927,-1.00054,-1.041783,0.124758,0.272724,0.207792,-1.085783,0.386609,0.988844,...,0.969003,-0.954743,0.58994,-1.606696,0.262953,0.108388,-0.228276,0.055086,1.659464,48.814088


In [47]:
data.dropna(inplace=True)
data.sample(5)

Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,...,X192,X193,X194,X195,X196,X197,X198,X199,X200,y
677,0.731869,-1.096852,0.198551,1.108424,-0.600812,0.722716,-0.332837,-0.436027,-0.812805,-0.391231,...,0.109321,-0.82796,0.03053,-0.268029,1.521622,0.364885,0.558242,0.860434,0.513657,-142.554141
888,1.041167,-1.356877,1.170723,-0.192536,-0.010444,-1.059061,-0.552038,-0.556104,1.435185,-0.376649,...,0.610066,-0.49952,-1.536939,-1.110738,0.663073,-1.174878,0.016668,0.14402,0.429925,230.218687
743,-0.142184,0.677312,1.673367,-1.631981,0.896933,-0.736006,0.955523,0.473728,0.399525,2.004731,...,0.172301,0.444955,0.052372,0.567104,-0.47764,1.033732,-0.71429,-0.787753,-1.367362,-66.363159
178,0.515979,0.25877,-0.232026,-0.110359,-0.821613,1.278475,-0.550324,0.700158,1.495035,-0.447937,...,-1.255225,1.273991,-0.069985,-0.949091,1.504873,1.536638,1.026819,-2.021322,1.291818,146.94392
1325,-0.612028,1.702648,0.873366,0.224716,-0.406032,-1.473677,1.725432,1.325353,-0.119919,0.893863,...,-1.24546,-2.305707,-0.761352,0.269793,1.498866,1.421475,-2.102509,-0.045771,0.983663,-192.974407


In [48]:
y = data['y']
y.sample(5)

1404   -201.137022
927      41.557819
936    -101.220993
451     364.910117
317      73.397303
Name: y, dtype: float64

In [49]:
x = data.drop(columns=['y'])
x.sample(5)

Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,...,X191,X192,X193,X194,X195,X196,X197,X198,X199,X200
851,-1.110527,-2.043829,-0.731589,0.873199,-1.433032,0.102718,-0.26157,-0.409058,1.133024,0.954732,...,-0.327056,0.193464,0.565812,-1.355431,1.18612,0.691204,-0.830774,-0.10653,-1.206052,2.252094
1072,0.235448,2.019877,-0.735402,1.078222,-1.94069,-0.488953,1.03684,-0.533786,0.189202,0.053152,...,-1.848562,-0.276505,1.001327,-0.068792,-0.952933,-1.563621,1.296935,1.287299,0.135743,-0.299573
1402,1.41045,1.528265,0.796285,-0.154407,1.035801,1.489065,-0.602469,-1.129376,-1.025087,-0.979529,...,1.51782,2.561999,0.029412,-1.138378,-1.488756,0.55454,-0.088691,0.427252,-0.857795,2.036639
790,0.245045,-1.028075,-0.32086,-0.589036,1.090711,1.092439,0.528472,0.980878,-1.13085,1.801174,...,1.053027,0.235578,-0.04016,0.28459,1.472368,-0.260753,-0.498578,0.448862,-0.315314,-1.601842
207,-0.561531,-0.273908,0.399256,-0.473119,0.380504,1.150283,0.144404,0.492265,-0.163101,-1.365345,...,-2.075239,0.91537,-0.305831,-1.492897,1.281821,-0.153057,0.943402,-0.805119,-0.108138,0.224261


In [54]:
# train test splits

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3, shuffle=True)

In [55]:
scaler = StandardScaler()

x_train_scaled = scaler.fit_transform(x_train)
x_test_scaled = scaler.fit_transform(x_test)

## Feature Selection using Linear Regression

In [57]:
lr = LinearRegression()
lr.fit(x_train_scaled,y_train)

feature_coeffs = lr.coef_
feature_coeffs

array([-6.36162195e-03, -2.61905579e-03,  2.62621516e-03,  6.62678597e-03,
        1.29668952e-03,  9.12643150e+01,  6.21518077e-03, -7.72436669e-04,
        6.59765792e-03, -8.07933354e-04, -3.71842455e-04, -1.50835511e-03,
        4.13605592e-04, -1.37028595e-03,  6.50879979e-03, -9.72610295e-04,
        2.72102378e+01, -1.46400778e-03, -5.02243593e-04,  1.53171161e-03,
        2.31693745e-03, -5.48509976e-03,  2.74349198e-03, -2.50256434e-03,
        2.61836051e-03,  1.51146793e-03,  3.51989936e-03,  3.41365093e+00,
        1.74428954e-03, -4.38865463e-03,  4.53538602e-03, -7.88059872e-03,
       -6.02482316e-04, -1.71154406e-03,  2.56746272e-03, -4.21485958e-03,
        2.00064442e-03, -4.58061905e-03, -4.11307331e-03, -3.56022940e-03,
        1.23912199e-03, -4.58340027e-03,  7.02367826e-03,  8.65671482e-04,
       -1.04630980e-03,  3.71063477e-03,  4.56488538e-03,  3.99541609e-04,
        5.21006034e-03,  9.77879712e-04, -5.22726142e-03,  8.98105648e+01,
       -3.36162978e-03,  

In [59]:
sorted_coeffs = sorted(enumerate(feature_coeffs), key=lambda x: abs(x[1]), reverse=True)
sorted_coeffs

[(5, 91.26431500307883),
 (51, 89.81056482903098),
 (112, 77.07848993969463),
 (143, 64.4365280468953),
 (180, 59.87450966966119),
 (165, 59.666025427008165),
 (90, 36.28250105292701),
 (16, 27.21023784832074),
 (160, 14.146589745971404),
 (27, 3.4136509293518316),
 (107, 0.010118482153932393),
 (129, 0.010100207875092915),
 (126, 0.009337950855655919),
 (184, 0.008844097141829987),
 (62, -0.008620493295929066),
 (169, 0.008473413638278693),
 (111, -0.00846729010036995),
 (88, -0.00800457608077565),
 (31, -0.00788059871978497),
 (172, -0.007539930156088559),
 (56, 0.007070376541328471),
 (42, 0.007023678255338339),
 (147, -0.006889892230146799),
 (3, 0.0066267859702655585),
 (148, 0.006621663073719475),
 (8, 0.006597657917706279),
 (188, -0.006560280909316596),
 (14, 0.0065087997879818005),
 (104, -0.006449114075764262),
 (0, -0.006361621954950521),
 (6, 0.006215180771615536),
 (174, -0.006196853999117735),
 (87, 0.006069464373945843),
 (100, -0.005968095557019382),
 (93, 0.00580453456

In [61]:
# I chose to use the 10 largest features. 
selected_features = [x.columns[i] for i, _ in sorted_coeffs[:10]]
selected_features

['X6', 'X52', 'X113', 'X144', 'X181', 'X166', 'X91', 'X17', 'X161', 'X28']

## Creating the Multi-Layer Perceptron Model 

- ReLu was used as it more computationally efficient and does not suffer from the Vanishing Gradient issue

In [76]:


model = MLPRegressor(hidden_layer_sizes=(1000, 1000, 1000), activation='relu')
model.fit(x_train_scaled, y_train)

### Metrics and Testing

In [78]:
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score

In [81]:
y_pred = model.predict(x_test_scaled)

mean = mean_squared_error(y_test,y_pred)
r2score = r2_score(y_test,y_pred)


In [82]:
print("Mean Squared Error (MSE):", mean)
print("R-squared (R2) Score:", r2score)

Mean Squared Error (MSE): 3184.3672298031506
R-squared (R2) Score: 0.9014848996732575
