<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Pre-Processing-&amp;-Training-Data-Development" data-toc-modified-id="Pre-Processing-&amp;-Training-Data-Development-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Pre-Processing &amp; Training Data Development<a id="4_Preprocessing_Training_DataDevelopment"></a></a></span><ul class="toc-item"><li><span><a href="#Imports" data-toc-modified-id="Imports-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Imports<a id="4.1_Imports"></a></a></span></li><li><span><a href="#Loading-the-Data" data-toc-modified-id="Loading-the-Data-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Loading the Data<a id="4.2_Loading_Data"></a></a></span></li><li><span><a href="#Checking-Correlations" data-toc-modified-id="Checking-Correlations-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Checking Correlations<a id="4.3_Checking_Correlations"></a></a></span></li><li><span><a href="#Defining-Target-and-Predictors" data-toc-modified-id="Defining-Target-and-Predictors-4.4"><span class="toc-item-num">4.4&nbsp;&nbsp;</span>Defining Target and Predictors<a id="4.4_Defining_Target_Predictors"></a></a></span></li><li><span><a href="#Features-Selection" data-toc-modified-id="Features-Selection-4.5"><span class="toc-item-num">4.5&nbsp;&nbsp;</span>Features Selection<a id="4.5_Features_Selection"></a></a></span></li><li><span><a href="#Summary-of-Pre-Processing-&amp;-Training-Data-Development-Steps" data-toc-modified-id="Summary-of-Pre-Processing-&amp;-Training-Data-Development-Steps-4.6"><span class="toc-item-num">4.6&nbsp;&nbsp;</span>Summary of Pre-Processing &amp; Training Data Development Steps<a id="4.6_Preprocessing_Summary"></a></a></span></li></ul></li></ul></div>

# Pre-Processing & Training Data Development<a id='4_Preprocessing_Training_DataDevelopment'></a>

The focus of this notebook is on the pre-processing and training data development and the following steps will be done:

- **Data loading** - Data from the previous EDA phase will be loaded for this phase
- **Checking correlations** - Correlations will be checked out
- **Target and Predictors Selection** - Target feature will be selected and the balance features will be part of the predictors
- **Features selection** - This is the most important step where a list of final features will be selected based on the Lasso regression model

The list of final features will be saved for the machine learning models. This phase is focused on making data ready for the machine learning steps as shown below:

![MLinfograph4.jpg](attachment:MLinfograph4.jpg)

## Imports<a id='4.1_Imports'></a>

In [1]:
# Importing required packages and libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# for model building to select features
from sklearn.linear_model import Lasso
from sklearn.feature_selection import SelectFromModel

# to visualise all the columns in the dataframe
pd.pandas.set_option('display.max_columns', None)

import warnings
warnings.simplefilter(action='ignore')

## Loading the Data<a id='4.2_Loading_Data'></a>

In [2]:
# Loading the X_train and X_test CSV files from the data folder
X_train = pd.read_csv('../HousePricesPrediction/data/xtrain.csv')
X_test = pd.read_csv('../HousePricesPrediction/data/xtest.csv')

X_train.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,HouseStyle,OverallQual,OverallCond,YearBuilt,YearRemodAdd,RoofStyle,RoofMatl,Exterior1st,Exterior2nd,MasVnrType,MasVnrArea,ExterQual,ExterCond,Foundation,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinSF1,BsmtFinType2,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,Heating,HeatingQC,CentralAir,Electrical,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,KitchenQual,TotRmsAbvGrd,Functional,Fireplaces,FireplaceQu,GarageType,GarageYrBlt,GarageFinish,GarageCars,GarageArea,GarageQual,GarageCond,PavedDrive,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice,LotFrontage_na,MasVnrArea_na,GarageYrBlt_na
0,6.836259,0.0,0.75,0.461171,0.377048,1.0,1.0,0.333333,1.0,1.0,0.0,0.0,0.863636,0.4,1.0,0.75,0.6,0.90309,0.732487,0.014706,0.04918,0.0,0.0,1.0,1.0,0.0,0.0,0.666667,1.0,1.0,0.75,0.75,0.75,1.0,0.002835,0.666667,0.0,0.673479,0.239935,1.0,1.0,1.0,1.0,0.55976,0.0,0.0,0.52325,0.0,0.0,0.666667,0.0,0.375,0.333333,0.666667,0.643793,1.0,0.0,0.2,0.8,0.018692,1.0,0.75,0.430183,0.666667,1.0,1.0,0.116686,0.032907,0.0,0.0,0.0,0.0,0.0,0.75,1.0,0.0,0.783092,0.750187,0.666667,0.75,12.21106,0.0,0.0,0.0
1,6.487684,0.0,0.75,0.456066,0.399443,1.0,1.0,0.333333,0.333333,1.0,0.0,0.0,0.363636,0.4,1.0,0.75,0.6,0.69897,0.885622,0.360294,0.04918,0.0,0.0,0.6,0.6,0.666667,0.03375,0.666667,1.0,0.5,0.5,0.75,0.25,0.666667,0.142807,0.666667,0.0,0.114724,0.17234,1.0,1.0,1.0,1.0,0.434539,0.0,0.0,0.406196,0.333333,0.0,0.333333,0.5,0.375,0.333333,0.666667,0.47088,1.0,0.0,0.2,0.8,0.457944,0.666667,0.25,0.220028,0.666667,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,1.0,0.0,0.836829,0.500249,0.666667,0.75,11.887931,0.0,0.0,0.0
2,3.828641,0.795881,0.75,0.394699,0.347082,1.0,1.0,0.0,0.333333,1.0,0.0,0.0,0.954545,0.4,1.0,1.0,0.6,0.954243,0.732487,0.036765,0.098361,1.0,0.0,0.3,0.2,0.666667,0.2575,1.0,1.0,1.0,1.0,0.75,0.25,1.0,0.080794,0.666667,0.0,0.601951,0.286743,1.0,1.0,1.0,1.0,0.627205,0.0,0.0,0.586296,0.333333,0.0,0.666667,0.0,0.25,0.333333,1.0,0.564575,1.0,0.333333,0.8,0.8,0.046729,0.666667,0.5,0.406206,0.666667,1.0,1.0,0.228705,0.149909,0.0,0.0,0.0,0.0,0.0,0.75,1.0,0.0,0.278943,1.0,0.666667,0.75,12.675764,0.0,0.0,0.0
3,7.207119,0.0,0.75,0.388581,0.493677,1.0,1.0,0.666667,0.666667,1.0,0.0,0.0,0.454545,0.4,1.0,0.75,0.6,0.845098,0.732487,0.066176,0.163934,0.0,0.0,1.0,1.0,0.0,0.0,0.666667,1.0,1.0,0.75,0.75,1.0,1.0,0.25567,0.666667,0.0,0.018114,0.242553,1.0,1.0,1.0,1.0,0.56692,0.0,0.0,0.529943,0.333333,0.0,0.666667,0.0,0.375,0.333333,0.666667,0.47088,1.0,0.333333,0.4,0.8,0.084112,0.666667,0.5,0.362482,0.666667,1.0,1.0,0.469078,0.045704,0.0,0.0,0.0,0.0,0.0,0.75,1.0,0.0,0.836829,0.250187,0.666667,0.75,12.278393,1.0,0.0,0.0
4,4.025352,0.0,0.75,0.577658,0.402702,1.0,1.0,0.333333,0.333333,1.0,0.0,0.0,0.363636,0.4,1.0,0.75,0.6,0.778151,0.732487,0.323529,0.737705,0.0,0.0,0.6,0.7,0.666667,0.17,0.333333,1.0,0.5,0.5,0.75,0.25,0.333333,0.086818,0.666667,0.0,0.434278,0.233224,1.0,0.75,1.0,1.0,0.549026,0.0,0.0,0.513216,0.0,0.0,0.666667,0.0,0.375,0.333333,0.333333,0.643793,1.0,0.333333,0.8,0.8,0.411215,0.666667,0.5,0.406206,0.666667,1.0,1.0,0.0,0.0,0.0,0.801181,0.0,0.0,0.0,0.75,1.0,0.0,0.783092,0.500249,0.666667,0.75,12.103486,0.0,0.0,0.0


In [3]:
# Checking data again to see if everything is integer now
X_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1314 entries, 0 to 1313
Data columns (total 84 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Id              1314 non-null   float64
 1   MSSubClass      1314 non-null   float64
 2   MSZoning        1314 non-null   float64
 3   LotFrontage     1314 non-null   float64
 4   LotArea         1314 non-null   float64
 5   Street          1314 non-null   float64
 6   Alley           1314 non-null   float64
 7   LotShape        1314 non-null   float64
 8   LandContour     1314 non-null   float64
 9   Utilities       1314 non-null   float64
 10  LotConfig       1314 non-null   float64
 11  LandSlope       1314 non-null   float64
 12  Neighborhood    1314 non-null   float64
 13  Condition1      1314 non-null   float64
 14  Condition2      1314 non-null   float64
 15  BldgType        1314 non-null   float64
 16  HouseStyle      1314 non-null   float64
 17  OverallQual     1314 non-null   f

## Checking Correlations<a id='4.3_Checking_Correlations'></a>

In [4]:
# Checking correlations
X_train.corr()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,HouseStyle,OverallQual,OverallCond,YearBuilt,YearRemodAdd,RoofStyle,RoofMatl,Exterior1st,Exterior2nd,MasVnrType,MasVnrArea,ExterQual,ExterCond,Foundation,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinSF1,BsmtFinType2,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,Heating,HeatingQC,CentralAir,Electrical,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,KitchenQual,TotRmsAbvGrd,Functional,Fireplaces,FireplaceQu,GarageType,GarageYrBlt,GarageFinish,GarageCars,GarageArea,GarageQual,GarageCond,PavedDrive,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice,LotFrontage_na,MasVnrArea_na,GarageYrBlt_na
Id,1.000000,0.035935,0.039426,-0.016011,-0.011624,-0.017975,0.004232,-0.030461,0.021622,-0.015503,-0.045902,0.041245,0.006023,-0.011182,0.045701,-0.000186,-0.012062,-0.013943,0.012577,0.011197,0.015181,0.045431,-0.004938,-0.004338,-0.011288,-0.013084,-0.039238,0.014617,0.023492,-0.002340,-0.027185,0.015190,0.029138,-0.011672,-0.009293,0.009501,0.011794,0.002610,-0.002884,-0.036506,-0.020154,0.008747,0.053466,0.016837,0.005860,-0.049034,0.011224,0.002138,-0.026726,-0.004610,0.022014,0.043074,-0.019304,0.031263,0.019234,-0.032089,-0.008537,0.010826,0.009288,0.010342,0.025676,-0.006516,0.006512,0.010035,-0.004648,0.003711,-0.017655,-0.000901,-0.010445,-0.032352,0.018066,0.035207,0.033550,-0.003032,0.033747,-0.000048,0.034827,-0.001180,-0.004849,0.020755,-0.006886,0.019624,0.024597,-0.004082
MSSubClass,0.035935,1.000000,-0.216252,-0.402560,-0.377494,-0.003035,-0.117224,-0.079496,-0.028740,0.031311,-0.060314,-0.035160,-0.009111,-0.038455,-0.040371,-0.192555,0.238647,0.090629,-0.039833,0.005793,-0.047038,-0.181738,-0.019759,-0.006696,-0.025584,-0.002809,0.035511,0.040708,-0.031512,0.061590,0.058444,-0.008170,0.029513,0.149138,-0.128393,0.046787,-0.089479,-0.120243,-0.286738,-0.059026,-0.004318,-0.115638,0.008025,-0.317157,0.487079,0.071442,0.203997,-0.053111,0.013345,0.190775,0.289383,0.075439,0.261703,0.005170,0.132407,-0.013875,-0.001301,0.010595,-0.075233,-0.073496,-0.008555,-0.016154,-0.083974,-0.048322,-0.068162,-0.067837,-0.003609,0.029393,0.033469,-0.047094,-0.012912,0.022632,0.024176,0.116109,0.023102,-0.013317,0.034729,-0.021744,-0.019486,-0.052975,-0.019033,-0.010525,-0.023925,0.062237
MSZoning,0.039426,-0.216252,1.000000,0.309330,0.334740,0.070270,0.220573,0.190665,0.046751,-0.010136,0.072783,0.012585,0.515240,0.126270,0.081431,0.064279,0.189795,0.271874,-0.093570,-0.460490,-0.251811,0.050424,0.037185,0.204701,0.210706,0.164331,0.126352,0.271526,0.152245,0.287636,0.247486,0.095680,0.105773,0.055376,0.175883,-0.054600,0.058681,0.036198,0.241478,0.099273,0.202444,0.255430,0.186416,0.274464,-0.019744,-0.111771,0.181750,0.138240,0.019903,0.258207,0.171005,0.096674,-0.123568,0.249519,0.141854,0.072573,0.170651,0.172251,0.344576,-0.260254,0.292510,0.261978,0.260863,0.176682,0.193489,0.297550,0.125348,0.081124,-0.242342,0.030520,0.030151,0.024583,0.024876,0.044207,0.015575,0.003181,-0.002563,0.014896,0.136153,0.139013,0.416743,0.085595,0.053177,-0.142017
LotFrontage,-0.016011,-0.402560,0.309330,1.000000,0.625884,-0.019887,0.134874,0.084794,0.040540,0.006593,0.052983,0.042931,0.219451,-0.007665,-0.056119,-0.030893,0.027801,0.179819,-0.027376,-0.075355,-0.064846,0.179875,0.094937,0.062066,0.089977,0.095655,0.126740,0.146218,0.042533,0.078239,0.108296,0.030578,0.125724,0.014332,0.151408,-0.024154,0.029962,0.154158,0.323293,0.007071,0.097352,0.040546,0.035453,0.415546,0.021637,0.018496,0.297268,0.055256,-0.014972,0.152353,-0.006460,0.256397,0.027412,0.134510,0.318707,0.037126,0.200301,0.209700,0.238537,-0.011674,0.202233,0.269962,0.316246,0.108902,0.109093,0.069844,0.057102,0.114386,0.021327,0.051760,0.049592,0.090932,0.106140,-0.038294,-0.019753,-0.001361,0.005020,-0.016278,0.119372,0.100488,0.322683,-0.110913,0.034506,-0.106912
LotArea,-0.011624,-0.377494,0.334740,0.625884,1.000000,-0.121928,0.103481,0.367711,0.137472,-0.024282,0.246825,0.269411,0.247929,0.066210,-0.041628,-0.089407,0.006430,0.162878,0.001017,-0.025958,-0.040077,0.167502,0.212850,0.015565,0.055347,0.041125,0.121238,0.111195,0.016946,0.017313,0.095011,0.037179,0.223616,-0.029290,0.237775,-0.026638,0.099606,0.065604,0.350391,-0.002788,0.055590,0.057103,0.042390,0.469923,0.076934,0.003742,0.381454,0.153567,0.035119,0.172333,0.036565,0.276531,-0.009182,0.125831,0.363855,-0.016009,0.333098,0.274923,0.248241,0.051354,0.191064,0.279318,0.329656,0.147837,0.134374,0.027842,0.191198,0.145317,0.002524,0.049143,0.097502,0.091103,0.098423,0.014421,-0.081303,0.048678,-0.015328,-0.025085,0.064715,0.074725,0.402494,0.133596,0.012340,-0.141714
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
SaleCondition,0.020755,-0.052975,0.139013,0.100488,0.074725,0.047650,0.019516,0.042118,0.071892,0.079663,-0.033613,-0.064542,0.245662,0.067451,0.012776,0.107645,0.098649,0.221069,0.000814,-0.246048,-0.259392,0.059966,-0.065282,0.176998,0.162670,0.150636,0.106215,0.271591,0.084458,0.214149,0.244283,0.064317,0.087706,0.128700,0.050125,0.056315,-0.064035,0.139449,0.168698,0.005206,0.234888,0.095239,0.120513,0.100673,0.050189,-0.070699,0.124333,0.005910,-0.068796,0.174551,0.067241,-0.043271,-0.076587,0.225727,0.099161,0.032665,0.109299,0.157379,0.208999,-0.190629,0.229966,0.240582,0.242862,0.148620,0.149164,0.100703,0.054373,0.079677,-0.090880,0.002499,-0.010373,-0.071239,-0.059041,0.078133,0.010303,0.007453,0.039163,-0.014512,0.486673,1.000000,0.296165,-0.033351,0.043307,-0.146441
SalePrice,-0.006886,-0.019033,0.416743,0.322683,0.402494,0.052953,0.162679,0.291714,0.149479,0.013215,0.137222,0.041956,0.742885,0.160683,0.022712,0.170050,0.299646,0.790797,0.020492,-0.585320,-0.576395,0.199880,0.105779,0.413760,0.398806,0.433178,0.423296,0.678667,0.200436,0.550732,0.650984,0.293838,0.359811,0.399348,0.375752,0.191215,0.019412,0.209785,0.610014,0.162131,0.473535,0.357125,0.300131,0.613096,0.311080,-0.040206,0.726052,0.249521,-0.004404,0.585048,0.311636,0.202220,-0.150588,0.667120,0.534851,0.142087,0.492053,0.552340,0.573658,-0.402292,0.604082,0.691922,0.655901,0.375136,0.387957,0.316746,0.332111,0.311436,-0.154432,0.059622,0.132426,0.038471,0.036888,0.182958,0.070153,-0.018978,0.046973,-0.038865,0.267824,0.296165,1.000000,0.050472,0.057254,-0.335708
LotFrontage_na,0.019624,-0.010525,0.085595,-0.110913,0.133596,-0.003670,0.072544,0.282242,0.043326,-0.059443,0.235524,0.113897,0.049858,0.116475,0.020788,0.068031,0.045794,-0.022324,0.033967,-0.048465,0.026488,-0.025472,0.052024,-0.002783,0.000911,0.010616,0.002891,-0.035986,-0.006610,-0.007828,-0.003699,0.015621,0.045924,-0.030665,0.076592,-0.028664,0.058700,-0.115957,-0.014440,0.015499,-0.059485,0.070672,0.057622,0.032408,0.008496,-0.033098,0.024360,0.069044,0.030924,-0.015284,0.052534,0.002517,-0.069373,-0.021730,-0.025425,0.011795,0.133763,0.052960,0.123813,0.020263,0.056116,0.016417,0.016968,0.079816,0.066411,0.049069,0.102697,0.026275,-0.042713,0.033821,-0.008310,0.006164,-0.001889,0.004538,-0.064025,0.083407,-0.018127,0.028555,-0.087521,-0.033351,0.050472,1.000000,-0.001889,-0.061555
MasVnrArea_na,0.024597,-0.023925,0.053177,0.034506,0.012340,0.004186,0.016486,-0.044929,-0.007432,0.001869,-0.017880,-0.015814,0.056929,0.001796,0.006223,0.008591,0.032774,0.061377,-0.020250,-0.073951,-0.064008,-0.035663,-0.009432,0.065132,0.066573,0.007012,-0.038222,0.090719,0.021744,0.072629,0.057055,0.032346,-0.008184,0.055381,0.023875,0.018379,-0.019384,0.010963,0.028978,0.009789,0.060253,0.018365,0.019766,0.015209,0.019331,-0.008252,0.033031,0.032154,-0.016211,0.053808,0.037875,-0.002778,-0.014095,0.066931,0.010832,0.016737,0.005324,0.002732,0.048271,-0.066167,0.059472,0.036616,0.032834,0.017982,0.020889,0.020182,-0.005686,0.061000,-0.024313,-0.007511,-0.018132,-0.004533,-0.004587,0.018094,0.013119,-0.006141,0.015040,0.017878,0.047678,0.043307,0.057254,-0.001889,1.000000,-0.016545


## Defining Target and Predictors<a id='4.4_Defining_Target_Predictors'></a>

In [5]:
# Defining y_train, y_test and final X_train and X_test
y_train = X_train['SalePrice']
y_test = X_test['SalePrice']

# Dropping columns from the X_train and X_test
X_train.drop(['Id', 'SalePrice'], axis=1, inplace=True)
X_test.drop(['Id', 'SalePrice'], axis=1, inplace=True)

## Features Selection<a id='4.5_Features_Selection'></a>

In [6]:
# Using Lasso regression model for the features selection
# Lower the alpha value, more features will be selected
sel_ = SelectFromModel(Lasso(alpha=0.005, random_state=0))

# Training Lasson model with X_train and y_train
sel_.fit(X_train, y_train)

SelectFromModel(estimator=Lasso(alpha=0.005, random_state=0))

In [7]:
# Printing the list of features selected by the above model
sel_features = X_train.columns[sel_.get_support()]

# Printing the number of total features and number of selected features
print("Number of total features are: {} ".format(X_train.shape[1]))
print("Number of selected features are: {}".format(len(sel_features)))

Number of total features are: 82 
Number of selected features are: 22


In [8]:
# Printing the list of selected features
print("List of selected features is as follow: \n ", sel_features)

List of selected features is as follow: 
  Index(['MSSubClass', 'MSZoning', 'Neighborhood', 'OverallQual', 'YearRemodAdd',
       'RoofStyle', 'MasVnrType', 'ExterQual', 'BsmtQual', 'BsmtExposure',
       'HeatingQC', 'CentralAir', '1stFlrSF', 'GrLivArea', 'BsmtFullBath',
       'KitchenQual', 'Fireplaces', 'FireplaceQu', 'GarageType',
       'GarageFinish', 'GarageCars', 'PavedDrive'],
      dtype='object')


In [9]:
# Converting into a series and then saving the data in the csv format
pd.Series(sel_features).to_csv('../HousePricesPrediction/data/selected_features.csv', index=False)

## Summary of Pre-Processing & Training Data Development Steps<a id='4.6_Preprocessing_Summary'></a>

Here is the summary of various Pre_Processing and Training Data Development steps:

- **Data verification** - After loading the data from the previous step of exploratory data analysis, all features were verified for the numeric type.
- **Correlations** - A quick check on the correlations was done.
- **Defining Targets and Predictors** - Target feature was defined as 'y' and then 'X' was fine-tuned by removing the 'Id' and the 'target' features.
- **Features selection** - Lasso regression model along with the SelectFromModel from sklearn was used to finalize the list of features required for the machine learning steps.
- **Data saved** - Final list of features selected was saved for the machine learning phase.