#### 1. Refer to this section for a description of the problem

__[Kaggle](https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/data)__ platform provides two datasets (_train.csv_, _test.csv_) containing information about houses, such as :

- _Id_ : an unique identifier for each property ; 
- _MSSubClass_ : the type of dwelling involved in the property sale ;
- _MSZoning_ : the general zoning classification of the property sale ;
- _LotFrontage_ : the linear feet of street connected to the property ;
- _LotArea_ : the lot size in square feet ;
- _Street_ : the type of road access to the property ;
- _Alley_ : the type of alley access to the property ;
- _LotShape_ : the general shape of the property ;
- _LandContour_ : the flatness of the property ;
- _Utilities_ : the type of utilities available in the property ;
- _LotConfig_ : the lot configuration of the property ;	
- _LandSlope_ : the slope of the property ;
- _Neighborhood_ : the physical locations within Ames city limits ;
- _Condition1_ : the proximity to various conditions ;
- _Condition2_ : the proximity to various conditions (if more than one is present) ;
- _BldgType_ : the type of dwelling ;
- _HouseStyle_ : the style of dwelling ;
- _OverallQual_ : rates the overall material and finish of the house ;	
- _OverallCond_ : rates the overall condition of the house ;
- _YearBuilt_ : the original construction date ;
- _YearRemodAdd_ : the remodel date (same as construction date if no remodeling or additions) ;
- _RoofStyle_ : the type of roof ;
- _RoofMatl_ : the roof material ;
- _Exterior1st_ : the exterior covering on house ;
- _Exterior2nd_ : the exterior covering on house (if more than one material) ;
- _MasVnrType_ : the masonry veneer type ;
- _MasVnrArea_ : the masonry veneer area in square feet ;
- _ExterQual_ : the quality of the material on the exterior ;
- _ExterCond_ : the present condition of the material on the exterior ;
- _Foundation_ : the type of foundation ;
- _BsmtQual_ : the height of the basement ;
- _BsmtCond_ : the general condition of the basement ;
- _BsmtExposure_ : refers to walkout or garden level walls ;
- _BsmtFinType1_ : the rating of basement finished area ;
- _BsmtFinType2_: the rating of basement finished area (if multiple types) ;
- _BsmtFinSF1_ : Type 1 finished square feet ;
- _BsmtFinSF2_ : Type 2 finished square feet ;
- _BsmtUnfSF_ : the unfinished square feet of basement area ;
- _TotalBsmtSF_ : the total square feet of basement area ;
- _Heating_ : the type of heating ;
- _HeatingQC_ : the heating quality and condition ;		
- _CentralAir_ : the central air conditioning ;
- _Electrical_ : the electrical system ;		
- _1stFlrSF_ : the first floor square feet ;
- _2ndFlrSF_ : the second floor square feet ;
- _LowQualFinSF_ : the low quality finished square feet (all floors) ;
- _GrLivArea_ : the above grade (ground) living area square feet ;
- _BsmtFullBath_ : the basement full bathrooms ;
- _BsmtHalfBath_ : the basement half bathrooms ;
- _FullBath_ : the full bathrooms above grade ;
- _HalfBath_ : the half baths above grade ;
- _Bedroom_ : the bedrooms above grade (does NOT include basement bedrooms) ;
- _Kitchen_ : the kitchens above grade ;
- _KitchenQual_ : the kitchen quality ;       	
- _TotRmsAbvGrd_ : the total rooms above grade (does not include bathrooms) ;
- _Functional_ : the home functionality (Assume typical unless deductions are warranted) ;		
- _Fireplaces_ : the number of fireplaces ;
- _FireplaceQu_ : the fireplace quality ;
- _GarageType_ : the garage location ;
- _GarageYrBlt_ : the year in which the garage was built ;
- _GarageFinish_ : the interior finish of the garage ;
- _GarageCars_ : the size of garage in car capacity ;
- _GarageArea_ : the size of garage in square feet ;
- _GarageQual_ : the garage quality ;
- _GarageCond_ : the garage condition ;
- _PavedDrive_ : the paved driveway ;
- _WoodDeckSF_ : the wood deck area in square feet ;
- _OpenPorchSF_ : the open porch area in square feet ;
- _EnclosedPorch_ : the enclosed porch area in square feet ;
- _3SsnPorch_ : the three season porch area in square feet ;
- _ScreenPorch_ : the screen porch area in square feet ;
- _PoolArea_ : the pool area in square feet ;
- _PoolQC_ : the pool quality ;
- _Fence_ : the fence quality ;
- _MiscFeature_ : the miscellaneous feature not covered in other categories ;
- _MiscVal_ : the $value of miscellaneous feature ;
- _MoSold_ : the month sold (MM)
- _YrSold_ : the year sold (YYYY) ;
- _SaleType_ : the type of sale ;
- _SaleCondition_ : the condition of sale ;
- The property sale price (_SalePrice_) in dollars, which is available only for samples in _train.csv_.

Based on the data and relationships identified in the _train.csv_ file, we aim to predict sales prices of houses listed in _test.csv_ file.

#### 2. Import required libraries

In [1]:
import gc, pandas as pd, numpy as np, matplotlib.pyplot as plt
from pathlib import Path

#### 3. Set up correct path

In [2]:
windowspath__scripts = Path().resolve()
windowspath__data = windowspath__scripts.parent / "data"

#### 4. Import train.csv and test.csv files

In [4]:
df__train = pd.read_csv(filepath_or_buffer=windowspath__data / "train" / "train.csv", dtype=str)
df__test = pd.read_csv(filepath_or_buffer=windowspath__data / "test" / "test.csv", dtype=str)

In [5]:
# Take a look at df__train
df__train.sample(n=3)

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
822,823,60,RL,,12394,Pave,,IR1,Lvl,AllPub,...,0,,,,0,10,2007,WD,Family,225000
25,26,20,RL,110.0,14230,Pave,,Reg,Lvl,AllPub,...,0,,,,0,7,2009,WD,Normal,256300
819,820,120,RL,44.0,6371,Pave,,IR1,Lvl,AllPub,...,0,,,,0,6,2010,New,Partial,224000


In [6]:
# Take a look at df__test
df__test.sample(n=3)

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
518,1979,120,RL,58.0,10110,Pave,,IR1,Lvl,AllPub,...,0,0,,,,0,11,2008,New,Partial
566,2027,160,FV,30.0,3180,Pave,Pave,Reg,Lvl,AllPub,...,0,0,,,,0,5,2008,WD,Normal
279,1740,120,FV,,3830,Pave,Pave,IR1,Lvl,AllPub,...,0,0,,,,0,1,2009,New,Partial


Other than "SalePrice" (which we call target) and "Id", all others features will be called covariates.

#### 5. Basic feature engineering (using some of the relations between covariates)

In [8]:
# Define an unified view of df__train and df__test
df__houses = pd.concat([df__train.copy(deep=True).drop(columns=["SalePrice"]), df__test.copy(deep=True)], ignore_index=True)

# Do not forget that samples of df__train and df__test are identified by their PassengerId

# Take a look at df__houses
df__houses.sample(n=3)

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
1838,1839,20,RL,50.0,4280,Pave,,IR1,Lvl,AllPub,...,0,0,,,,0,9,2009,WD,Normal
1154,1155,60,RL,,13700,Pave,,IR1,Lvl,AllPub,...,273,0,,GdPrv,,0,5,2008,WD,Normal
2317,2318,60,RL,72.0,8229,Pave,,IR1,Lvl,AllPub,...,0,0,,,,0,12,2007,New,Partial
