# Random Forest: Housing Prices Advanced Regression Techiques 
The housing prices dataset is dataset used to display more advanced regression techniques. The dataset has numeric attributes, objects and empty values and will require hot one encoding, imputers and random forest to load and handle the data.

The goal of this machine learning project is to calculate the sale price of each house based on the various features of the house.

In [1]:
import pandas as pd

def get_data():
    #Import data
    train_data = pd.read_csv('../house-prices-advanced-regression-techniques/train.csv')
    #test_data = pd.read_csv('../house-prices-advanced-regression-techniques/test.csv')

    #Drops data where the Sale Price is missing
    train_data.dropna(axis=0, subset=['SalePrice'], inplace=True)

    y = train_data.SalePrice
    X = train_data.drop(['Id', 'SalePrice'], axis=1)

    #Getting the desired columns in numbers and objects
    low_cardinality_cols = [cname for cname in X.columns if 
                                    X[cname].nunique() < 10 and X[cname].dtype == "object"]
    numeric_cols = [cname for cname in X.columns if 
                                    X[cname].dtype in ['int64', 'float64']]
    my_cols = numeric_cols + low_cardinality_cols
    X_predictors = X[my_cols]

    #One-hot encodings
    X = pd.get_dummies(X_predictors)
    return X, y
    

# Random Forest and Cross Validation

In [2]:
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score
from sklearn.impute import SimpleImputer

X_rf, y_rf = get_data()

my_pipeline = Pipeline(steps=[('preprocessor', SimpleImputer()),
                              ('model', RandomForestRegressor(n_estimators=50, random_state=0))
                             ])
scores = -1 * cross_val_score(my_pipeline, X_rf, y_rf,
                              cv=5,
                              scoring='neg_mean_absolute_error')

print("Mean Absolute Error CV:\n", scores.mean())

Mean Absolute Error CV:
 17947.349383561643
