Sample of code below.

In [12]:
#Goal: To predit Iowa home price using the following list of predictors    
#    LotArea
#    YearBuilt
#    1stFlrSF
#    2ndFlrSF
#    FullBath
#    BedroomAbvGr
#    TotRmsAbvGrd
#as suggested at https://www.kaggle.com/dansbecker/your-first-scikit-learn-model
import pandas as pd
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.ensemble import RandomForestRegressor

main_file_path = '../input/train.csv'
data = pd.read_csv(main_file_path)
y = data.SalePrice
model_predictors = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 
                        'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']
X = data[model_predictors]


In [13]:
data[model_predictors].describe()

Things Noted:
Total count of rows is 1460. Year built is considered as numeric instead of categorical. 

In [14]:
y.describe()

Things Noted: 
Total row count is 1460. Average Price is $180K. The max price of a house is 755K which might result in skew of the results. 

In [15]:
# Decision Tree model 
# split data into training and validation data, for both predictors and target
# The split is based on a random number generator. Supplying a numeric value to
# the random_state argument guarantees we get the same split every time we
# run this script.
train_X, val_X, train_y, val_y = train_test_split(X, y,random_state = 0)

def get_mae(max_leaf_nodes, predictors_train, predictors_val, targ_train, targ_val):
    # define model to be used 
    model = DecisionTreeRegressor(max_leaf_nodes=max_leaf_nodes, random_state=0)
    # fit the classifer using predictors and Y value
    model.fit(predictors_train, targ_train)
    # predict using the classifer for validation set
    preds_val = model.predict(predictors_val)
    # calculate the mean absolute error for the model using validation set
    mae = mean_absolute_error(targ_val, preds_val)
    return(mae)


# compare MAE with differing values of max_leaf_nodes
for max_leaf_nodes in [10, 50, 750, 2000]:
    my_mae = get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)
    print("Max leaf nodes: %d  \t\t Mean Absolute Error:  %d" %(max_leaf_nodes, my_mae))

The best decision tree model would be the one with 50 leaf nodes. The MAE obtained was $27K 

In [16]:
# Random Forrests model 
# define model to be used 
forest_model = RandomForestRegressor()
# fit the classifer using predictors and Y value
forest_model.fit(train_X, train_y)
# predict using the classifer for validation set
melb_preds = forest_model.predict(val_X)
# calculate the mean absolute error for the model using validation set
print(mean_absolute_error(val_y, melb_preds))

Random forrest model results in a MAE of $23K 

In [17]:
# Read the test data
test = pd.read_csv('../input/test.csv')
# Treat the test data in the same way as training data. In this case, pull same columns.
test_X = test[model_predictors]
# Use the model to make predictions
predicted_prices = forest_model.predict(test_X)
# We will look at the predicted prices to ensure we have something sensible.
print(predicted_prices)

In [None]:
# Creating Submission:
my_submission = pd.DataFrame({'Id': test.Id, 'SalePrice': predicted_prices})
# you could use any filename. We choose submission here
my_submission.to_csv('submission.csv', index=False)
