# Introduction
**This will be your workspace for Kaggle's Machine Learning education track.**

You will build and continually improve a model to predict housing prices as you work through each tutorial.  Fork this notebook and write your code in it.

The data from the tutorial, the Melbourne data, is not available in this workspace.  You will need to translate the concepts to work with the data in this notebook, the Iowa data.

Come to the [Learn Discussion](https://www.kaggle.com/learn-forum) forum for any questions or comments. 

# Write Your Code Below



In [2]:
import pandas as pd

main_file_path = '../input/train.csv'
iowa_data = pd.read_csv(main_file_path)
# print summary of the data in Iowa
print(iowa_data.describe())

In [3]:
# List the names of all the columns
print(iowa_data.columns)

In [4]:
# Pull Single Column of data Price and store as a separate series
s_iowa_price = iowa_data.SalePrice
# Print the head() of the new series
print(s_iowa_price.head())

In [5]:
# Pull columns of interest by including multiple in brackets
columns_of_interest = ['SalePrice', 'YrSold']
two_columns_of_intestest = iowa_data[columns_of_interest]
print(two_columns_of_intestest)

In [6]:
two_columns_of_intestest.describe()

In [7]:
# Set prediction target
y = iowa_data.SalePrice
iowa_predictors = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd'] # Add predictors when ready
# Set predictors as X
X = iowa_data[iowa_predictors]


In [8]:
from sklearn.tree import DecisionTreeRegressor

# Define model
iowa_model = DecisionTreeRegressor()

# Fit model
iowa_model.fit(X, y)

In [9]:
# Make predicitions for the top 5 rows based on iowa_model
print('Making predictions for 5 houses:')
print(X.head())
print('The predicted Values are:')
print(iowa_model.predict(X.head()))

In [10]:
from sklearn.metrics import mean_absolute_error

predicted_home_prices = iowa_model.predict(X)
mean_absolute_error(y, predicted_home_prices)

In [11]:
from sklearn.model_selection import train_test_split
# split data into training and validation data, for both predictors and target
# The split is based on a random number generator. Supplying a numeric value to
# the random_state argument guarantees we get the same split every time we
# run this script.
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=0)
# Define model
iowa_model = DecisionTreeRegressor()
# Fit Model
iowa_model.fit(train_X, train_y)

# get predicted prices
val_predictions = iowa_model.predict(val_X)
print(mean_absolute_error(val_y, val_predictions))



In [12]:
def get_mae(max_leaf_nodes, predictors_train, predictors_val, targ_train, targ_val):
    model = DecisionTreeRegressor(max_leaf_nodes=max_leaf_nodes, random_state=0)
    model.fit(predictors_train, targ_train)
    preds_val = model.predict(predictors_val)
    mae = mean_absolute_error(targ_val, preds_val)
    return(mae)


In [13]:
for max_leaf_nodes in [5, 10, 25, 40, 50, 500, 5000]:
    my_mae = get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)
    print("Max Leaf nodes: %d \t\t Mean Absolute Error: %d" %(max_leaf_nodes, my_mae))

In [14]:
# Random Forrest Example
from sklearn.ensemble import RandomForestRegressor

forest_model = RandomForestRegressor()
forest_model.fit(train_X, train_y)
iowa_preds = forest_model.predict(val_X)
print(mean_absolute_error(val_y, iowa_preds))

In [21]:
# Create Submission for House Price Competition

# Create Train and Test Selections
train_y = iowa_data.SalePrice
columns_of_interest = ['YrSold', 'LotArea', 'OverallQual', 'YearBuilt', 'TotRmsAbvGrd']
train_X = iowa_data[columns_of_interest]

# create model
my_model = RandomForestRegressor()
my_model.fit(train_X, train_y)

In [22]:
# Create Prediction / Pull in Test Data 
test = pd.read_csv('../input/test.csv')
test_X = test[columns_of_interest]
# Use my_model to make predictions
predicted_prices = my_model.predict(test_X)
print(predicted_prices)

In [23]:
# Send Results to CSV file

test_submission = pd.DataFrame({'Id': test.Id, 'SalePrice': predicted_prices}).to_csv('submission.csv', index=False)