# Introduction
**This will be your workspace for Kaggle's Machine Learning education track.**

You will build and continually improve a model to predict housing prices as you work through each tutorial.  Fork this notebook and write your code in it.

The data from the tutorial, the Melbourne data, is not available in this workspace.  You will need to translate the concepts to work with the data in this notebook, the Iowa data.

Come to the [Learn Discussion](https://www.kaggle.com/learn-forum) forum for any questions or comments. 

# Write Your Code Below



## Importing Modules and Loading Data 

In [1]:
import pandas as pd

# save filepath to variable for easier access
file = '../input/train.csv'
# read the data and store data in DataFrame titled iowa_data
iowa_data = pd.read_csv(file) 
# print a summary of the data in Iowa data
print(iowa_data.describe())

   ## Selecting and Filtering Columns in Pandas

### Viewing Columns

In [2]:
print(iowa_data.columns)

### Viewing Target Variable - SalePrice

In [3]:
#Selecting Saleprice coloum and viewing it
saleprice =  iowa_data['SalePrice']
saleprice.head()

### Viewing Subset of Iowa Data

In [4]:
#Taking a subset of two columns from total datset
iowa_subset = iowa_data[['SalePrice', 'Neighborhood']]
iowa_subset.head()

## Building Initial Scikit-Learn Model

### Choosing Prediction Target Variable

In [5]:
#Putting target variable column in variable y
y = saleprice
y.head()

### Choosing Predictors

In [6]:
#Selecting columns from the dataset and putting them in variable X
iowa_predictors = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr',
'TotRmsAbvGrd']
X = iowa_data[iowa_predictors]

### First Model

#### Fitting Model

In [7]:
from sklearn.tree import DecisionTreeRegressor

# Define model
iowa_model = DecisionTreeRegressor()

# Fit model
iowa_model.fit(X, y)

#### Making Predictions

In [8]:
print("Making predictions for the following 5 houses:")
print(X.head())
print("The predictions are")
print(iowa_model.predict(X.head()))

## Model Validation

### Train Test Split

In [10]:
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split

# split data into training and validation data, for both predictors and target
# The split is based on a random number generator. Supplying a numeric value to
# the random_state argument guarantees we get the same split every time we
# run this script.
train_X, val_X, train_y, val_y = train_test_split(X, y,random_state = 0)
# Define model
iowa_model = DecisionTreeRegressor()
# Fit model
iowa_model.fit(train_X, train_y)

# get predicted prices on validation data
val_predictions = iowa_model.predict(val_X)
print(mean_absolute_error(val_y, val_predictions))

## Changing Number of Tree Leafs

### Getting Mean Absolute Error (MAE)

In [11]:
#Writing get_mae function
def get_mae(max_leaf_nodes, predictors_train, predictors_val, targ_train, targ_val):
    model = DecisionTreeRegressor(max_leaf_nodes=max_leaf_nodes, random_state=0)
    model.fit(predictors_train, targ_train)
    preds_val = model.predict(predictors_val)
    mae = mean_absolute_error(targ_val, preds_val)
    return(mae)

# compare MAE with differing values of max_leaf_nodes
for max_leaf_nodes in [5, 50, 500, 5000]:
    my_mae = get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)
    print("Max leaf nodes: %d  \t\t Mean Absolute Error:  %d" %(max_leaf_nodes, my_mae))

## Random Forest Model

In [12]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error

forest_model = RandomForestRegressor()
forest_model.fit(train_X, train_y)
iowa_preds = forest_model.predict(val_X)
print(mean_absolute_error(val_y, iowa_preds))

## Making Submissions

### Getting Predictions

In [14]:
# Read the test data
test = pd.read_csv('../input/test.csv')
# Treat the test data in the same way as training data. In this case, pull same columns.
test_X = test[iowa_predictors]
# Use the model to make predictions
predicted_prices = iowa_model.predict(test_X)
# We will look at the predicted prices to ensure we have something sensible.
print(predicted_prices)

### Making Submissions CSV

In [18]:
my_submission = pd.DataFrame({'Id': test.Id, 'SalePrice': predicted_prices})
# you could use any filename. We choose submission here
my_submission.to_csv('submission.csv', index=False)