### Selecting Data for Modeling
Your dataset had too many variables to wrap your head around, or even to print out nicely. How can you pare down this overwhelming amount of data to something you can understand?

We'll start by picking a few variables using our intuition. There are statistical techniques to automatically prioritize variables.

To choose variables/columns, we'll need to see a list of all columns in the dataset. That is done with the columns property of the DataFrame (the bottom line of code below).

In [11]:
import pandas as pd

iowa_file_path = '../../data/train.csv'
home_data = pd.read_csv(iowa_file_path)

# print the list of columns in the dataset to find the name of the prediction target
print(home_data.columns)

Index(['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street',
       'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig',
       'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType',
       'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',
       'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',
       'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual',
       'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1',
       'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating',
       'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF',
       'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
       'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual',
       'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType',
       'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual',
       'GarageCond', 'PavedDrive

### Step 1: Selecting The Prediction Target
You can pull out a variable with dot-notation. This single column is stored in a Series, which is broadly like a DataFrame with only a single column of data.

We'll use the dot notation to select the column we want to predict, which is called the prediction target. By convention, the prediction target is called y. So the code we need to save the house prices in the Melbourne data is



In [None]:
# The prediction target should be the price
#y = _


#Ans:y = home_data.Price


## Step 2: Create X
Now you will create a DataFrame called `X` holding the predictive features.

Since you want only some columns from the original data, you'll first create a list with the names of the columns you want in `X`.

You'll use just the following columns in the list (you can copy and paste the whole list to save some typing, though you'll still need to add quotes):
    * LotArea
    * YearBuilt
    * 1stFlrSF
    * 2ndFlrSF
    * FullBath
    * BedroomAbvGr
    * TotRmsAbvGrd

After you've created that list of features, use it to create the DataFrame that you'll use to fit the model.

In [None]:
# Create the list of features below
# feature_names = ___
#Ans: feature_names = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath','BedroomAbvGr','TotRmsAbvGrd']

# select data corresponding to features in feature_names
#X = _
#X = home_data[feature_names]


## Review Data
Before building a model, take a quick look at **X** to verify it looks sensible. Note the **head()** method for pandasis useful for inspecting the top data few lines of data 

In [None]:
# Review data
# print description or statistics from X
#print(_)

#Ans: print(X.describe())

# print the top few lines
#print(_)

#Ans: print(X.head())

## Building Your Model
We will use the **scikit-learn** library to create your models. When coding, this library is written as sklearn. Scikit-learn is easily the most popular library for modeling the types of data typically stored in DataFrames.

The steps to building and using a model are:

1) **Define:** What type of model will it be? A decision tree? Some other type of model? Some other parameters of the model type are specified too.

2) **Fit:** Capture patterns from provided data. This is the heart of modeling. 

3) **Predict:** Just what it sounds like

4) **Evaluate:** Determine how accurate the model's predictions are.


## Step 3: Specify and Fit Model
Create a `DecisionTreeRegressor` and save it iowa_model. Ensure you've done the relevant import from sklearn to run this command.

Then fit the model you just created using the data in `X` and `y` that you saved above.

In [None]:
# import the correct libary
from _ import _
#Ans: from sklearn.tree import DecisionTreeRegressor

#specify the model.
#For model reproducibility, set a numeric value for random_state when specifying the model
# NOTE: Specifying a number for random_state ensures you get the same results in each run.

iowa_model = _
#Ans: iowa_model = DecisionTreeRegressor(random_state=1)

# Fit the model
_

## Step 4: Make Predictions
Make predictions with the model's `predict` command using `X` as the data. Save the results to a variable called `predictions`.

In [None]:
predictions = _
print(predictions)


### Quick Check
Lets look at the top few lines of predicted values and the top few lines of actual values

In [None]:
# print the top few lines of predicted values

#Ans:
print("Making predictions for the following 5 houses:")
#print(X.head())
print("The predictions are")
#print(iowa_model.predict(X.head()))

# print the top few lines of actual values

#Ans:
print("The following 5 houses are worth:")
#print(y.head())

#### Question: Qualitatively evaluate the quality of the results? Is the model making resonable predicition ? Rember the range of housing prices when making your educated guess.


#### Answer: 