# Competition Description
Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.

With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.
## Practice Skills
* Creative feature engineering 
* Advanced regression techniques like random forest and gradient boosting
## Evaluation
### Goal
It is your job to predict the sales price for each house. For each Id in the test set, you must predict the value of the SalePrice variable. 
### Metric
Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)


In [1]:
# Import helpful libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeRegressor

In [2]:
# Load the data, and separate the target
iowa_file_path = r'C:\Users\Usuario\OneDrive\Documentos\Python Scripts\Housing_prices_competition\train.csv'
full_data = pd.read_csv(iowa_file_path)
y = full_data.SalePrice

# Constructions of the model

## RandomForestRegressor

In this first approach, we'll analyse the RandomForestRegressor model.

In [13]:
# Create X (After completing the exercise, you can return to modify this line!)
features = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']

# Select columns corresponding to features, and preview the data
X = full_data[features]
X.head()

Unnamed: 0,LotArea,YearBuilt,1stFlrSF,2ndFlrSF,FullBath,BedroomAbvGr,TotRmsAbvGrd
0,8450,2003,856,854,2,3,8
1,9600,1976,1262,0,2,3,6
2,11250,2001,920,866,2,3,6
3,9550,1915,961,756,1,3,7
4,14260,2000,1145,1053,2,4,9


In [14]:
# Split into validation and training data
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.25, random_state=54321)

In [15]:
# Define a random forest model
e = 10000000
best_depth = 0
accuracy = 0
best_est = 0

for est in range(10, 101, 10):
    for depth in range(1, 10):
        
        # Define a random forest model
        model = RandomForestRegressor(max_depth=depth, n_estimators=est, random_state=54321)
        model.fit(X_train, y_train)
        val_preds = model.predict(X_valid)
        
        # Analyze the metrics
        mae = mean_absolute_error(y_valid, val_preds)
        accuracy = model.score(X_valid, y_valid)
        
        # Save the parameters with the best metrics
        if mae < e:
            e = mae
            best_depth = depth
            best_est = est
            a = accuracy

print(f'Exactitud y error con una profundidad de {best_depth} y un número de árboles en el bosque de {best_est}: {a*100:.0f}% / {e:,.0f}')

Exactitud y error con una profundidad de 9 y un número de árboles en el bosque de 30: 76% / 22,836


## DecisionTreeRegressor

In this second approach, we'll analyse the DecisionTreeRegressor model.

In [17]:
# Define a decision tree model
e = 10000000
best_depth = 0
accuracy = 0


for depth in range(1, 10):
    
    # Define a decision tree model    
    model = DecisionTreeRegressor(max_depth=depth, random_state=54321)
    model.fit(X_train, y_train)
    val_preds = model.predict(X_valid)
        
    # Analyze the metrics
    mae = mean_absolute_error(y_valid, val_preds)
    accuracy = model.score(X_valid, y_valid)
        
    # Save the parameters with the best metrics
    if mae < e:
        e = mae
        best_depth = depth
        a = accuracy

print(f'Exactitud y error con una profundidad de {best_depth}: {a*100:.0f}% / {e:,.0f}')

Exactitud y error con una profundidad de 6: 71% / 27,960


# Train a model for the competition

Use the code cell below to build a Random Forest model and train it on all of X and y.

In [18]:
# To improve accuracy, create a new Random Forest model which you will train on all training data
final_model = RandomForestRegressor(max_depth=9, n_estimators=30, random_state=54321)

In [19]:
# Fit final_model on all data from the training data
final_model.fit(X, y)

RandomForestRegressor(max_depth=9, n_estimators=30, random_state=54321)

Now, read the file of "test" data, and apply your model to make predictions.

In [20]:
# path to file you will use for predictions
test_data_file_path = r'C:\Users\Usuario\OneDrive\Documentos\Python Scripts\Housing_Prices_Competition\test.csv'

# read test data file using pandas
test_data = pd.read_csv(test_data_file_path)

# create test_X which comes from test_data but includes only the columns you used for prediction.
# The list of columns is stored in a variable called features
test_X = test_data[features]

# make predictions which we will submit.
test_preds = final_model.predict(test_X)

In [None]:
# Run the code to save predictions in the format used for competition scoring
output = pd.DataFrame({'Id':test_data.Id, 'SalePrice':test_preds})

output.to_csv(r'C:\Users\Usuario\OneDrive\Documentos\Python Scripts\Housing_Prices_Competition\submission.csv', index=False)