# House Prices: Advanced Regression Techniques

*O dataset desta competição demonstra como as negociações influenciam diretamente nos preços do que o número de quartos ou uma cerca branca.
Com 79 variáveis que descrevem "todos" os aspectos de casas residenciais, essa competição desafia você a* **prever o preço final** **de cada casa.**

Predict sales prices and practice feature engineering, **BoostedTreesRegressor**
<img src='https://www.tensorflow.org/images/tf_logo_32px.png'  style='float:left;margin-top:-5px;padding-right:10px'/>

**Estimator:** *High level tools for working with models. / Ferramenta de alto nível para trabalhar com modelos.*


> ## DEPENDENCIES

In [None]:
from IPython.display import clear_output
import seaborn as sns
import tensorflow as tf
from tensorflow import keras
import numpy as np
import os
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
print(os.listdir("./"))

> ## SETUP / Data Correlation Analysis  
verificando alguns valores...

In [None]:
test = pd.read_csv('../input/house-prices-advanced-regression-techniques/test.csv')
train = pd.read_csv('../input/house-prices-advanced-regression-techniques/train.csv')
#check ids.
print('test  min\t',test.Id.min())
print('test  max\t',test.Id.max())
print('train min\t',train.Id.min())
print('train max\t',train.Id.max())
print('--')
print('min sale price\t',train.SalePrice.max())
print('max sale price\t',train.SalePrice.min())
print('count sale price',train.SalePrice.count())

plt.figure(figsize=(10,1))
sns.distplot(train.SalePrice)
plt.legend(['Sale Price'])
plt.axis('off')

O autor do conjunto de dados recomenda remover todas as casas com 
**GrLivArea** com mais de 4.000 metros quadrados, pois esses pontos no conjunto de dados são discrepantes.

In [None]:
plt.figure(figsize=(10,3))
sns.scatterplot(  x="GrLivArea", y="SalePrice",data=train)
plt.legend(['GrLivArea x Sale Price'])
plt.axis('on')

> #### Drop outliers  (update or drop)

In [None]:
train = train[train.GrLivArea < 4000]
plt.figure(figsize=(10,3))
sns.scatterplot(  x="GrLivArea", y="SalePrice",data=train)
plt.legend(['GrLivArea x Sale Price'])
plt.axis('on')

In [None]:
train_id = train.Id
test_id = test.Id
data = train.copy()
train = train[['Id','SalePrice']]
data.drop("SalePrice", axis = 1, inplace = True)
data =  pd.concat([data,test],axis=0,sort=False)
data = data.reset_index(drop=True)
data.describe()

> # Data Cleaning

**OverallQual** : Qualidade geral do material e acabamento
*        10	Very Excellent
*        9	Excellent
*        8	Very Good
*        7	Good
*        6	Above Average
*        5	Average
*        4	Below Average
*        3	Fair
*        2	Poor
*        1	Very Poor

In [None]:
data.MSSubClass = data.MSSubClass.astype(str)
msSubClass= np.unique(data.MSSubClass.values)
for sub_class in msSubClass:
  if "SC" not in sub_class:
    data.loc[data["MSSubClass"] == sub_class,"MSSubClass"] = "SC"+sub_class
data.OverallQual = data.OverallQual.astype(str)
overallQual= np.unique(data.OverallQual.values)
for overall_qual in overallQual:
  if "OQ" not in overall_qual:
    data.loc[data["OverallQual"] == overall_qual,"OverallQual"] = "OQ"+overall_qual

#OverallCond : classificação geral das condições
data.OverallCond = data.OverallCond.astype(str)
overallCond= np.unique(data.OverallCond.values)
for overall_cond in overallCond:
  if "OC" not in overall_cond:
    data.loc[data["OverallCond"] == overall_cond,"OverallCond"] = "OC"+overall_cond

print('MSSubClass:',np.unique(data.MSSubClass.values))
print('--')
print('OverallQual:',np.unique(data.OverallQual.values))
print('--')
print('OverallCond:',np.unique(data.OverallCond.values))
data = data.reset_index(drop=True)

Obtendo colunas com valores quantitativos e qualitativos.

In [None]:
print("dtypes:",data.dtypes.unique())
quantitative_columns = [f for f in data.columns if data.dtypes[f] != 'object']
qualitative_columns = [f for f in data.columns if data.dtypes[f] == 'object']
quantitative_columns.pop(0)
print('qualitative columns:',qualitative_columns)
print('quantitative columns:',quantitative_columns)

check Missing data

In [None]:
total=data.isnull().sum().sort_values(ascending=False)
percent=(data.isnull().sum()/data.isnull().count()).sort_values(ascending=False)
missing=pd.concat([total,percent], axis=1,keys=['Total','%'])
missing.head(30)

**Alley:** Tipo de beco de acesso à propriedade
1. **Grvl**	Gravel
1. **Pave**	Paved
1. **Noaa**	No alley access	


In [None]:
data.Alley.mode()
data.Alley.fillna('NA', inplace=True) #No alley access
plt.figure(figsize=(4,2))
sns.barplot(x=data.Alley, y=train.SalePrice)
plt.axis('on')

**LotFrontage:** Metros da rua concetados à propriedade.

In [None]:
data.LotFrontage.mode()
data.LotFrontage.fillna(data.LotFrontage.median(), inplace=True)
data.LotFrontage
plt.figure(figsize=(10,3))
sns.distplot(data.LotFrontage, hist_kws={'alpha':0.5}, label='LotFrontage')
plt.legend()

**MasVnrType**: Tipo de alvenaria da fachada
1.        **BrkCmn**	Tijolo / Brick Common
1.        **BrkFace**	Cara de Tijolo / Brick Face
1.        **CBlock**	Concreto / Cinder Block
1.        **None**		None
1.        **Stone**	    Pedra /Stone

In [None]:
data.MasVnrType.mode()
data.MasVnrType.fillna('NA', inplace=True)
plt.figure(figsize=(8,2))
sns.barplot(x=data.MasVnrType, y=train.SalePrice)
plt.axis('on')

**MasVnrArea**: Area de alvenaria por metro quadrado

In [None]:
data.MasVnrArea.mode()
data.MasVnrArea.fillna(0.0, inplace=True)
plt.figure(figsize=(10,3))
sns.distplot(data.MasVnrArea, hist_kws={'alpha':0.4}, label='MasVnrArea')
plt.legend()

**BsmtQual:** Altura do porão
1.        **Ex**	Excellent (100+ inches)	
1.        **Gd**	Good (90-99 inches)
1.        **TA**	Typical (80-89 inches)
1.        **Fa**	Fair (70-79 inches)
1.        **Po**	Poor (<70 inches
1.        **NA**	No Basement

In [None]:
data.BsmtQual.mode()
data.BsmtQual.fillna('NA', inplace=True)
plt.figure(figsize=(12,2))
sns.barplot(x=data.BsmtQual, y=train.SalePrice)

**BsmtCond**: Condição geral do porão
*        Ex	Excellent
*        Gd	Good
*        TA	Typical - slight dampness allowed
*        Fa	Fair - dampness or some cracking or settling
*        Po	Poor - Severe cracking, settling, or wetness
*        NA	No Basement

In [None]:
data.BsmtCond.mode()
data.BsmtCond.fillna('NA', inplace=True)
plt.figure(figsize=(12,2))
sns.barplot(x=data.BsmtCond, y=train.SalePrice)

**BsmtExposure:** Paredes subterrâneas no nível da entrada ou do jardim
*        Gd	Good Exposure
*        Av	Average Exposure (split levels or foyers typically score average or above)	
*        Mn	Mimimum Exposure
*        No	No Exposure
*        NA	No Basement

In [None]:
data.BsmtExposure.mode()
data.BsmtExposure.fillna('NA', inplace=True)
plt.figure(figsize=(10,2))
sns.barplot(x=data.BsmtExposure, y=train.SalePrice)

**BsmtFinType1: ** Qualidade da área acabada do porão
*        GLQ	Good Living Quarters
*        ALQ	Average Living Quarters
*        BLQ	Below Average Living Quarters	
*        Rec	Average Rec Room
*        LwQ	Low Quality
*        Unf	Unfinshed
*        NA	No Basement

In [None]:
data.BsmtFinType1.mode()
data.BsmtFinType1.fillna('NA', inplace=True)
plt.figure(figsize=(10,2))
sns.barplot(x=data.BsmtFinType1, y=train.SalePrice)

BsmtFinSF1: Metro quadrado com acabamento tipo 1

In [None]:
data.BsmtFinSF1.mode()
data.BsmtFinSF1.fillna(0, inplace=True)
plt.figure(figsize=(10,3))
sns.distplot(data.BsmtFinSF1, hist_kws={'alpha':0.5}, label='BsmtFinSF1')
plt.legend()

**BsmtFinType2**: Qualidade da segunda área finalizada (se presente)
*        GLQ	Good Living Quarters
*        ALQ	Average Living Quarters
*        BLQ	Below Average Living Quarters	
*        Rec	Average Rec Room
*        LwQ	Low Quality
*        Unf	Unfinshed
*        NA	No Basement

In [None]:
data.BsmtFinType2.mode()
data.BsmtFinType2.fillna('NA', inplace=True)
plt.figure(figsize=(10,2))
sns.barplot(x=data.BsmtFinType2, y=train.SalePrice)

**Electrical**: Electrical system
*        SBrkr	Standard Circuit Breakers & Romex
*        FuseA	Fuse Box over 60 AMP and all Romex wiring (Average)	
*        FuseF	60 AMP Fuse Box and mostly Romex wiring (Fair)
*        FuseP	60 AMP Fuse Box and mostly knob & tube wiring (poor)
*        Mix	Mixed

In [None]:
data.Electrical.mode()
data.Electrical.fillna('SBrkr', inplace=True)
plt.figure(figsize=(10,2))
sns.barplot(x=data.Electrical, y=train.SalePrice)

**FireplaceQu**: Qualidade da lareira
*        Ex	Excellent - Exceptional Masonry Fireplace
*        Gd	Good - Masonry Fireplace in main level
*        TA	Average - Prefabricated Fireplace in main living area or Masonry Fireplace in basement
* 					- Lareira pré-fabricada na sala principal ou lareira de alvenaria no porão
*        Fa	Fair - Prefabricated Fireplace in basement
* 				 - Pré-fabricada no porão
*        Po	Poor - Ben Franklin Stove
*        NA	No Fireplace

In [None]:
data.FireplaceQu.mode()
data.FireplaceQu.fillna('NA', inplace=True)
plt.figure(figsize=(10,2))
sns.barplot(x=data.FireplaceQu, y=train.SalePrice)

**GarageType**: Tipo/Localização da garagem	
*        Types	More than one type of garage
*        Attchd	Attached to home
*        Basment	Basement Garage
*        BuiltIn	Built-In (Garage part of house - typically has room above garage)
*        CarPort	Car Port
*        Detchd	Detached from home
*        NA	No Garage

In [None]:
data.GarageType.mode()
data.GarageType.fillna('NA', inplace=True)
plt.figure(figsize=(10,2))
sns.barplot(x=data.GarageType, y=train.SalePrice)

GarageYrBlt: Ano de construção da garagem

In [None]:
data.GarageYrBlt = data.GarageYrBlt.fillna(data.YearBuilt)#, inplace=True)
plt.figure(figsize=(10,2))
sns.scatterplot(x=data.GarageYrBlt, y=train.SalePrice)

**GarageFinish**: Acabamento interior da garagem
*        Fin	Finished
*        RFn	Rough Finished	
*        Unf	Unfinished
*        NA	No Garage

In [None]:
data.GarageFinish.mode()
data.GarageFinish.fillna('NA', inplace=True)
plt.figure(figsize=(10,2))
sns.barplot(x=data.GarageFinish, y=train.SalePrice)

**GarageQual**: Qualidade da Garagem
*        Ex	Excellent
*        Gd	Good
*        TA	Typical/Average
*        Fa	Fair
*        Po	Poor
*        NA	No Garage

In [None]:
data.GarageQual.mode()
data.GarageQual.fillna('NA', inplace=True)
plt.figure(figsize=(10,2))
sns.barplot(x=data.GarageQual, y=train.SalePrice)

**GarageCond**: Condições da Garagem
*        Ex	Excellent
*        Gd	Good
*        TA	Typical/Average
*        Fa	Fair
*        Po	Poor
*        NA	No Garage

In [None]:
data.GarageCond.mode()
data.GarageCond.fillna('NA', inplace=True)
plt.figure(figsize=(10,2))
sns.barplot(x=data.GarageCond, y=train.SalePrice)

**PoolQC**: Qualidade da Piscina
*        Ex	Excellent
*        Gd	Good
*        TA	Average/Typical
*        Fa	Fair
*        NA	No Pool

In [None]:
data.PoolQC.mode()
data.PoolQC.fillna('NA', inplace=True)
plt.figure(figsize=(10,2))
sns.barplot(x=data.PoolQC, y=train.SalePrice)

**Fence**: Qualidade da Cerca		
*        GdPrv	Good Privacy
*        MnPrv	Minimum Privacy
*        GdWo	Good Wood
*        MnWw	Minimum Wood/Wire
*        NA	No Fence

In [None]:
data.Fence.mode()
data.Fence.fillna('NA', inplace=True)
plt.figure(figsize=(10,2))
sns.barplot(x=data.Fence, y=train.SalePrice)

**MiscFeature**: Recurso variado não coberto em outras categorias	
*        Elev	Elevator
*        Gar2	2nd Garage (if not described in garage section)
*        Othr	Other
*        Shed	Shed (over 100 SF)
*        TenC	Tennis Court
*        NA	None

In [None]:
data.MiscFeature.mode()
data.MiscFeature.fillna('NA', inplace=True)
plt.figure(figsize=(10,2))
sns.barplot(x=data.MiscFeature, y=train.SalePrice)

* **BsmtFullBath** : banheiros completos no porão
* **GarageArea** : tamanho da garagem em metro quadrado.
* **GarageCars** : tamanho da garagem em capacidade do carro
* **TotalBsmtSF** : tamanho total em metros quadrados da área do porão

In [None]:
for col in  quantitative_columns:
  data[col].mode()
  data[col].fillna(0, inplace=True)

## Correlation Matrix (heatmap)

*Como nosso dataset possui colunas, precisamos saber a correlação entre as colunas.*


In [None]:
data_corr = data[:train.shape[0]].copy() 
data_corr = data_corr[quantitative_columns]
data_corr['SalePrice']  = train.SalePrice.values
data_corr = data_corr.reset_index(drop=True)

In [None]:
kendall = data_corr.corr("kendall")
kendall.style.format("{:.2}").background_gradient()

In [None]:
best_kendall =  dict((k,v) for k,v in (kendall['SalePrice'].sort_values(ascending=False).to_dict()).items() if v >.1)
best_kendall

In [None]:
spearman = data_corr.corr("spearman")
spearman.style.format("{:.2}").background_gradient(cmap=plt.get_cmap('OrRd'))

In [None]:
best_spearman =  dict((k,v) for k,v in (spearman['SalePrice'].sort_values(ascending=False).to_dict()).items() if v >.1)
best_spearman #best_kendall

In [None]:
pearson= data_corr.corr("pearson")
pearson.style.format("{:.2}").background_gradient(cmap=plt.get_cmap('PuRd'))

In [None]:
best_pearson =  dict((k,v) for k,v in (pearson['SalePrice'].sort_values(ascending=False).to_dict()).items() if v >.1)
best_pearson #best_spearman #best_kendall

In [None]:
cols =  [k for k,v in best_kendall.items()]+[k for k,v in best_pearson.items()]+[k for k,v in best_spearman.items()]
cols = set(cols)
len(cols),len(best_kendall),len(best_pearson),len(best_spearman)

In [None]:
cols

*Best columns para usarmos em nossa predição.*

In [None]:
best_columns=['GrLivArea',
 'GarageCars',
 'TotalBsmtSF',
 'GarageArea',
 '1stFlrSF',
 'FullBath',
 'TotRmsAbvGrd',
 'YearBuilt',
 'YearRemodAdd',
 'GarageYrBlt',
 'MasVnrArea',
 'Fireplaces',
 'BsmtFinSF1',
 'LotFrontage',
 'OpenPorchSF',
 'WoodDeckSF']
for key in cols:
    #if ((value>=0.3175) & (value<0.9)) | (value<=-0.315):
    best_columns.append(key)
best_columns 

In [None]:
total=data.isnull().sum().sort_values(ascending=False)
percent=(data.isnull().sum()/data.isnull().count()).sort_values(ascending=False)
missing=pd.concat([total,percent], axis=1,keys=['Total','%'])

missing[missing['%']>0].head(30)

Dados/Colunas  -  Quantitativos

**SaleType**: Type of sale		
*        WD 	Warranty Deed - Conventional
*        CWD	Warranty Deed - Cash
*        VWD	Warranty Deed - VA Loan
*        New	Home just constructed and sold
*        COD	Court Officer Deed/Estate
*        Con	Contract 15% Down payment regular terms
*        ConLw	Contract Low Down payment and low interest
*        ConLI	Contract Low Interest
*        ConLD	Contract Low Down
*        Oth	Other

In [None]:
data.SaleType.mode()
data.SaleType.fillna('Oth', inplace=True)	
plt.figure(figsize=(12,2))
sns.barplot(x=data.SaleType, y=train.SalePrice)

**KitchenQual**: Kitchen quality
*        Ex	Excellent
*        Gd	Good
*        TA	Typical/Average
*        Fa	Fair
*        Po	Poor

In [None]:
data.KitchenQual.mode()
data.KitchenQual.fillna('TA', inplace=True)
plt.figure(figsize=(5,2))
sns.barplot(x=data.KitchenQual, y=train.SalePrice)

**Exterior1st**: Exterior covering on house
*        AsbShng	Asbestos Shingles
*        AsphShn	Asphalt Shingles
*        BrkComm	Brick Common
*        BrkFace	Brick Face
*        CBlock	Cinder Block
*        CemntBd	Cement Board
*        HdBoard	Hard Board
*        ImStucc	Imitation Stucco
*        MetalSd	Metal Siding
*        Other	Other
*        Plywood	Plywood
*        PreCast	PreCast	
*        Stone	Stone
*        Stucco	Stucco
*        VinylSd	Vinyl Siding
*        Wd Sdng	Wood Siding
*        WdShing	Wood Shingles

In [None]:
data.Exterior1st.mode()
data.Exterior1st.fillna('VinylSd', inplace=True)
plt.figure(figsize=(20,2))
sns.barplot(x=data.Exterior1st, y=train.SalePrice)

**Exterior2nd**: Exterior covering on house (if more than one material)
*        AsbShng	Asbestos Shingles
*        AsphShn	Asphalt Shingles
*        BrkComm	Brick Common
*        BrkFace	Brick Face
*        CBlock	Cinder Block
*        CemntBd	Cement Board
*        HdBoard	Hard Board
*        ImStucc	Imitation Stucco
*        MetalSd	Metal Siding
*        Other	Other
*        Plywood	Plywood
*        PreCast	PreCast
*        Stone	Stone
*        Stucco	Stucco
*        VinylSd	Vinyl Siding
*        Wd Sdng	Wood Siding
*        WdShing	Wood Shingles

In [None]:
data.Exterior2nd.mode()
data.Exterior2nd.fillna('VinylSd', inplace=True)	
plt.figure(figsize=(20,2))
sns.barplot(x=data.Exterior2nd, y=train.SalePrice)

**Utilities**: Tipo de utilitários disponíveis
*        AllPub	All public Utilities (E,G,W,& S)	
*        NoSewr	Electricity, Gas, and Water (Septic Tank)
*        NoSeWa	Electricity and Gas Only
*        ELO	Electricity only

In [None]:
data.Utilities.mode()
data.Utilities.fillna('AllPub', inplace=True)	
plt.figure(figsize=(3,2))
sns.barplot(x=data.Utilities, y=train.SalePrice)

**Functional**: Home functionality (Assume typical unless deductions are warranted)
*        Typ	Typical Functionality
*        Min1	Minor Deductions 1
*        Min2	Minor Deductions 2
*        Mod	Moderate Deductions
*        Maj1	Major Deductions 1
*        Maj2	Major Deductions 2
*        Sev	Severely Damaged
*        Sal	Salvage only

In [None]:
data.Functional.mode()
data.Functional.fillna('Typ', inplace=True)	
plt.figure(figsize=(8,2))
sns.barplot(x=data.Functional, y=train.SalePrice)

**MSZoning**: Identifica a classificação geral de zoneamento da venda.	
*        A	Agriculture
*        C	Commercial
*        FV	FLOATING VILLAGE RESIDENTIAL
*        I	Industrial
*        RH	Residential High Density
*        RL	Residential Low Density
*        RP	Residential Low Density Park 
*        RM	Residential Medium Density

In [None]:
data.MSZoning.mode()
data.MSZoning.fillna('RL', inplace=True)	
plt.figure(figsize=(10,2))
sns.barplot(x=data.MSZoning, y=train.SalePrice)

**Data Clear! **

In [None]:
total=data.isnull().sum().sort_values(ascending=False)
percent=(data.isnull().sum()/data.isnull().count()).sort_values(ascending=False)
missing=pd.concat([total,percent], axis=1,keys=['Total','%'])
missing[missing['%']>0].head(30)

# Feature Engeneering


> ###  Existem 2915  conjuntos de treinamento
Estatística descritiva para idenificar tendências centrais, dispersões e forma da distribuição.

In [None]:
data.describe().transpose()

In [None]:
size = train.shape[0]
orig_label = train.SalePrice.copy()
label = train.SalePrice.values

Voltando com os dados ...

In [None]:
train = data[:size]
train['SalePrice']  = orig_label.values
label = train.SalePrice.values
train.drop("SalePrice", axis = 1, inplace = True)
test = data[size:]
train = train.reset_index(drop=True)
test = test.reset_index(drop=True)
train.head(3)

In [None]:
test.shape,train.shape

In [None]:
best_columns = list(set(best_columns))
best_columns.remove("SalePrice")
best_columns

In [None]:
train[best_columns]

In [None]:
feature_columns = []
for column_name in  np.unique(best_columns):#quantitative_columns:
  feature_columns.append(tf.feature_column.numeric_column(column_name))

def one_hot_cat_column(feature_name, vocab):
  return tf.feature_column.indicator_column( tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocab))

for column_name in qualitative_columns:
  #vocabulary =np.unique(train[qualitative_columns[0]].values)
  vocabulary = train[column_name].unique()
  categorical_column =one_hot_cat_column(column_name,vocabulary)
  feature_columns.append(categorical_column)

print('feature_columns:\t',len(feature_columns))
print('feature_columns:\t',feature_columns)
train[best_columns].head(1)

In [None]:
train.interpolate(method='linear',inplace=True)
test.interpolate(method='linear',inplace=True)

# Hyperparameter tuning

* max_depth=10
* learning_rate=0.1
* n_batches_per_layer=1
* n_trees=3000

In [None]:
batch_size = 1

#boost_testimator  = tf.estimator.BoostedTreesRegressor(feature_columns=feature_columns,max_depth=10, learning_rate=0.1, l1_regularization=0.1, l2_regularization=0.1, n_batches_per_layer=1,n_trees=700)
boost_testimator  = tf.estimator.BoostedTreesRegressor(feature_columns=feature_columns,max_depth=10, learning_rate=0.1,n_batches_per_layer=1,n_trees=3000)

epochs = 1
def input_estimator(xdata,ydata,epochs=None,shuffle=True):
  def input_fn():
    dataset = tf.data.Dataset.from_tensor_slices((dict(xdata), ydata))
    if shuffle:
        dataset = dataset.shuffle(len(ydata))
    dataset = dataset.repeat(epochs)
    dataset = dataset.batch(len(ydata))
    return dataset
  return input_fn
boost_testimator.train(input_estimator(train,label),max_steps=30)
clear_output()

*pseudo evaluate*

In [None]:
results = boost_testimator.evaluate(input_estimator(train,label,epochs=1,shuffle=False))
clear_output()
pd.Series(results).to_frame()
print(pd.Series(results))

Obtendo as previsões do estimator...

In [None]:
predict_input_fn = lambda: tf.data.Dataset.from_tensors(dict(test))
preds = np.array([p['predictions'][0] for p in boost_testimator.predict(predict_input_fn)])

In [None]:
preds.shape,test.shape

In [None]:
submission = pd.DataFrame({"ID" : test_id, "SalePrice" : preds})
submission.to_csv("prediction_values_corr_bruno.csv", index=False)
submission.head(1)

# REFERENCIES

https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data?select=data_description.txt

https://www.tensorflow.org/api_docs/python/tf/estimator/BoostedTreesRegressor

http://blog.datadive.net/interpreting-random-forests/

https://www.tensorflow.org/tutorials/estimator/boosted_trees

https://www.tensorflow.org/tutorials/estimator/boosted_trees_model_understanding

https://medium.com/@dineshmadhup_75545/comparison-of-tensorflow-and-random-forest-model-with-python-92a475f84faa

https://www.kaggle.com/serigne/stacked-regressions-top-4-on-leaderboard

