# TP2 - Organización de Datos
#### Notebook principal

<hr>

### Notebooks utilizados:

- ***pre_processing:*** notebook para el manejo inicial de los dataframes.
- ***feature_generation:*** primer etapa del pipeline. En este notebook se generarán nuevos features para luego, realizar un proceso de selección de los mejores features para cada modelo.
- ***feature_selection*** segunda etapa, donde se buscara encontrar los features con mayor importancia, es decir aquellos que aporten mayor informacion.
- ***parameter_tuning:*** tercer etapa, notebook donde se tunean los parámetros para cada modelo.
- ***predict:*** finalmente, una vez obtenidos los mejores parametros y features para cada modelo, este notebook se encargará de generar el csv con las predicciones finales para el modelo que se le indique.

<hr>


In [1]:
import pandas as pd
import numpy as np
import math

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

seed = 7

In [2]:
import nbimporter

from pre_processing import load_featured_datasets
import feature_generation
import feature_selection
import parameter_tuning
import predict

Importing Jupyter notebook from pre_processing.ipynb
Importing Jupyter notebook from feature_generation.ipynb
Importing Jupyter notebook from feature_selection.ipynb
Importing Jupyter notebook from parameter_tuning.ipynb
Importing Jupyter notebook from predict.ipynb


In [3]:
def escribir_respuesta(ids,predicciones):
    with open("respuesta.csv",'w') as archivo:
        archivo.write("id,target\n")
        for i in range(len(ids)):
            linea = f"{int(ids[i])},{predicciones[i]}"
            archivo.write(f"{linea}\n")

<hr>

# Resultados obtenidos

# area de testing:

In [21]:
import lightgbm as lgb

In [22]:
train,test = load_featured_datasets()

In [6]:
#train['precio'] = train['precio'].map(lambda x: math.log(x))

In [23]:
features = feature_generation.get_features()

In [24]:
best_features = feature_selection.get_best_features_per_category()

In [25]:
best_features

[('metros', 1),
 ('tipodepropiedad', 0),
 ('provincia', 6),
 ('ciudad', 2),
 ('fecha', 4),
 ('descripcion', 0),
 ('metricas', 2),
 ('habitaciones', 0),
 ('antiguedad', 1),
 ('extras', 2),
 ('volcanes', 0),
 ('idzona', 0)]

In [26]:
train_selected = train[['antiguedad', 'habitaciones', 'garages', 'banos', 'metroscubiertos', 'metrostotales',
                        'idzona', 'lat', 'lng', 'gimnasio', 'usosmultiples', 'piscina', 'escuelascercanas',
                        'centroscomercialescercanos']\
                       +features["metros"][1]\
                       +features["tipodepropiedad"][0]\
                       +features["provincia"][6]\
                       +features["ciudad"][2]\
                       +features["fecha"][4]\
                       +features["descripcion"][0]\
                       +features["metricas"][2]\
                       +features["habitaciones"][0]\
                       +features["antiguedad"][1]\
                       +features["extras"][2]\
                       +features["volcanes"][0]\
                       +features["idzona"][0]\
                       +["precio"]]

In [37]:
X = train_selected.drop('precio', axis=1)#.values
Y = train_selected['precio']#.values

In [38]:
X_train, X_val, Y_train, Y_val = train_test_split(X, Y, test_size=0.2, random_state=seed)

In [41]:
params = {
    'boosting_type': 'gbdt',
    'objective': 'regression',
    'metric': 'mae',
    'max_depth': 14, 
    'learning_rate': 0.1,
    'verbose': 0, 
    'early_stopping_round': 50}
n_estimators=10000

In [42]:
d_train = lgb.Dataset(X_train.values, label=Y_train.values)
d_valid = lgb.Dataset(X_val.values, label=Y_val.values)
watchlist = [d_valid]
reg = lgb.train(params, d_train, n_estimators, watchlist, verbose_eval=1)

[1]	valid_0's l1: 1.49102e+06
Training until validation scores don't improve for 50 rounds
[2]	valid_0's l1: 1.38696e+06
[3]	valid_0's l1: 1.29593e+06
[4]	valid_0's l1: 1.2151e+06
[5]	valid_0's l1: 1.14411e+06
[6]	valid_0's l1: 1.08282e+06
[7]	valid_0's l1: 1.02844e+06
[8]	valid_0's l1: 980269
[9]	valid_0's l1: 938316
[10]	valid_0's l1: 901368
[11]	valid_0's l1: 869068
[12]	valid_0's l1: 840567
[13]	valid_0's l1: 815497
[14]	valid_0's l1: 793820
[15]	valid_0's l1: 772996
[16]	valid_0's l1: 754417
[17]	valid_0's l1: 737827
[18]	valid_0's l1: 723534
[19]	valid_0's l1: 710354
[20]	valid_0's l1: 698835
[21]	valid_0's l1: 686919
[22]	valid_0's l1: 676382
[23]	valid_0's l1: 667970
[24]	valid_0's l1: 659379
[25]	valid_0's l1: 651604
[26]	valid_0's l1: 644988
[27]	valid_0's l1: 638921
[28]	valid_0's l1: 633505
[29]	valid_0's l1: 628293
[30]	valid_0's l1: 624064
[31]	valid_0's l1: 619630
[32]	valid_0's l1: 615855
[33]	valid_0's l1: 612190
[34]	valid_0's l1: 609503
[35]	valid_0's l1: 606245
[36]

[313]	valid_0's l1: 517426
[314]	valid_0's l1: 517311
[315]	valid_0's l1: 517166
[316]	valid_0's l1: 517089
[317]	valid_0's l1: 517045
[318]	valid_0's l1: 516973
[319]	valid_0's l1: 516910
[320]	valid_0's l1: 516788
[321]	valid_0's l1: 516688
[322]	valid_0's l1: 516483
[323]	valid_0's l1: 516382
[324]	valid_0's l1: 516378
[325]	valid_0's l1: 516329
[326]	valid_0's l1: 516247
[327]	valid_0's l1: 516229
[328]	valid_0's l1: 516189
[329]	valid_0's l1: 516153
[330]	valid_0's l1: 516114
[331]	valid_0's l1: 515996
[332]	valid_0's l1: 515947
[333]	valid_0's l1: 515969
[334]	valid_0's l1: 515894
[335]	valid_0's l1: 515856
[336]	valid_0's l1: 515847
[337]	valid_0's l1: 515781
[338]	valid_0's l1: 515638
[339]	valid_0's l1: 515583
[340]	valid_0's l1: 515571
[341]	valid_0's l1: 515439
[342]	valid_0's l1: 515438
[343]	valid_0's l1: 515332
[344]	valid_0's l1: 515316
[345]	valid_0's l1: 515289
[346]	valid_0's l1: 515240
[347]	valid_0's l1: 515222
[348]	valid_0's l1: 515108
[349]	valid_0's l1: 515097
[

[623]	valid_0's l1: 504691
[624]	valid_0's l1: 504660
[625]	valid_0's l1: 504619
[626]	valid_0's l1: 504586
[627]	valid_0's l1: 504556
[628]	valid_0's l1: 504560
[629]	valid_0's l1: 504539
[630]	valid_0's l1: 504502
[631]	valid_0's l1: 504489
[632]	valid_0's l1: 504430
[633]	valid_0's l1: 504381
[634]	valid_0's l1: 504376
[635]	valid_0's l1: 504373
[636]	valid_0's l1: 504353
[637]	valid_0's l1: 504392
[638]	valid_0's l1: 504412
[639]	valid_0's l1: 504400
[640]	valid_0's l1: 504399
[641]	valid_0's l1: 504376
[642]	valid_0's l1: 504306
[643]	valid_0's l1: 504224
[644]	valid_0's l1: 504162
[645]	valid_0's l1: 504141
[646]	valid_0's l1: 504118
[647]	valid_0's l1: 504122
[648]	valid_0's l1: 504091
[649]	valid_0's l1: 504015
[650]	valid_0's l1: 503982
[651]	valid_0's l1: 503960
[652]	valid_0's l1: 503980
[653]	valid_0's l1: 503971
[654]	valid_0's l1: 503962
[655]	valid_0's l1: 503952
[656]	valid_0's l1: 503910
[657]	valid_0's l1: 503860
[658]	valid_0's l1: 503834
[659]	valid_0's l1: 503810
[

[933]	valid_0's l1: 497794
[934]	valid_0's l1: 497787
[935]	valid_0's l1: 497791
[936]	valid_0's l1: 497702
[937]	valid_0's l1: 497663
[938]	valid_0's l1: 497614
[939]	valid_0's l1: 497625
[940]	valid_0's l1: 497583
[941]	valid_0's l1: 497576
[942]	valid_0's l1: 497538
[943]	valid_0's l1: 497566
[944]	valid_0's l1: 497533
[945]	valid_0's l1: 497499
[946]	valid_0's l1: 497476
[947]	valid_0's l1: 497460
[948]	valid_0's l1: 497446
[949]	valid_0's l1: 497413
[950]	valid_0's l1: 497389
[951]	valid_0's l1: 497385
[952]	valid_0's l1: 497365
[953]	valid_0's l1: 497365
[954]	valid_0's l1: 497345
[955]	valid_0's l1: 497336
[956]	valid_0's l1: 497347
[957]	valid_0's l1: 497305
[958]	valid_0's l1: 497308
[959]	valid_0's l1: 497296
[960]	valid_0's l1: 497245
[961]	valid_0's l1: 497246
[962]	valid_0's l1: 497243
[963]	valid_0's l1: 497234
[964]	valid_0's l1: 497216
[965]	valid_0's l1: 497215
[966]	valid_0's l1: 497204
[967]	valid_0's l1: 497156
[968]	valid_0's l1: 497135
[969]	valid_0's l1: 497136
[

[1230]	valid_0's l1: 493862
[1231]	valid_0's l1: 493848
[1232]	valid_0's l1: 493854
[1233]	valid_0's l1: 493866
[1234]	valid_0's l1: 493855
[1235]	valid_0's l1: 493832
[1236]	valid_0's l1: 493821
[1237]	valid_0's l1: 493793
[1238]	valid_0's l1: 493772
[1239]	valid_0's l1: 493760
[1240]	valid_0's l1: 493723
[1241]	valid_0's l1: 493721
[1242]	valid_0's l1: 493742
[1243]	valid_0's l1: 493742
[1244]	valid_0's l1: 493729
[1245]	valid_0's l1: 493727
[1246]	valid_0's l1: 493712
[1247]	valid_0's l1: 493690
[1248]	valid_0's l1: 493686
[1249]	valid_0's l1: 493657
[1250]	valid_0's l1: 493664
[1251]	valid_0's l1: 493643
[1252]	valid_0's l1: 493630
[1253]	valid_0's l1: 493622
[1254]	valid_0's l1: 493619
[1255]	valid_0's l1: 493631
[1256]	valid_0's l1: 493595
[1257]	valid_0's l1: 493569
[1258]	valid_0's l1: 493565
[1259]	valid_0's l1: 493545
[1260]	valid_0's l1: 493541
[1261]	valid_0's l1: 493526
[1262]	valid_0's l1: 493523
[1263]	valid_0's l1: 493524
[1264]	valid_0's l1: 493534
[1265]	valid_0's l1:

[1524]	valid_0's l1: 491010
[1525]	valid_0's l1: 491007
[1526]	valid_0's l1: 490991
[1527]	valid_0's l1: 490986
[1528]	valid_0's l1: 490969
[1529]	valid_0's l1: 490966
[1530]	valid_0's l1: 490957
[1531]	valid_0's l1: 490953
[1532]	valid_0's l1: 490970
[1533]	valid_0's l1: 490953
[1534]	valid_0's l1: 490921
[1535]	valid_0's l1: 490923
[1536]	valid_0's l1: 490929
[1537]	valid_0's l1: 490933
[1538]	valid_0's l1: 490924
[1539]	valid_0's l1: 490878
[1540]	valid_0's l1: 490857
[1541]	valid_0's l1: 490855
[1542]	valid_0's l1: 490849
[1543]	valid_0's l1: 490833
[1544]	valid_0's l1: 490814
[1545]	valid_0's l1: 490801
[1546]	valid_0's l1: 490802
[1547]	valid_0's l1: 490791
[1548]	valid_0's l1: 490803
[1549]	valid_0's l1: 490799
[1550]	valid_0's l1: 490786
[1551]	valid_0's l1: 490786
[1552]	valid_0's l1: 490774
[1553]	valid_0's l1: 490766
[1554]	valid_0's l1: 490742
[1555]	valid_0's l1: 490752
[1556]	valid_0's l1: 490736
[1557]	valid_0's l1: 490739
[1558]	valid_0's l1: 490736
[1559]	valid_0's l1:

[1820]	valid_0's l1: 489106
[1821]	valid_0's l1: 489096
[1822]	valid_0's l1: 489086
[1823]	valid_0's l1: 489079
[1824]	valid_0's l1: 489045
[1825]	valid_0's l1: 489050
[1826]	valid_0's l1: 489037
[1827]	valid_0's l1: 489006
[1828]	valid_0's l1: 488972
[1829]	valid_0's l1: 488987
[1830]	valid_0's l1: 488985
[1831]	valid_0's l1: 488984
[1832]	valid_0's l1: 488992
[1833]	valid_0's l1: 488972
[1834]	valid_0's l1: 488953
[1835]	valid_0's l1: 488906
[1836]	valid_0's l1: 488891
[1837]	valid_0's l1: 488878
[1838]	valid_0's l1: 488888
[1839]	valid_0's l1: 488867
[1840]	valid_0's l1: 488862
[1841]	valid_0's l1: 488873
[1842]	valid_0's l1: 488899
[1843]	valid_0's l1: 488884
[1844]	valid_0's l1: 488871
[1845]	valid_0's l1: 488875
[1846]	valid_0's l1: 488878
[1847]	valid_0's l1: 488869
[1848]	valid_0's l1: 488884
[1849]	valid_0's l1: 488860
[1850]	valid_0's l1: 488831
[1851]	valid_0's l1: 488799
[1852]	valid_0's l1: 488777
[1853]	valid_0's l1: 488772
[1854]	valid_0's l1: 488756
[1855]	valid_0's l1:

[2117]	valid_0's l1: 487031
[2118]	valid_0's l1: 487006
[2119]	valid_0's l1: 486998
[2120]	valid_0's l1: 486988
[2121]	valid_0's l1: 486962
[2122]	valid_0's l1: 486956
[2123]	valid_0's l1: 486918
[2124]	valid_0's l1: 486905
[2125]	valid_0's l1: 486892
[2126]	valid_0's l1: 486853
[2127]	valid_0's l1: 486879
[2128]	valid_0's l1: 486890
[2129]	valid_0's l1: 486886
[2130]	valid_0's l1: 486870
[2131]	valid_0's l1: 486870
[2132]	valid_0's l1: 486874
[2133]	valid_0's l1: 486860
[2134]	valid_0's l1: 486857
[2135]	valid_0's l1: 486853
[2136]	valid_0's l1: 486841
[2137]	valid_0's l1: 486833
[2138]	valid_0's l1: 486810
[2139]	valid_0's l1: 486811
[2140]	valid_0's l1: 486791
[2141]	valid_0's l1: 486773
[2142]	valid_0's l1: 486750
[2143]	valid_0's l1: 486745
[2144]	valid_0's l1: 486735
[2145]	valid_0's l1: 486722
[2146]	valid_0's l1: 486724
[2147]	valid_0's l1: 486719
[2148]	valid_0's l1: 486720
[2149]	valid_0's l1: 486709
[2150]	valid_0's l1: 486697
[2151]	valid_0's l1: 486692
[2152]	valid_0's l1:

[2415]	valid_0's l1: 485388
[2416]	valid_0's l1: 485391
[2417]	valid_0's l1: 485389
[2418]	valid_0's l1: 485401
[2419]	valid_0's l1: 485391
[2420]	valid_0's l1: 485386
[2421]	valid_0's l1: 485388
[2422]	valid_0's l1: 485406
[2423]	valid_0's l1: 485372
[2424]	valid_0's l1: 485387
[2425]	valid_0's l1: 485390
[2426]	valid_0's l1: 485379
[2427]	valid_0's l1: 485381
[2428]	valid_0's l1: 485367
[2429]	valid_0's l1: 485363
[2430]	valid_0's l1: 485348
[2431]	valid_0's l1: 485342
[2432]	valid_0's l1: 485343
[2433]	valid_0's l1: 485338
[2434]	valid_0's l1: 485337
[2435]	valid_0's l1: 485322
[2436]	valid_0's l1: 485328
[2437]	valid_0's l1: 485329
[2438]	valid_0's l1: 485324
[2439]	valid_0's l1: 485302
[2440]	valid_0's l1: 485305
[2441]	valid_0's l1: 485290
[2442]	valid_0's l1: 485286
[2443]	valid_0's l1: 485280
[2444]	valid_0's l1: 485265
[2445]	valid_0's l1: 485259
[2446]	valid_0's l1: 485251
[2447]	valid_0's l1: 485226
[2448]	valid_0's l1: 485215
[2449]	valid_0's l1: 485207
[2450]	valid_0's l1:

[2712]	valid_0's l1: 484127
[2713]	valid_0's l1: 484115
[2714]	valid_0's l1: 484126
[2715]	valid_0's l1: 484119
[2716]	valid_0's l1: 484119
[2717]	valid_0's l1: 484122
[2718]	valid_0's l1: 484126
[2719]	valid_0's l1: 484114
[2720]	valid_0's l1: 484106
[2721]	valid_0's l1: 484105
[2722]	valid_0's l1: 484093
[2723]	valid_0's l1: 484092
[2724]	valid_0's l1: 484097
[2725]	valid_0's l1: 484086
[2726]	valid_0's l1: 484083
[2727]	valid_0's l1: 484070
[2728]	valid_0's l1: 484038
[2729]	valid_0's l1: 484027
[2730]	valid_0's l1: 484027
[2731]	valid_0's l1: 484032
[2732]	valid_0's l1: 484031
[2733]	valid_0's l1: 484029
[2734]	valid_0's l1: 484033
[2735]	valid_0's l1: 484025
[2736]	valid_0's l1: 484012
[2737]	valid_0's l1: 484017
[2738]	valid_0's l1: 483996
[2739]	valid_0's l1: 484008
[2740]	valid_0's l1: 483975
[2741]	valid_0's l1: 483981
[2742]	valid_0's l1: 483961
[2743]	valid_0's l1: 483943
[2744]	valid_0's l1: 483957
[2745]	valid_0's l1: 483952
[2746]	valid_0's l1: 483944
[2747]	valid_0's l1:

[3017]	valid_0's l1: 483102
[3018]	valid_0's l1: 483120
[3019]	valid_0's l1: 483110
[3020]	valid_0's l1: 483104
[3021]	valid_0's l1: 483092
[3022]	valid_0's l1: 483096
[3023]	valid_0's l1: 483100
[3024]	valid_0's l1: 483097
[3025]	valid_0's l1: 483094
[3026]	valid_0's l1: 483088
[3027]	valid_0's l1: 483104
[3028]	valid_0's l1: 483116
[3029]	valid_0's l1: 483103
[3030]	valid_0's l1: 483106
[3031]	valid_0's l1: 483107
[3032]	valid_0's l1: 483089
[3033]	valid_0's l1: 483087
[3034]	valid_0's l1: 483078
[3035]	valid_0's l1: 483070
[3036]	valid_0's l1: 483066
[3037]	valid_0's l1: 483077
[3038]	valid_0's l1: 483082
[3039]	valid_0's l1: 483101
[3040]	valid_0's l1: 483089
[3041]	valid_0's l1: 483088
[3042]	valid_0's l1: 483079
[3043]	valid_0's l1: 483074
[3044]	valid_0's l1: 483072
[3045]	valid_0's l1: 483077
[3046]	valid_0's l1: 483076
[3047]	valid_0's l1: 483077
[3048]	valid_0's l1: 483082
[3049]	valid_0's l1: 483086
[3050]	valid_0's l1: 483083
[3051]	valid_0's l1: 483079
[3052]	valid_0's l1:

[3320]	valid_0's l1: 482161
[3321]	valid_0's l1: 482164
[3322]	valid_0's l1: 482164
[3323]	valid_0's l1: 482152
[3324]	valid_0's l1: 482163
[3325]	valid_0's l1: 482145
[3326]	valid_0's l1: 482145
[3327]	valid_0's l1: 482160
[3328]	valid_0's l1: 482157
[3329]	valid_0's l1: 482164
[3330]	valid_0's l1: 482160
[3331]	valid_0's l1: 482164
[3332]	valid_0's l1: 482164
[3333]	valid_0's l1: 482162
[3334]	valid_0's l1: 482160
[3335]	valid_0's l1: 482153
[3336]	valid_0's l1: 482170
[3337]	valid_0's l1: 482153
[3338]	valid_0's l1: 482159
[3339]	valid_0's l1: 482161
[3340]	valid_0's l1: 482151
[3341]	valid_0's l1: 482147
[3342]	valid_0's l1: 482133
[3343]	valid_0's l1: 482125
[3344]	valid_0's l1: 482116
[3345]	valid_0's l1: 482092
[3346]	valid_0's l1: 482079
[3347]	valid_0's l1: 482072
[3348]	valid_0's l1: 482067
[3349]	valid_0's l1: 482079
[3350]	valid_0's l1: 482084
[3351]	valid_0's l1: 482088
[3352]	valid_0's l1: 482087
[3353]	valid_0's l1: 482080
[3354]	valid_0's l1: 482077
[3355]	valid_0's l1:

[3628]	valid_0's l1: 481523
[3629]	valid_0's l1: 481511
[3630]	valid_0's l1: 481515
[3631]	valid_0's l1: 481518
[3632]	valid_0's l1: 481510
[3633]	valid_0's l1: 481509
[3634]	valid_0's l1: 481511
[3635]	valid_0's l1: 481510
[3636]	valid_0's l1: 481516
[3637]	valid_0's l1: 481506
[3638]	valid_0's l1: 481502
[3639]	valid_0's l1: 481502
[3640]	valid_0's l1: 481506
[3641]	valid_0's l1: 481507
[3642]	valid_0's l1: 481498
[3643]	valid_0's l1: 481496
[3644]	valid_0's l1: 481499
[3645]	valid_0's l1: 481500
[3646]	valid_0's l1: 481497
[3647]	valid_0's l1: 481500
[3648]	valid_0's l1: 481507
[3649]	valid_0's l1: 481501
[3650]	valid_0's l1: 481504
[3651]	valid_0's l1: 481512
[3652]	valid_0's l1: 481518
[3653]	valid_0's l1: 481512
[3654]	valid_0's l1: 481514
[3655]	valid_0's l1: 481500
[3656]	valid_0's l1: 481491
[3657]	valid_0's l1: 481493
[3658]	valid_0's l1: 481490
[3659]	valid_0's l1: 481479
[3660]	valid_0's l1: 481478
[3661]	valid_0's l1: 481476
[3662]	valid_0's l1: 481474
[3663]	valid_0's l1:

[3933]	valid_0's l1: 480760
[3934]	valid_0's l1: 480752
[3935]	valid_0's l1: 480746
[3936]	valid_0's l1: 480751
[3937]	valid_0's l1: 480745
[3938]	valid_0's l1: 480753
[3939]	valid_0's l1: 480752
[3940]	valid_0's l1: 480758
[3941]	valid_0's l1: 480753
[3942]	valid_0's l1: 480754
[3943]	valid_0's l1: 480758
[3944]	valid_0's l1: 480761
[3945]	valid_0's l1: 480762
[3946]	valid_0's l1: 480754
[3947]	valid_0's l1: 480753
[3948]	valid_0's l1: 480752
[3949]	valid_0's l1: 480753
[3950]	valid_0's l1: 480762
[3951]	valid_0's l1: 480760
[3952]	valid_0's l1: 480755
[3953]	valid_0's l1: 480763
[3954]	valid_0's l1: 480760
[3955]	valid_0's l1: 480756
[3956]	valid_0's l1: 480753
[3957]	valid_0's l1: 480748
[3958]	valid_0's l1: 480753
[3959]	valid_0's l1: 480760
[3960]	valid_0's l1: 480742
[3961]	valid_0's l1: 480739
[3962]	valid_0's l1: 480735
[3963]	valid_0's l1: 480732
[3964]	valid_0's l1: 480734
[3965]	valid_0's l1: 480739
[3966]	valid_0's l1: 480734
[3967]	valid_0's l1: 480723
[3968]	valid_0's l1:

[4227]	valid_0's l1: 480323
[4228]	valid_0's l1: 480331
[4229]	valid_0's l1: 480321
[4230]	valid_0's l1: 480320
[4231]	valid_0's l1: 480316
[4232]	valid_0's l1: 480318
[4233]	valid_0's l1: 480323
[4234]	valid_0's l1: 480318
[4235]	valid_0's l1: 480312
[4236]	valid_0's l1: 480311
[4237]	valid_0's l1: 480306
[4238]	valid_0's l1: 480307
[4239]	valid_0's l1: 480312
[4240]	valid_0's l1: 480292
[4241]	valid_0's l1: 480287
[4242]	valid_0's l1: 480283
[4243]	valid_0's l1: 480281
[4244]	valid_0's l1: 480274
[4245]	valid_0's l1: 480268
[4246]	valid_0's l1: 480263
[4247]	valid_0's l1: 480256
[4248]	valid_0's l1: 480252
[4249]	valid_0's l1: 480248
[4250]	valid_0's l1: 480241
[4251]	valid_0's l1: 480243
[4252]	valid_0's l1: 480251
[4253]	valid_0's l1: 480247
[4254]	valid_0's l1: 480250
[4255]	valid_0's l1: 480256
[4256]	valid_0's l1: 480258
[4257]	valid_0's l1: 480256
[4258]	valid_0's l1: 480253
[4259]	valid_0's l1: 480260
[4260]	valid_0's l1: 480249
[4261]	valid_0's l1: 480263
[4262]	valid_0's l1:

In [43]:
Y_pred = reg.predict(X_val)

#f = np.vectorize(math.exp)
#Y_pred = f(Y_pred)
#Y_val = f(Y_val)
mean_absolute_error(Y_val,Y_pred)

479877.68598618475

In [15]:
# en Y_pred estan los resultados predecidos y en Y_val los verdaderos. Analizaremos en donde nos
# estamos equivocando:

In [52]:
df = X_val.copy()

In [53]:
df['precio'] = Y_val

In [54]:
df['precio_predicted'] = Y_pred

In [56]:
df['error'] = df.apply(lambda x: abs(x['precio'] - x['precio_predicted']), axis=1)

In [61]:
df.loc[df['error'] < 100000]

Index(['antiguedad', 'habitaciones', 'garages', 'banos', 'metroscubiertos',
       'metrostotales', 'idzona', 'lat', 'lng', 'gimnasio',
       ...
       'antiguedad_binning_2_3_ohe2', 'antiguedad_binning_2_4_ohe2',
       'usd_precio_promedio_mensual', 'usd_subio', 'volcan_cerca',
       'volcanes_cerca', 'idzona_mean_price', 'precio', 'precio_predicted',
       'error'],
      dtype='object', length=116)

In [72]:
df.groupby('dic2016')['error'].mean()

dic2016
0    469154.720787
1    558605.942472
Name: error, dtype: float64

In [71]:
df['dic2016'].value_counts()

0    42246
1     5754
Name: dic2016, dtype: int64

In [64]:
df.groupby('mes')['error'].mean()

mes
1     457792.162680
2     463169.944441
3     456001.596693
4     492672.976551
5     450777.269360
6     470895.831920
7     465610.862803
8     461132.009234
9     484955.481680
10    485033.971211
11    472161.605521
12    525364.342841
Name: error, dtype: float64

### Modelo: Regresion lineal

In [None]:
# ...

### Modelo: Regresion logistica

In [5]:
# ...

### Modelo: SVM

In [6]:
# ...

### Modelo: Decision Tree

In [7]:
# ...

### Modelo: RandomForest

In [4]:
from sklearn.ensemble import RandomForestRegressor

In [5]:
train,test = load_featured_datasets()

In [6]:
train.shape

(240000, 137)

In [7]:
train.fillna(train.mean(), inplace = True)

In [8]:
train.shape

(240000, 137)

In [10]:
X = train.drop('precio', axis=1).values
Y = train['precio'].values
X_train, X_val, Y_train, Y_val = train_test_split(X, Y, test_size=0.2)

In [11]:
regressor = RandomForestRegressor(n_estimators = 100, random_state = seed, verbose=2, max_depth=10) 
regressor.fit(X_train, Y_train)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


building tree 1 of 100


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    2.7s remaining:    0.0s


building tree 2 of 100
building tree 3 of 100
building tree 4 of 100
building tree 5 of 100
building tree 6 of 100
building tree 7 of 100
building tree 8 of 100
building tree 9 of 100
building tree 10 of 100
building tree 11 of 100
building tree 12 of 100
building tree 13 of 100
building tree 14 of 100
building tree 15 of 100
building tree 16 of 100
building tree 17 of 100
building tree 18 of 100
building tree 19 of 100
building tree 20 of 100
building tree 21 of 100
building tree 22 of 100
building tree 23 of 100
building tree 24 of 100
building tree 25 of 100
building tree 26 of 100
building tree 27 of 100
building tree 28 of 100
building tree 29 of 100
building tree 30 of 100
building tree 31 of 100
building tree 32 of 100
building tree 33 of 100
building tree 34 of 100
building tree 35 of 100
building tree 36 of 100
building tree 37 of 100
building tree 38 of 100
building tree 39 of 100
building tree 40 of 100
building tree 41 of 100
building tree 42 of 100
building tree 43 of 100


[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:  5.1min finished


RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=10,
                      max_features='auto', max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=100,
                      n_jobs=None, oob_score=False, random_state=7, verbose=2,
                      warm_start=False)

In [12]:
from sklearn import metrics

In [15]:
y_pred = regressor.predict(X_val)
print('MAE: ', int(metrics.mean_absolute_error(Y_val, y_pred)))

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s


MAE:  688548


[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.6s finished


In [16]:
y_pred2 = regressor.predict(X_train)
print('MAE: ', int(metrics.mean_absolute_error(Y_train, y_pred2)))

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s


MAE:  663067


[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    2.2s finished


In [20]:
names = train.columns.to_list()
print(sorted(zip(map(lambda x: round(x, 4), regressor.feature_importances_), names), reverse=True))

[(0.4881, 'metroscubiertos'), (0.2571, 'ciudad_le'), (0.0352, 'ciudad_muycara'), (0.0317, 'banos'), (0.0158, 'tipodepropiedad_1_pol'), (0.0141, 'dia'), (0.0129, 'precio_promedio_metrocubierto_mes'), (0.0125, 'antiguedad'), (0.0113, 'garages'), (0.0105, 'servicio'), (0.0096, 'es_Veracruz'), (0.0093, 'metroscubiertos_mean'), (0.009, 'precio'), (0.0085, 'intercept_pol'), (0.0069, 'tipodepropiedad_2_pol'), (0.0065, 'tipodepropiedad_0_pol'), (0.005, 'habitaciones'), (0.0042, 'aniomes'), (0.0033, 'tipodepropiedad_3_pol'), (0.0031, 'ciudad_barata'), (0.0025, 'es_apart'), (0.0024, 'tipodepropiedad_4_pol'), (0.002, 'tipodepropiedad_le'), (0.002, 'ciudad_cara'), (0.0019, 'tipodepropiedad_8_ohe'), (0.0017, 'lujo'), (0.0017, 'aniomes_scaled'), (0.0015, 'mes'), (0.0015, 'es_casa'), (0.0014, 'tipodepropiedad_7_pol'), (0.0014, 'hab_binning_1_ohe'), (0.0013, 'provincia_10_ohe'), (0.0013, 'gimnasio'), (0.0012, 'parrilla'), (0.0011, 'piscina'), (0.0011, 'es_Distrito Federal'), (0.001, 'hab_binning_7_ohe

### Modelo: XGBoost

_Generacion del dataset de train con sus features_

In [17]:
import xgboost
from sklearn.model_selection import GridSearchCV

In [18]:
train,test = load_featured_datasets()

In [19]:
train['precio'] = train['precio'].map(lambda x: math.log(x))

In [20]:
best_features = feature_selection.get_best_features_per_category()

In [21]:
features = feature_generation.get_features()

In [22]:
train_selected = train[['antiguedad', 'habitaciones', 'garages', 'banos', 'metroscubiertos', 'metrostotales',
                        'idzona', 'lat', 'lng', 'gimnasio', 'usosmultiples', 'piscina', 'escuelascercanas',
                        'centroscomercialescercanos']\
                       +features["metros"][1]\
                       +features["tipodepropiedad"][0]\
                       +features["provincia"][6]\
                       +features["ciudad"][2]\
                       +features["fecha"][4]\
                       +features["descripcion"][0]\
                       +features["metricas"][2]\
                       +features["habitaciones"][0]\
                       +features["antiguedad"][1]\
                       +features["extras"][2]\
                       +features["volcanes"][0]\
                       +features["idzona"][0]\
                       +["precio"]]

test_selected = test[['antiguedad', 'habitaciones', 'garages', 'banos', 'metroscubiertos', 'metrostotales',
                        'idzona', 'lat', 'lng', 'gimnasio', 'usosmultiples', 'piscina', 'escuelascercanas',
                        'centroscomercialescercanos']\
                       +features["metros"][1]\
                       +features["tipodepropiedad"][0]\
                       +features["provincia"][6]\
                       +features["ciudad"][2]\
                       +features["fecha"][4]\
                       +features["descripcion"][0]\
                       +features["metricas"][2]\
                       +features["habitaciones"][0]\
                       +features["antiguedad"][1]\
                       +features["extras"][2]\
                       +features["volcanes"][0]\
                       +features["idzona"][0]]

In [23]:
X = train_selected.drop('precio', axis=1).values
Y = train_selected['precio'].values

In [27]:
X_train, X_val, Y_train, Y_val = train_test_split(X, Y, test_size=0.2)

In [28]:
parametros = {
    'max_depth':[11,12,13,14,15],
    'n_estimators':[100,110,120,130,140],
    'learning_rate': [0.05,0.08,0.1,0.15,0.2,0.3],
    'subsample':[0.5,0.8,0.9,0.7],
    'min_child_weight':[5,10,15,20,30]
}

In [29]:
reg = xgboost.XGBRegressor(max_depth=17,n_estimators=240 ,learning_rate=0.06, verbosity=2,subsample=0.9, min_child_weight=10, n_jobs=2)
reg.fit(X_train,Y_train)

[20:09:07] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 56 extra nodes, 0 pruned nodes, max_depth=7
[20:09:09] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 66 extra nodes, 0 pruned nodes, max_depth=8
[20:09:10] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 64 extra nodes, 0 pruned nodes, max_depth=8
[20:09:11] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 66 extra nodes, 0 pruned nodes, max_depth=8
[20:09:12] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 68 extra nodes, 0 pruned nodes, max_depth=7
[20:09:13] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 70 extra nodes, 0 pruned nodes, max_depth=8
[20:09:14] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 78 extra nodes, 0 pruned nodes, max_depth=8
[20:09:15] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 80 extra nod

[20:10:52] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 7162 extra nodes, 0 pruned nodes, max_depth=17
[20:10:55] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6928 extra nodes, 0 pruned nodes, max_depth=17
[20:10:59] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 7462 extra nodes, 0 pruned nodes, max_depth=17
[20:11:01] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6518 extra nodes, 0 pruned nodes, max_depth=17
[20:11:05] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 7162 extra nodes, 0 pruned nodes, max_depth=17
[20:11:09] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 7204 extra nodes, 0 pruned nodes, max_depth=17
[20:11:11] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 8340 extra nodes, 0 pruned nodes, max_depth=17
[20:11:13] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 

[20:15:54] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 1830 extra nodes, 0 pruned nodes, max_depth=17
[20:15:59] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 3068 extra nodes, 0 pruned nodes, max_depth=17
[20:16:03] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 3698 extra nodes, 0 pruned nodes, max_depth=17
[20:16:08] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 2018 extra nodes, 0 pruned nodes, max_depth=17
[20:16:12] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 1414 extra nodes, 0 pruned nodes, max_depth=17
[20:16:17] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 3068 extra nodes, 0 pruned nodes, max_depth=17
[20:16:21] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 3336 extra nodes, 0 pruned nodes, max_depth=17
[20:16:27] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 

[20:20:23] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 3606 extra nodes, 0 pruned nodes, max_depth=17
[20:20:25] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 3286 extra nodes, 0 pruned nodes, max_depth=17
[20:20:27] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 1394 extra nodes, 0 pruned nodes, max_depth=17
[20:20:29] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 3344 extra nodes, 0 pruned nodes, max_depth=17
[20:20:31] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 3322 extra nodes, 0 pruned nodes, max_depth=17
[20:20:34] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 4776 extra nodes, 0 pruned nodes, max_depth=17
[20:20:35] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 1282 extra nodes, 0 pruned nodes, max_depth=17
[20:20:38] INFO: /workspace/src/tree/updater_prune.cc:74: tree pruning end, 

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0,
             importance_type='gain', learning_rate=0.06, max_delta_step=0,
             max_depth=17, min_child_weight=10, missing=None, n_estimators=240,
             n_jobs=2, nthread=None, objective='reg:linear', random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
             silent=None, subsample=0.9, verbosity=2)

_Comprobacion contra el conjunto de validacion_

In [26]:
Y_pred = reg.predict(X_val)

f = np.vectorize(math.exp)
Y_pred = f(Y_pred)
Y_val = f(Y_val)
mean_absolute_error(Y_val,Y_pred)

473494.35926695546

In [32]:
# preparamos el csv de respuesta para kaggle

In [30]:
ids = test_selected.index.values
X_test = test_selected.values

In [31]:
test_predict = reg.predict(X_test)

f = np.vectorize(math.exp)
test_predict = f(test_predict)

In [33]:
escribir_respuesta(ids, test_predict)

### Modelo: CatBoost

In [None]:
#...

### Modelo: LightGBM

In [73]:
import lightgbm as lgb

In [74]:
train,test = load_featured_datasets()

In [75]:
features = feature_generation.get_features()

In [76]:
best_features = feature_selection.get_best_features_per_category()

In [98]:
features['metros']

{0: ['metroscubiertos_alt', 'metrostotales_alt'],
 1: ['metroscubiertos_alt',
  'metrostotales_alt',
  'metrostotales_confiables_alt'],
 2: ['metroscubiertos_i1', 'metrostotales_i1'],
 3: ['metroscubiertos_i1', 'metrostotales_i1', 'metrostotales_confiables_alt'],
 4: ['metroscubiertos_alt', 'metrostotales_i2'],
 5: ['metroscubiertos_alt',
  'metrostotales_i2',
  'metrostotales_confiables_alt']}

In [77]:
best_features

[('metros', 1),
 ('tipodepropiedad', 0),
 ('provincia', 6),
 ('ciudad', 2),
 ('fecha', 4),
 ('descripcion', 0),
 ('metricas', 2),
 ('habitaciones', 0),
 ('antiguedad', 1),
 ('extras', 2),
 ('volcanes', 0),
 ('idzona', 0)]

In [114]:
train_selected = train[['antiguedad', 'habitaciones', 'garages', 'banos', 'metroscubiertos', 'metrostotales',
                        'idzona', 'lat', 'lng', 'gimnasio', 'usosmultiples', 'piscina', 'escuelascercanas',
                        'centroscomercialescercanos']\
                       +features["metros"][1]\
                       +features["tipodepropiedad"][0]\
                       +features["provincia"][6]\
                       +features["ciudad"][2]\
                       +features["fecha"][4]\
                       +features["descripcion"][0]\
                       +features["metricas"][2]\
                       +features["habitaciones"][0]\
                       +features["antiguedad"][1]\
                       +features["extras"][2]\
                       +features["volcanes"][0]\
                       +features["idzona"][0]\
                       +["precio"]]

In [115]:
X = train_selected.drop('precio', axis=1).values
Y = train_selected['precio'].values

In [116]:
X_train, X_val, Y_train, Y_val = train_test_split(X, Y, test_size=0.2, random_state=seed)

In [117]:
params = {
    'boosting_type': 'gbdt',
    'objective': 'regression',
    'metric': 'mae',
    'max_depth': 14, 
    'learning_rate': 0.2,
    'verbose': 0, 
    'early_stopping_round': 50}
n_estimators=10000

In [118]:
d_train = lgb.Dataset(X_train, label=Y_train)
d_valid = lgb.Dataset(X_val, label=Y_val)
watchlist = [d_valid]
reg = lgb.train(params, d_train, n_estimators, watchlist, verbose_eval=1)

[1]	valid_0's l1: 1.37848e+06
Training until validation scores don't improve for 50 rounds
[2]	valid_0's l1: 1.20026e+06
[3]	valid_0's l1: 1.06646e+06
[4]	valid_0's l1: 964527
[5]	valid_0's l1: 886962
[6]	valid_0's l1: 825819
[7]	valid_0's l1: 780005
[8]	valid_0's l1: 742121
[9]	valid_0's l1: 713473
[10]	valid_0's l1: 687841
[11]	valid_0's l1: 669070
[12]	valid_0's l1: 654806
[13]	valid_0's l1: 642552
[14]	valid_0's l1: 632085
[15]	valid_0's l1: 623036
[16]	valid_0's l1: 616622
[17]	valid_0's l1: 611148
[18]	valid_0's l1: 606440
[19]	valid_0's l1: 600293
[20]	valid_0's l1: 596400
[21]	valid_0's l1: 592895
[22]	valid_0's l1: 589279
[23]	valid_0's l1: 586698
[24]	valid_0's l1: 583480
[25]	valid_0's l1: 580945
[26]	valid_0's l1: 579340
[27]	valid_0's l1: 576913
[28]	valid_0's l1: 575237
[29]	valid_0's l1: 572096
[30]	valid_0's l1: 570468
[31]	valid_0's l1: 568829
[32]	valid_0's l1: 567254
[33]	valid_0's l1: 566310
[34]	valid_0's l1: 565160
[35]	valid_0's l1: 563258
[36]	valid_0's l1: 5621

[313]	valid_0's l1: 513172
[314]	valid_0's l1: 513125
[315]	valid_0's l1: 513092
[316]	valid_0's l1: 513154
[317]	valid_0's l1: 513044
[318]	valid_0's l1: 513021
[319]	valid_0's l1: 512951
[320]	valid_0's l1: 512956
[321]	valid_0's l1: 512850
[322]	valid_0's l1: 512802
[323]	valid_0's l1: 512664
[324]	valid_0's l1: 512586
[325]	valid_0's l1: 512481
[326]	valid_0's l1: 512538
[327]	valid_0's l1: 512486
[328]	valid_0's l1: 512340
[329]	valid_0's l1: 512216
[330]	valid_0's l1: 512170
[331]	valid_0's l1: 512064
[332]	valid_0's l1: 511962
[333]	valid_0's l1: 511925
[334]	valid_0's l1: 511863
[335]	valid_0's l1: 511841
[336]	valid_0's l1: 511708
[337]	valid_0's l1: 511644
[338]	valid_0's l1: 511596
[339]	valid_0's l1: 511572
[340]	valid_0's l1: 511551
[341]	valid_0's l1: 511514
[342]	valid_0's l1: 511462
[343]	valid_0's l1: 511476
[344]	valid_0's l1: 511431
[345]	valid_0's l1: 511429
[346]	valid_0's l1: 511443
[347]	valid_0's l1: 511366
[348]	valid_0's l1: 511306
[349]	valid_0's l1: 511244
[

[623]	valid_0's l1: 502016
[624]	valid_0's l1: 501982
[625]	valid_0's l1: 501985
[626]	valid_0's l1: 501975
[627]	valid_0's l1: 502002
[628]	valid_0's l1: 501962
[629]	valid_0's l1: 501937
[630]	valid_0's l1: 501924
[631]	valid_0's l1: 501931
[632]	valid_0's l1: 501862
[633]	valid_0's l1: 501834
[634]	valid_0's l1: 501766
[635]	valid_0's l1: 501745
[636]	valid_0's l1: 501754
[637]	valid_0's l1: 501773
[638]	valid_0's l1: 501770
[639]	valid_0's l1: 501770
[640]	valid_0's l1: 501752
[641]	valid_0's l1: 501748
[642]	valid_0's l1: 501765
[643]	valid_0's l1: 501754
[644]	valid_0's l1: 501685
[645]	valid_0's l1: 501644
[646]	valid_0's l1: 501590
[647]	valid_0's l1: 501591
[648]	valid_0's l1: 501581
[649]	valid_0's l1: 501612
[650]	valid_0's l1: 501543
[651]	valid_0's l1: 501540
[652]	valid_0's l1: 501484
[653]	valid_0's l1: 501469
[654]	valid_0's l1: 501456
[655]	valid_0's l1: 501450
[656]	valid_0's l1: 501455
[657]	valid_0's l1: 501469
[658]	valid_0's l1: 501405
[659]	valid_0's l1: 501433
[

[933]	valid_0's l1: 497524
[934]	valid_0's l1: 497537
[935]	valid_0's l1: 497519
[936]	valid_0's l1: 497482
[937]	valid_0's l1: 497475
[938]	valid_0's l1: 497485
[939]	valid_0's l1: 497456
[940]	valid_0's l1: 497468
[941]	valid_0's l1: 497450
[942]	valid_0's l1: 497469
[943]	valid_0's l1: 497474
[944]	valid_0's l1: 497489
[945]	valid_0's l1: 497451
[946]	valid_0's l1: 497457
[947]	valid_0's l1: 497434
[948]	valid_0's l1: 497449
[949]	valid_0's l1: 497436
[950]	valid_0's l1: 497355
[951]	valid_0's l1: 497334
[952]	valid_0's l1: 497332
[953]	valid_0's l1: 497336
[954]	valid_0's l1: 497291
[955]	valid_0's l1: 497268
[956]	valid_0's l1: 497305
[957]	valid_0's l1: 497286
[958]	valid_0's l1: 497280
[959]	valid_0's l1: 497245
[960]	valid_0's l1: 497257
[961]	valid_0's l1: 497253
[962]	valid_0's l1: 497228
[963]	valid_0's l1: 497214
[964]	valid_0's l1: 497177
[965]	valid_0's l1: 497152
[966]	valid_0's l1: 497171
[967]	valid_0's l1: 497128
[968]	valid_0's l1: 497134
[969]	valid_0's l1: 497099
[

[1230]	valid_0's l1: 494993
[1231]	valid_0's l1: 494972
[1232]	valid_0's l1: 494965
[1233]	valid_0's l1: 494977
[1234]	valid_0's l1: 494987
[1235]	valid_0's l1: 495000
[1236]	valid_0's l1: 494990
[1237]	valid_0's l1: 495010
[1238]	valid_0's l1: 495000
[1239]	valid_0's l1: 495021
[1240]	valid_0's l1: 495021
[1241]	valid_0's l1: 494993
[1242]	valid_0's l1: 494989
[1243]	valid_0's l1: 494952
[1244]	valid_0's l1: 494950
[1245]	valid_0's l1: 494952
[1246]	valid_0's l1: 494930
[1247]	valid_0's l1: 494900
[1248]	valid_0's l1: 494853
[1249]	valid_0's l1: 494854
[1250]	valid_0's l1: 494837
[1251]	valid_0's l1: 494848
[1252]	valid_0's l1: 494857
[1253]	valid_0's l1: 494868
[1254]	valid_0's l1: 494890
[1255]	valid_0's l1: 494880
[1256]	valid_0's l1: 494826
[1257]	valid_0's l1: 494789
[1258]	valid_0's l1: 494821
[1259]	valid_0's l1: 494785
[1260]	valid_0's l1: 494806
[1261]	valid_0's l1: 494812
[1262]	valid_0's l1: 494834
[1263]	valid_0's l1: 494836
[1264]	valid_0's l1: 494844
[1265]	valid_0's l1:

[1537]	valid_0's l1: 493350
[1538]	valid_0's l1: 493348
[1539]	valid_0's l1: 493358
[1540]	valid_0's l1: 493358
[1541]	valid_0's l1: 493375
[1542]	valid_0's l1: 493373
[1543]	valid_0's l1: 493371
[1544]	valid_0's l1: 493375
[1545]	valid_0's l1: 493378
[1546]	valid_0's l1: 493341
[1547]	valid_0's l1: 493364
[1548]	valid_0's l1: 493344
[1549]	valid_0's l1: 493331
[1550]	valid_0's l1: 493295
[1551]	valid_0's l1: 493269
[1552]	valid_0's l1: 493284
[1553]	valid_0's l1: 493272
[1554]	valid_0's l1: 493272
[1555]	valid_0's l1: 493305
[1556]	valid_0's l1: 493310
[1557]	valid_0's l1: 493321
[1558]	valid_0's l1: 493312
[1559]	valid_0's l1: 493321
[1560]	valid_0's l1: 493323
[1561]	valid_0's l1: 493287
[1562]	valid_0's l1: 493266
[1563]	valid_0's l1: 493243
[1564]	valid_0's l1: 493238
[1565]	valid_0's l1: 493239
[1566]	valid_0's l1: 493222
[1567]	valid_0's l1: 493172
[1568]	valid_0's l1: 493163
[1569]	valid_0's l1: 493123
[1570]	valid_0's l1: 493062
[1571]	valid_0's l1: 493061
[1572]	valid_0's l1:

[1832]	valid_0's l1: 491687
[1833]	valid_0's l1: 491692
[1834]	valid_0's l1: 491694
[1835]	valid_0's l1: 491679
[1836]	valid_0's l1: 491698
[1837]	valid_0's l1: 491671
[1838]	valid_0's l1: 491648
[1839]	valid_0's l1: 491662
[1840]	valid_0's l1: 491663
[1841]	valid_0's l1: 491659
[1842]	valid_0's l1: 491655
[1843]	valid_0's l1: 491636
[1844]	valid_0's l1: 491618
[1845]	valid_0's l1: 491602
[1846]	valid_0's l1: 491612
[1847]	valid_0's l1: 491623
[1848]	valid_0's l1: 491601
[1849]	valid_0's l1: 491599
[1850]	valid_0's l1: 491595
[1851]	valid_0's l1: 491606
[1852]	valid_0's l1: 491590
[1853]	valid_0's l1: 491588
[1854]	valid_0's l1: 491579
[1855]	valid_0's l1: 491599
[1856]	valid_0's l1: 491572
[1857]	valid_0's l1: 491566
[1858]	valid_0's l1: 491548
[1859]	valid_0's l1: 491553
[1860]	valid_0's l1: 491549
[1861]	valid_0's l1: 491544
[1862]	valid_0's l1: 491554
[1863]	valid_0's l1: 491565
[1864]	valid_0's l1: 491558
[1865]	valid_0's l1: 491568
[1866]	valid_0's l1: 491575
[1867]	valid_0's l1:

In [22]:
Y_pred = reg.predict(X_val)
mean_absolute_error(Y_val,Y_pred)

487268.28239609225

In [331]:
# preparamos el csv de respuesta para kaggle

In [46]:
ids = test_selected.index.values
X_test = test_selected.values

In [47]:
test_predict = reg.predict(X_test)
escribir_respuesta(ids, test_predict)

In [None]:
# best params so far
params = {
    'boosting_type': 'gbdt',
    'objective': 'regression',
    'metric': 'mae',
    'max_depth': 14, 
    'learning_rate': 0.05,
    'verbose': 0, 
    'early_stopping_round': 200}
n_estimators=20000

### Modelo: KNN

### Modelo: Neural Networks

In [3]:
# ...