### Create a simple h2o cars price model

Model based on scraped gaspedaal data

In [16]:
import pandas as pd
import h2o
from h2o.automl import H2OAutoML

### import csv file

In [25]:
cars = pd.read_csv("cars.csv")
cars = (
    cars
    .assign(ouderdom = 2016 - cars.bouwjaar + 1)
)

### start h2o and upload

In [3]:
h2o.init()

Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "13.0.1" 2019-10-15; OpenJDK Runtime Environment (build 13.0.1+9); OpenJDK 64-Bit Server VM (build 13.0.1+9, mixed mode, sharing)
  Starting server from /Users/lamlon/anaconda3/lib/python3.7/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /var/folders/y6/jrqktfnx2dxdcryrpygdr2s9rt72yb/T/tmptt0666vi
  JVM stdout: /var/folders/y6/jrqktfnx2dxdcryrpygdr2s9rt72yb/T/tmptt0666vi/h2o_lamlon_started_from_python.out
  JVM stderr: /var/folders/y6/jrqktfnx2dxdcryrpygdr2s9rt72yb/T/tmptt0666vi/h2o_lamlon_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O_cluster_uptime:,02 secs
H2O_cluster_timezone:,Europe/Amsterdam
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.30.0.1
H2O_cluster_version_age:,23 days
H2O_cluster_name:,H2O_from_python_lamlon_yujkcv
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,4 Gb
H2O_cluster_total_cores:,8
H2O_cluster_allowed_cores:,8


In [27]:
carsh = h2o.H2OFrame(cars)

Parse progress: |█████████████████████████████████████████████████████████| 100%


In [28]:
train, test = carsh.split_frame()

In [29]:
train.head(5)

KM,bouwjaar,Prijs,Transmissie,Merk,Model,Motor,Brandstof,ouderdom
12865,2013,7250,Handgeschakeld,Peugeot,107,998,Benzine,4
4620,2016,11339,Handgeschakeld,Peugeot,108,998,Benzine,1
1270,2016,14368,Automaat,Mitsubishi,Space,999,Benzine,1
198,2003,1275,Handgeschakeld,Ford,Focus,1388,Benzine,14
451,2016,17144,Handgeschakeld,Ford,Fiesta,1499,Diesel,1




### Just lazy, turn on autoML

In [17]:
aml = H2OAutoML(max_runtime_secs = 30)

In [30]:
aml.train(
    y = "Prijs",
    x = ["ouderdom", "KM"],
    training_frame = train,
    validation_frame = test
)

AutoML progress: |
21:17:48.836: User specified a validation frame with cross-validation still enabled. Please note that the models will still be validated using cross-validation only, the validation frame will be used to provide purely informative validation metrics on the trained models.

████████████████████████████████████████████████████████| 100%


In [31]:
aml.leaderboard

model_id,mean_residual_deviance,rmse,mse,mae,rmsle
StackedEnsemble_AllModels_AutoML_20200426_211748,429592000.0,20726.6,429592000.0,8187.04,0.70583
StackedEnsemble_BestOfFamily_AutoML_20200426_211748,432323000.0,20792.4,432323000.0,8172.25,0.706208
GBM_2_AutoML_20200426_211748,528505000.0,22989.2,528505000.0,11399.2,1.18106
GBM_5_AutoML_20200426_211748,532042000.0,23066.0,532042000.0,11499.9,1.18864
GBM_3_AutoML_20200426_211748,532106000.0,23067.4,532106000.0,11502.4,1.18881
GBM_1_AutoML_20200426_211748,532108000.0,23067.5,532108000.0,11500.1,1.18873
GBM_4_AutoML_20200426_211748,534682000.0,23123.2,534682000.0,11605.7,1.19686
GLM_1_AutoML_20200426_211748,554333000.0,23544.3,554333000.0,12126.9,1.23311
DRF_1_AutoML_20200426_211748,621854000.0,24937.0,621854000.0,8527.11,0.713852
XGBoost_1_AutoML_20200426_211748,624978000.0,24999.6,624978000.0,11493.8,1.14258




In [32]:
perf = aml.leader.model_performance(test)

In [33]:
perf


ModelMetricsRegressionGLM: stackedensemble
** Reported on test data. **

MSE: 495401577.72121614
RMSE: 22257.618419795414
MAE: 8321.435026980478
RMSLE: 0.7082577129686142
R^2: 0.2092245177908143
Mean Residual Deviance: 495401577.72121614
Null degrees of freedom: 108969
Residual degrees of freedom: 108960
Null deviance: 68269864677839.06
Residual deviance: 53983909924280.92
AIC: 2490940.676862295




In [36]:
#### dump model to mojo
modelpath = h2o.save_model(aml.leader, path="", force= True)

In [37]:
modelpath

'/Users/lamlon/Documents/Personal/Projects/cars_model/StackedEnsemble_AllModels_AutoML_20200426_211748'