# A Naive Bayes model in H2O
## Jose M Albornoz
### December 2018

This notebook illustrates a Naive Bayes model in H2O

In [1]:
import h2o

In [2]:
h2o.init()

Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
; Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)
  Starting server from c:\users\albornoj\appdata\local\programs\python\python37\lib\site-packages\h2o\backend\bin\h2o.jar
  Ice root: C:\Users\AlbornoJ\AppData\Local\Temp\tmp1z9fc523
  JVM stdout: C:\Users\AlbornoJ\AppData\Local\Temp\tmp1z9fc523\h2o_AlbornoJ_started_from_python.out
  JVM stderr: C:\Users\AlbornoJ\AppData\Local\Temp\tmp1z9fc523\h2o_AlbornoJ_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321... successful.


0,1
H2O cluster uptime:,05 secs
H2O cluster timezone:,Europe/London
H2O data parsing timezone:,UTC
H2O cluster version:,3.22.0.2
H2O cluster version age:,20 days
H2O cluster name:,H2O_from_python_AlbornoJ_p7jtrg
H2O cluster total nodes:,1
H2O cluster free memory:,3.531 Gb
H2O cluster total cores:,4
H2O cluster allowed cores:,4


# 1.- Import Iris dataset

In [3]:
url = "http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv"

In [4]:
iris = h2o.import_file(url)

Parse progress: |█████████████████████████████████████████████████████████| 100%


In [5]:
iris.shape

(150, 5)

# 2.- Train-test split

In [6]:
train, test = iris.split_frame([0.8])

In [7]:
train.shape

(117, 5)

In [8]:
test.shape

(33, 5)

In [9]:
train.summary()

Unnamed: 0,sepal_len,sepal_wid,petal_len,petal_wid,class
type,real,real,real,real,enum
mins,4.3,2.0,1.0,0.1,
mean,5.864957264957263,3.035042735042735,3.802564102564103,1.2025641025641027,
maxs,7.9,4.4,6.9,2.5,
sigma,0.8589605024153896,0.44514938200103515,1.7796968716349313,0.7700718213774449,
zeros,0,0,0,0,
missing,0,0,0,0,0
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.6,3.1,1.5,0.2,Iris-setosa


# 3.- Model build

In [10]:
from h2o.estimators.naive_bayes import H2ONaiveBayesEstimator

In [11]:
mNB1 = H2ONaiveBayesEstimator()

In [12]:
mNB1.train(["sepal_len", "sepal_wid", "petal_len", "petal_wid"], "class", train)

naivebayes Model Build progress: |████████████████████████████████████████| 100%


In [13]:
mNB1

Model Details
H2ONaiveBayesEstimator :  Naive Bayes
Model Key:  NaiveBayes_model_python_1544625223309_1


ModelMetricsMultinomial: naivebayes
** Reported on train data. **

MSE: 0.039843147898662105
RMSE: 0.1996074845757596
LogLoss: 0.13298926540199857
Mean Per-Class Error: 0.05070603337612323
Confusion Matrix: Row labels: Actual class; Column labels: Predicted class



0,1,2,3,4
Iris-setosa,Iris-versicolor,Iris-virginica,Error,Rate
38.0,0.0,0.0,0.0,0 / 38
0.0,35.0,3.0,0.0789474,3 / 38
0.0,3.0,38.0,0.0731707,3 / 41
38.0,38.0,41.0,0.0512821,6 / 117


Top-3 Hit Ratios: 


0,1
k,hit_ratio
1,0.9487180
2,1.0
3,1.0




# 4.- Predictions

In [14]:
p = mNB1.predict(test)

naivebayes prediction progress: |█████████████████████████████████████████| 100%


In [15]:
p

predict,Iris-setosa,Iris-versicolor,Iris-virginica
Iris-setosa,1,1.12797e-17,4.86798e-26
Iris-setosa,1,9.12559e-13,2.44269e-20
Iris-setosa,1,2.13808e-16,1.04231e-24
Iris-setosa,1,1.81218e-17,1.00327e-25
Iris-setosa,1,3.97711e-16,3.35484e-24
Iris-setosa,1,1.78039e-16,2.55914e-24
Iris-setosa,1,8.11457e-15,1.76681e-22
Iris-setosa,1,4.60101e-14,5.01421e-22
Iris-setosa,1,8.90242e-09,7.41887e-17
Iris-setosa,1,1.01998e-12,7.94434e-21




In [16]:
mNB1.model_performance(test)


ModelMetricsMultinomial: naivebayes
** Reported on test data. **

MSE: 0.007082168489116261
RMSE: 0.08415562066265247
LogLoss: 0.030666761880687697
Mean Per-Class Error: 0.0
Confusion Matrix: Row labels: Actual class; Column labels: Predicted class



0,1,2,3,4
Iris-setosa,Iris-versicolor,Iris-virginica,Error,Rate
12.0,0.0,0.0,0.0,0 / 12
0.0,12.0,0.0,0.0,0 / 12
0.0,0.0,9.0,0.0,0 / 9
12.0,12.0,9.0,0.0,0 / 33


Top-3 Hit Ratios: 


0,1
k,hit_ratio
1,1.0
2,1.0
3,1.0




# 5.- A provision to handle missing fields in production

In [17]:
mNB2 = H2ONaiveBayesEstimator(laplace=2)

In [18]:
mNB2.train(["sepal_len", "sepal_wid", "petal_len", "petal_wid"], "class", train)

naivebayes Model Build progress: |████████████████████████████████████████| 100%


In [19]:
mNB2.model_performance(test)


ModelMetricsMultinomial: naivebayes
** Reported on test data. **

MSE: 0.007057524965820819
RMSE: 0.08400907668711054
LogLoss: 0.030613140062475255
Mean Per-Class Error: 0.0
Confusion Matrix: Row labels: Actual class; Column labels: Predicted class



0,1,2,3,4
Iris-setosa,Iris-versicolor,Iris-virginica,Error,Rate
12.0,0.0,0.0,0.0,0 / 12
0.0,12.0,0.0,0.0,0 / 12
0.0,0.0,9.0,0.0,0 / 9
12.0,12.0,9.0,0.0,0 / 33


Top-3 Hit Ratios: 


0,1
k,hit_ratio
1,1.0
2,1.0
3,1.0


