# Training a neural net with H20 (reduced dset)

**Import and initialise cluster**

In [24]:
library(h2o)
h2o.init()

 Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         5 minutes 56 seconds 
    H2O cluster timezone:       Europe/London 
    H2O data parsing timezone:  UTC 
    H2O cluster version:        3.32.1.2 
    H2O cluster version age:    27 days  
    H2O cluster name:           H2O_started_from_R_lukeswaby-petts_flh799 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   4.00 GB 
    H2O cluster total cores:    4 
    H2O cluster allowed cores:  4 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    H2O API Extensions:         Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4 
    R Version:                  R version 4.0.3 (2020-10-10) 



**Import data**

In [25]:
train = h2o.importFile("../Data/Reduced/train_reduced_dset.csv")
test = h2o.importFile("../Data/Reduced/test_reduced_dset.csv")



**Set factors**

In [26]:
# Rename final cols
colnames(train)[length(colnames(train))] = colnames(test)[length(colnames(test))] = 'Dives'

# Set factors
train$Dives = as.factor(train$Dives)
test$Dives = as.factor(test$Dives)

**Build and train model**

In [38]:
# Build and train the model:
dl <- h2o.deeplearning(y = "Dives",
                       distribution = "bernoulli",
                       hidden = c(200, 200),
                       epochs = 200,
                       train_samples_per_iteration = -1,
                       activation = "RectifierWithDropout",
                       input_dropout_ratio = 0.2,
                       hidden_dropout_ratios = c(0.2, 0.2),
                       single_node_mode = FALSE,
                       balance_classes = FALSE,
                       force_load_balance = FALSE,
                       seed = 23123,
                       score_training_samples = 0,
                       score_validation_samples = 0,
                       training_frame = train,
                       stopping_rounds = 0)



**Evaluate**

In [None]:
# Eval performance:
perf <- h2o.performance(dl, test)

# Generate predictions on a test set (if necessary):
pred <- h2o.predict(dl, newdata = test)

In [39]:
perf

H2OBinomialMetrics: deeplearning

MSE:  0.0134631
RMSE:  0.1160306
LogLoss:  0.1063337
Mean Per-Class Error:  0.01379159
AUC:  0.9969158
AUCPR:  0.9942913
Gini:  0.9938316

Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
          0    1    Error      Rate
0      2323   48 0.020245  =48/2371
1        15 2029 0.007339  =15/2044
Totals 2338 2077 0.014270  =63/4415

Maximum Metrics: Maximum metrics at their respective thresholds
                        metric threshold       value idx
1                       max f1  0.324509    0.984712 231
2                       max f2  0.014510    0.991061 270
3                 max f0point5  0.984470    0.981544 176
4                 max accuracy  0.324509    0.985730 231
5                max precision  0.999999    0.996084   2
6                   max recall  0.000000    1.000000 397
7              max specificity  1.000000    0.997469   0
8             max absolute_mcc  0.324509    0.971445 231
9   max min_per_class_ac

**(Shutdown)**

In [22]:
h2o.shutdown()

Are you sure you want to shutdown the H2O instance running at http://localhost:54321/ (Y/N)? Y
