Thank you for reading, in this new version I installed the "tfaddons" R package in order to introduce weight normalization to the dense layers of the residual network, which improved the performance noticeably relative to the previous version.

Recently I have been studying the book "Deep Learning with R" by François Chollet, and I thought this competition would be a nice opportunity to practice. I will put the corresponding sections from the book for each step in case anyone is interested.

The model in this notebook is inspired on the following notebook by Demetry Pascal from the "Mechanisms of Action" competition (it looks like his new user name is demetrypascal2) @ https://www.kaggle.com/demetrypascal/fork-of-2heads-looper-super-puper-plate I thought it would be interesting to try using a similar model in this competition after I read the notebook by Laurent Pourchot: https://www.kaggle.com/pourchot/decision-forest-fed-by-neural-network In that notebook he feeds a neural network into a decision forest, so I thought a similar behaviour might be achieved through residual connections as in Demetry's model, although the model in this notebook is much simpler and there is probably still much room for improvement.

In [1]:
# This R environment comes with many helpful analytics packages installed
# It is defined by the kaggle/rstats Docker image: https://github.com/kaggle/docker-rstats
# For example, here's a helpful package to load

library(tidyverse) # metapackage of all tidyverse packages

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

list.files(path = "../input")

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.3.3     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.1     [32m✔[39m [34mdplyr  [39m 1.0.5
[32m✔[39m [34mtidyr  [39m 1.1.3     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.4.0     [32m✔[39m [34mforcats[39m 0.5.0

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



In [2]:
# Tensorflow addons:
install.packages("tfaddons")
library(tfaddons)
tfaddons::install_tfaddons()

# Tensorflow / Keras:
library(tensorflow)
library(keras)

# Competition metric similar to the one in Demetry's notebook:
altloss <- function(y_true, y_pred){
  y_pred <- k_clip(y_pred, 0.0+10E-15, 1.0-10E-15)
  k_mean(metric_categorical_crossentropy(y_true,y_pred))
}

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)


Attaching package: ‘tfaddons’


The following object is masked from ‘package:readr’:

    parse_time




Using virtual environment '/usr/local/share/.virtualenvs/r-reticulate' ...


In [3]:
# Input:
train <- read.csv('../input/tabular-playground-series-jun-2021/train.csv',row.names=1)
test <- read.csv('../input/tabular-playground-series-jun-2021/test.csv',row.names=1)
sample_submission <- read.csv('../input/tabular-playground-series-jun-2021/sample_submission.csv',row.names=1)

# Get training features matrix:
xall <- as.matrix(train[,c(1:ncol(train)-1)])

# Get training targets matrix as one-hot-encoding (see section 3.5.2 of the book by Chollet):
yint <- as.integer(gsub("Class_","",train[,ncol(train)])) - 1
yohe <- to_categorical(yint)

# Prepare out-of-fold and testing-set predictions:
yoof <- 0*yohe
ytest <- matrix(0,nrow(test),ncol(sample_submission),dimnames=list(rownames(test),colnames(sample_submission)))

I got the idea to use an embedding layer on this dataset from Laurent's notebook. One thing I am still confused about is why the same embedding layer works for all the features. I tried using a different embedding layer for each feature and then concatenating them as shown below, but the performance is worse. I guess this might mean that the integer values share the same meaning across many features (e.g. if a value of 0 means 'lilies' in all columns).

In [4]:
# Column-wise embedding layers:
# embedded <- list()
# for(i in 1:ncol(xall)){
#   layer <- input[,i] %>%
#   layer_embedding(max(xall[,i])+1,1)
#   embedded <- append(embedded,layer)
# }
# embedded <- layer_concatenate(embedded)

In [5]:
# Model function using the R functional API (see section 7.1.1 of the book by Chollet):

get_model <- function(incols=ncol(xall), outcols=ncol(yohe)){
  input <- layer_input(shape=c(incols))
  
  # Embedding layer as in Laurent's notebook:
    
  embedded <- input %>%
    layer_embedding(max(xall)+1,2) %>%
    layer_flatten()
  
  # Network with residual connections inspired by Demetry's notebook:
    
  hidden <- embedded %>%
    layer_dropout(0.2) %>%
    layer_weight_normalization(layer_dense(units=32,activation='selu',kernel_initializer="lecun_normal"))
  
  output <- layer_concatenate(list(embedded,hidden)) %>%
    layer_dropout(0.2) %>%
    layer_weight_normalization(layer_dense(units=32,activation='relu'))
  
  output <- layer_concatenate(list(embedded,hidden,output)) %>%
    layer_dropout(0.3) %>%
    layer_weight_normalization(layer_dense(units=32,activation='elu')) %>%
    layer_dense(units=outcols,activation='softmax')
  
  model <- keras_model(input,output)
  model %>% compile(
    optimizer=optimizer_adam(),
    loss=loss_categorical_crossentropy,
    metrics=custom_metric('altloss',altloss)
  )
    
  return(model)
}

In [6]:
# Stratified K-fold cross validation:

seeds <- 1
folds <- 10

for(seed in 1:seeds){
    
    # Section 4.2.1 of the book by Chollet shows how to do k-fold cross validation:
    # indices <- sample(1:nrow(xall))
    # folds <- cut(1:length(indices),breaks=10,labels=FALSE)
    
    # However, in order to add stratification on the target classes I made some modifications:
    set.seed(seed)
    indices <- data.frame(Index=1:nrow(xall),Class=yint)
    indices <- do.call(rbind,lapply(split(indices,indices$Class),function(x){
      x <- x[sample(1:nrow(x)),]
      x$Fold <- cut(1:nrow(x),breaks=folds,labels=FALSE)
      return(x)
    }))

    for(fold in 1:folds){
        
      # Get validation indices:
      validID <- indices[indices$Fold==fold,'Index']

      # Get feature matrices:
      xtrain <- xall[-validID,]
      xvalid <- xall[validID,]
      xtest <- as.matrix(test)

      # Training:
      model <- get_model(incols=ncol(xtrain), outcols=ncol(sample_submission))
      model %>% fit(
        xtrain, yohe[-validID,], validation_data=list(xvalid, yohe[validID,]),
        epochs=100, batch_size=256,
        callbacks=list(
          callback_reduce_lr_on_plateau(patience=2, factor=0.7),
          callback_early_stopping(patience=8, min_delta=1e-05),
          callback_model_checkpoint(paste0('mlp_',seed,'_',fold,'.h5'), 
                                    save_best_only=TRUE, save_weights_only=TRUE)
      ))
      load_model_weights_hdf5(model, paste0('mlp_',seed,'_',fold,'.h5'))
        
      # Inference:
      yoof[validID,] <- model %>% predict(xvalid)
      ytest <- ytest + model %>% predict(xtest) / folds / seeds
      print(paste('Seed =',seed,'; Fold =', fold,'; OOF log-loss =',as.numeric(altloss(yohe[validID,],yoof[validID,]))))
    }
    
    # Measure out-of-fold performance:
    print(paste('Full training set; OOF log-loss =',as.numeric(altloss(yohe,yoof))))
}

# Submit result:
submission <- data.frame("ID"=rownames(sample_submission),ytest)
write.csv(submission,file='submission.csv',row.names=FALSE,quote=FALSE)

[1] "Seed = 1 ; Fold = 1 ; OOF log-loss = 1.7471413138545"
[1] "Seed = 1 ; Fold = 2 ; OOF log-loss = 1.74715084023004"
[1] "Seed = 1 ; Fold = 3 ; OOF log-loss = 1.74113025788679"
[1] "Seed = 1 ; Fold = 4 ; OOF log-loss = 1.73564082499736"
[1] "Seed = 1 ; Fold = 5 ; OOF log-loss = 1.7379857807757"
[1] "Seed = 1 ; Fold = 6 ; OOF log-loss = 1.74579396582287"
[1] "Seed = 1 ; Fold = 7 ; OOF log-loss = 1.74211644906561"
[1] "Seed = 1 ; Fold = 8 ; OOF log-loss = 1.7306896518844"
[1] "Seed = 1 ; Fold = 9 ; OOF log-loss = 1.74928774768949"
[1] "Seed = 1 ; Fold = 10 ; OOF log-loss = 1.73910967239203"
[1] "Full training set; OOF log-loss = 1.74160472761929"


Thank you for reading! Let me know if you have any questions or suggestions.