## Objective

To analyze the second-level effect, on how the location of customer city affects the delivery time

Modeling: multi-level GLMM (with the use of some location features of customer cities as the second-level features).
This model is then compared with the base model (GLMM as proposed in analysis1).

In [1]:
library(R2WinBUGS)
library(readr)

#The directory of the WinBUGS
WinBUGS_path = "C:/Users/s1155063404/Downloads/WinBUGS14/"

#Working directory
working_dir = "C:/Users/s1155063404/Desktop/Projects/brazilian-ecommerce-dataset/StatisticalAnalysis"

setwd(working_dir)
dataset = read_csv("./dataset.csv")

#Standardize the observations for modeling
standardize <- function(x){
  mean_x = mean(x)
  sig_x = sqrt(var(x))
  if (sig_x > 0){
    return((x - mean_x) / sig_x)
  }
  else {
    return(x)
  }
}

dataset$size = standardize(dataset$size)
dataset$order_products_value = standardize(dataset$order_products_value)
dataset$order_freight_value = standardize(dataset$order_freight_value)
dataset$distance = standardize(dataset$distance)

Loading required package: coda
Loading required package: boot
Parsed with column specification:
cols(
  size = col_double(),
  order_products_value = col_double(),
  order_freight_value = col_double(),
  distance = col_double(),
  cluster = col_integer(),
  customer_state = col_character(),
  customer_city = col_character(),
  delivery_time = col_double()
)


As I perform Bayesian Inference for the multi-level model, to have a fair comparison with the base model, I first perform Bayesian Inference for the base model.

In [2]:
#Base model: Prepare the data, initial parameter for WinBUGS
K = nrow(dataset)
n_clust = length(unique(dataset$cluster))
Y = dataset$delivery_time
X = cbind(dataset$size, dataset$order_products_value, dataset$order_freight_value)
Z = dataset$distance
i = dataset$cluster

mu_b = rep(0, 3)
tau_b = diag(rep(0.0001, 3))

data = list(K=K, n_clust=n_clust, Y=Y, X=X, Z=Z, mu_b=mu_b, tau_b=tau_b, i=i)
bugs.data(data, dir="../WinBUGS_code/", data.file = "data_base.txt")

init = list(beta=rep(0,3), m1=0, m2=0, tau_a=10, tau_u=10, b=0.5)
para = c("m1", "beta", "m2", "sig_u", "sig_a", "b")

In [None]:
#Run WinBUGS for Bayesian Inference
sim = bugs(data="data_base.txt", inits=list(init),
           parameters.to.save=para,
           model.file="base.txt",
           n.chains=1, n.iter=1000,
           bugs.directory = WinBUGS_path,
           working.directory="../WinBUGS_code/")
save(sim, file="sim_base.RData")

In [None]:
sim

round(cbind(sim$mean, sim$sd), digits=3)