## Objective

To analyze the second-level effect, on how the location of customer city affects the delivery time

Modeling: multi-level GLMM (with the use of some location features of customer cities as the second-level features).
This model is then compared with the base model (GLMM as proposed in analysis1).

In [None]:
library(R2WinBUGS)
library(dplyr)
library(readr)

#The directory of the WinBUGS
WinBUGS_path = "C:/Users/Ka Ho/Desktop/WinBUGS14/"

#Working directory
working_dir = "C:/Users/Ka Ho/Desktop/Projects/Analysis-of-Brazilian-Ecommerce-Dataset/StatisticalAnalysis"

setwd(working_dir)
dataset = read_csv("./dataset.csv")

#Standardize the observations for modeling
standardize <- function(x){
  mean_x = mean(x)
  sig_x = sqrt(var(x))
  if (sig_x > 0){
    return((x - mean_x) / sig_x)
  }
  else {
    return(x)
  }
}

dataset$size = standardize(dataset$size)
dataset$order_products_value = standardize(dataset$order_products_value)
dataset$order_freight_value = standardize(dataset$order_freight_value)
dataset$distance = standardize(dataset$distance)

As I perform Bayesian Inference for the multi-level model, to have a fair comparison with the base model, I first perform Bayesian Inference for the base model.

In [2]:
#Base model: Prepare the data, initial parameter for WinBUGS
K = nrow(dataset)
n_clust = length(unique(dataset$cluster))
Y = dataset$delivery_time
X = cbind(dataset$size, dataset$order_products_value, dataset$order_freight_value)
Z = dataset$distance
i = dataset$cluster

mu_b = rep(0, 3)
tau_b = diag(rep(0.0001, 3))

data = list(K=K, n_clust=n_clust, Y=Y, X=X, Z=Z, mu_b=mu_b, tau_b=tau_b, i=i)
bugs.data(data, dir="../WinBUGS_code/", data.file = "data_base.txt")

init = list(beta=rep(0,3), m1=0, m2=0, tau_a=10, tau_u=10, b=0.5)
para = c("m1", "beta", "m2", "sig_u", "sig_a", "b")

In [4]:
sim = bugs(data="data_base.txt", inits=list(init),
           parameters.to.save=para,
           model.file="base.txt",
           n.chains=1, n.iter=1000,
           bugs.directory = WinBUGS_path,
           working.directory="../WinBUGS_code/")

In [9]:
sim

cbind(sim$mean, sim$sd)

Inference for Bugs model at "base.txt", fit using WinBUGS,
 1 chains, each with 1000 iterations (first 500 discarded)
 n.sims = 500 iterations saved
             mean      sd     2.5%      25%      50%      75%    97.5%
m1            0.3     0.1      0.2      0.3      0.3      0.4      0.4
beta[1]       0.0     0.0      0.0      0.0      0.0      0.0      0.0
beta[2]       0.0     0.0      0.0      0.0      0.0      0.0      0.0
beta[3]       0.0     0.0      0.0      0.0      0.0      0.0      0.0
m2            0.1     0.0      0.0      0.0      0.1      0.1      0.1
sig_u         0.6     0.1      0.5      0.5      0.6      0.6      0.6
sig_a         0.4     0.0      0.4      0.4      0.4      0.5      0.5
b             0.1     0.0      0.1      0.1      0.1      0.2      0.2
deviance 743168.8 23975.9 711400.0 722000.0 739000.0 761850.0 790962.5

DIC info (using the rule, pD = Dbar-Dhat)
pD = 2972.4 and DIC = 746141.0
DIC is an estimate of expected predictive error (lower deviance is 

0,1,2
m1,0.3358378,0.05993527
beta,"0.03887712, -0.01695028, 0.03059388","0.002505872, 0.005012632, 0.003737089"
m2,0.06365648,0.0182579
sig_u,0.562332,0.05422203
sig_a,0.4428546,0.016783
b,0.147537,0.01080099
deviance,743168.8,23975.89
