***
#### Just Run to set up libraries and verify Stan exists

In [28]:
library(cmdstanr)
library(posterior)
set_cmdstan_path('/opt/conda/bin/cmdstan')
cmdstan_version()
rpath <- "/home/sagemaker-user/SharedGit/RCode" # Location of R files
source(file.path(rpath,"EstimationEcosystems.R")) # Source helper functions

CmdStan path set to: /opt/conda/bin/cmdstan



***
####  1) Create subfolder in Projects directory and import rds data file there 
Then specify in first three lines:  
><code style="color:yellow">my_folder</code> is name of folder you created under Projects   
><code style="color:yellow">my_csv_file</code> has name of csv file (in my_folder)   
><code style="color:yellow">out_prefix</code> alphanumeric descriptive prefix for all output files (e.g. a project code).  No "/", special symbols   
Advanced users can igore my_folder and set directories manually

In [30]:
################    REQUIRED ENTRIES       #####################
my_folder <- "Training" # NAME OF your folder in Projects
my_csv_file <- "Data_Train.csv" 
out_prefix <- "SawComp" # Prefix for output files (descriptive string no /)

################    OPTIONAL       #####################
my_stan_code <- "BaseHB_wMNL.stan" # Optional: Specify other stan model here in subfolder StanCode
dir_work <- file.path("/home/sagemaker-user/Projects", my_folder) # path for output files
dir_data <- file.path("/home/sagemaker-user/Projects", my_folder) # path for data
dir_stan <- "/home/sagemaker-user/SharedGit/StanCode" # path for stan code

################ LEAVE BELOW ALONE #####################
source(file.path(rpath,"getdata_and_stan.R"))

Working directory for output set to: /home/sagemaker-user/Projects/Training

Read data into R file: data_in



'data.frame':	28800 obs. of  10 variables:
 $ CaseID                           : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Task                             : int  1 1 1 1 2 2 2 2 3 3 ...
 $ Concept                          : int  1 2 3 4 1 2 3 4 1 2 ...
 $ Att.1...Destination.             : int  2 3 4 1 6 3 6 5 2 3 ...
 $ Att.2...Cruise.Line.             : int  5 1 4 3 6 2 1 2 4 5 ...
 $ Att.3...Number.of.Days.          : int  2 1 3 5 3 2 1 4 5 3 ...
 $ Att.4...Stateroom.               : int  2 1 2 3 1 3 2 3 3 2 ...
 $ Att.5...Ship.Amenities.Age.      : int  1 1 2 2 1 2 2 1 1 2 ...
 $ Att.6...Price.per.Person.per.Day.: int  2 1 5 3 1 2 4 5 4 3 ...
 $ Response                         : int  0 1 0 0 1 0 0 0 1 0 ...


Compiling Stan code...

DONE



#### 2) Specify Coding 

In [29]:
idtaskdep <- data_in[,c(1,2,10)] # column numbers of id, task, dep

### SPECIFY CODING BELOW FOR EACH ATTRIBUTE ####
indcode_spec <- list()
indcode_spec[[1]] <- catcode(data_in, 4) 
indcode_spec[[2]] <- catcode(data_in, 5) 
indcode_spec[[3]] <- catcode(data_in, 6) 
indcode_spec[[4]] <- catcode(data_in, 7) 
indcode_spec[[5]] <- catcode(data_in, 8) 
indcode_spec[[6]] <- ordcode(data_in, 9, cut_pts = c(1,2,3,4,5), varout = "ppd") # Ordinal Code
#################################################

### Put coding together ###
message("Chosen id, task, dep:")
str(idtaskdep)
indcode_list <- make_codefiles(indcode_spec)
write.table(cbind(rownames(indcode_list$code_master), indcode_list$code_master), file = file.path(dir_work, paste0(out_prefix,"_code_master.csv")), sep = ",", na = ".", row.names = FALSE)


Chosen id, task, dep:



'data.frame':	28800 obs. of  3 variables:
 $ CaseID  : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Task    : int  1 1 1 1 2 2 2 2 3 3 ...
 $ Response: int  0 1 0 0 1 0 0 0 1 0 ...


Code master file has the following coded parameters:



 [1] "Att.1...Destination._1"        "Att.1...Destination._2"       
 [3] "Att.1...Destination._3"        "Att.1...Destination._4"       
 [5] "Att.1...Destination._5"        "Att.2...Cruise.Line._1"       
 [7] "Att.2...Cruise.Line._2"        "Att.2...Cruise.Line._3"       
 [9] "Att.2...Cruise.Line._4"        "Att.2...Cruise.Line._5"       
[11] "Att.3...Number.of.Days._1"     "Att.3...Number.of.Days._2"    
[13] "Att.3...Number.of.Days._3"     "Att.3...Number.of.Days._4"    
[15] "Att.4...Stateroom._1"          "Att.4...Stateroom._2"         
[17] "Att.5...Ship.Amenities.Age._1" "ppd_1_2"                      
[19] "ppd_2_3"                       "ppd_3_4"                      
[21] "ppd_4_5"                      


***
#### 3) Specify constraints

In [31]:
### Specify constraints (sign only) ###
con_sign <- rep(0,ncol(indcode_list$code_master))
con_sign[18:21] <- -1 # Negative utilities for price slopes

***  
#### 4) Specify threads and modeling specs

In [32]:
# Specify threads
threads <- list(tot_chains = 2, # Typically 2
                parallel_chains = 2, # Typically 2
                threads_per_chain = 1) # Depends on Cores, 8 to 16 is desirable

# Modeling parameters. Defaults are usually fine
data_model <- list(
  iter_warmup = 200, # warmup of 400 is plenty
  iter_sampling = 400, # sampling of 400 is plenty
  df = 2,
  prior_cov_scale = 1,
  con_sign = con_sign,
  # Stan specific below (no changes needed)
  refresh = 50,
  agg_model = NULL,
  tag = NULL
)

### Prepare data for Stan (Optional changes) ####
data_stan <- prep_file_stan(idtaskdep, indcode_list)
data_stan$ splitsize <- round(.5 + data_stan$T/(4 * threads[[3]])) # For multi-treading 1 is Stan automatic
message("Stan file prepared")
str(modifyList(data_stan, data_model))

Stan file prepared



List of 27
 $ N              : int 28800
 $ P              : int 21
 $ T              : int 7200
 $ I              : int 600
 $ dep            : Named num [1:28800] 0 1 0 0 1 0 0 0 1 0 ...
  ..- attr(*, "names")= chr [1:28800] "1" "1" "1" "1" ...
 $ ind            : num [1:28800, 1:21] 0 0 0 1 -1 0 -1 0 0 0 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:21] "Att.1...Destination._1" "Att.1...Destination._2" "Att.1...Destination._3" "Att.1...Destination._4" ...
 $ idtask         :'data.frame':	28800 obs. of  2 variables:
  ..$ CaseID: int [1:28800] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ Task  : int [1:28800] 1 1 1 1 2 2 2 2 3 3 ...
 $ idtask_r       : int [1:28800] 1 1 1 1 2 2 2 2 3 3 ...
 $ resp_id        : int [1:600] 1 3 5 6 7 9 10 12 14 16 ...
 $ match_id       : int [1:28800] 1 1 1 1 1 1 1 1 1 1 ...
 $ task_individual: int [1:7200] 1 1 1 1 1 1 1 1 1 1 ...
 $ start          : num [1:7200] 1 5 9 13 17 21 25 29 33 37 ...
 $ end            : int [1:7200] 4 8 12 16 20 2

***
#### Last cell below can just be run  
**-- Estimate Model, Save Results and Evaluate Convergence --**


In [None]:
source(file.path(rpath, "estimate_stan.R"))
source(file.path(rpath, "checkconverge_and_export.R"))
