duplicated profiles within a question #27

shigono · 2023-08-09T09:32:19Z

Hello,
I think I found cbc_design() generates designs which contain errors. Please see the demo code below.

library(cbcTools)
profiles_3x3x2 <- cbc_profiles(
  A = c("A1", "A2", "A3"), 
  B = c("B1", "B2", "B3"), 
  C = c("C1", "C2")
)
set.seed(123)
demo_CEA <- cbc_design(
  profiles = profiles_3x3x2,  
  n_resp   = 10,      
  n_alts   = 3,       
  n_q      = 7,       
  no_choice = FALSE,  
  method   = 'CEA',   
  priors   = list(A = c(0,0), B = c(0, 0), C = 0) 
)
print(demo_CEA[demo_CEA$respID == 1 & demo_CEA$qID == 1,])

The output is:

  profileID respID qID altID obsID blockID  A  B  C
1        16      1   1     1     1       1 A1 B3 C2
2        14      1   1     2     1       1 A2 B2 C2
3        14      1   1     3     1       1 A2 B2 C2

profiles of altID = 2 and 3 are duplicated.

I guess this behavior comes from join_profiles() in design.R. At line 332 , it merges design with profiles. It doen't retain the row order of design.

just to be sure, my sessionInfo() is:

R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=Japanese_Japan.utf8  LC_CTYPE=Japanese_Japan.utf8    LC_MONETARY=Japanese_Japan.utf8
[4] LC_NUMERIC=C                    LC_TIME=Japanese_Japan.utf8    

time zone: Asia/Tokyo
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] cbcTools_0.5.0

loaded via a namespace (and not attached):
 [1] vctrs_0.6.3       cli_3.6.1         rlang_1.1.1       promises_1.2.0.1  generics_0.1.3   
 [6] shiny_1.7.4.1     xtable_1.8-4      glue_1.6.2        colorspace_2.1-0  htmltools_0.5.5  
[11] httpuv_1.6.11     scales_1.2.1      fansi_1.0.4       grid_4.3.1        munsell_0.5.0    
[16] tibble_3.2.1      ellipsis_0.3.2    MASS_7.3-60       fastmap_1.1.1     lifecycle_1.0.3  
[21] idefix_1.0.3      compiler_4.3.1    dplyr_1.1.2       Rcpp_1.0.11       pkgconfig_2.0.3  
[26] later_1.3.1       rstudioapi_0.15.0 digest_0.6.33     R6_2.5.1          tidyselect_1.2.0 
[31] utf8_1.2.3        Rdpack_2.4        pillar_1.9.0      rbibutils_2.2.13  magrittr_2.0.3   
[36] tools_4.3.1       gtable_0.3.3      mime_0.12         ggplot2_3.4.2

Thank you,
Shigeru ONO

jhelvy · 2023-08-09T13:10:03Z

Hi Shigeru, thanks for noticing this. I've had issues with this in the past and thought I had them all resolved, but apparently not. I'll look into it. The issue is indeed in the joining. It should be preserving the row order when joining.

jhelvy · 2023-08-09T14:06:17Z

Okay I've patched this. I actually previously had a more manual check to ensure the row ordering would be preserved during the join, but then removed it because I thought setting sort = FALSE in merge would prevent it from changing the row ordering. I guess I was wrong! Best to just keep the manual check in there.

Try installing this version:

remotes::install_github("jhelvy/cbcTools")

Does that fix the issue?

shigono · 2023-08-10T00:27:26Z

Thank you for your quick response. I found the issue is fixed.

Please let me say thank you for making a nice package. I wrote an introduction to the package (sorry it is in Japanese ...) https://elsur.jpn.org/202308ConjointDoE/doe_for_dcm.html

jhelvy · 2023-08-10T10:34:09Z

Wow what a great post! This package is still relatively new and hasn't had too many users to test it, so thanks for taking the time to work through the different design methods. Using Google translate I could mostly understand your post, and it looks like you found some other bugs too.

As you can see from the code, the cbc_design() function relies on many other packages for generating designs. My goal was not to re-create everyone else's work but instead make it easier to use them. The rest of the package provides a workflow to further evaluate designs by simulating choice data and examining statistical power to obtain an estimated sample size requirement, so the cbc_design() part is only one piece of the puzzle.

I'm not 100% sure if the translation to English was correct, but here are some things that I think you pointed out - please let me know if I have them correct:

For the full and orthogonal methods, cbc_design() returns missing experimental designs when the number of possible profiles is less than the number of trials. I agree with you - that should generate an error, and the user should be informed. Would you mind posting that specific issue as a separate issue? That will make fixing it easier to track.
I really like summary statistics you created for inspecting the designs. Those are every helpful and I may create some functions that automate them for users to more easily inspect their designs.
The suggestion at the end about randomizing the order of designs across users is also a good idea. For all the methods except random, I could very easily add a randomize = TRUE argument that simply shuffles the order of the questions across all respondents. You could always set it to FALSE, but it seems like a good idea to randomize the question order.
The overlap issue with the random method looks pretty bad...I think I could quite easily adjust the algorithm to swap out profiles to void full overlap of any one attribute within any one choice set. That's an easy fix and still maintains a randomized design. This could also be an argument to the function, e.g. avoid_full_overlap = TRUE.
The issue with the orthogonal method is important. I have had a difficult time with this method as it seems it's virtually impossible to obtain a truly "orthogonal" design. I don't know if anything else can be done to improve it though. In reality, no design will be truly orthogonal anyway as not every respondent is perfect. As soon as you have to drop poor respondents from your experiment, the resulting data will no longer be orthogonal.

shigono · 2023-08-11T06:10:49Z

I'm very honored that you read my poor writing. Taking advantage of your kindness, I'll post three new issues (NA in full/orthogonal design, randomizing, minimizing overlap).

Yes I believe one of great values of your package is that it helps our workflow from making a design to evaluating it. I'd like to add some demonstration using cbc_choice() and cbc_power() to my post, after studying your JSS article on logitr.

Thank you for your comment on orthogonal method, which helps me much. Some researhers recommend traditional DoE approach based on orthogonal arrays, but I think, as you said, it is not necessary useful for current practice of CBC.

jhelvy · 2023-08-11T17:33:04Z

Awesome. I'll close this issue out for now, but I've added the other 3 to my todo list.

jhelvy closed this as completed in 449ed0a Aug 9, 2023

jhelvy reopened this Aug 10, 2023

This was referenced Aug 11, 2023

cbc_design(method = full/orthogonal) returns a design with NA #28

Open

request: randomize question/alternatives par respondents #29

Open

request: minimize overlap or avoid full overlap in cbc_design() #30

Open

jhelvy closed this as completed Aug 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

duplicated profiles within a question #27

duplicated profiles within a question #27

shigono commented Aug 9, 2023

jhelvy commented Aug 9, 2023

jhelvy commented Aug 9, 2023

shigono commented Aug 10, 2023

jhelvy commented Aug 10, 2023

shigono commented Aug 11, 2023

jhelvy commented Aug 11, 2023

duplicated profiles within a question #27

duplicated profiles within a question #27

Comments

shigono commented Aug 9, 2023

jhelvy commented Aug 9, 2023

jhelvy commented Aug 9, 2023

shigono commented Aug 10, 2023

jhelvy commented Aug 10, 2023

shigono commented Aug 11, 2023

jhelvy commented Aug 11, 2023