Bug in creating Bayesian D-efficient designs when not all combinations of levels are allowed #10

ksvanhorn · 2023-05-04T17:06:28Z

I ran the following code:

cbcTools::cbc_design(profiles, n_resp, n_alts, n_q, priors = priors)

where n_resp = 735, n_alts = 4, n_q = 50, there are 12 discrete attributes with numbers of levels 12, 6, 3, ..., 3, 4, 4, and profiles only contains 1932 rows, which is far less than 1263*...34*4 = 7,558,272. I got the following error:

Error in $<-.data.frame(*tmp*, blockID, value = c(1L, 1L, 1L, 1L, :
replacement has 200 rows, data has 3

Investigating with the debugger, I found the following erroneous code in cbcTools:::make_design_deff():

...
else {
  ...
  des <- merge(des, profiles, by = varnames)
  ...
}
...
des$blockID <- rep(seq(n_blocks), each = n_alts * n_q)
...

The issue is that an earlier call to idefix::CEA() produced a design des that ignored profiles, and this is patched up via the call to merge(); but the number of rows in des gets drastically reduced at this point, from 200 to only 3, and the line

des$blockID <- rep(seq(n_blocks), each = n_alts * n_q)

assumes that des still has the original number of rows.

Even if the above assignment were fixed to account for the reduced number of rows, there are still problems -- every block of n_alts = 4 rows in the original des is a question in the design, and this structure is destroyed by the call to merge(), not to mention that there are no longer sufficient profiles in des to even produce n_q = 50 questions.

The text was updated successfully, but these errors were encountered:

jhelvy · 2023-05-04T18:07:30Z

Great catches. I'll have to patch these issues up. The support for Bayesian D-efficient designs is something that I only recently added last fall, and I hastily put it together. My goal was to basically just add a wrapper for the idefix package since it already does this (no need to re-invent the wheel), but I found the interface a little cumbersome, so I've tried to map things appropriately to the interface I've come up with for cbcTools. Quite a lot in here, so I'll start fixing the issues on this branch which I started to address issue #7. Of course if you find a solution that fixes these issues, feel free to open a PR.

jhelvy · 2023-05-04T18:18:09Z

Before I dig in, could you provide a reproducible example where you define a set of profiles using cbc_profiles(), and then a call to cbc_design() that generates the error?

ksvanhorn · 2023-05-04T18:29:58Z

This reproduces the error:

A <- as.factor(c(1,1,2,3))
B <- as.factor(c(1,2,1,1))
testprofiles <- data.frame(profileID=seq_len(4L), A=A, B=B)
priors=list(A=c(0,0), B=0)
cbcTools::cbc_design(testprofiles, n_resp=10, n_alts=2, n_q=5, priors=priors)

jhelvy · 2023-05-04T18:43:59Z

Thanks, this is helpful. Looks like I overlooked how merge() is performing. What I wanted was a left join, but my code is performing a full join, introducing a lot of issues. Hopefully I can fix this relatively quickly.

jhelvy · 2023-05-04T19:08:00Z

Learning a bit more now. This is definitely the same issue at #9.

The example you presented (and the example in #9) has restrictions on the set of possible profiles. This is the root of the issue. The reason is because when I call idefix::CEA() to make a design, it ignores those restrictions when coming up with a D-efficient design. So later when I'm merging back on the profileIDs, there are missing matches because profiles contains restrictions.

I'm now trying to figure out if idefix::CEA() can even do this. If it's not possible to obtain a D-efficient design with restricted profiles, then all I'll be able to do for now is add a stop() to require an unrestricted set of profiles when using priors. Hopefully there's a way.

eleafeit · 2023-05-04T19:09:19Z

@jhelvy, @ksvanhorn's issues speak for themselves, but FWIW I think very highly of him.

jhelvy · 2023-05-04T19:13:49Z

Thanks @eleafeit! Really glad to see folks are stress testing this package.

I've more or less gotten to the bottom of this one. Looks like my code is actually all fine, but only if you use a full factorial set of profiles to pull from when making a design. When using a restricted set of profiles idefix ignores those restrictions, at least the way I'm using it now. But I think there's a way to incorporate them.

jhelvy · 2023-05-04T20:42:03Z

Okay I believe I have a fix now. Could you try installing the version on this branch with this code:

remotes::install_github("jhelvy/cbcTools@dupe-checks")

Then you can run your test code again with:

library(cbcTools)

A <- as.factor(c(1,1,2,3))
B <- as.factor(c(1,2,1,1))
testprofiles <- data.frame(profileID=seq_len(4L), A=A, B=B)
priors=list(A=c(0,0), B=0)
design <- cbcTools::cbc_design(testprofiles, n_resp=10, n_alts=2, n_q=5, priors=priors)

This should now work, but only for the "Modfed" algorithm. Based on what I read in the JSS article on the idefix package, it appears the "CEA" algorithm requires an unrestricted profile set (pg. 17 in the paper). For this reason, I also set the default method to "Modfed".

ksvanhorn · 2023-05-04T20:57:11Z

Will do. Come to think of it, it makes perfect sense that coordinate-exchange would require an unrestricted profile set, as it's essentially a hill-climbing algorithm, and hard constraints are a bear to deal with in that context.

ksvanhorn · 2023-05-04T21:48:50Z

The small test case works, but a larger test case fails in the call of utils::combn(profiles$profileID, n_alts). For the case I tried, length(profile$profileID) = 1932 and n_alts = 4. The problem is that, in the body of utils::combn when calculating count <- as.integer(round(choose(n,m))), the subexpression choose(n,m) requires 40 bits to represent. Although this integer value is exactly representable as a double, converting it to an integer yields NA because R integers have a maximum value of 2^31 - 1.

ksvanhorn · 2023-05-04T21:50:32Z

Of course, you don't want to create a matrix with that many columns anyway...

jhelvy · 2023-05-04T21:53:50Z

Okay, I'll have to come back to this tomorrow, but looks like we're on the right track now.

ksvanhorn · 2023-05-04T21:55:10Z

Thanks for the lightning fast response.

#10 (comment)

jhelvy · 2023-05-05T10:42:47Z

Okay hopefully this is now fixed. I realized that this input check is for cases where you have a small number of profiles, so the majority of the time we don't need to compute that matrix of combinations. Instead, I now do an initial test to see if (n_q > floor(nrow(profiles) / n_alts)), which most of the time will be FALSE. Only when this is TRUE will the combinations then be computed, which should be cases when we don't run into this chose(n,m) issue.

jhelvy · 2023-05-05T11:35:09Z

Okay actually now I realized how dumb my original code was...I could have just use the closed form expression to compute the number of combinations instead of using combn since I only need the number and not the actual combinations (duh!). Now I'm doing that.

ksvanhorn · 2023-05-05T15:25:27Z

Are you ready for me to try out the latest code?

jhelvy · 2023-05-05T15:35:44Z

Yes, same install as before:

remotes::install_github("jhelvy/cbcTools@dupe-checks")

I made several other changes this morning as I discovered that the profileIDs were still not being appropriately merged onto the design. This should (hopefully) all be taken care of now.

ksvanhorn · 2023-05-08T16:49:18Z

In check_inputs_design(), the line

ncomb <- factorial(n)/(factorial(k) * (factorial(n - k)))

produces NaN with n=1932 and k=4, as you get Inf divided by Inf.

choose(n,k) is more robust. If I overwrite this assignment with ncomb <- choose(n,k) in the debugger, then everything continues on fine until the call the idefix::Modfed(). I'm still getting an error from that call, which on investigation seems to stem from one or more determinants evaluating to 0, so I suspect it has to do with the input I'm providing rather than any problem with code. I'm going to look into this some more, and I'll let you know what I find.

ksvanhorn · 2023-05-08T17:09:28Z

OK, I now know why idefix::Modfed() is having problems. I'm working with a study that has a lot of dependent attributes. Attribute A1 has 12 levels, and the remaining attributes are defined only if A1 = 3. Two of the attributes are only defined if A1 = 3 and A10 = 1. I'm handling "undefined" by requiring the dependent attributes to have level 1 as their value when they are not defined. Problem is, when idefix::Modfed() creates some initial designs from which to start optimizing, even with n_q = 100 some of the levels are A1 are missing from the design, because only 11 out of the 1932 profiles have A1 != 3.

This is ultimately an issue with idefix, I think. I don't know what you can do about it. I think idefix could be more robust to this kind of think if it added epsilon * diag(n) to the information matrix for some small epsilon > 0. That would at least allow things to get started, and perhaps reach an information matrix with nonzero determinant.

I might be able to work around this problem by including multiple copies of the profiles that have A1 != 3.

jhelvy · 2023-05-08T17:09:29Z

Maybe I should put this check back then:

if (n_q > floor(nrow(profiles) / n_alts))

I previously had this in place because the ncomb check is actually only necessary if the number of profiles are small. If n = 1932, then the ncomb check really should never need to run.

jhelvy · 2023-05-08T19:33:25Z

Okay I put that check back in (see this commit)

That should hopefully take care of the situation you're running into when you have a large number of profiles.

As for the other issue, I think you're correct that it fundamentally has to do with idefix::Modfed(). I'm trying to avoid re-writing these search algorithms though as that is a whole other bag of worms to get into, and it'd be better to bring the issue there to try to resolve it.

Of course, there may be other ways to deal with your specific problem, as you mentioned.

At this point, I may push this branch to the main and mark this version 0.3.0. I believe it fixes most of the issues raised in this issue, though I will keep this issue open should we find a way to fix it in idefix.

ksvanhorn · 2023-05-08T20:55:36Z

Why not use choose(n, k) instead of factorial(n) / (factorial(k) * factorial(n - k))? They compute the same quantity, except that the latter produces NaNs in cases that the former handles just fine.

ksvanhorn · 2023-05-08T23:04:28Z

One last note. It may be worth mentioning in the documentation that the Modfed algorithm can be very slow, unless the set of profiles is small. I ran a test in which I thinned my set of 1932 profiles down to 1920 (only those for which A1 = 3) and then discarded A1, with the remaining parameters being m_resp = 1, n_alts = 4, n_q = 100, method = 'Modfed', priors = (value previously mentioned). After 5 hours running with 5 threads in parallel on an 8-core MacBook Pro, I gave up.

jhelvy · 2023-05-09T00:35:37Z

That was my original motivation for making idefix::CEA() the default algorithm - it's a lot faster. But if you're using a set of restricted profiles, it won't work, so Modfed it is.

The idefix package was the best I've seen so far that generates designs for choice experiments, so I've adopted it, but I'm happy to try out a different solution if there's one that is more efficient and more robust. I haven't look too much into the inner workings of idefix, but I'm sure there are ways to improve compute efficiency. There's also AlgDesign, but I haven't figured out how to make it work for the context of choice experiments with priors. Again, I'm trying to avoid hand-coding these algorithms if there's already a working version out there.

Why such a high n_q though? Are you really going to ask each person 100 choice questions? I rarely ask more than 8-10 per respondent due to fatigue. I'm thinking that may be where it's slowing down. Having a large number of profiles to pull from doesn't seem like it would take much computing power, but a large n_q means a relatively big design matrix.

jhelvy · 2023-05-09T00:41:10Z

Oh and I just changed it to choose(n, k). I thought choose was also causing problems when computing a large number of combinations. But I've kept the outer if statement just to be safe. For the vast majority of cases, the choose(n, k) line will never run as the number of profiles should be plenty sufficiently large.

jhelvy · 2023-05-09T00:56:39Z

How timely - there's a summary of related packages published 4/5 this year: https://cran.r-project.org/web/views/ExperimentalDesign.html

Some possible alternatives to idefix I see after a quick look:

support.CEs: Looks pretty simple, but it doesn't look like it supports Bayesian designs with priors. The author has a book too with a lot of other examples.
skpr: Looks promising but isn't clear if it's made for choice experiments.
ExpertChoice: Clearly made for choice experiments, not clear if priors can be included.

jhelvy · 2023-05-09T12:11:25Z

@ ksvanhorn I'm closing this issue out as I believe we have addressed the core issue now (things breaking when using a restricted set of profiles). I have now opened another issue (#11) to explore alternatives to idefix since idefix::Modfed() is rather slow and somewhat limited.

ksvanhorn · 2023-05-09T17:48:55Z

Thanks, I agree on closing this out.

With regards to your question about n_q, it's actually n_blocks * n_q that I need to be over 27 (the number of parameters). My understanding is that the information matrix computed is actually for the aggregate model (as the information matrix for a full hierarchical Bayesian model is much larger and more difficult to compute), so whether I choose n_blocks = 1 with n_q = 100 or n_blocks = 4 with n_q = 25 I should get the same set of tasks. A quick experiment appears to confirm this.

jhelvy · 2023-05-09T18:03:40Z

I see - yes that should be the same either way then. Still, 25 choice questions is a lot to get through for a respondent. Hopefully one of these other packages might have a way to more efficiently find a design for such an experiment.

jhelvy · 2023-05-10T01:08:26Z

FYI, these fixes are now on CRAN as v0.3.0.

jhelvy mentioned this issue May 4, 2023

Undefined columns selected error #9

Closed

jhelvy mentioned this issue May 4, 2023

Restricting attribute levels with idefix::Modfed() and idefix::CEA() traets/idefix#12

Closed

jhelvy added a commit that referenced this issue May 5, 2023

addresses input checks with large number of profiles

1642590

#10 (comment)

jhelvy mentioned this issue May 9, 2023

Alternative design packages #11

Closed

jhelvy closed this as completed May 9, 2023

jhelvy mentioned this issue May 9, 2023

v0.3.0 - Dupe checks #12

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in creating Bayesian D-efficient designs when not all combinations of levels are allowed #10

Bug in creating Bayesian D-efficient designs when not all combinations of levels are allowed #10

ksvanhorn commented May 4, 2023

jhelvy commented May 4, 2023

jhelvy commented May 4, 2023

ksvanhorn commented May 4, 2023

jhelvy commented May 4, 2023

jhelvy commented May 4, 2023

eleafeit commented May 4, 2023

jhelvy commented May 4, 2023 •

edited

Loading

jhelvy commented May 4, 2023

ksvanhorn commented May 4, 2023

ksvanhorn commented May 4, 2023

ksvanhorn commented May 4, 2023

jhelvy commented May 4, 2023

ksvanhorn commented May 4, 2023

jhelvy commented May 5, 2023

jhelvy commented May 5, 2023

ksvanhorn commented May 5, 2023

jhelvy commented May 5, 2023

ksvanhorn commented May 8, 2023

ksvanhorn commented May 8, 2023

jhelvy commented May 8, 2023

jhelvy commented May 8, 2023

ksvanhorn commented May 8, 2023

ksvanhorn commented May 8, 2023

jhelvy commented May 9, 2023

jhelvy commented May 9, 2023

jhelvy commented May 9, 2023

jhelvy commented May 9, 2023

ksvanhorn commented May 9, 2023

jhelvy commented May 9, 2023

jhelvy commented May 10, 2023

Bug in creating Bayesian D-efficient designs when not all combinations of levels are allowed #10

Bug in creating Bayesian D-efficient designs when not all combinations of levels are allowed #10

Comments

ksvanhorn commented May 4, 2023

jhelvy commented May 4, 2023

jhelvy commented May 4, 2023

ksvanhorn commented May 4, 2023

jhelvy commented May 4, 2023

jhelvy commented May 4, 2023

eleafeit commented May 4, 2023

jhelvy commented May 4, 2023 • edited Loading

jhelvy commented May 4, 2023

ksvanhorn commented May 4, 2023

ksvanhorn commented May 4, 2023

ksvanhorn commented May 4, 2023

jhelvy commented May 4, 2023

ksvanhorn commented May 4, 2023

jhelvy commented May 5, 2023

jhelvy commented May 5, 2023

ksvanhorn commented May 5, 2023

jhelvy commented May 5, 2023

ksvanhorn commented May 8, 2023

ksvanhorn commented May 8, 2023

jhelvy commented May 8, 2023

jhelvy commented May 8, 2023

ksvanhorn commented May 8, 2023

ksvanhorn commented May 8, 2023

jhelvy commented May 9, 2023

jhelvy commented May 9, 2023

jhelvy commented May 9, 2023

jhelvy commented May 9, 2023

ksvanhorn commented May 9, 2023

jhelvy commented May 9, 2023

jhelvy commented May 10, 2023

jhelvy commented May 4, 2023 •

edited

Loading