-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in creating Bayesian D-efficient designs when not all combinations of levels are allowed #10
Comments
Great catches. I'll have to patch these issues up. The support for Bayesian D-efficient designs is something that I only recently added last fall, and I hastily put it together. My goal was to basically just add a wrapper for the |
Before I dig in, could you provide a reproducible example where you define a set of profiles using |
This reproduces the error:
|
Thanks, this is helpful. Looks like I overlooked how |
Learning a bit more now. This is definitely the same issue at #9. The example you presented (and the example in #9) has restrictions on the set of possible profiles. This is the root of the issue. The reason is because when I call I'm now trying to figure out if |
@jhelvy, @ksvanhorn's issues speak for themselves, but FWIW I think very highly of him. |
Thanks @eleafeit! Really glad to see folks are stress testing this package. I've more or less gotten to the bottom of this one. Looks like my code is actually all fine, but only if you use a full factorial set of profiles to pull from when making a design. When using a restricted set of profiles |
Okay I believe I have a fix now. Could you try installing the version on this branch with this code: remotes::install_github("jhelvy/cbcTools@dupe-checks") Then you can run your test code again with: library(cbcTools)
A <- as.factor(c(1,1,2,3))
B <- as.factor(c(1,2,1,1))
testprofiles <- data.frame(profileID=seq_len(4L), A=A, B=B)
priors=list(A=c(0,0), B=0)
design <- cbcTools::cbc_design(testprofiles, n_resp=10, n_alts=2, n_q=5, priors=priors) This should now work, but only for the |
Will do. Come to think of it, it makes perfect sense that coordinate-exchange would require an unrestricted profile set, as it's essentially a hill-climbing algorithm, and hard constraints are a bear to deal with in that context. |
The small test case works, but a larger test case fails in the call of |
Of course, you don't want to create a matrix with that many columns anyway... |
Okay, I'll have to come back to this tomorrow, but looks like we're on the right track now. |
Thanks for the lightning fast response. |
Okay hopefully this is now fixed. I realized that this input check is for cases where you have a small number of profiles, so the majority of the time we don't need to compute that matrix of combinations. Instead, I now do an initial test to see if |
Okay actually now I realized how dumb my original code was...I could have just use the closed form expression to compute the number of combinations instead of using |
Are you ready for me to try out the latest code? |
Yes, same install as before: remotes::install_github("jhelvy/cbcTools@dupe-checks") I made several other changes this morning as I discovered that the profileIDs were still not being appropriately merged onto the design. This should (hopefully) all be taken care of now. |
In
produces
|
OK, I now know why This is ultimately an issue with I might be able to work around this problem by including multiple copies of the profiles that have A1 != 3. |
Maybe I should put this check back then:
I previously had this in place because the ncomb check is actually only necessary if the number of profiles are small. If n = 1932, then the ncomb check really should never need to run. |
Okay I put that check back in (see this commit) That should hopefully take care of the situation you're running into when you have a large number of profiles. As for the other issue, I think you're correct that it fundamentally has to do with Of course, there may be other ways to deal with your specific problem, as you mentioned. At this point, I may push this branch to the main and mark this version 0.3.0. I believe it fixes most of the issues raised in this issue, though I will keep this issue open should we find a way to fix it in idefix. |
Why not use choose(n, k) instead of factorial(n) / (factorial(k) * factorial(n - k))? They compute the same quantity, except that the latter produces NaNs in cases that the former handles just fine. |
One last note. It may be worth mentioning in the documentation that the Modfed algorithm can be very slow, unless the set of profiles is small. I ran a test in which I thinned my set of 1932 profiles down to 1920 (only those for which A1 = 3) and then discarded A1, with the remaining parameters being m_resp = 1, n_alts = 4, n_q = 100, method = 'Modfed', priors = (value previously mentioned). After 5 hours running with 5 threads in parallel on an 8-core MacBook Pro, I gave up. |
That was my original motivation for making The idefix package was the best I've seen so far that generates designs for choice experiments, so I've adopted it, but I'm happy to try out a different solution if there's one that is more efficient and more robust. I haven't look too much into the inner workings of idefix, but I'm sure there are ways to improve compute efficiency. There's also AlgDesign, but I haven't figured out how to make it work for the context of choice experiments with priors. Again, I'm trying to avoid hand-coding these algorithms if there's already a working version out there. Why such a high |
Oh and I just changed it to |
How timely - there's a summary of related packages published 4/5 this year: https://cran.r-project.org/web/views/ExperimentalDesign.html Some possible alternatives to idefix I see after a quick look:
|
@ ksvanhorn I'm closing this issue out as I believe we have addressed the core issue now (things breaking when using a restricted set of profiles). I have now opened another issue (#11) to explore alternatives to idefix since |
Thanks, I agree on closing this out. With regards to your question about n_q, it's actually n_blocks * n_q that I need to be over 27 (the number of parameters). My understanding is that the information matrix computed is actually for the aggregate model (as the information matrix for a full hierarchical Bayesian model is much larger and more difficult to compute), so whether I choose n_blocks = 1 with n_q = 100 or n_blocks = 4 with n_q = 25 I should get the same set of tasks. A quick experiment appears to confirm this. |
I see - yes that should be the same either way then. Still, 25 choice questions is a lot to get through for a respondent. Hopefully one of these other packages might have a way to more efficiently find a design for such an experiment. |
FYI, these fixes are now on CRAN as v0.3.0. |
I ran the following code:
where n_resp = 735, n_alts = 4, n_q = 50, there are 12 discrete attributes with numbers of levels 12, 6, 3, ..., 3, 4, 4, and profiles only contains 1932 rows, which is far less than 1263*...34*4 = 7,558,272. I got the following error:
Investigating with the debugger, I found the following erroneous code in
cbcTools:::make_design_deff()
:The issue is that an earlier call to
idefix::CEA()
produced a designdes
that ignoredprofiles
, and this is patched up via the call tomerge()
; but the number of rows indes
gets drastically reduced at this point, from 200 to only 3, and the lineassumes that
des
still has the original number of rows.Even if the above assignment were fixed to account for the reduced number of rows, there are still problems -- every block of
n_alts = 4
rows in the originaldes
is a question in the design, and this structure is destroyed by the call tomerge()
, not to mention that there are no longer sufficient profiles indes
to even producen_q = 50
questions.The text was updated successfully, but these errors were encountered: