-
-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Conditional logistic regression models #560
Comments
I realized that it is already possible to fit conditional logistic models in brms, although the syntax is a little bit verbose. Here is an example: library(brms)
# choice data in **wide** format
data("Fishing", package = "mlogit")
bform <- bf(
mode ~ 1,
nlf(mubeach ~ bprice * price.beach + bcatch * catch.beach),
nlf(mupier ~ bpier + bprice * price.pier + bcatch * catch.pier),
nlf(muboat ~ bboat + bprice * price.boat + bcatch * catch.boat),
nlf(mucharter ~ bcharter + bprice * price.charter + bcatch * catch.charter),
bpier + bboat + bcharter + bprice + bcatch ~ 1,
family = categorical(refcat = NA)
)
nlpars <- c("bpier", "bboat", "bcharter", "bprice", "bcatch")
bprior <- set_prior("normal(0, 5)", nlpar = nlpars)
fit <- brm(formula = bform, data = Fishing,
prior = bprior, chains = 2, cores = 2)
summary(fit) The above model will only work in brms 2.7.2 or higher. |
I updated my model in the above post, which now produces the expected results. Based on this syntax, it should be possible to specify all kinds of conditional logistic models. @jon-mellon I would love to hear your opinion on this. |
Thanks for doing this. I'll take a closer look soon, but my first question is: does this approach allow the choice sets to vary across cases. e.g. in one case someone is choosing between charter, pier and boat and in another case they are choosing between charter and beach. Or would this require that the respondent is choosing between all options in every case? The use case I'm thinking of using this for is voting for political parties across many countries and years. So the actual parties change between countries and over time, but facts about the parties (their party family, their economic positions, their previous vote share etc) can be measured comparably across all settings. In this case, it's important that I don't have voters in Sweden choosing between Republicans and Democrats and vice-versa. |
The approach does not canonically support varying response categories, but one may try to trick it into supporting them by added highly negative offsets to impossible categories. Suppose some persons did not see the "boat" option, then we could "deactivate" it as follows (this requires the latest version of brms as of January 31th 2019): library(brms)
# choice data in **wide** format
data("Fishing", package = "mlogit")
Fishing$noboat <- 0
# the first 50 persons did not have the "boat" option
Fishing$noboat[1:50] <- -100
bform2 <- bf(
mode ~ 1,
nlf(mubeach ~ bprice * price.beach + bcatch * catch.beach),
nlf(mupier ~ bpier + bprice * price.pier + bcatch * catch.pier),
nlf(muboat ~ bboat + bprice * price.boat + bcatch * catch.boat + noboat),
nlf(mucharter ~ bcharter + bprice * price.charter + bcatch * catch.charter),
bpier + bboat + bcharter + bprice + bcatch ~ 1,
family = categorical(refcat = NA)
)
nlpars <- c("bpier", "bboat", "bcharter", "bprice", "bcatch")
bprior2 <- set_prior("normal(0, 5)", nlpar = nlpars)
fit2 <- brm(formula = bform2, data = Fishing,
prior = bprior2, chains = 2, cores = 2)
summary(fit2)
# p(boat) is 0 for the first 50 observations
round(fitted(fit2, newdata = Fishing[1:100, ])[, 1, ], 5) |
Although the syntax shown above allows fitting conditional logit models, I am not sure if I would recommend it as an easy way to actually specify those models. I will have to think more a less verbose solution (related to your initial proposal) that works well within the brms framework. Accordingly, I will leave this issue open for now. |
Thanks I think that makes sense. I will also try to see how closely the current version can replicate conditional logit results. The main thing I'm not sure about is whether variation on the x variables for deactivated options could still provide information to the fitting process in a way that might skew the results. |
Do you use brms 2.9.0 from CRAN?
jon-mellon <notifications@github.com> schrieb am Fr., 28. Juni 2019, 21:47:
… Just coming back to this again and was trying to work with the code above.
I updated to the latest version of BRMS on CRAN
I first get the error:
Error in categorical(refcat = NA) : unused argument (refcat = NA)
then when I delete refcat = NA.
I get the following error when I try to run the brm( line
Error: The parameter 'mubeach' is not a valid distributional or non-linear
parameter. Did you forget to set 'nl = TRUE'?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#560?email_source=notifications&email_token=ADCW2AGGVCQXKWGA52J7VITP4ZTFXA5CNFSM4GG76Z42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY3AFAA#issuecomment-506856064>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADCW2AGP2GZYTJEBS7NKBGLP4ZTFXANCNFSM4GG76Z4Q>
.
|
Just realized that I was still using 2.9.0. I've tried 2.9.2 from github and it seems to be working now. |
I've built a simulation of a multinomial choice process as a benchmark (so I can be sure what the correct answer is) and the mlogit package seems to be able to recover something close to the true result (note that in reality the size of the choice set would vary across instances). Ideally I would want to be able to have the intercepts drawn from a distribution and partially pooled rather than being fixed effects and have higher level random intercepts as well.
One bit of the code I'm currently not clear on for running the BRMS version of the model is this part:
Do I need to put all fixed parts of the model in there? |
I haven't looked at your above code in detail, but yes, your fixed effects are indeed specified as non-linear parameters on which there is a lot of documentation available. |
I tried turning the above model into Stan code and it seems that the priors don't get included for some reason. Is this a bug? Or is it something weird about categorical nonlinear parameters? Here is the code I ran:
And here is the resulting Stan code:
|
Thanks! This was indeed a bug which should now be fixed in the github version of brms. |
On this particular issue -- is it currently possible for the brms models (fit per the verbose syntax examples above) to predict responses when new options are included, assuming we have values for the new option's characteristics? Similar to the end of this vignette predicting market shares with a new technology in the heating dataset; in the fishing context it'd be a new mode like "helicopter", with a certain price and catch ratio. |
No chance because we have no idea how the new options relates to the
existing ones.
franzsf <notifications@github.com> schrieb am Mi., 21. Aug. 2019, 20:56:
… On this particular issue -- is it currently possible for the brms models
(fit per the verbose syntax examples above) to predict responses when
*new* options are included, assuming we have values for the new option's
characteristics?
Similar to the end of this vignette
<https://cran.r-project.org/web/packages/mlogit/vignettes/e1mlogit.html>
predicting market shares with a new technology in the heating dataset; in
the fishing context it'd be a new mode like "helicopter", with a certain
price and catch ratio.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#560?email_source=notifications&email_token=ADCW2AH37WXHKC3SU3LFH53QFWMVLA5CNFSM4GG76Z42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4254KY#issuecomment-523623979>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADCW2AAYRSUFWHZV2UQE2LTQFWMVLANCNFSM4GG76Z4Q>
.
|
In the case of the heating vignette, the new option has a known installation cost (ic) and operating (oc) cost; the coefficients on those predictors have been estimated through the model, so we do know something about the new option, right? That allows predictions of how market shares might change if the the new option were added to the mix. |
Yeah I get it now if we have the coefficients estimated in some way. But it
is currently out of scope of what brms has to offer in that regard.
franzsf <notifications@github.com> schrieb am Mi., 21. Aug. 2019, 22:27:
… In the case of the heating vignette, the new option has a known
installation cost (ic) and operating (oc) cost; the coefficients on those
predictors have been estimated through the model, so we do know something
about the new option, right? That allows predictions of how market shares
might change if the the new option were added to the mix.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#560?email_source=notifications&email_token=ADCW2AA6523IZXDD4RQZVYLQFWXLTA5CNFSM4GG76Z42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD43FYAY#issuecomment-523656195>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADCW2ACSDNAORM3OE6NOJY3QFWXLTANCNFSM4GG76Z4Q>
.
|
I'd like to kindly ask you for your help. Following on the example from Jon Mellon, I have a similar one, but with panel data. This means each consumer could visit store several times over time, thus she/he faced {1,..,n} choice occasions. On each store visit = choice occasion, she/he chose one alternative and choice set didn't vary in my case. Please find below example of a dataset:
For instance, consumer 1 went to the store twice and she/he chose kellogs on the first visit and walmart own brand on the second visit. Consumer 2 went to the store once and she/he chose general mills. Would it be possible to estimate random intercepts for brands, but also random intercepts for individual consumers in this case with brms please? Is there any example or description how to structure dataset that feeds into this model please? (For instance, with ChoiceModelR, it needs to be very specifically formatted - data needs to be in long format, alternatives need to be integers, choice needs to be stored on the first row for each choice occasion, etc. Unfortunately, I couldn't find any similar description/dataset example for brms.) Thank you very very much for any advice. I have read the vignettes and tried to search for this issue extensively and I'm really sorry if I'm overlooking something. |
Hi, please ask brms related questions on Stan discourse (https://discourse.mc-stan.org/) using the Interfaces - brms tag. |
Hi, |
I suggest you ask brms related questions on https://discourse.mc-stan.org where there is a greater audience to both read and answer your question. |
I will close this issue to reduce the load of the brms issue tracker, as I am unlikely to work on this myself. If somone wants to work on this feature, please write here and I am happy to reopen it. |
Following up on https://twitter.com/paulbuerkner/status/1067809413581414401
It would be great it BRMS could support conditional logit models. They are used in a lot of places, such as analyzing conjoint experiments, purchase decisions or voting behavior.
As an example let's suppose that consumers (i) enter a store and can choose one brand of cereal. Different stores have different inventory at different times so the choice set is not always the same. We'll say that consumers weigh up the nutrition, price and advertising that each brand has received when making their decision and we want to know what weights they put on each. The data might look something like:
They're definitely possible in Stan in some form:
https://www.rdocumentation.org/packages/rstanarm/versions/2.17.4/topics/stan_clogit
Here's the original McFadden paper on the models:
https://eml.berkeley.edu/reprints/mcfadden/zarembka.pdf
I think the key extra piece of information that needs to be given is an identifier that defines one choice situation.
brm(formula = choice|chid(consumer) ~ price + nutrition + advertising)
we might then imagine that older people are more susceptible to advertising so we would interact age and advertising
brm(formula = choice|chid(consumer) ~ price + nutrition + advertising:age + advertising)
or that the slope of advertising varies across states:
brm(formula = choice|chid(consumer) ~ price + nutrition + advertising + (1+advertising|state), data)
I wonder if we could also have brand (i.e. particular choice) specific random intercepts:
brm(formula = choice|chid(consumer) ~ price + nutrition + advertising + (1|brand), data)
The text was updated successfully, but these errors were encountered: