-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow sampling of alternatives during prediction #142
Conversation
It's kind of limited sampling, not something you could use during a location choice model. Alternatives may show up as available for more than one chooser and alternatives may show up more than once for a single chooser.
@fscottfoti I noticed that |
That's definitely a bug. There shouldn't be any reason to sample the sample alternative twice. |
It's the same way in activitysim. I think we did it that way so that the same alternative can show up for different choosers, but it means the same alternative can also show up multiple times for the same chooser. To do this correctly I think we'd need to do a random draw with |
That makes sense. Unfortunately, I think we have to do a random draw with replace=False for each chooser. Is it a lot slower? |
This needs to be an option. In some applications like shopping destination choice it is fine to re sample the same alternative destination. In others like residential location it may not be, depending on how the choice algorithm deals with competition, since a unit is generally constrained to only one household occupant. Sent from my iPhone
|
@fscottfoti and I talked about this and we're explicitly not adding support for a situation like household location because it'd be a lot of work to have sampling while respecting supply constraints. So this sampling would only be used in situations where the alternatives are not supply constrained. |
@waddell I don't think there's any case where you sample the same alternative twice for a single chooser? |
Not unless you have an aggregated alternative (residential units of a specific building type in a submarket) and are not accounting for the varying size of the alternatives in the specification of the model. So I’d agree that sampling without replacement is the right general pattern, and that the other cases should be handled either in the utility function specification or on the choice algorithm. Paul
|
Makes sense. |
So does this mean that for situations like household location choice, interaction variables will not be supported for choices in the simulation? If this is the case, a boolean argument called 'is_constrained' or something would be helpful in the model constructor: if provided, this would automatically set the choice_mode to aggregate and the probability_mode to 'full_product', and would disable any interaction dataset functionality. This way we don't specify and estimate models in a manner that won't be supported by the simulation. |
This compensates for values removed from the alternatives by post-interaction filters, but this does it make it so that post-interaction filters cannot be used in full_product mode.
At the moment certain probability and choice modes are incompatible so this checks for those pairings and raises exceptions. In particular the 'single_chooser' probability mode must be used with the 'aggregate' choice mode and the 'full_product' probability mode must be used with the 'individual' choice mode. This also disallows using post-interaction filters in 'full_product' mode because the filters can cause there to be different numbers of alternatives for each chooser, which breaks the MNL machinery.
When building an interaction dataset with sampling we need to pick alternatives individually for each chooser so that we don't draw the same alternative multiple times for the same chooser.
Things changed thanks to doing draws with replacement when sampling to make interaction datasets.
@bridwell Right now Interaction variables are allowed in any mode because in either case the choosers and alternatives are being merged. However, it is not possible to use |
Changes Unknown when pulling 9c49885 on dcm-pred-sample into * on master*. |
Think this is ready to go. |
To be clear, UrbanSim can handle the possibility of different individuals having different numbers of alternatives. It can't handle interaction filters well because the filters are applied after sampling, and so might result in a poor sample. If that's OK with the modeler, it's something we could support pretty easily. I actually don't know how the old UrbanSim did this. I suspect it would continue getting new samples until it had enough alternatives that passed the filters, which can be an expensive operation if there are few alternatives that do pass the filter. Otherwise you have to merge every chooser with every alternative and then filter, and then sample, which gives you the right answer but uses too much memory during the merge. |
What I was seeing today was that if there was a mismatch between the number of alts passed to |
That makes sense - in that case we should be able to fill in dummy alternatives though. In the other case it's not clear how to solve it. |
Do you want to do that now, or merge this and carry on with the release? |
Merge this and release. We can address this when we get the chance. |
Allow sampling of alternatives during prediction
It's kind of limited sampling, not something you could use
during a location choice model. Alternatives may show up as
available for more than one chooser and alternatives may show up
more than once for a single chooser.