Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow sampling of alternatives during prediction #142

Merged
merged 5 commits into from
Apr 30, 2015
Merged

Conversation

jiffyclub
Copy link
Member

It's kind of limited sampling, not something you could use
during a location choice model. Alternatives may show up as
available for more than one chooser and alternatives may show up
more than once for a single chooser.

It's kind of limited sampling, not something you could use
during a location choice model. Alternatives may show up as
available for more than one chooser and alternatives may show up
more than once for a single chooser.
@jiffyclub
Copy link
Member Author

@fscottfoti I noticed that mnl_interaction_dataset doesn't have replace=False when drawing samples so the same alternative can show up multiple times for the same chooser.

@fscottfoti
Copy link
Contributor

That's definitely a bug. There shouldn't be any reason to sample the sample alternative twice.

@jiffyclub
Copy link
Member Author

It's the same way in activitysim. I think we did it that way so that the same alternative can show up for different choosers, but it means the same alternative can also show up multiple times for the same chooser. To do this correctly I think we'd need to do a random draw with replace=False individually for each chooser.

@fscottfoti
Copy link
Contributor

That makes sense. Unfortunately, I think we have to do a random draw with replace=False for each chooser. Is it a lot slower?

@waddell
Copy link
Member

waddell commented Apr 29, 2015

This needs to be an option. In some applications like shopping destination choice it is fine to re sample the same alternative destination. In others like residential location it may not be, depending on how the choice algorithm deals with competition, since a unit is generally constrained to only one household occupant.

Sent from my iPhone

On Apr 29, 2015, at 9:41 AM, Matt Davis notifications@github.com wrote:

It's the same way in activitysim. I think we did it that way so that the same alternative can show up for different choosers, but it means the same alternative can also show up multiple times for the same chooser. To do this correctly I think we'd need to do a random draw with replace=False individually for each chooser.


Reply to this email directly or view it on GitHub.

@jiffyclub
Copy link
Member Author

@fscottfoti and I talked about this and we're explicitly not adding support for a situation like household location because it'd be a lot of work to have sampling while respecting supply constraints. So this sampling would only be used in situations where the alternatives are not supply constrained.

@fscottfoti
Copy link
Contributor

@waddell I don't think there's any case where you sample the same alternative twice for a single chooser?

@waddell
Copy link
Member

waddell commented Apr 29, 2015

Not unless you have an aggregated alternative (residential units of a specific building type in a submarket) and are not accounting for the varying size of the alternatives in the specification of the model.

So I’d agree that sampling without replacement is the right general pattern, and that the other cases should be handled either in the utility function specification or on the choice algorithm.

Paul

On Apr 29, 2015, at 10:01 AM, Fletcher Foti notifications@github.com wrote:

@waddell https://github.com/waddell I don't think there's any case where you sample the same alternative twice for a single chooser?


Reply to this email directly or view it on GitHub #142 (comment).

@fscottfoti
Copy link
Contributor

Makes sense.

@bridwell
Copy link
Contributor

So does this mean that for situations like household location choice, interaction variables will not be supported for choices in the simulation? If this is the case, a boolean argument called 'is_constrained' or something would be helpful in the model constructor: if provided, this would automatically set the choice_mode to aggregate and the probability_mode to 'full_product', and would disable any interaction dataset functionality. This way we don't specify and estimate models in a manner that won't be supported by the simulation.

This compensates for values removed from the alternatives
by post-interaction filters, but this does it make it so
that post-interaction filters cannot be used in full_product
mode.
At the moment certain probability and choice modes are incompatible
so this checks for those pairings and raises exceptions.
In particular the 'single_chooser' probability mode must be used
with the 'aggregate' choice mode and the 'full_product' probability
mode must be used with the 'individual' choice mode.

This also disallows using post-interaction filters in 'full_product'
mode because the filters can cause there to be different numbers of
alternatives for each chooser, which breaks the MNL machinery.
When building an interaction dataset with sampling we
need to pick alternatives individually for each chooser
so that we don't draw the same alternative multiple times
for the same chooser.
Things changed thanks to doing draws with replacement when
sampling to make interaction datasets.
@jiffyclub
Copy link
Member Author

@bridwell Right now probability_mode='single_chooser' must be used with choice_mode='aggregate'. This is checked by the system (as of 5087d76 in this PR).

Interaction variables are allowed in any mode because in either case the choosers and alternatives are being merged. However, it is not possible to use interaction_predict_filters if probability_mode='full_product' because UrbanSim can't handle the possibility of different choosers having different numbers of available alternatives.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 9c49885 on dcm-pred-sample into * on master*.

@jiffyclub
Copy link
Member Author

Think this is ready to go.

@fscottfoti
Copy link
Contributor

To be clear, UrbanSim can handle the possibility of different individuals having different numbers of alternatives. It can't handle interaction filters well because the filters are applied after sampling, and so might result in a poor sample. If that's OK with the modeler, it's something we could support pretty easily.

I actually don't know how the old UrbanSim did this. I suspect it would continue getting new samples until it had enough alternatives that passed the filters, which can be an expensive operation if there are few alternatives that do pass the filter.

Otherwise you have to merge every chooser with every alternative and then filter, and then sample, which gives you the right answer but uses too much memory during the merge.

@jiffyclub
Copy link
Member Author

What I was seeing today was that if there was a mismatch between the number of alts passed to mnl_simulate and the number available to one of the choosers UrbanSim would throw an exception because it couldn't reshape the data into a rectangular array.

@fscottfoti
Copy link
Contributor

That makes sense - in that case we should be able to fill in dummy alternatives though. In the other case it's not clear how to solve it.

@jiffyclub
Copy link
Member Author

Do you want to do that now, or merge this and carry on with the release?

@fscottfoti
Copy link
Contributor

Merge this and release. We can address this when we get the chance.

jiffyclub added a commit that referenced this pull request Apr 30, 2015
Allow sampling of alternatives during prediction
@jiffyclub jiffyclub merged commit 15e90f9 into master Apr 30, 2015
@jiffyclub jiffyclub deleted the dcm-pred-sample branch April 30, 2015 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants