Allow sampling of alternatives during prediction #142

jiffyclub · 2015-04-29T00:12:13Z

It's kind of limited sampling, not something you could use
during a location choice model. Alternatives may show up as
available for more than one chooser and alternatives may show up
more than once for a single chooser.

It's kind of limited sampling, not something you could use during a location choice model. Alternatives may show up as available for more than one chooser and alternatives may show up more than once for a single chooser.

jiffyclub · 2015-04-29T00:12:52Z

@fscottfoti I noticed that mnl_interaction_dataset doesn't have replace=False when drawing samples so the same alternative can show up multiple times for the same chooser.

fscottfoti · 2015-04-29T00:56:21Z

That's definitely a bug. There shouldn't be any reason to sample the sample alternative twice.

jiffyclub · 2015-04-29T16:41:56Z

It's the same way in activitysim. I think we did it that way so that the same alternative can show up for different choosers, but it means the same alternative can also show up multiple times for the same chooser. To do this correctly I think we'd need to do a random draw with replace=False individually for each chooser.

fscottfoti · 2015-04-29T16:46:51Z

That makes sense. Unfortunately, I think we have to do a random draw with replace=False for each chooser. Is it a lot slower?

waddell · 2015-04-29T16:47:58Z

This needs to be an option. In some applications like shopping destination choice it is fine to re sample the same alternative destination. In others like residential location it may not be, depending on how the choice algorithm deals with competition, since a unit is generally constrained to only one household occupant.

Sent from my iPhone

On Apr 29, 2015, at 9:41 AM, Matt Davis notifications@github.com wrote:

It's the same way in activitysim. I think we did it that way so that the same alternative can show up for different choosers, but it means the same alternative can also show up multiple times for the same chooser. To do this correctly I think we'd need to do a random draw with replace=False individually for each chooser.

—
Reply to this email directly or view it on GitHub.

jiffyclub · 2015-04-29T16:53:13Z

@fscottfoti and I talked about this and we're explicitly not adding support for a situation like household location because it'd be a lot of work to have sampling while respecting supply constraints. So this sampling would only be used in situations where the alternatives are not supply constrained.

fscottfoti · 2015-04-29T17:01:15Z

@waddell I don't think there's any case where you sample the same alternative twice for a single chooser?

waddell · 2015-04-29T17:22:33Z

Not unless you have an aggregated alternative (residential units of a specific building type in a submarket) and are not accounting for the varying size of the alternatives in the specification of the model.

So I’d agree that sampling without replacement is the right general pattern, and that the other cases should be handled either in the utility function specification or on the choice algorithm.

Paul

On Apr 29, 2015, at 10:01 AM, Fletcher Foti notifications@github.com wrote:

@waddell https://github.com/waddell I don't think there's any case where you sample the same alternative twice for a single chooser?

—
Reply to this email directly or view it on GitHub #142 (comment).

fscottfoti · 2015-04-29T17:27:11Z

Makes sense.

bridwell · 2015-04-29T17:47:00Z

So does this mean that for situations like household location choice, interaction variables will not be supported for choices in the simulation? If this is the case, a boolean argument called 'is_constrained' or something would be helpful in the model constructor: if provided, this would automatically set the choice_mode to aggregate and the probability_mode to 'full_product', and would disable any interaction dataset functionality. This way we don't specify and estimate models in a manner that won't be supported by the simulation.

This compensates for values removed from the alternatives by post-interaction filters, but this does it make it so that post-interaction filters cannot be used in full_product mode.

At the moment certain probability and choice modes are incompatible so this checks for those pairings and raises exceptions. In particular the 'single_chooser' probability mode must be used with the 'aggregate' choice mode and the 'full_product' probability mode must be used with the 'individual' choice mode. This also disallows using post-interaction filters in 'full_product' mode because the filters can cause there to be different numbers of alternatives for each chooser, which breaks the MNL machinery.

When building an interaction dataset with sampling we need to pick alternatives individually for each chooser so that we don't draw the same alternative multiple times for the same chooser.

Things changed thanks to doing draws with replacement when sampling to make interaction datasets.

jiffyclub · 2015-04-29T22:49:12Z

@bridwell Right now probability_mode='single_chooser' must be used with choice_mode='aggregate'. This is checked by the system (as of 5087d76 in this PR).

Interaction variables are allowed in any mode because in either case the choosers and alternatives are being merged. However, it is not possible to use interaction_predict_filters if probability_mode='full_product' because UrbanSim can't handle the possibility of different choosers having different numbers of available alternatives.

coveralls · 2015-04-29T23:06:31Z

Changes Unknown when pulling 9c49885 on dcm-pred-sample into * on master*.

jiffyclub · 2015-04-29T23:06:42Z

Think this is ready to go.

fscottfoti · 2015-04-29T23:43:21Z

To be clear, UrbanSim can handle the possibility of different individuals having different numbers of alternatives. It can't handle interaction filters well because the filters are applied after sampling, and so might result in a poor sample. If that's OK with the modeler, it's something we could support pretty easily.

I actually don't know how the old UrbanSim did this. I suspect it would continue getting new samples until it had enough alternatives that passed the filters, which can be an expensive operation if there are few alternatives that do pass the filter.

Otherwise you have to merge every chooser with every alternative and then filter, and then sample, which gives you the right answer but uses too much memory during the merge.

jiffyclub · 2015-04-30T00:10:48Z

What I was seeing today was that if there was a mismatch between the number of alts passed to mnl_simulate and the number available to one of the choosers UrbanSim would throw an exception because it couldn't reshape the data into a rectangular array.

fscottfoti · 2015-04-30T00:52:40Z

That makes sense - in that case we should be able to fill in dummy alternatives though. In the other case it's not clear how to solve it.

jiffyclub · 2015-04-30T15:25:42Z

Do you want to do that now, or merge this and carry on with the release?

fscottfoti · 2015-04-30T15:50:25Z

Merge this and release. We can address this when we get the chance.

Allow sampling of alternatives during prediction

Allow sampling of alternatives during prediction

53b5840

It's kind of limited sampling, not something you could use during a location choice model. Alternatives may show up as available for more than one chooser and alternatives may show up more than once for a single chooser.

jiffyclub added 4 commits April 29, 2015 13:46

Take numalts from data size in single_chooser mode

0433ab5

This compensates for values removed from the alternatives by post-interaction filters, but this does it make it so that post-interaction filters cannot be used in full_product mode.

Draw samples individually for choosers

d17a5ce

When building an interaction dataset with sampling we need to pick alternatives individually for each chooser so that we don't draw the same alternative multiple times for the same chooser.

Update MNL tests with new random answers

9c49885

Things changed thanks to doing draws with replacement when sampling to make interaction datasets.

jiffyclub mentioned this pull request Apr 29, 2015

add prediction_sample_size flags for lcms UDST/sanfran_urbansim#15

Merged

jiffyclub added a commit that referenced this pull request Apr 30, 2015

Merge pull request #142 from synthicity/dcm-pred-sample

15e90f9

Allow sampling of alternatives during prediction

jiffyclub merged commit 15e90f9 into master Apr 30, 2015

jiffyclub deleted the dcm-pred-sample branch April 30, 2015 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow sampling of alternatives during prediction #142

Allow sampling of alternatives during prediction #142

jiffyclub commented Apr 29, 2015

jiffyclub commented Apr 29, 2015

fscottfoti commented Apr 29, 2015

jiffyclub commented Apr 29, 2015

fscottfoti commented Apr 29, 2015

waddell commented Apr 29, 2015

jiffyclub commented Apr 29, 2015

fscottfoti commented Apr 29, 2015

waddell commented Apr 29, 2015

fscottfoti commented Apr 29, 2015

bridwell commented Apr 29, 2015

jiffyclub commented Apr 29, 2015

coveralls commented Apr 29, 2015

jiffyclub commented Apr 29, 2015

fscottfoti commented Apr 29, 2015

jiffyclub commented Apr 30, 2015

fscottfoti commented Apr 30, 2015

jiffyclub commented Apr 30, 2015

fscottfoti commented Apr 30, 2015

Allow sampling of alternatives during prediction #142

Allow sampling of alternatives during prediction #142

Conversation

jiffyclub commented Apr 29, 2015

jiffyclub commented Apr 29, 2015

fscottfoti commented Apr 29, 2015

jiffyclub commented Apr 29, 2015

fscottfoti commented Apr 29, 2015

waddell commented Apr 29, 2015

jiffyclub commented Apr 29, 2015

fscottfoti commented Apr 29, 2015

waddell commented Apr 29, 2015

fscottfoti commented Apr 29, 2015

bridwell commented Apr 29, 2015

jiffyclub commented Apr 29, 2015

coveralls commented Apr 29, 2015

jiffyclub commented Apr 29, 2015

fscottfoti commented Apr 29, 2015

jiffyclub commented Apr 30, 2015

fscottfoti commented Apr 30, 2015

jiffyclub commented Apr 30, 2015

fscottfoti commented Apr 30, 2015