some info on destination choice

jpn-- · Mar 2, 2017 · c426375 · c426375
1 parent d5ae2cd
commit c426375
Show file tree

Hide file tree

Showing 2 changed files with 129 additions and 4 deletions.
diff --git a/doc/agg-choice-variance.png b/doc/agg-choice-variance.png
diff --git a/doc/math.rst b/doc/math.rst
@@ -3,12 +3,137 @@
 Mathematics of Logit Choice Modeling 
 ====================================
 
-This documentation will eventually provide some instruction on the underlying
-mathematics of logit models.  For example:
+This documentation will eventually provide instruction on some of the more interesting topics on the underlying
+mathematics of logit models.
+
+
+
+~~~~~~~~~~~~~~~~~~~~~~~
+Aggregate Choice Models
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Sometimes, a discrete choice is made from a very large pool of possible choices. In these
+circumstances, it may be useful to aggregate choices together, and represent a set of choices
+as a single meta-choice. This is particularly common in destination choice models, where the
+individual possible destinations are aggregated together as traffic analysis zones.
+
+The aggregate choice in many ways represents a nested logit model, with the aggregations corresponding to the nests.
+
+We can make some assumptions:
+
+	1. The individual elemental alternatives within each zone or aggregate are homogeneous.
+	   That is, each such alternative has the same systematic utility, :math:`V_{i} = \beta X_{i}`
+	2. The particular locations of the zonal or aggregation boundaries are arbitrary, and have
+	   no systematic meaning themselves.
+
+Using these assumptions, we can derive an aggregate/zonal choice model.
+
+The usual form of the nested logit model calculates the probability of an alternative as :math:`P_{nest}P_{alt|nest}`.
+In the case of aggregate choices, we do not observe the choice, but only the nest, so we only care about :math:`P_{nest}`.
+The nested formula for that term is
+
+.. math::
+
+	P_{nest}=\frac{\exp[V_{nest}]}{\sum_{j\in nests}\exp[V_{j}]}
+
+with
+
+.. math::
+
+	V_{nest}=\mu_{nest}\log\left[\sum_{i\in nest}\exp\left[V_{i}/\mu_{nest}\right]\right]
+
+Using assumption 2, we know that :math:`\mu_{nest}` must be 1, as we want the aggregation nesting structure to
+collapse to a multinomial logit model. Further, our first assumption is that all the :math:`V_{i}` are equal,
+so the terms inside the summation can collapse together, leaving
+
+.. math::
+
+	V_{nest}=\log\left[N_{nest}\exp\left[V_{i}\right]\right]=V_{i}+\log\left[N_{nest}\right]
+
+with :math:`N_{nest}` as the number of discrete elemental alternatives inside the nest. This can be estimated
+by creating a variable for each aggregate alternative that has a value of :math:`\log\left[N_{nest}\right]`,
+and including it in a MNL model, with a beta coefficient constrained to be equal to 1.
+
+One thing to be careful of in these models: the log likelihood at “zeros” model should include the parameter
+on :math:`\log\left[N_{nest}\right]` equal to 1, not 0. This is because this is not a parameter we are
+estimating in the model, it is a direct function of the structure of aggregation, which we have imposed externally.
+
+Relax Arbitrary Boundaries Assumption
+-------------------------------------
+
+Relaxing the assumption of arbitrary boundaries puts :math:`\mu_{nest}` back into the equation for :math:`V_{nest}`:
+
+.. math::
+
+	V_{nest}=\mu_{nest}\log\left[\sum_{i\in nest}\exp\left[V_{i}/\mu_{nest}\right]\right]=V_{i}+\mu_{nest}\log\left[N_{nest}\right]
+
+The logsum parameter thus appears as a coefficient on :math:`\log\left[N_{nest}\right]`. This may or may not be a good
+idea for transportation models. In an intra-urban model, if the boundaries of zones are at the TAZ level, which are
+small sectors drawn only for modelling purposes, relaxing this assumption probably doesn't make sense. If the boundaries
+are aligned with political boundaries (counties, towns) that have differing taxing, administration, or other policies,
+it might be OK to relax this assumption. In a log distance travel model, if the boundaries are aligned with metropolitan
+areas, then it is certainly reasonable to relax the arbitrary bounds assumption.
+
+
+Relaxing Homogeneity
+--------------------
+
+The other assumption we made was that the individual alternatives within a zone are homogeneous... but it is highly likely
+they are not. Variance in the systematic utilities, and in particular heteroskedastic variance, can change the calculations.
+Consider the one dimensional destination choice depicted here:
+
+.. image:: agg-choice-variance.png
+
+The choice has been subdivided into three aggregation zones. The average utility of Zone A is lower than that of Zone B
+or Zone C, but the variance of utility in Zone A is much larger.
+
+Recall that utility maximization theory posits that a decision maker will choose the one discrete alternative with maximum
+utility. The aggregation of those discrete alternatives into zones or aggregate choices does not change the underlying
+choice; a decision maker does not choose a zone, but she chooses a single discrete alternative in a zone.
+
+While the average utility in Zone A is smaller, you can see that there are some points in Zone A with much higher utility,
+and which are more likely to be chosen. In general, all other things being equal, aggregate alternatives get a positive
+bump in their probability of selection with an increase in variance of the systematic utility.
+
+Dan McFadden showed that, when the utilities in an aggregate are distributed normally, if we define :math:`\omega_{nest}^{2}`
+as the variance of :math:`V_{i}` in a nest, and :math:`\bar{V}_{i}` as the average systematic utility of alternatives in
+the nest, then
 
 .. math::
 
-	P(i) = \frac{ \exp(V_i) }{ \sum_j \exp(V_j) }
+	V_{nest}=\bar{V}_{i}+\mu_{nest}\log\left[N_{nest}\right]+\frac{1}{2}\frac{\omega_{nest}^{2}}{\mu_{nest}}
+
+
+
+Estimating N
+------------
+Sometimes, it is not obvious what :math:`N` should be. Land area? Employment? Population? It might be different
+for different types of trips, even if the types of trips are not differentiated in the data.
+
+It is possible to build :math:`N` as a linear combination of several component parts, so that you might have
+
+.. math::
+
+	N_{nest}=\gamma_{remp}RetailEmployment+\gamma_{nemp}NonretailEmployment+\gamma_{pop}Population
+
+The :math:`\gamma`'s then become new parameters to the model, in addition to the :math:`\beta` and :math:`\mu` parameters.
+
+The size value :math:`N_{nest}` still needs to be strictly positive, as it represents the number of discrete
+alternatives in the zone or aggregation. Therefore, all the data values and all the parameters inside :math:`N` also
+need to be positive. Enforcing positive data is easy, by only choosing variables that reflect size attributes
+(like employment, population, area). Enforcing positive coefficients requires constraints on the :math:`\gamma` parameters,
+or more simply a rewrite of the formulation of :math:`N`:
+
+.. math::
+
+	N_{nest}=\exp[\dot{\gamma}_{remp}]RetailEmployment+\exp[\dot{\gamma}_{nemp}]NonretailEmployment+\exp[\dot{\gamma}_{pop}]Population
+
+
+Then :math:`\dot{\gamma}` can be unconstrained.
 
-with :math:`V_i = \beta X_i`.
+One of the issues with estimating :math:`N` in this fashion is that the scale of :math:`N`, like the scale of :math:`V`,
+is not defined. Doubling the :math:`N` size of all alternatives, by adding :math:`\log[2]` to all :math:`\dot{\gamma}`,
+will not affect the probabilities. Therefore, one :math:`\dot{\gamma}` needs to be arbitrarily fixed at zero.
+(In the non-estimated :math:`N` case, this normalization occurs implicitly; there is no parameter inside the log term
+on :math:`N`.)