Merge branch 'master' of https://github.com/jpn--/larch

jpn-- · Mar 3, 2017 · a5b767e · a5b767e
2 parents e6494f8 + 17c08a3
commit a5b767e
Show file tree

Hide file tree

Showing 155 changed files with 535 additions and 160 deletions.
diff --git a/build_configuration.py b/build_configuration.py
@@ -1,6 +1,6 @@
 #!/usr/bin/python
 #
-#  Copyright 2007-2016 Jeffrey Newman
+#  Copyright 2007-2017 Jeffrey Newman
 #
 #  This file is part of Larch.
 #

diff --git a/doc/agg-choice-variance.png b/doc/agg-choice-variance.png
diff --git a/doc/conf.py b/doc/conf.py
@@ -113,7 +113,7 @@ def __getattr__(cls, name):
 
 # General information about the project.
 project = u'Larch'
-copyright = u'2010-2016, Jeffrey Newman'
+copyright = u'2010-2017, Jeffrey Newman'
 
 # The version info for the project you're documenting, acts as replacement for
 # |version| and |release|, also used in various other places throughout the
@@ -377,7 +377,7 @@ def setup(app):
 epub_title = u'larch'
 epub_author = u'Jeffrey Newman'
 epub_publisher = u'Jeffrey Newman'
-epub_copyright = u'2016, Jeffrey Newman'
+epub_copyright = u'2017, Jeffrey Newman'
 
 # The language of the text. It defaults to the language option
 # or en if the language is not set.

diff --git a/doc/math.rst b/doc/math.rst
@@ -3,12 +3,148 @@
 Mathematics of Logit Choice Modeling 
 ====================================
 
-This documentation will eventually provide some instruction on the underlying
-mathematics of logit models.  For example:
+This documentation will eventually provide instruction on some of the more interesting topics on the underlying
+mathematics of logit models.
+
+
+
+~~~~~~~~~~~~~~~~~~~~~~~
+Aggregate Choice Models
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Sometimes, a discrete choice is made from a very large pool of possible choices. In these
+circumstances, it may be useful to aggregate choices together, and represent a set of choices
+as a single meta-choice. This is particularly common in destination choice models, where the
+individual possible destinations are aggregated together as traffic analysis zones.
+
+The aggregate choice in many ways represents a nested logit model, with the aggregations corresponding to the nests.
+
+We can make some assumptions:
+
+	1. The individual elemental alternatives within each zone or aggregate are homogeneous.
+	   That is, each such alternative has the same systematic utility, :math:`V_{i} = \beta X_{i}`
+	2. The particular locations of the zonal or aggregation boundaries are arbitrary, and have
+	   no systematic meaning themselves.
+
+Using these assumptions, we can derive an aggregate/zonal choice model.
+
+The usual form of the nested logit model calculates the probability of an alternative as :math:`P_{nest}P_{alt|nest}`.
+In the case of aggregate choices, we do not observe the choice, but only the nest, so we only care about :math:`P_{nest}`.
+The nested formula for that term is
+
+.. math::
+
+	P_{nest}=\frac{\exp(V_{nest})}{\sum_{j\in nests}\exp(V_{j})}
+
+with
+
+.. math::
+
+	V_{nest}=\mu_{nest}\log\left(\sum_{i\in nest}\exp\left(\frac{V_{i}}{\mu_{nest}}\right)\right)
+
+Using assumption 2, we know that :math:`\mu_{nest}` must be 1, as we want the aggregation nesting structure to
+collapse to a multinomial logit model. Further, our first assumption is that all the :math:`V_{i}` are equal,
+so the terms inside the summation can collapse together, leaving
+
+.. math::
+
+	V_{nest}=\log\left(N_{nest}\exp\left(V_{i}\right)\right)=V_{i}+\log\left(N_{nest}\right)
+
+with :math:`N_{nest}` as the number of discrete elemental alternatives inside the nest. This can be estimated
+by creating a variable for each aggregate alternative that has a value of :math:`\log\left(N_{nest}\right)`,
+and including it in a MNL model, with a beta coefficient constrained to be equal to 1.
+
+One thing to be careful of in these models: the log likelihood at “zeros” model should include the parameter
+on :math:`\log\left(N_{nest}\right)` equal to 1, not 0. This is because this is not a parameter we are
+estimating in the model, it is a direct function of the structure of aggregation, which we have imposed externally.
+
+Relax Arbitrary Boundaries Assumption
+-------------------------------------
+
+Relaxing the assumption of arbitrary boundaries puts :math:`\mu_{nest}` back into the equation for :math:`V_{nest}`:
 
 .. math::
 
-	P(i) = \frac{ \exp(V_i) }{ \sum_j \exp(V_j) }
+	V_{nest}=\mu_{nest}\log\left(\sum_{i\in nest}\exp\left(\frac{V_{i}}{\mu_{nest}}\right)\right)=V_{i}+\mu_{nest}\log\left(N_{nest}\right)
+
+The logsum parameter thus appears as a coefficient on :math:`\log\left(N_{nest}\right)`. This may or may not be a good
+idea for transportation models. In an intra-urban model, if the boundaries of zones are at the TAZ level, which are
+small sectors drawn only for modelling purposes, relaxing this assumption probably doesn't make sense. If the boundaries
+are aligned with political boundaries (counties, towns) that have differing taxing, administration, or other policies,
+it might be OK to relax this assumption. In a log distance travel model, if the boundaries are aligned with metropolitan
+areas, then it is certainly reasonable to relax the arbitrary bounds assumption.
+
+
+Relaxing Homogeneity
+--------------------
+
+The other assumption we made was that the individual alternatives within a zone are homogeneous... but it is highly likely
+they are not. Variance in the systematic utilities, and in particular heteroskedastic variance, can change the calculations.
+Consider the one dimensional destination choice depicted here:
+
+.. image:: agg-choice-variance.png
+
+The choice has been subdivided into three aggregation zones. The average utility of Zone A is lower than that of Zone B
+or Zone C, but the variance of utility in Zone A is much larger.
+
+Recall that utility maximization theory posits that a decision maker will choose the one discrete alternative with maximum
+utility. The aggregation of those discrete alternatives into zones or aggregate choices does not change the underlying
+choice; a decision maker does not choose a zone, but she chooses a single discrete alternative in a zone.
+
+While the average utility in Zone A is smaller, you can see that there are some points in Zone A with much higher utility,
+and which are more likely to be chosen. In general, all other things being equal, aggregate alternatives get a positive
+bump in their probability of selection with an increase in variance of the systematic utility.
+
+[McFadden1978]_ showed that, when the utilities in an aggregate are distributed normally, if we define :math:`\omega_{nest}^{2}`
+as the variance of :math:`V_{i}` in a nest, and :math:`\bar{V}_{i}` as the average systematic utility of alternatives in
+the nest, then
+
+.. math::
+
+	V_{nest}=\bar{V}_{i}+\mu_{nest}\log\left(N_{nest}\right)+\frac{1}{2}\frac{\omega_{nest}^{2}}{\mu_{nest}}
+
+
+Estimating N
+------------
+Sometimes, it is not obvious what :math:`N` should be. Land area? Employment? Population? It might be different
+for different types of trips, even if the types of trips are not differentiated in the data.
+
+It is possible to build :math:`N` as a linear combination of several component parts, so that you might have
+
+.. math::
+
+	N_{nest}=\gamma_{remp}RetailEmployment+\gamma_{nemp}NonretailEmployment+\gamma_{pop}Population
+
+The :math:`\gamma`'s then become new parameters to the model, in addition to the :math:`\beta` and :math:`\mu` parameters.
+
+The size value :math:`N_{nest}` still needs to be strictly positive, as it represents the number of discrete
+alternatives in the zone or aggregation. Therefore, all the data values and all the parameters inside :math:`N` also
+need to be positive (or, more precisely, they must all be non-negative and at least one pairing must both be strictly positive).
+Enforcing positive data is easy, by only choosing variables that reflect size attributes
+(like employment, population, area). Enforcing positive coefficients requires constraints on the :math:`\gamma` parameters,
+or, more simply, a rewrite of the formulation of :math:`N`:
+
+.. math::
+
+	N_{nest}=\exp(\dot{\gamma}_{remp})RetailEmployment+\exp(\dot{\gamma}_{nemp})NonretailEmployment+\exp(\dot{\gamma}_{pop})Population
+
+
+Then :math:`\dot{\gamma}` can be unconstrained.  (This form also has advantages in the calculation of derivatives, the
+details of which are not important for users to understand.)
+
+One of the issues with estimating :math:`N` in this fashion is that the scale of :math:`N`, like the scale of :math:`V`,
+is not defined. Doubling the :math:`N` size of all alternatives, by adding :math:`\log(2)` to all :math:`\dot{\gamma}`,
+will not affect the probabilities. Therefore, one :math:`\dot{\gamma}` needs to be arbitrarily fixed at zero.
+(In the non-estimated :math:`N` case, this normalization occurs implicitly; there is no parameter inside the log term
+on :math:`N`.)
+
+
+
+
+~~~~~~~~~~
+
 
-with :math:`V_i = \beta X_i`.
+.. [McFadden1978] McFadden, D. (1978) Modelling the choice of residential location.
+   Spatial Interaction Theory and Residential Location (Karlquist A. Ed., pp. 75-96).
+   North Holland, Amsterdam.
 
diff --git a/py/__init__.py b/py/__init__.py
@@ -2,7 +2,7 @@
 #
 #  Larch is free, open source software to estimate discrete choice models.
 #  
-#  Copyright 2007-2016 Jeffrey Newman
+#  Copyright 2007-2017 Jeffrey Newman
 #
 #  Larch is free software: you can redistribute it and/or modify
 #  it under the terms of the GNU General Public License as published by
@@ -34,7 +34,7 @@
 
 
 info = """Larch is free, open source software to estimate discrete choice models.
-Copyright 2007-2016 Jeffrey Newman
+Copyright 2007-2017 Jeffrey Newman
 This program is licensed under GPLv3 and comes with ABSOLUTELY NO WARRANTY."""
 
 status = ""

diff --git a/py/examples/__init__.py b/py/examples/__init__.py
@@ -1,6 +1,6 @@
 ######################################################### encoding: utf-8 ######
 #
-#  Copyright 2007-2016 Jeffrey Newman.
+#  Copyright 2007-2017 Jeffrey Newman.
 #
 #  This file is part of Larch.
 #

diff --git a/py/examples/itin80.py b/py/examples/itin80.py
@@ -1,6 +1,6 @@
 ################################################################################
 #
-#  Copyright 2007-2016 Jeffrey Newman.
+#  Copyright 2007-2017 Jeffrey Newman.
 #
 #  This file is part of Larch.
 #

diff --git a/py/examples/mtc01e.py b/py/examples/mtc01e.py
@@ -1,6 +1,6 @@
 ################################################################################
 #
-#  Copyright 2007-2016 Jeffrey Newman.
+#  Copyright 2007-2017 Jeffrey Newman.
 #
 #  This file is part of Larch.
 #

diff --git a/py/examples/mtc17.py b/py/examples/mtc17.py
@@ -1,6 +1,6 @@
 ################################################################################
 #
-#  Copyright 2007-2016 Jeffrey Newman.
+#  Copyright 2007-2017 Jeffrey Newman.
 #
 #  This file is part of Larch.
 #

diff --git a/py/examples/mtc22.py b/py/examples/mtc22.py
@@ -1,6 +1,6 @@
 ################################################################################
 #
-#  Copyright 2007-2016 Jeffrey Newman.
+#  Copyright 2007-2017 Jeffrey Newman.
 #
 #  This file is part of Larch.
 #

diff --git a/py/examples/swissmetro00data.py b/py/examples/swissmetro00data.py
@@ -1,6 +1,6 @@
 ######################################################### encoding: utf-8 ######
 #
-#  Copyright 2007-2016 Jeffrey Newman.
+#  Copyright 2007-2017 Jeffrey Newman.
 #
 #  This file is part of Larch.
 #

diff --git a/py/examples/swissmetro01logit.py b/py/examples/swissmetro01logit.py
@@ -1,6 +1,6 @@
 ######################################################### encoding: utf-8 ######
 #
-#  Copyright 2007-2016 Jeffrey Newman.
+#  Copyright 2007-2017 Jeffrey Newman.
 #
 #  This file is part of Larch.
 #

diff --git a/py/examples/swissmetro02weighted.py b/py/examples/swissmetro02weighted.py
@@ -1,6 +1,6 @@
 ######################################################### encoding: utf-8 ######
 #
-#  Copyright 2007-2016 Jeffrey Newman.
+#  Copyright 2007-2017 Jeffrey Newman.
 #
 #  This file is part of Larch.
 #

diff --git a/py/examples/swissmetro04transforms.py b/py/examples/swissmetro04transforms.py
@@ -1,6 +1,6 @@
 ######################################################### encoding: utf-8 ######
 #
-#  Copyright 2007-2016 Jeffrey Newman.
+#  Copyright 2007-2017 Jeffrey Newman.
 #
 #  This file is part of Larch.
 #

diff --git a/py/examples/swissmetro09nested.py b/py/examples/swissmetro09nested.py
@@ -1,6 +1,6 @@
 ######################################################### encoding: utf-8 ######
 #
-#  Copyright 2007-2016 Jeffrey Newman.
+#  Copyright 2007-2017 Jeffrey Newman.
 #
 #  This file is part of Larch.
 #

diff --git a/py/examples/swissmetro11cnl.py b/py/examples/swissmetro11cnl.py
@@ -1,6 +1,6 @@
 ######################################################### encoding: utf-8 ######
 #
-#  Copyright 2007-2016 Jeffrey Newman.
+#  Copyright 2007-2017 Jeffrey Newman.
 #
 #  This file is part of Larch.
 #

diff --git a/py/examples/swissmetro14selectionBias.py b/py/examples/swissmetro14selectionBias.py
@@ -1,6 +1,6 @@
 ######################################################### encoding: utf-8 ######
 #
-#  Copyright 2007-2016 Jeffrey Newman.
+#  Copyright 2007-2017 Jeffrey Newman.
 #
 #  This file is part of Larch.
 #

diff --git a/py/linalg.py b/py/linalg.py
@@ -1,5 +1,5 @@
 #
-#  Copyright 2007-2016 Jeffrey Newman
+#  Copyright 2007-2017 Jeffrey Newman
 #
 #  This file is part of Larch.
 #

diff --git a/py/logging.py b/py/logging.py
@@ -1,5 +1,5 @@
 #
-#  Copyright 2007-2016 Jeffrey Newman
+#  Copyright 2007-2017 Jeffrey Newman
 #
 #  This file is part of Larch.
 #

diff --git a/py/model_reporter/docx.py b/py/model_reporter/docx.py
@@ -2,6 +2,7 @@
 try:
 	import docx
 	from docx.enum.style import WD_STYLE_TYPE
+	from docx.enum.text import WD_ALIGN_PARAGRAPH
 except ImportError:
 
 	class DocxModelReporter():
@@ -30,6 +31,15 @@ def _append_to_document(self, other_doc):
 	def document_larchstyle():
 		document = docx.Document()
 
+#		normal = document.styles['Normal']
+#		normal.font.name = 'Arial'
+#		normal.font.size = docx.shared.Pt(11)
+#		normal.paragraph_format.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY
+#		normal.paragraph_format.line_spacing = 1.0
+#		normal.paragraph_format.widow_control = True
+#
+		body_text = document.styles['Body Text']
+
 		monospaced_small = document.styles.add_style('Monospaced Small',WD_STYLE_TYPE.TABLE)
 		monospaced_small.base_style = document.styles['Normal']
 		monospaced_small.font.name = 'Courier New'
@@ -38,6 +48,15 @@ def document_larchstyle():
 		monospaced_small.paragraph_format.space_after  = docx.shared.Pt(0)
 		monospaced_small.paragraph_format.line_spacing = 1.0
 
+		table_body_text = document.styles.add_style('Table Body Text',WD_STYLE_TYPE.TABLE)
+		table_body_text.base_style = document.styles['Body Text']
+		table_body_text.font.name = 'Arial Narrow'
+		table_body_text.font.size = docx.shared.Pt(9)
+		table_body_text.paragraph_format.space_before = docx.shared.Pt(1)
+		table_body_text.paragraph_format.space_after  = docx.shared.Pt(1)
+		table_body_text.paragraph_format.line_spacing = 1.0
+
+
 		return document
 
 
@@ -204,7 +223,7 @@ def docx_params(self, groups=None, display_inital=False, **format):
 			if groups is None and hasattr(self, 'parameter_groups'):
 				groups = self.parameter_groups
 
-			table = docx_table(rows=1, cols=number_of_columns, style='Monospaced Small',
+			table = docx_table(rows=1, cols=number_of_columns, style='Table Body Text',
 							   header_text="Model Parameter Estimates", header_level=2)
 
 			def append_simple_row(name, initial_value, value, std_err, tstat, nullvalue, holdfast):

diff --git a/py/test/__init__.py b/py/test/__init__.py
@@ -1,5 +1,5 @@
 #
-#  Copyright 2007-2016 Jeffrey Newman
+#  Copyright 2007-2017 Jeffrey Newman
 #
 #  This file is part of Larch.
 #

diff --git a/py/test/test_data.py b/py/test/test_data.py
@@ -1,5 +1,5 @@
 #
-#  Copyright 2007-2016 Jeffrey Newman
+#  Copyright 2007-2017 Jeffrey Newman
 #
 #  This file is part of Larch.
 #

diff --git a/py/test/test_examples.py b/py/test/test_examples.py
@@ -1,5 +1,5 @@
 #
-#  Copyright 2007-2016 Jeffrey Newman
+#  Copyright 2007-2017 Jeffrey Newman
 #
 #  This file is part of Larch.
 #

diff --git a/py/test/test_mixed.py b/py/test/test_mixed.py
@@ -1,5 +1,5 @@
 #
-#  Copyright 2007-2016 Jeffrey Newman
+#  Copyright 2007-2017 Jeffrey Newman
 #
 #  This file is part of Larch.
 #

diff --git a/py/test/test_mnl.py b/py/test/test_mnl.py
@@ -1,5 +1,5 @@
 #
-#  Copyright 2007-2016 Jeffrey Newman
+#  Copyright 2007-2017 Jeffrey Newman
 #
 #  This file is part of Larch.
 #

diff --git a/py/test/test_nl.py b/py/test/test_nl.py
@@ -1,5 +1,5 @@
 #
-#  Copyright 2007-2016 Jeffrey Newman
+#  Copyright 2007-2017 Jeffrey Newman
 #
 #  This file is part of Larch.
 #