Samplingbias examples

jpn-- · Dec 21, 2016 · c32f69b · c32f69b
1 parent 02fd180
commit c32f69b
Show file tree

Hide file tree

Showing 2 changed files with 211 additions and 0 deletions.
diff --git a/doc/example/114_nl_sampbias.rst b/doc/example/114_nl_sampbias.rst
@@ -0,0 +1,122 @@
+.. currentmodule:: larch
+
+=========================================
+114: Swissmetro Nested with Sampling Bias
+=========================================
+
+.. testsetup:: *
+
+   import larch
+
+
+
+This example is a mode choice model built using the Swissmetro example dataset.
+First we create the DB and Model objects.  When we create the DB object, we will
+redefine the weight value:
+
+.. testcode::
+
+	d = larch.DB.Example('SWISSMETRO')
+	m = larch.Model(d)
+
+We can attach a title to the model. The title does not affect the calculations
+as all; it is merely used in various output report styles.
+
+.. testcode::
+
+	m.title = "swissmetro example 14 (nested logit with sampling bias)"
+
+
+The swissmetro dataset, as with all Biogeme data, is only in `co` format.
+
+.. testcode::
+
+	from larch.roles import P,X
+	m.utility[1] = ( P.ASC_TRAIN
+	               + P.B_TIME * X.TRAIN_TT
+	               + P.B_COST * X("TRAIN_CO*(GA==0)") )
+	m.utility[2] = ( P.B_TIME * X.SM_TT
+	               + P.B_COST * X("SM_CO*(GA==0)") )
+	m.utility[3] = ( P.ASC_CAR
+	               + P.B_TIME * X.CAR_TT
+	               + P.B_COST * X("CAR_CO") )
+
+
+To create a new nest, we can use the new_nest command, although we'll need to know what the
+alternative codes are for the alternatives in our dataset. To find out, we can do:
+
+.. doctest::
+	:options: +ELLIPSIS, +NORMALIZE_WHITESPACE
+
+	>>> m.df.alternatives()
+	[(1, 'Train'), (2, 'SM'), (3, 'Car')]
+
+
+For this example, we want to nest together the Train and Car modes into a "existing" modes nest.
+It looks like those are modes 1 and 3, so we can use the new_nest command like this:
+
+.. testcode::
+
+	m.new_nest("existing", parent=m.root_id, children=[1,3])
+
+
+We're also going to insert a sampling bias correction in this model.  To read more on this topic,
+please see
+Bierlaire, Bolduc, and McFadden (2008)
+`The estimation of generalized extreme value models from choice-based samples <http://www.sciencedirect.com/science/article/pii/S0191261507000963>`_,
+in `Transportation Research Part B: Methodological <http://www.sciencedirect.com/science/journal/01912615>`_.
+
+
+.. testcode::
+
+	m.samplingbias[1] = P.SB_TRAIN
+
+
+Larch will find all the parameters in the model, but we'd like to output them in
+a particular order, so we want to reorder the parameters.
+We can use the reorder method to fix this:
+
+.. testcode::
+
+	m.reorder_parameters("ASC", "B_", "existing",)
+
+We can estimate the models and check the results match up with those given by Biogeme:
+
+.. doctest::
+	:options: +ELLIPSIS, +NORMALIZE_WHITESPACE
+
+	>>> result = m.maximize_loglike()
+	>>> print(result.message)
+	Optimization terminated successfully...
+
+	>>> print(m.report('txt', sigfigs=3))
+	=========================================================================================...
+	swissmetro example 14 (nested logit with sampling bias)
+	=========================================================================================...
+	Model Parameter Estimates
+	-----------------------------------------------------------------------------------------...
+	Parameter	InitValue   	FinalValue  	StdError    	t-Stat      	NullValue
+	ASC_TRAIN	 0.0        	-10.2       	 3.13       	-3.27       	 0.0        
+	ASC_CAR  	 0.0        	-0.317      	 0.0443     	-7.16       	 0.0        
+	B_TIME   	 0.0        	-0.0113     	 0.000576   	-19.6       	 0.0        
+	B_COST   	 0.0        	-0.0111     	 0.000511   	-21.6       	 0.0        
+	SB_TRAIN 	 0.0        	 10.4       	 3.15       	 3.31       	 0.0        
+	existing 	 1.0        	 0.874      	 0.035      	-3.61       	 1.0        
+	=========================================================================================...
+	Model Estimation Statistics
+	-----------------------------------------------------------------------------------------...
+	Log Likelihood at Convergence     	-5169.64
+	Log Likelihood at Null Parameters 	-6964.66
+	-----------------------------------------------------------------------------------------...
+	Rho Squared w.r.t. Null Parameters	0.258
+	=========================================================================================...
+
+
+.. tip::
+
+	If you want access to the model in this example without worrying about assembling all the code blocks
+	together on your own, you can load a read-to-estimate copy like this::
+
+		m = larch.Model.Example(114)
+
+
diff --git a/doc/example/307_arc_nl2_sampbias.rst b/doc/example/307_arc_nl2_sampbias.rst
@@ -0,0 +1,89 @@
+.. currentmodule:: larch
+
+===============================================
+307: Itinerary Choice with Sampling Bias
+===============================================
+
+.. testsetup:: *
+
+   import larch
+
+
+We begin this example by picking up where we left off on Example 303:
+
+.. testcode::
+
+	m = larch.Model.Example(303)
+
+That model include a two level nested logit structure.  Let's suppose however,
+that our source data does not quite look like a simple random sample of travelers.
+Instead, suppose we have a stratified sample based on distribution channel (how travelers
+purchased their tickets).  This stratification will manifest itself in different levels of
+sampling intensity across carriers due to differences in marketing practices across those carriers.
+
+Under an MNL model formulation, because we are including a complete set of carrier-specific
+constants in the model, any bias in parameter estimates relating to the stratified sampling would
+be isolated in the carrier-specific constants, allowing the other parameter estimates to
+be unbiased by the sampling.  However, this does not hold for more complex GEV models unless
+carrier-specific constants are processed outside of any GEV or nesting structure.
+
+To do this, we can add a second set of carrier-specific constants in a special samplingbias term:
+
+.. testcode::
+
+	m.samplingbias.ca = sum(P("carrier_bias_{}".format(i))*X("carrier={}".format(i)) for i in [2,3,4,5])
+
+Just like the "normal" carrier-specific constants, we hold out a carrier (#1) as the reference point, and
+only add constants for 2 through 5.
+
+We can now estimate the likelihood maximizing parameters:
+
+.. doctest::
+	:options: +ELLIPSIS, +NORMALIZE_WHITESPACE
+
+	>>> result = m.maximize_loglike('SLSQP')
+
+	>>> print(m.report('txt', sigfigs=3))
+	============================================================================================...
+	Model Parameter Estimates
+	--------------------------------------------------------------------------------------------...
+	Parameter     	InitValue   	FinalValue  	StdError    	t-Stat      	NullValue
+	timeperiod=2  	 0.0        	 0.0901     	 0.00775    	 11.6       	 0.0        
+	timeperiod=3  	 0.0        	 0.104      	 0.00779    	 13.4       	 0.0        
+	timeperiod=4  	 0.0        	 0.0887     	 0.00853    	 10.4       	 0.0        
+	timeperiod=5  	 0.0        	 0.127      	 0.00869    	 14.6       	 0.0        
+	timeperiod=6  	 0.0        	 0.229      	 0.00878    	 26.1       	 0.0        
+	timeperiod=7  	 0.0        	 0.269      	 0.00986    	 27.3       	 0.0        
+	timeperiod=8  	 0.0        	 0.293      	 0.0103     	 28.3       	 0.0        
+	timeperiod=9  	 0.0        	-0.0223     	 0.011      	-2.04       	 0.0        
+	carrier=2     	 0.0        	-0.82       	 0.113      	-7.26       	 0.0        
+	carrier=3     	 0.0        	-2.04       	 0.153      	-13.3       	 0.0        
+	carrier=4     	 0.0        	-1.7        	 0.231      	-7.35       	 0.0        
+	carrier=5     	 0.0        	-6.84       	 8.48       	-0.807      	 0.0        
+	equipment=2   	 0.0        	 0.387      	 0.0086     	 45.0       	 0.0        
+	fare_hy       	 0.0        	-0.000948   	 2.52e-05   	-37.6       	 0.0        
+	fare_ly       	 0.0        	-0.00104    	 7.02e-05   	-14.8       	 0.0        
+	elapsed_time  	 0.0        	-0.00479    	 0.00011    	-43.4       	 0.0        
+	nb_cnxs       	 0.0        	-2.45       	 0.0458     	-53.5       	 0.0        
+	mu_tod        	 1.0        	 0.822      	 0.0128     	-13.9       	 1.0        
+	mu_los        	 1.0        	 0.806      	 0.00911    	-21.3       	 1.0        
+	carrier_bias_2	 0.0        	 1.13       	 0.138      	 8.23       	 0.0        
+	carrier_bias_3	 0.0        	 3.16       	 0.169      	 18.6       	 0.0        
+	carrier_bias_4	 0.0        	 2.64       	 0.279      	 9.44       	 0.0        
+	carrier_bias_5	 0.0        	 7.86       	 10.5       	 0.747      	 0.0        
+	============================================================================================...
+	Model Estimation Statistics
+	--------------------------------------------------------------------------------------------...
+	Log Likelihood at Convergence     	-777321.75
+	Log Likelihood at Null Parameters 	-953940.44
+	--------------------------------------------------------------------------------------------...
+	Rho Squared w.r.t. Null Parameters	0.185
+	============================================================================================...
+
+.. tip::
+
+	If you want access to the model in this example without worrying about assembling all the code blocks
+	together on your own, you can load a read-to-estimate copy like this::
+
+		m = larch.Model.Example(307)
+