# Choosing Nests

We are using this paper to figure out how to choose nests in a nested multinomial logit. First things first relative choice probabilities of items in a given nest:

$$
r_A(A',B')=\frac{p(A',A)}{p(B',A)}
$$

which means the likelihood that $A'$ is chosen divided by the likelihood that $B'$ is chosen when they are in the same nest.(These are empirical frequencies - while the paper distinguishes between these and theoretical, mean frequencies, we don't seem to need to do this.

Note also what $p(A,B)$ is - it is the probability that $A$ is chosen (or some entity within this subset), given a choice has narrowed us down to $B$. 



The nest selection problem requires:

$$
\min_{\mathcal{Y} \in \mathcal{X}} D_1(\mathcal{Y}) + D_2(\mathcal{Y})
$$

where 

$$
D_1(\mathcal{Y}) = \frac{\sum_{Y\in\mathcal{Y}}\sum_{A,B \in \mathcal{A},a,b\in A \cap B \cap Y}\log(r_A(a,b)) - \log(r_B(a,b)))^2}{\sum_{Y\in\mathcal{Y}}|\{(A,B,a,b)|a,b \in A \cap B \cap Y|\}}
$$


and

$$
D_2(\mathcal{Y}) = \frac{\sum_{Y,Y'\in\mathcal{Y}}\sum_{A,B \in \mathcal{A}:A \cap Y = B \cap Y,A \cap Y' = B\cap Y'}\log(r_A(Y,Y')) - \log(r_B(Y,Y')))^2}{\sum_{Y,Y'\in\mathcal{Y}}|\{(A,B)|A\cap Y=B \cap Y,A \cap Y'=B\cap Y'\}|}
$$

Let's see if we can make the rubber hit the road with some data on transit choice...the following import command takes a lot of time to load!

In [None]:
import delimited "C:\Users\mjbaker\OneDrive - CUNY\Documents\github\ShareFormNMNL\Data\nhgis0029_csv\nhgis0029_ts_nominal_county.csv", clear

Dropping some unneeded/incomplete variables:

In [4]:
drop b78aa125 b78aa125m b78aa195 b78aa195m b86aa125m b86aa195m b86ab125m b86ab195m 
drop b86ac125m b86ac195m b84aa125m b84aa195m b84ab125m b84ab195m b84ac125m b84ac195m
drop b84ad125m b84ad195m b84ae125m b84ae195m b84af125m b84af195m c53aa125m c53aa195m
drop c53ab125m c53ab195m
drop c53ac125m c53ac195m c53ad125m c53ad195m c53ae125m c53ae195m c53af125m c53af195m
drop c53ag125m c53ag195m c53ah125m c53ah195m c53ai125m c53ai195m c53aj125m c53aj195m
drop c53ak125m c53ak195m c53al125m c53al195m c53am125m c53am195m c53an125m c53an195m
drop c53ao125m c53ao195m c53ap125m c53ap195m c53aq125m c53aq195m c53ar125m c53ar195m 
drop c53as125m c53as195m c53at125m c53at195m cw0aa125m cw0aa195m

Rearranging some data...for ease of reshaping data

In [5]:
drop b84aa1970 b84ab1970 b84ac1970 b84ad1970 b84ae1970 b84af1970

rename b86aa125 b86aa2010
rename b86ab125 b86ab2010

rename b86aa195 b86aa2020
rename b86ab195 b86ab2020

rename b86ac125 b86ac2010
rename c53ac125 c53ac2010 
rename c53ag125 c53ag2010 
rename c53ak125 c53ak2010 
rename c53ao125 c53ao2010 
rename c53as125 c53as2010 

rename b86ac195 b86ac2020
rename c53ac195 c53ac2020
rename c53ag195 c53ag2020
rename c53ak195 c53ak2020
rename c53ao195 c53ao2020
rename c53as195 c53as2020

rename b84aa125 b84aa2010
rename c53ad125 c53ad2010
rename c53ah125 c53ah2010
rename c53al125 c53al2010
rename c53ap125 c53ap2010 
rename c53at125 c53at2010 

rename b84aa195 b84aa2020
rename c53ad195 c53ad2020
rename c53ah195 c53ah2020
rename c53al195 c53al2020
rename c53ap195 c53ap2020 

rename b84ab125 b84ab2010
rename c53ae125 c53ae2010
rename c53ai125 c53ai2010
rename c53am125 c53am2010
rename c53aq125 c53aq2010 
rename cw0aa125 cw0aa2010

rename b84ab195 b84ab2020
rename c53ae195 c53ae2020
rename c53ai195 c53ai2020
rename c53am195 c53am2020
rename c53aq195 c53aq2020 
rename cw0aa195 cw0aa2020

rename b84ac125 b84ac2010
rename c53af125 c53af2010
rename c53aj125 c53aj2010
rename c53an125 c53an2010
rename c53ar125 c53ar2010 

rename b84ad125 b84ad2010
rename b84ae125 b84ae2010
rename b84af125 b84af2010

rename b84ad195 b84ad2020
rename b84ae195 b84ae2020
rename b84af195 b84af2020

rename c53aa125 c53aa2010
rename c53ab125 c53ab2010

rename c53aa195 c53aa2020
rename c53ab195 c53ab2020

rename b84ac195 b84ac2020
rename c53af195 c53af2020
rename c53aj195 c53aj2020
rename c53an195 c53an2020
rename c53ar195 c53ar2020 
rename c53at195 c53at2020

In [8]:
list county statecode geography year class share in 1/10


     +------------------------------------------------------------+
  1. |         county | statec~e |               geography | year |
     | Baldwin County |       AL | Baldwin County, Alabama | 2005 |
     |------------------------------------------------------------|
     |                                 class     |     share      |
     |                           Drive Alone     |      78.2      |
     +------------------------------------------------------------+

     +------------------------------------------------------------+
  2. |         county | statec~e |               geography | year |
     | Baldwin County |       AL | Baldwin County, Alabama | 2005 |
     |------------------------------------------------------------|
     |                                 class     |     share      |
     |               Carpool - One Passenger     |      12.6      |
     +------------------------------------------------------------+

     +---------------------------------------

One question - should we have a decision to participate in the labor force as part of this transit decision? If so, we would need to know how bit the labor force is, if it is not already in our data...

In [9]:
bysort year state county: egen foo = total(share)

In [11]:
bysort year state county: gen last = _n == _N

In [None]:
hist foo if last

(bin=38, start=0, width=3.2289475)
