# Multinomial Logit & Unordered Data

R Package requirements:
* `zoo`
* `mlogit`
* `tidyverse`
* `broom`
* `mfx`
* `effects`

Reference: https://rdrr.io/rforge/mlogit/

`mlogit` provides a model description interface (enhanced formula-data), a very versatile estimation function and a testing infrastructure to deal with random utility models.

In [2]:
library(tidyverse)
library(broom)
library(mlogit)
library(zoo)

In [3]:
data("Fishing", package = "mlogit")
head(Fishing, 10)

Unnamed: 0_level_0,mode,price.beach,price.pier,price.boat,price.charter,catch.beach,catch.pier,catch.boat,catch.charter,income
Unnamed: 0_level_1,<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,charter,157.93,157.93,157.93,182.93,0.0678,0.0503,0.2601,0.5391,7083.332
2,charter,15.114,15.114,10.534,34.534,0.1049,0.0451,0.1574,0.4671,1250.0
3,boat,161.874,161.874,24.334,59.334,0.5333,0.4522,0.2413,1.0266,3750.0
4,pier,15.134,15.134,55.93,84.93,0.0678,0.0789,0.1643,0.5391,2083.333
5,boat,106.93,106.93,41.514,71.014,0.0678,0.0503,0.1082,0.324,4583.332
6,charter,192.474,192.474,28.934,63.934,0.5333,0.4522,0.1665,0.3975,4583.332
7,beach,51.934,51.934,191.93,220.93,0.0678,0.0789,0.1643,0.5391,8750.001
8,charter,15.134,15.134,21.714,56.714,0.0678,0.0789,0.0102,0.0209,2083.333
9,boat,34.914,34.914,34.914,53.414,0.2537,0.1498,0.0233,0.0219,3750.0
10,boat,28.314,28.314,28.314,46.814,0.2537,0.1498,0.0233,0.0219,2916.667


A sample of 1182 individuals in the United-States for the choice of 4 alternative fishing modes.

A dataframe containing :
* mode: recreation mode choice, one of : beach, pier, boat and charter, 
* price.beach: price for beach mode
* price.pier: price for pier mode,
* price.boat: price for private boat mode,
* price.charter: price for charter boat mode,
* catch.beach: catch rate for beach mode,
* catch.pier: catch rate for pier mode,
* catch.boat: catch rate for private boat mode,
* catch.charter: catch rate for charter boat mode, 
* income: monthly income,

4 fishing modes: 
* beach, 
* pier, 
* boat, 
* charter, 

2 alternative specific variables 
* price and 
* catch 
and one choice/individual specific variable 
* income. 

This *wide* format is suitable to store individual specific variables. Otherwise, it is cumbersome for alternative specific variables because there are as many columns for such variables that there are alternatives.

Datasets can have two different format or shape:
* a wide shape: one row for each choice situation,
* a long shape: one row for each alternative and hence as many rows as there are alternatives for each choice situation.

`mlogit` deals with both format. The `mlogit.data` function that take as first argument a `data.frame` and returns a `data.frame` in *long* format with some information about the structure of the data.
For example:

In [4]:
Fish <- mlogit.data(Fishing, varying = c(2:9), shape = "wide", choice = "mode")
head(Fish,10)


~~~~~~~
 first 10 observations out of 4728 
~~~~~~~
    mode   income     alt   price  catch chid    idx
1  FALSE 7083.332   beach 157.930 0.0678    1 1:each
2  FALSE 7083.332    boat 157.930 0.2601    1 1:boat
3   TRUE 7083.332 charter 182.930 0.5391    1 1:rter
4  FALSE 7083.332    pier 157.930 0.0503    1 1:pier
5  FALSE 1250.000   beach  15.114 0.1049    2 2:each
6  FALSE 1250.000    boat  10.534 0.1574    2 2:boat
7   TRUE 1250.000 charter  34.534 0.4671    2 2:rter
8  FALSE 1250.000    pier  15.114 0.0451    2 2:pier
9  FALSE 3750.000   beach 161.874 0.5333    3 3:each
10  TRUE 3750.000    boat  24.334 0.2413    3 3:boat

~~~ indexes ~~~~
   chid     alt
1     1   beach
2     1    boat
3     1 charter
4     1    pier
5     2   beach
6     2    boat
7     2 charter
8     2    pier
9     3   beach
10    3    boat
indexes:  1, 2 


In [5]:
Fish$inc1 <- Fish$income/1000 

The *choice* variable is a logical variable and the individual specific variable, *income*, is repeated 4 times. An index attribute is added to the data, which contains the two relevant index: `chid` is the choice index and `alt` index. 

In [6]:
head(index(Fish))

Unnamed: 0_level_0,chid,alt
Unnamed: 0_level_1,<int>,<fct>
1,1,beach
2,1,boat
3,1,charter
4,1,pier
5,2,beach
6,2,boat


## Multinomial Logit

In [7]:
mlogit.mnl1 <- mlogit(mode ~ 1 | inc1, data=Fish, reflevel="charter")
tidy(mlogit.mnl1)

term,estimate,std.error,statistic,p.value
<chr>,<dbl>,<dbl>,<dbl>,<dbl>
(Intercept):beach,-1.34129144,0.19451671,-6.8955076,5.367262e-12
(Intercept):boat,-0.60237067,0.13609637,-4.4260597,9.596998e-06
(Intercept):pier,-0.52714117,0.17778419,-2.9650622,0.003026218
inc1:beach,0.03163988,0.0418463,0.7560974,0.4495908
inc1:boat,0.12354624,0.02791059,4.4265,9.577437e-06
inc1:pier,-0.11176304,0.04397946,-2.5412551,0.01104553


#### Multinomial logit with different base

In [8]:
mlogit.mnl2 <- mlogit(mode ~ 1 | inc1, data = Fish, reflevel="beach")
summary(mlogit.mnl2)


Call:
mlogit(formula = mode ~ 1 | inc1, data = Fish, reflevel = "beach", 
    method = "nr")

Frequencies of alternatives:choice
  beach    boat charter    pier 
0.11337 0.35364 0.38240 0.15059 

nr method
4 iterations, 0h:0m:0s 
g'(-H)^-1g = 8.32E-07 
gradient close to zero 

Coefficients :
                     Estimate Std. Error z-value  Pr(>|z|)    
(Intercept):boat     0.738921   0.196731  3.7560 0.0001727 ***
(Intercept):charter  1.341291   0.194517  6.8955 5.367e-12 ***
(Intercept):pier     0.814150   0.228632  3.5610 0.0003695 ***
inc1:boat            0.091906   0.040664  2.2602 0.0238116 *  
inc1:charter        -0.031640   0.041846 -0.7561 0.4495908    
inc1:pier           -0.143403   0.053288 -2.6911 0.0071223 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihood: -1477.2
McFadden R^2:  0.013736 
Likelihood ratio test : chisq = 41.145 (p.value = 6.0931e-09)

* **MNL**:  Coefficient interpretation is more difficult than for the CL logit model. Interpretation for the MNL model is relative to the base category. Hence a positive coefficient does not necessarily imply an increase in probability. 
* Relative to beach fishing, higher income reduce likelihood of fishing from a pier or a charter and greater likelihood of use of a private boat.

### Mean values for the marginal effects

#### Setting mean values for variables to use for marginal effects 

In [9]:
m <- mlogit(mode ~ price+catch |inc1, data = Fish, reflevel="beach")
z <- with(Fish, data.frame(price = tapply(price, index(m)$alt, mean), 
                             catch = tapply(catch, index(m)$alt, mean), inc1 = mean(inc1)))

**OR**

In [10]:
m <- mlogit(mode ~ price | inc1 | catch, data = Fish)
z <- with(Fish, data.frame(price = tapply(price, idx(m, 2), mean),
                           catch = tapply(catch, idx(m, 2), mean),
                           inc1 = mean(inc1)))

#### Multinomial logit model marginal effects

In [11]:
round(effects(mlogit.mnl1, covariate = "inc1", data = z),3)

A \$1,000 increase in income implies changes of 0.000, -0.021, 0.033, and -0.012 in the probabilities of fishing from beach, pier, private boat, and charter boat. This indicates little change in beach fishing, movement **out of** pier and charter boat fishing, and movement **to** private boat fishing. 

MNL model has much lower log-likelihood and pseudo-R2 than does the Mixed model.

#### Odds ratio

In [12]:
coef.mnl <- coef(mlogit.mnl2)
tibble(exp(coef.mnl[4:6]))

exp(coef.mnl[4:6])
<dbl>
1.0962622
0.9688554
0.8664049


## Conditional Logit

In [13]:
clogit.cl1 <- mlogit(mode ~ price+catch | 0,reflevel="beach", data = Fish)
summary(clogit.cl1)


Call:
mlogit(formula = mode ~ price + catch | 0, data = Fish, reflevel = "beach", 
    method = "nr")

Frequencies of alternatives:choice
  beach    boat charter    pier 
0.11337 0.35364 0.38240 0.15059 

nr method
6 iterations, 0h:0m:0s 
g'(-H)^-1g = 0.000179 
successive function values within tolerance limits 

Coefficients :
        Estimate Std. Error z-value  Pr(>|z|)    
price -0.0204765  0.0012231 -16.742 < 2.2e-16 ***
catch  0.9530982  0.0894134  10.659 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihood: -1312

The CL model, the sign of the coefficients are directly interpretable. Hence $\beta_P<0$ means that an increase in price of one of the alternative decreases the probability of choosing that alternative, and increases the probability of choosing the other alternatives.   

#### Conditional logit model marginal effects
Average marginal response of the probability of choosing each alternative when a regressor changes for one of the alternatives and is unchanged for the other alternatives.

* The marginal effect when *Price* changes by \$100

In [14]:
effects(clogit.cl1, covariate = "price", data = z)*100

Unnamed: 0,beach,boat,charter,pier
beach,-0.26407348,0.1188335,0.10129703,0.04394299
boat,0.11883347,-0.4831872,0.25411697,0.1102368
charter,0.10129703,0.254117,-0.44938298,0.09396898
pier,0.04394299,0.1102368,0.09396898,-0.24814876


* The marginal effect when *Catch rate* changes by one unit

In [15]:
effects(clogit.cl1, covariate = "catch", data = z)

Unnamed: 0,beach,boat,charter,pier
beach,0.1229158,-0.05531229,-0.04714977,-0.02045373
boat,-0.05531217,0.22490411,-0.11828117,-0.05131077
charter,-0.04714969,-0.11828123,0.20916971,-0.04373878
pier,-0.02045373,-0.05131089,-0.04373886,0.11550348


The results in the above Tables are consistent with the view that the greatest substitution is between pier and beach fishing and between private boat and charter boat fishing. And similarly for charter versus private boat.

## Mixed Logit

In [16]:
clogit.mx1 <- mlogit(mode ~ price+catch |inc1, data = Fish,reflevel="beach")
summary(clogit.mx1)


Call:
mlogit(formula = mode ~ price + catch | inc1, data = Fish, reflevel = "beach", 
    method = "nr")

Frequencies of alternatives:choice
  beach    boat charter    pier 
0.11337 0.35364 0.38240 0.15059 

nr method
7 iterations, 0h:0m:0s 
g'(-H)^-1g = 1.37E-05 
successive function values within tolerance limits 

Coefficients :
                      Estimate Std. Error  z-value  Pr(>|z|)    
(Intercept):boat     0.5272788  0.2227927   2.3667 0.0179485 *  
(Intercept):charter  1.6943657  0.2240506   7.5624 3.952e-14 ***
(Intercept):pier     0.7779594  0.2204939   3.5283 0.0004183 ***
price               -0.0251166  0.0017317 -14.5042 < 2.2e-16 ***
catch                0.3577820  0.1097733   3.2593 0.0011170 ** 
inc1:boat            0.0894398  0.0500671   1.7864 0.0740345 .  
inc1:charter        -0.0332917  0.0503409  -0.6613 0.5084031    
inc1:pier           -0.1275772  0.0506395  -2.5193 0.0117582 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihoo

* Mixed is richer than CL, but with CL one can predict the probability of selection of any additional alternative added to the choice set. Because the parameters of CL do not vary across alternatives.
* Mixed: Compared to CL and MNL, the coefficients do not change much, except for the *catch rate* coefficient due to inclusion of the alternative-specific dummies, rather than inclusion of *income*. 
* The mixed model is strongly preferred to the other models on the basis of much higher log-likelihood value or formal statistical tests.