# Multinomial Logit with different R packages and iris data.

There are different `R` packages that runs pure multinomial logit regression. Below, we are going to run multinomial regression using `mlogit` and `nnet` packages on the `iris` data set.

First, we load the iris data set and check the input and response variables:

In [6]:
data(iris)
head(iris)

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5.0,3.6,1.4,0.2,setosa
5.4,3.9,1.7,0.4,setosa


`Species` is the response variable. It has three different states:

In [7]:
levels(iris$Species)

## 1. `mlogit` package:

In [8]:
library('mlogit')
miris<-mlogit.data(iris,shape="wide",choice="Species")
mfit<-mlogit(Species ~ 1 | Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
             data=miris,probit=F)
summary(mfit)



Call:
mlogit(formula = Species ~ 1 | Sepal.Length + Sepal.Width + Petal.Length + 
    Petal.Width, data = miris, probit = F, method = "nr", print.level = 0)

Frequencies of alternatives:
    setosa versicolor  virginica 
   0.33333    0.33333    0.33333 

nr method
20 iterations, 0h:0m:0s 
g'(-H)^-1g = 7.97E-07 
gradient close to zero 

Coefficients :
                          Estimate Std. Error t-value Pr(>|t|)
versicolor:(intercept)      7.7935 42638.2447  0.0002   0.9999
virginica:(intercept)     -34.8443 42638.2524 -0.0008   0.9993
versicolor:Sepal.Length    -7.7171 12654.9755 -0.0006   0.9995
virginica:Sepal.Length    -10.1823 12654.9757 -0.0008   0.9994
versicolor:Sepal.Width     -5.9009  5851.2567 -0.0010   0.9992
virginica:Sepal.Width     -12.5818  5851.2584 -0.0022   0.9983
versicolor:Petal.Length    14.6912  8962.8318  0.0016   0.9987
virginica:Petal.Length     24.1206  8962.8331  0.0027   0.9979
versicolor:Petal.Width     16.7926 14074.0631  0.0012   0.9990
virginica:Petal

The `predict` method is valid for `mlogit`object and we do a in-sample prediction to get the fitted values:

In [9]:
pMatrix<-predict(mfit,miris)
head(pMatrix)

setosa,versicolor,virginica
1,5.171276e-13,7.986497e-40
1,4.626541e-11,3.302764e-36
1,1.531004e-11,1.831962e-37
1,1.128381e-09,2.221427e-34
1,6.201154e-13,6.282711e-40
1,1.136945e-11,3.798153e-37


And, the confusion matrix is obtained by the function `confusionMatrix` from the `caret` package:

In [10]:
library('caret')
fittedInSample<-as.factor(apply(pMatrix, 1,  which.max))
levels(fittedInSample)<-levels(iris$Species)
realVals<-iris$Species
confusionMatrix(table(fittedInSample,realVals))

Loading required package: lattice
Loading required package: ggplot2


Confusion Matrix and Statistics

              realVals
fittedInSample setosa versicolor virginica
    setosa         50          0         0
    versicolor      0         49         1
    virginica       0          1        49

Overall Statistics
                                          
               Accuracy : 0.9867          
                 95% CI : (0.9527, 0.9984)
    No Information Rate : 0.3333          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.98            
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: setosa Class: versicolor Class: virginica
Sensitivity                 1.0000            0.9800           0.9800
Specificity                 1.0000            0.9900           0.9900
Pos Pred Value              1.0000            0.9800           0.9800
Neg Pred Value              1.0000            0.9900           0.9900
Prevalence                  0.3333 

## Train and test subsets for prediction

In [123]:
k<- floor(0.75 * nrow(iris))
#set.seed(100)
index_train<-sample(1:nrow(iris),k,replace=F)
miris_train<-mlogit.data(iris[index_train,],shape="wide",choice="Species")
miris_test<-mlogit.data(iris[-index_train,],shape="wide",choice="Species")

mfit<-mlogit(Species ~ 1 | Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
             data=miris_train)
pMatrix_test<-predict(mfit,miris_test)
predictTest<-as.factor(apply(pMatrix_test, 1,  which.max))
levels(predictTest)<-levels(iris$Species)
realVals<-iris$Species[-index_train]
confusionMatrix(table(predictTest,realVals))

Confusion Matrix and Statistics

            realVals
predictTest  setosa versicolor virginica
  setosa         14          0         0
  versicolor      0          9         0
  virginica       0          2        13

Overall Statistics
                                          
               Accuracy : 0.9474          
                 95% CI : (0.8225, 0.9936)
    No Information Rate : 0.3684          
    P-Value [Acc > NIR] : 7.078e-14       
                                          
                  Kappa : 0.9203          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: setosa Class: versicolor Class: virginica
Sensitivity                 1.0000            0.8182           1.0000
Specificity                 1.0000            1.0000           0.9200
Pos Pred Value              1.0000            1.0000           0.8667
Neg Pred Value              1.0000            0.9310           1.0000
Prevalence                  0.3684           