## DISCRIM (QDA) - alcools dataset

In [1]:
#disable warnings
from warnings import simplefilter, filterwarnings
simplefilter(action='ignore', category=FutureWarning)
filterwarnings("ignore")

### alcools dataset

In [2]:
#vins dataset
from discrimintools.datasets import load_alcools
D = load_alcools("train")
print(D.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52 entries, 0 to 51
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   TYPE    52 non-null     object 
 1   MEOH    52 non-null     float64
 2   ACET    52 non-null     float64
 3   BU1     52 non-null     float64
 4   BU2     52 non-null     float64
 5   ISOP    52 non-null     int64  
 6   MEPR    52 non-null     float64
 7   PRO1    52 non-null     float64
 8   ACAL    52 non-null     float64
dtypes: float64(7), int64(1), object(1)
memory usage: 3.8+ KB
None


In [3]:
#split into X and y
y, X = D["TYPE"], D.drop(columns=["TYPE"])

### instanciation and training

In [4]:
from discrimintools import DISCRIM
clf = DISCRIM(method="quad") #warning can be disable using warn_message
clf.fit(X,y)


Since the Chi-Square value is significant at the 0.1 level, the within covariance matrices will be used in the discriminant function.
Reference: Morrison, D.F. (1976) Multivariate Statistical Methods p252.


0,1,2
,method,'quad'
,priors,'prop'
,classes,
,var_select,False
,level,
,tol,
,warn_message,True


#### Evaluation on training data

In [5]:
#eval_predict function
eval_train = clf.eval_predict(X,y,verbose=True)

Observation Profile:
                        Read  Used
Number of Observations    52    52

Number of Observations Classified into TYPE:
prediction  KIRSCH  MIRAB  POIRE  Total
TYPE                                   
KIRSCH          17      0      0     17
MIRAB            0     15      0     15
POIRE            0      0     20     20
Total           17     15     20     52

Percent Classified into TYPE:
prediction      KIRSCH       MIRAB       POIRE  Total
TYPE                                                 
KIRSCH      100.000000    0.000000    0.000000  100.0
MIRAB         0.000000  100.000000    0.000000  100.0
POIRE         0.000000    0.000000  100.000000  100.0
Total        32.692308   28.846154   38.461538  100.0
Priors        0.326923    0.288462    0.384615    NaN

Error Count Estimates for TYPE:
          KIRSCH     MIRAB     POIRE  Total
Rate    0.000000  0.000000  0.000000    0.0
Priors  0.326923  0.288462  0.384615    NaN

Classification Report for TYPE:
              pr

In [6]:
#score function
print("Accuracy : {}%".format(100*round(clf.score(X,y),2)))

Accuracy : 100.0%


In [7]:
#error rate
print("Error rate : {}%".format(100-100*round(clf.score(X,y),2)))

Error rate : 0.0%


### summary

In [8]:
from discrimintools import summaryDISCRIM
summaryDISCRIM(clf,detailed=True)

                     Discriminant Analysis - Results                     

Summary Information:
               Infos  Value                  DF  DF value
0  Total Sample Size     52            DF Total        51
1          Variables      8   DF Within Classes        49
2            Classes      3  DF Between Classes         2

Class Level Information:
        Frequency  Proportion  Prior Probability
KIRSCH         17      0.3269             0.3269
MIRAB          15      0.2885             0.2885
POIRE          20      0.3846             0.3846

Within Covariance Matrix Information:
        Rank  Natural Log of the Determinant
Pooled     8                         58.3267
KIRSCH     8                         49.0021
MIRAB      8                         48.9038
POIRE      8                         54.6744

Test of Homogeneity of Within Covariance Matrices:
         Bartlett Value  Num DF  Den DF  F value  Pr>F  Chi Sq. Value  Pr>Chi2
Box's M        350.5115      72    6010    3.679   0.0 

### Evaluation of prediction on testing dataset

#### Testing data

In [9]:
#testining data
DTest = load_alcools("test")
DTest.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   TYPE    50 non-null     object 
 1   MEOH    50 non-null     int64  
 2   ACET    50 non-null     int64  
 3   BU1     50 non-null     float64
 4   BU2     50 non-null     float64
 5   ISOP    50 non-null     int64  
 6   MEPR    50 non-null     int64  
 7   PRO1    50 non-null     int64  
 8   ACAL    50 non-null     float64
dtypes: float64(3), int64(5), object(1)
memory usage: 3.6+ KB


In [10]:
#split into X and y
yTest, XTest = DTest["TYPE"], DTest.drop(columns=["TYPE"])
eval_test = clf.eval_predict(XTest,yTest,verbose=True)

Observation Profile:
                        Read  Used
Number of Observations    50    50

Number of Observations Classified into TYPE:
prediction  KIRSCH  MIRAB  POIRE  Total
TYPE                                   
KIRSCH          14      0      0     14
MIRAB            0     12      5     17
POIRE            0      2     17     19
Total           14     14     22     50

Percent Classified into TYPE:
prediction      KIRSCH      MIRAB      POIRE  Total
TYPE                                               
KIRSCH      100.000000   0.000000   0.000000  100.0
MIRAB         0.000000  70.588235  29.411765  100.0
POIRE         0.000000  10.526316  89.473684  100.0
Total        28.000000  28.000000  44.000000  100.0
Priors        0.326923   0.288462   0.384615    NaN

Error Count Estimates for TYPE:
          KIRSCH     MIRAB     POIRE     Total
Rate    0.000000  0.294118  0.105263  0.125327
Priors  0.326923  0.288462  0.384615       NaN

Classification Report for TYPE:
              precisi