## DISCRIM (QDA) - alcools dataset

In [1]:
#disable warnings
from warnings import simplefilter, filterwarnings
simplefilter(action='ignore', category=FutureWarning)
filterwarnings("ignore")
from great_tables import GT, html
def print_dt(data,title=None,subtitle=None,rowname=None,digits=8):
    dt = (GT(data=data.round(digits).rename_axis(rowname).reset_index())
          .tab_header(title=title, subtitle=subtitle))
    return dt

### alcools dataset

In [2]:
#vins dataset
from discrimintools.datasets import load_alcools
D = load_alcools("train")
(
    GT(D.head().rename_axis("").reset_index())
    .tab_header(title=html("<b>Alcools dataset - training data</b>"))
)

Alcools dataset - training data,Alcools dataset - training data,Alcools dataset - training data,Alcools dataset - training data,Alcools dataset - training data,Alcools dataset - training data,Alcools dataset - training data,Alcools dataset - training data,Alcools dataset - training data,Alcools dataset - training data
Unnamed: 0_level_1,TYPE,MEOH,ACET,BU1,BU2,ISOP,MEPR,PRO1,ACAL
0,KIRSCH,336.0,225.0,1.0,1.0,92,37.0,177.0,0.0
1,KIRSCH,442.0,338.0,1.9,10.0,91,30.0,552.0,31.0
2,KIRSCH,373.0,356.0,0.0,29.0,83,27.0,814.0,11.0
3,KIRSCH,418.0,62.0,0.8,0.0,89,24.0,342.0,7.0
4,KIRSCH,84.0,65.0,2.0,2.0,2,0.0,288.0,6.0


In [3]:
#split into X and y
y, X = D["TYPE"], D.drop(columns=["TYPE"])

### `instanciation & training`

In [4]:
from discrimintools import DISCRIM
clf = DISCRIM(method="quad")

#### `fit` function

In [5]:
#fit function
clf.fit(X,y)


Since the Chi-Square value is significant at the 0.1 level, the within covariance matrices will be used in the discriminant function.
Reference: Morrison, D.F. (1976) Multivariate Statistical Methods p252.


0,1,2
,method,'quad'
,priors,'prop'
,classes,
,var_select,False
,level,
,tol,
,warn_message,True


#### `decision_function` function

In [6]:
#decision_function function
print_dt(clf.decision_function(X).head(),rowname="Annee",title=html("<b>Fonction de déçision</b>"))

Fonction de déçision,Fonction de déçision,Fonction de déçision,Fonction de déçision
Annee,KIRSCH,MIRAB,POIRE
0,-28.27006853,-53.34622518,-71.25568463
1,-29.70584998,-84.81645249,-131.43444884
2,-29.14099401,-165.34776238,-130.3679083
3,-28.94024612,-88.44257998,-64.20114679
4,-31.00049686,-114.67914831,-74.15977165


#### `eval_predict` function

In [7]:
#eval_predict function
eval_train = clf.eval_predict(X,y,verbose=True)

Observation Profile:
                        Read  Used
Number of Observations    52    52

Number of Observations Classified into TYPE:
prediction  KIRSCH  MIRAB  POIRE  Total
TYPE                                   
KIRSCH          17      0      0     17
MIRAB            0     15      0     15
POIRE            0      0     20     20
Total           17     15     20     52

Percent Classified into TYPE:
prediction      KIRSCH       MIRAB       POIRE  Total
TYPE                                                 
KIRSCH      100.000000    0.000000    0.000000  100.0
MIRAB         0.000000  100.000000    0.000000  100.0
POIRE         0.000000    0.000000  100.000000  100.0
Total        32.692308   28.846154   38.461538  100.0
Priors        0.326923    0.288462    0.384615    NaN

Error Count Estimates for TYPE:
          KIRSCH     MIRAB     POIRE  Total
Rate    0.000000  0.000000  0.000000    0.0
Priors  0.326923  0.288462  0.384615    NaN

Classification Report for TYPE:
              pr

#### `pred_table` function

In [8]:
#pred_table function
print_dt(clf.pred_table(X,y),rowname="Reference",title=html("<b>Confusion matrix</b>"))

Confusion matrix,Confusion matrix,Confusion matrix,Confusion matrix
Reference,KIRSCH,MIRAB,POIRE
KIRSCH,17,0,0
MIRAB,0,15,0
POIRE,0,0,20


#### `predict` function

In [9]:
#predict function
print_dt(clf.predict(X).to_frame().head(),rowname="Annee",title=html("<b>Prediction</b>"))

Prediction,Prediction
Annee,prediction
0,KIRSCH
1,KIRSCH
2,KIRSCH
3,KIRSCH
4,KIRSCH


#### `predict_proba` fonction

In [10]:
#predict_proba function
print_dt(clf.predict_proba(X).head(),rowname="Annee",title=html("<b>Predicted probabilities</b>"))

Predicted probabilities,Predicted probabilities,Predicted probabilities,Predicted probabilities
Annee,KIRSCH,MIRAB,POIRE
0,1.0,0.0,0.0
1,1.0,0.0,0.0
2,1.0,0.0,0.0
3,1.0,0.0,0.0
4,1.0,0.0,0.0


#### `score` function

In [11]:
#score function
print("Accuracy : {}%".format(100*round(clf.score(X,y),2)))

Accuracy : 100.0%


In [12]:
#error rate
print("Error rate : {}%".format(100-100*round(clf.score(X,y),2)))

Error rate : 0.0%


### Discriminant informations

#### Summary Information

In [13]:
#summary information
print_dt(clf.summary_.infos,rowname="",title=html("<b>Summary Information</b>"))

Summary Information,Summary Information,Summary Information,Summary Information,Summary Information
Unnamed: 0_level_1,Infos,Value,DF,DF value
0,Total Sample Size,52,DF Total,51
1,Variables,8,DF Within Classes,49
2,Classes,3,DF Between Classes,2


#### Class Level Informations

In [14]:
#class level information
print_dt(clf.classes_.infos,rowname="",title=html("<b>Class Level Information</b>"))

Class Level Information,Class Level Information,Class Level Information,Class Level Information
Unnamed: 0_level_1,Frequency,Proportion,Prior Probability
KIRSCH,17,0.32692308,0.32692308
MIRAB,15,0.28846154,0.28846154
POIRE,20,0.38461538,0.38461538


### Sum of Squared Cross Product

#### Within-Class SSCP Matrices

In [15]:
#Within-Class SSCP Matrices
print("\nWithin-Class SSCP Matrices:")
for k in clf.sscp_.within.keys():
    print("\n{} = {}".format(clf.call_.target,k))
    print(clf.sscp_.within[k].round(3))


Within-Class SSCP Matrices:

TYPE = KIRSCH
             MEOH        ACET      BU1         BU2       ISOP       MEPR  \
MEOH   548154.471  180731.747  1016.90   32716.447  90831.235  28085.018   
ACET   180731.747  228263.485   283.60   12547.925  43663.424  13676.592   
BU1      1016.900     283.600     9.86      57.670     65.300     46.700   
BU2     32716.447   12547.925    57.67    7385.345   3062.024   1758.752   
ISOP    90831.235   43663.424    65.30    3062.024  24698.118   6784.459   
MEPR    28085.018   13676.592    46.70    1758.752   6784.459   2413.449   
PRO1  1077685.288  504124.699  2824.60  232310.779  85339.494  48316.307   
ACAL    19177.815    5561.486    49.98    1236.646   2278.182    642.581   

             PRO1       ACAL  
MEOH  1077685.288  19177.815  
ACET   504124.699   5561.486  
BU1      2824.600     49.980  
BU2    232310.779   1236.646  
ISOP    85339.494   2278.182  
MEPR    48316.307    642.581  
PRO1  9014430.915  42553.976  
ACAL    42553.976   150

#### Pooled Within-Class SSCP Matrix

In [16]:
#Pooled Within-Class SSCP Matrix
print_dt(clf.sscp_.pooled,rowname="",title=html("<b>Pooled Within-Class SSCP Matrix</b>"))

Pooled Within-Class SSCP Matrix,Pooled Within-Class SSCP Matrix,Pooled Within-Class SSCP Matrix,Pooled Within-Class SSCP Matrix,Pooled Within-Class SSCP Matrix,Pooled Within-Class SSCP Matrix,Pooled Within-Class SSCP Matrix,Pooled Within-Class SSCP Matrix,Pooled Within-Class SSCP Matrix
Unnamed: 0_level_1,MEOH,ACET,BU1,BU2,ISOP,MEPR,PRO1,ACAL
MEOH,1970961.42058824,375035.79705882,14676.79,-123611.11294118,171239.08529412,65681.81764706,388261.68823529,40850.38970588
ACET,375035.79705882,739580.16803922,-5393.25,7287.75803922,14377.24019608,10940.19176471,688127.43215686,22671.35813725
BU1,14676.79,-5393.25,1774.842,1099.462,3441.57,1075.1,13354.16,164.313
BU2,-123611.11294118,7287.75803922,1099.462,138470.43003922,-21121.28980392,-4480.04823529,1181710.07215686,-1124.91886275
ISOP,171239.08529412,14377.24019608,3441.57,-21121.28980392,105664.00098039,30177.85882353,-50666.63921569,738.24068627
MEPR,65681.81764706,10940.19176471,1075.1,-4480.04823529,30177.85882353,12039.04941176,14955.70705882,117.86117647
PRO1,388261.68823529,688127.43215686,13354.16,1181710.07215686,-50666.63921569,14955.70705882,16706890.64862745,39141.60254902
ACAL,40850.38970588,22671.35813725,164.313,-1124.91886275,738.24068627,117.86117647,39141.60254902,3241.14448039


#### Between-Class SSCP Matrix

In [17]:
#Between-Class SSCP Matrix
print_dt(clf.sscp_.between,rowname="",title=html("<b>Between SSCP Matrix</b>"))

Between SSCP Matrix,Between SSCP Matrix,Between SSCP Matrix,Between SSCP Matrix,Between SSCP Matrix,Between SSCP Matrix,Between SSCP Matrix,Between SSCP Matrix,Between SSCP Matrix
Unnamed: 0_level_1,MEOH,ACET,BU1,BU2,ISOP,MEPR,PRO1,ACAL
MEOH,5002712.70921946,-37539.01917421,147192.45230769,142632.12063348,219172.64547511,117589.18812217,-3742723.55073529,14368.79683258
ACET,-37539.01917421,21418.55869155,322.77538462,-14758.85342383,-10560.60173454,-8292.94830317,-73700.12965686,-850.32621418
BU1,147192.45230769,322.77538462,4427.15030769,3272.27569231,5846.56076923,2959.37076923,-116993.39,372.62853846
BU2,142632.12063348,-14758.85342383,3272.27569231,12931.52226848,12022.95903469,8151.79746606,-40791.33215686,890.52732428
ISOP,219172.64547511,-10560.60173454,5846.56076923,12022.95903469,13363.07594268,8277.61809955,-121036.66078431,942.71315988
MEPR,117589.18812217,-8292.94830317,2959.37076923,8151.79746606,8277.61809955,5362.09751131,-52287.37205882,598.06266968
PRO1,-3742723.55073529,-73700.12965686,-116993.39,-40791.33215686,-121036.66078431,-52287.37205882,3290219.66887255,-7174.31754902
ACAL,14368.79683258,-850.32621418,372.62853846,890.52732428,942.71315988,598.06266968,-7174.31754902,67.35321192


#### Total-Sample SSCP Matrix

In [18]:
#Total-Sample SSCP Matrix
print_dt(clf.sscp_.total,rowname="",title=html("<b>Total-Sample SSCP Matrix</b>"))

Total-Sample SSCP Matrix,Total-Sample SSCP Matrix,Total-Sample SSCP Matrix,Total-Sample SSCP Matrix,Total-Sample SSCP Matrix,Total-Sample SSCP Matrix,Total-Sample SSCP Matrix,Total-Sample SSCP Matrix,Total-Sample SSCP Matrix
Unnamed: 0_level_1,MEOH,ACET,BU1,BU2,ISOP,MEPR,PRO1,ACAL
MEOH,6973674.12980769,337496.77788462,161869.24230769,19021.00769231,390411.73076923,183271.00576923,-3354461.8625,55219.18653846
ACET,337496.77788462,760998.72673077,-5070.47461538,-7471.09538462,3816.63846154,2647.24346154,614427.3025,21821.03192308
BU1,161869.24230769,-5070.47461538,6201.99230769,4371.73769231,9288.13076923,4034.47076923,-103639.23,536.94153846
BU2,19021.00769231,-7471.09538462,4371.73769231,151401.95230769,-9098.33076923,3671.74923077,1140918.74,-234.39153846
ISOP,390411.73076923,3816.63846154,9288.13076923,-9098.33076923,119027.07692308,38455.47692308,-171703.3,1680.95384615
MEPR,183271.00576923,2647.24346154,4034.47076923,3671.74923077,38455.47692308,17401.14692308,-37331.665,715.92384615
PRO1,-3354461.8625,614427.3025,-103639.23,1140918.74,-171703.3,-37331.665,19997110.3175,31967.285
ACAL,55219.18653846,21821.03192308,536.94153846,-234.39153846,1680.95384615,715.92384615,31967.285,3308.49769231


### Covariance matrices

#### Within-Class Covariance Matrices

In [19]:
#Within-Class Covariance Matrices
print("\nWithin-Class Covariance Matrices:")
for k in clf.cov_.within.keys():
    print("\n{} = {}, DF = {}".format(clf.call_.target,k,clf.classes_.infos.loc[k,"Frequency"]-1))
    print(clf.cov_.within[k].round(3))


Within-Class Covariance Matrices:

TYPE = KIRSCH, DF = 16
           MEOH       ACET      BU1        BU2      ISOP      MEPR  \
MEOH  34259.654  11295.734   63.556   2044.778  5676.952  1755.314   
ACET  11295.734  14266.468   17.725    784.245  2728.964   854.787   
BU1      63.556     17.725    0.616      3.604     4.081     2.919   
BU2    2044.778    784.245    3.604    461.584   191.376   109.922   
ISOP   5676.952   2728.964    4.081    191.376  1543.632   424.029   
MEPR   1755.314    854.787    2.919    109.922   424.029   150.841   
PRO1  67355.331  31507.794  176.538  14519.424  5333.718  3019.769   
ACAL   1198.613    347.593    3.124     77.290   142.386    40.161   

            PRO1      ACAL  
MEOH   67355.331  1198.613  
ACET   31507.794   347.593  
BU1      176.538     3.124  
BU2    14519.424    77.290  
ISOP    5333.718   142.386  
MEPR    3019.769    40.161  
PRO1  563401.932  2659.623  
ACAL    2659.623    94.126  

TYPE = MIRAB, DF = 14
           MEOH       ACET

#### Pooled Within-Class Covariance Matrix

In [20]:
#Pooled Within-Class Covariance Matrix
print_dt(clf.cov_.pooled,rowname="Variables",title=html("<b>Pooled Within-Class Covariance Matrix, DF = {}</b>".format(clf.summary_.infos.iloc[1,3])))

"Pooled Within-Class Covariance Matrix, DF = 49","Pooled Within-Class Covariance Matrix, DF = 49","Pooled Within-Class Covariance Matrix, DF = 49","Pooled Within-Class Covariance Matrix, DF = 49","Pooled Within-Class Covariance Matrix, DF = 49","Pooled Within-Class Covariance Matrix, DF = 49","Pooled Within-Class Covariance Matrix, DF = 49","Pooled Within-Class Covariance Matrix, DF = 49","Pooled Within-Class Covariance Matrix, DF = 49"
Variables,MEOH,ACET,BU1,BU2,ISOP,MEPR,PRO1,ACAL
MEOH,40223.70246098,7653.79177671,299.52632653,-2522.67577431,3494.67521008,1340.4452581,7923.70792317,833.68142257
ACET,7653.79177671,15093.47281713,-110.06632653,148.7297559,293.41306523,223.26921969,14043.41698279,462.68077831
BU1,299.52632653,-110.06632653,36.22126531,22.438,70.23612245,21.94081633,272.53387755,3.35332653
BU2,-2522.67577431,148.7297559,22.438,2825.92714366,-431.04673069,-91.42955582,24116.53208483,-22.95752781
ISOP,3494.67521008,293.41306523,70.23612245,-431.04673069,2156.40818327,615.87466987,-1034.01304522,15.06613645
MEPR,1340.4452581,223.26921969,21.94081633,-91.42955582,615.87466987,245.69488595,305.2185114,2.40533013
PRO1,7923.70792317,14043.41698279,272.53387755,24116.53208483,-1034.01304522,305.2185114,340956.95201281,798.80821529
ACAL,833.68142257,462.68077831,3.35332653,-22.95752781,15.06613645,2.40533013,798.80821529,66.14580572


#### Between-Class Covariance Matrix

In [21]:
#Between-Class Covariance Matrix
print_dt(clf.cov_.between,rowname="Variables",title=html("<b>Between-Class Covariance Matrix, DF = {}</b>".format(clf.summary_.infos.iloc[2,3])))

"Between-Class Covariance Matrix, DF = 2","Between-Class Covariance Matrix, DF = 2","Between-Class Covariance Matrix, DF = 2","Between-Class Covariance Matrix, DF = 2","Between-Class Covariance Matrix, DF = 2","Between-Class Covariance Matrix, DF = 2","Between-Class Covariance Matrix, DF = 2","Between-Class Covariance Matrix, DF = 2","Between-Class Covariance Matrix, DF = 2"
Variables,MEOH,ACET,BU1,BU2,ISOP,MEPR,PRO1,ACAL
MEOH,144309.02045825,-1082.85632233,4245.93612426,4114.3880952,6322.28785024,3391.99581122,-107963.17934813,414.48452402
ACET,-1082.85632233,617.84303918,9.3108284,-425.73615646,-304.63274234,-239.21966259,-2125.96527856,-24.52864079
BU1,4245.93612426,9.3108284,127.70625888,94.39256805,168.65079142,85.3664645,-3374.80932692,10.74890015
BU2,4114.3880952,-425.73615646,94.39256805,373.02468082,346.816126,235.14800383,-1176.67304299,25.6882882
ISOP,6322.28785024,-304.63274234,168.65079142,346.816126,385.4733445,238.77744518,-3491.44213801,27.19364884
MEPR,3391.99581122,-239.21966259,85.3664645,235.14800383,238.77744518,154.67588975,-1508.28957862,17.25180778
PRO1,-107963.17934813,-2125.96527856,-3374.80932692,-1176.67304299,-3491.44213801,-1508.28957862,94910.18275594,-206.95146776
ACAL,414.48452402,-24.52864079,10.74890015,25.6882882,27.19364884,17.25180778,-206.95146776,1.94288111


#### Total-Sample Covariance Matrix

In [22]:
#Total-Sample Covariance Matrix
print_dt(clf.cov_.total,rowname="Variables",title=html("<b>Total-Sample Covariance Matrix, DF = {}</b>".format(clf.summary_.infos.iloc[0,3])))

"Total-Sample Covariance Matrix, DF = 51","Total-Sample Covariance Matrix, DF = 51","Total-Sample Covariance Matrix, DF = 51","Total-Sample Covariance Matrix, DF = 51","Total-Sample Covariance Matrix, DF = 51","Total-Sample Covariance Matrix, DF = 51","Total-Sample Covariance Matrix, DF = 51","Total-Sample Covariance Matrix, DF = 51","Total-Sample Covariance Matrix, DF = 51"
Variables,MEOH,ACET,BU1,BU2,ISOP,MEPR,PRO1,ACAL
MEOH,136738.7084276,6617.58388009,3173.90671192,372.96093514,7655.13197587,3593.54913273,-65773.7620098,1082.72914781
ACET,6617.58388009,14921.54366139,-99.42107089,-146.49206637,74.83604827,51.90673454,12047.59416667,427.86337104
BU1,3173.90671192,-99.42107089,121.60769231,85.72034691,182.12021116,79.10726998,-2032.14176471,10.52826546
BU2,372.96093514,-146.49206637,85.72034691,2968.66573152,-178.39864253,71.99508296,22370.95568627,-4.59591252
ISOP,7655.13197587,74.83604827,182.12021116,-178.39864253,2333.86425339,754.02895928,-3366.73137255,32.95987934
MEPR,3593.54913273,51.90673454,79.10726998,71.99508296,754.02895928,341.19895928,-731.99343137,14.03772247
PRO1,-65773.7620098,12047.59416667,-2032.14176471,22370.95568627,-3366.73137255,-731.99343137,392100.20230392,626.8095098
ACAL,1082.72914781,427.86337104,10.52826546,-4.59591252,32.95987934,14.03772247,626.8095098,64.87250377


### Correlation coefficients

#### Within-Class Correlation Coefficients

In [23]:
#Within-Class Correlation Coefficients
print("\nWithin-Class Correlation Coefficients/Pr>|r|")
for k in clf.corr_.within.keys():
    print("\n{} = {}".format(clf.call_.target,k))
    print(clf.corr_.within[k].round(3))


Within-Class Correlation Coefficients/Pr>|r|

TYPE = KIRSCH
   Variable1 Variable2      R  t value  DF  Pr>|t|       Conclusion
0       MEOH      ACET  0.511    2.302  15   0.036      Significant
1       MEOH       BU1  0.437    1.884  15   0.079  Non-significant
2       MEOH       BU2  0.514    2.322  15   0.035      Significant
3       MEOH      ISOP  0.781    4.838  15   0.000      Significant
4       MEOH      MEPR  0.772    4.706  15   0.000      Significant
5       MEOH      PRO1  0.485    2.147  15   0.049      Significant
6       MEOH      ACAL  0.667    3.472  15   0.003      Significant
7       ACET       BU1  0.189    0.746  15   0.467  Non-significant
8       ACET       BU2  0.306    1.243  15   0.233  Non-significant
9       ACET      ISOP  0.582    2.768  15   0.014      Significant
10      ACET      MEPR  0.583    2.777  15   0.014      Significant
11      ACET      PRO1  0.351    1.454  15   0.167  Non-significant
12      ACET      ACAL  0.300    1.218  15   0.242  Non

#### Pooled Within-Class Correlation Coefficients/Pr>|r|

In [24]:
#Pooled Within-Class Correlation Coefficients/Pr>|r|
print_dt(clf.corr_.pooled,rowname="",title=html("<b>Pooled Within-Class Correlation Coefficients/Pr>|r|</b>"))

Pooled Within-Class Correlation Coefficients/Pr>|r|,Pooled Within-Class Correlation Coefficients/Pr>|r|,Pooled Within-Class Correlation Coefficients/Pr>|r|,Pooled Within-Class Correlation Coefficients/Pr>|r|,Pooled Within-Class Correlation Coefficients/Pr>|r|,Pooled Within-Class Correlation Coefficients/Pr>|r|,Pooled Within-Class Correlation Coefficients/Pr>|r|,Pooled Within-Class Correlation Coefficients/Pr>|r|
Unnamed: 0_level_1,Variable1,Variable2,R,t value,DF,Pr>|t|,Conclusion
0,MEOH,ACET,0.31062831,2.26409766,48,0.02812436,Significant
1,MEOH,BU1,0.24814879,1.77473559,48,0.08228331,Non-significant
2,MEOH,BU2,-0.23661373,-1.68721858,48,0.09805046,Non-significant
3,MEOH,ISOP,0.37523238,2.80461744,48,0.0072514,Significant
4,MEOH,MEPR,0.42639294,3.26590679,48,0.00201739,Significant
5,MEOH,PRO1,0.06766088,0.46984506,48,0.64059335,Non-significant
6,MEOH,ACAL,0.51110243,4.11976719,48,0.00014892,Significant
7,ACET,BU1,-0.14886002,-1.04295277,48,0.30219467,Non-significant
8,ACET,BU2,0.02277313,0.15781783,48,0.8752624,Non-significant
9,ACET,ISOP,0.05143034,0.35679201,48,0.72281128,Non-significant


#### Between-Class Correlation Coefficients/Pr>|r|

In [25]:
#Between-Class Correlation Coefficients/Pr>|r|
print_dt(clf.corr_.between,rowname="",title=html("<b>Between-Class Correlation Coefficients/Pr>|r|</b>"))

Between-Class Correlation Coefficients/Pr>|r|,Between-Class Correlation Coefficients/Pr>|r|,Between-Class Correlation Coefficients/Pr>|r|,Between-Class Correlation Coefficients/Pr>|r|,Between-Class Correlation Coefficients/Pr>|r|,Between-Class Correlation Coefficients/Pr>|r|,Between-Class Correlation Coefficients/Pr>|r|,Between-Class Correlation Coefficients/Pr>|r|
Unnamed: 0_level_1,Variable1,Variable2,R,t value,DF,Pr>|t|,Conclusion
0,MEOH,ACET,-0.1146793,-0.11544091,1,0.92683191,Non-significant
1,MEOH,BU1,0.98905542,6.70343752,1,0.09427393,Non-significant
2,MEOH,BU2,0.5607764,0.67729295,1,0.62100546,Non-significant
3,MEOH,ISOP,0.8476769,1.59784429,1,0.35600124,Non-significant
4,MEOH,MEPR,0.71795505,1.03141204,1,0.49015663,Non-significant
5,MEOH,PRO1,-0.92251306,-2.39014146,1,0.25226352,Non-significant
6,MEOH,ACAL,0.78277774,1.25787562,1,0.42760432,Non-significant
7,ACET,BU1,0.03314692,0.03316514,1,0.97889415,Non-significant
8,ACET,BU2,-0.88681427,-1.91898799,1,0.30582645,Non-significant
9,ACET,ISOP,-0.6242238,-0.799011,1,0.57083067,Non-significant


#### Total-Sample Correlation Coefficients/Pr>|r|

In [26]:
#Total-Sample Correlation Coefficients/Pr>|r|
print_dt(clf.corr_.total,rowname="",title=html("<>Total-Sample Correlation Coefficients/Pr>|r|</b>"))

<>Total-Sample Correlation Coefficients/Pr>|r|,<>Total-Sample Correlation Coefficients/Pr>|r|,<>Total-Sample Correlation Coefficients/Pr>|r|,<>Total-Sample Correlation Coefficients/Pr>|r|,<>Total-Sample Correlation Coefficients/Pr>|r|,<>Total-Sample Correlation Coefficients/Pr>|r|,<>Total-Sample Correlation Coefficients/Pr>|r|,<>Total-Sample Correlation Coefficients/Pr>|r|
Unnamed: 0_level_1,Variable1,Variable2,R,t value,DF,Pr>|t|,Conclusion
0,MEOH,ACET,0.14650311,1.04723284,50,0.30002834,Non-significant
1,MEOH,BU1,0.7783374,8.76596241,50,0.0,Significant
2,MEOH,BU2,0.0185113,0.1309171,50,0.89636622,Non-significant
3,MEOH,ISOP,0.42851808,3.35359052,50,0.0015276,Significant
4,MEOH,MEPR,0.5261069,4.37447898,50,6.183e-05,Significant
5,MEOH,PRO1,-0.28405891,-2.09489517,50,0.04126725,Significant
6,MEOH,ACAL,0.36353293,2.75935756,50,0.00807169,Significant
7,ACET,BU1,-0.07380589,-0.52331374,50,0.60306763,Non-significant
8,ACET,BU2,-0.02201033,-0.15567426,50,0.87691659,Non-significant
9,ACET,ISOP,0.01268137,0.08967804,50,0.92890134,Non-significant


### Simple Statistics

#### Total statistics

In [27]:
#total statistics
print_dt(clf.summary_.total,rowname="Variable",title=html("<b>Total statistic</b>"))

Total statistic,Total statistic,Total statistic,Total statistic,Total statistic,Total statistic,Total statistic,Total statistic,Total statistic
Variable,count,mean,std,min,25%,50%,75%,max
MEOH,52,808.04807692,369.78197418,23.0,468.0,896.5,1054.75,1548.0
ACET,52,205.42884615,122.15377056,13.0,127.75,176.0,271.0,755.0
BU1,52,14.42307692,11.02758778,0.0,1.9,16.0,21.0,44.0
BU2,52,29.77692308,54.48546349,0.0,2.575,10.5,28.25,330.0
ISOP,52,98.30769231,48.31008439,2.0,72.75,90.0,119.5,247.0
MEPR,52,37.15769231,18.47157165,0.0,26.75,34.5,46.5,113.0
PRO1,52,436.925,626.17904972,50.0,93.75,197.0,539.25,3020.0
ACAL,52,13.06538462,8.05434689,0.0,8.375,11.2,15.25,36.0


#### Within-class statistics

In [28]:
#whitin-class statistics
print("\nwhitin-class statistics")
for k in clf.summary_.within.keys():
    print("\n{}".format(k))
    print(clf.summary_.within[k].round(3))


whitin-class statistics

KIRSCH
      count     mean      std    min    25%    50%    75%     max
MEOH     17  371.676  185.094   23.0  334.0  393.0  465.0   726.0
ACET     17  203.018  119.442   13.0   96.0  225.0  325.0   356.0
BU1      17    1.200    0.785    0.0    0.8    1.3    1.9     2.3
BU2      17   21.018   21.485    0.0    3.0   13.0   29.0    77.0
ISOP     17   81.588   39.289    2.0   83.0   89.0   98.0   125.0
MEPR     17   28.894   12.282    0.0   25.0   33.0   37.0    42.0
PRO1     17  790.771  750.601  177.0  317.0  552.0  814.0  3020.0
ACAL     17   12.012    9.702    0.0    6.0    8.6   17.0    35.0

MIRAB
      count     mean      std    min    25%    50%     75%     max
MEOH     15  934.200  151.503  595.0  865.0  961.0  1024.0  1197.0
ACET     15  235.067  178.580   58.0  122.0  176.0   297.5   755.0
BU1      15   20.200    7.133   15.0   16.5   18.0    20.5    44.0
BU2      15   13.567   34.017    0.0    0.0    2.2    10.0   134.0
ISOP     15   90.933   53.726  

### Class means

#### Total-sample class means

In [29]:
#total-sample class means
print_dt(clf.classes_.center.T,rowname="{}".format(clf.call_.target),title=html("<b>Total-sample class means</b>"))

Total-sample class means,Total-sample class means,Total-sample class means,Total-sample class means
TYPE,KIRSCH,MIRAB,POIRE
MEOH,371.67647059,934.2,1084.35
ACET,203.01764706,235.06666667,185.25
BU1,1.2,20.2,21.33
BU2,21.01764706,13.56666667,49.38
ISOP,81.58823529,90.93333333,118.05
MEPR,28.89411765,29.4,50.0
PRO1,790.77058824,195.26666667,317.4
ACAL,12.01176471,12.35333333,14.495


#### Total-Sample Standardized Class Means

In [30]:
#Total-Sample Standardized Class Means
print_dt(clf.classes_.total,rowname="{}".format(clf.call_.target),title=html("<b>Total-Sample Standardized Class Means</b>"))

Total-Sample Standardized Class Means,Total-Sample Standardized Class Means,Total-Sample Standardized Class Means,Total-Sample Standardized Class Means
TYPE,KIRSCH,MIRAB,POIRE
MEOH,-1.18007809,0.34115217,0.74720225
ACET,-0.01973905,0.24262714,-0.16519217
BU1,-1.19909061,0.52386099,0.62633127
BU2,-0.16076354,-0.29751525,0.35978545
ISOP,-0.34608627,-0.15264637,0.40865811
MEPR,-0.44736717,-0.41998009,0.69524716
PRO1,0.56508692,-0.38592529,-0.19087991
ACAL,-0.13081382,-0.08840584,0.17749613


#### Pooled-Within Class Standardized Class Means

In [31]:

print_dt(clf.classes_.pooled,rowname="{}".format(clf.call_.target),title=html("<b>Pooled-Within Class Standardized Class Means</b>"))

Pooled-Within Class Standardized Class Means,Pooled-Within Class Standardized Class Means,Pooled-Within Class Standardized Class Means,Pooled-Within Class Standardized Class Means
TYPE,KIRSCH,MIRAB,POIRE
MEOH,-2.17578242,0.6290032,1.37766265
ACET,-0.0196263,0.24124131,-0.16424862
BU1,-2.19710451,0.95987521,1.14763243
BU2,-0.16477364,-0.30493649,0.36875996
ISOP,-0.36004492,-0.15880303,0.42514045
MEPR,-0.52719332,-0.49491942,0.81930388
PRO1,0.60598798,-0.41385862,-0.20469582
ACAL,-0.12954863,-0.0875508,0.17577943


#### Squared Distances to Qualite

In [32]:
#squared distances to Qualite
print_dt(clf.classes_.mahal.round(3),rowname="{}".format(clf.call_.target),title=html("<b>Squared Distance to {}</b>".format(clf.call_.target)))

Squared Distance to TYPE,Squared Distance to TYPE,Squared Distance to TYPE,Squared Distance to TYPE
TYPE,KIRSCH,MIRAB,POIRE
KIRSCH,0.0,365.221,143.643
MIRAB,1050.796,0.0,16.263
POIRE,1192.291,29.827,0.0


#### Generalized Squared distance to Qualite

In [33]:
#generalized squared distance to Qualite
print_dt(clf.classes_.gen.round(3),rowname="{}".format(clf.call_.target),title=html("<b>Generalized Distance to {}</b>".format(clf.call_.target)))

Generalized Distance to TYPE,Generalized Distance to TYPE,Generalized Distance to TYPE,Generalized Distance to TYPE
TYPE,KIRSCH,MIRAB,POIRE
KIRSCH,51.238,416.611,200.228
MIRAB,1102.034,51.39,72.849
POIRE,1243.529,81.217,56.585


### Statistics

#### Univariate Test Statistics

In [34]:
#Univariate Test Statistics
print_dt(clf.statistics_.anova,rowname="Variables",title=html("<b>Univariate Test Statistics</b>"))

Univariate Test Statistics,Univariate Test Statistics,Univariate Test Statistics,Univariate Test Statistics,Univariate Test Statistics,Univariate Test Statistics,Univariate Test Statistics,Univariate Test Statistics,Univariate Test Statistics,Univariate Test Statistics,Univariate Test Statistics,Univariate Test Statistics
Variables,Total Std. Dev.,Pooled Std. Dev.,Between Std. Dev.,Within SS,Between SS,R-Square,R-Square/(1-RSq),F Value,Num DF,Den DF,Pr>F
MEOH,369.78197418,200.55847641,379.88027121,1970961.42058824,5002712.70921946,0.71737116,2.53820935,62.18612911,2,49,0.0
ACET,122.15377056,122.85549567,24.85644864,739580.16803922,21418.55869155,0.02814533,0.02896043,0.7095305,2,49,0.49685825
BU1,11.02758778,6.01841053,11.3007194,1774.842,4427.15030769,0.71382712,2.49439122,61.11258497,2,49,0.0
BU2,54.48546349,53.15945018,19.31384687,138470.43003922,12931.52226848,0.08541186,0.09338833,2.28801409,2,49,0.11220866
ISOP,48.31008439,46.43714228,19.6334751,105664.00098039,13363.07594268,0.11226921,0.12646763,3.09845697,2,49,0.05406192
MEPR,18.47157165,15.67465744,12.43687621,12039.04941176,5362.09751131,0.30814621,0.4453921,10.91210647,2,49,0.00012032
PRO1,626.17904972,583.91519248,308.07496288,16706890.64862745,3290219.66887255,0.16453476,0.19693788,4.82497812,2,49,0.01222491
ACAL,8.05434689,8.13300718,1.3938727,3241.14448039,67.35321192,0.02035764,0.02078069,0.50912685,2,49,0.60416435


#### Average R-square

In [35]:
#Average R-square
print_dt(clf.statistics_.average_rsq,rowname="",title=html("<b>Average R-square</b>"))

Average R-square,Average R-square,Average R-square
Unnamed: 0_level_1,Unweighted,Weighted by Variance
Average R-Square,0.26875791,0.29792234


#### Multivariate Statistics and F Approximations

In [36]:
#Multivariate Statistics and F Approximations
print_dt(clf.statistics_.manova,rowname="Statistics",title=html("<b>Multivariate Statistics and F Approximations</b>"))

Multivariate Statistics and F Approximations,Multivariate Statistics and F Approximations,Multivariate Statistics and F Approximations,Multivariate Statistics and F Approximations,Multivariate Statistics and F Approximations,Multivariate Statistics and F Approximations
Statistics,Value,Num DF,Den DF,F Value,Pr > F
Wilks' lambda,0.0667132385974023,16.0,84.0,15.076064126698094,2.2865869244309182e-18
Pillai's trace,1.321252786839968,16.0,86.0,10.46300241322746,3.9340132390891333e-14
Hotelling-Lawley trace,8.174100784644882,16.0,65.23144104803492,21.081615594099663,8.73249250163194e-20
Roy's greatest root,7.38683115221621,8.0,43.0,39.70421744316213,2.086169313281085e-17


### `summary`

In [37]:
from discrimintools import summaryDISCRIM

#### Simple summary

In [38]:
#simple summary
summaryDISCRIM(clf)

                     Discriminant Analysis - Results                     

Summary Information:
               Infos  Value                  DF  DF value
0  Total Sample Size     52            DF Total        51
1          Variables      8   DF Within Classes        49
2            Classes      3  DF Between Classes         2

Class Level Information:
        Frequency  Proportion  Prior Probability
KIRSCH         17      0.3269             0.3269
MIRAB          15      0.2885             0.2885
POIRE          20      0.3846             0.3846


#### Detailed summary

In [39]:
#detailed summary
summaryDISCRIM(clf,detailed=True)

                     Discriminant Analysis - Results                     

Summary Information:
               Infos  Value                  DF  DF value
0  Total Sample Size     52            DF Total        51
1          Variables      8   DF Within Classes        49
2            Classes      3  DF Between Classes         2

Class Level Information:
        Frequency  Proportion  Prior Probability
KIRSCH         17      0.3269             0.3269
MIRAB          15      0.2885             0.2885
POIRE          20      0.3846             0.3846

Within Covariance Matrix Information:
        Rank  Natural Log of the Determinant
Pooled     8                         58.3267
KIRSCH     8                         49.0021
MIRAB      8                         48.9038
POIRE      8                         54.6744

Test of Homogeneity of Within Covariance Matrices:
         Bartlett Value  Num DF  Den DF  F value  Pr>F  Chi Sq. Value  Pr>Chi2
Box's M        350.5115      72    6010    3.679   0.0 

#### Markdown format

In [40]:
#markdown format
summaryDISCRIM(clf,detailed=True,to_markdown=True)

                     Discriminant Analysis - Results                     

Summary Information:
|    | Infos             |   Value | DF                 |   DF value |
|----|-------------------|---------|--------------------|------------|
|  0 | Total Sample Size |      52 | DF Total           |         51 |
|  1 | Variables         |       8 | DF Within Classes  |         49 |
|  2 | Classes           |       3 | DF Between Classes |          2 |

Class Level Information:
|        |   Frequency |   Proportion |   Prior Probability |
|--------|-------------|--------------|---------------------|
| KIRSCH |          17 |       0.3269 |              0.3269 |
| MIRAB  |          15 |       0.2885 |              0.2885 |
| POIRE  |          20 |       0.3846 |              0.3846 |

Within Covariance Matrix Information:
|        |   Rank |   Natural Log of the Determinant |
|--------|--------|----------------------------------|
| Pooled |      8 |                          58.3267 |
| KIRSCH 

### Evaluation of prediction on testing dataset

#### Testing data

In [41]:
#testining data
DTest = load_alcools("test")
DTest.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   TYPE    50 non-null     object 
 1   MEOH    50 non-null     int64  
 2   ACET    50 non-null     int64  
 3   BU1     50 non-null     float64
 4   BU2     50 non-null     float64
 5   ISOP    50 non-null     int64  
 6   MEPR    50 non-null     int64  
 7   PRO1    50 non-null     int64  
 8   ACAL    50 non-null     float64
dtypes: float64(3), int64(5), object(1)
memory usage: 3.6+ KB


In [42]:
#display testing data
print_dt(DTest,rowname="",title=html("<b>Alcools dataset - testing data</b>"))

Alcools dataset - testing data,Alcools dataset - testing data,Alcools dataset - testing data,Alcools dataset - testing data,Alcools dataset - testing data,Alcools dataset - testing data,Alcools dataset - testing data,Alcools dataset - testing data,Alcools dataset - testing data,Alcools dataset - testing data
Unnamed: 0_level_1,TYPE,MEOH,ACET,BU1,BU2,ISOP,MEPR,PRO1,ACAL
0,KIRSCH,3,15,0.2,30.0,9,9,350,9.0
1,KIRSCH,475,172,1.9,7.0,113,33,546,14.0
2,KIRSCH,186,101,0.0,1.6,36,11,128,8.0
3,KIRSCH,371,414,1.2,0.0,97,39,502,9.0
4,KIRSCH,583,226,2.3,19.0,120,46,656,11.0
5,KIRSCH,0,25,0.1,8.0,0,6,253,7.0
6,KIRSCH,421,142,1.6,8.0,75,24,128,31.0
7,KIRSCH,557,447,0.0,34.0,107,39,162,94.0
8,KIRSCH,167,86,0.0,0.0,32,10,114,8.0
9,KIRSCH,523,367,2.6,30.0,116,45,787,25.0


In [43]:
#split into X and y
yTest, XTest = DTest["TYPE"], DTest.drop(columns=["TYPE"])

#### `decision_function` function

In [44]:
#decision_function function
print_dt(clf.decision_function(XTest).head(),rowname="",title=html("<b>Decision function on testing data</b>"))

Decision function on testing data,Decision function on testing data,Decision function on testing data,Decision function on testing data
Unnamed: 0_level_1,KIRSCH,MIRAB,POIRE
0,-32.3186927,-153.553632,-69.46790223
1,-27.58776782,-114.02611502,-84.161579
2,-27.97315945,-113.88631846,-64.48781691
3,-30.9003882,-128.2515319,-143.22466043
4,-28.41059592,-154.74199418,-79.02384566


#### `eval_predict` function

In [45]:
#eval_predict function
eval_test = clf.eval_predict(XTest,yTest,verbose=True)

Observation Profile:
                        Read  Used
Number of Observations    50    50

Number of Observations Classified into TYPE:
prediction  KIRSCH  MIRAB  POIRE  Total
TYPE                                   
KIRSCH          14      0      0     14
MIRAB            0     12      5     17
POIRE            0      2     17     19
Total           14     14     22     50

Percent Classified into TYPE:
prediction      KIRSCH      MIRAB      POIRE  Total
TYPE                                               
KIRSCH      100.000000   0.000000   0.000000  100.0
MIRAB         0.000000  70.588235  29.411765  100.0
POIRE         0.000000  10.526316  89.473684  100.0
Total        28.000000  28.000000  44.000000  100.0
Priors        0.326923   0.288462   0.384615    NaN

Error Count Estimates for TYPE:
          KIRSCH     MIRAB     POIRE     Total
Rate    0.000000  0.294118  0.105263  0.125327
Priors  0.326923  0.288462  0.384615       NaN

Classification Report for TYPE:
              precisi

#### `predict` function

In [46]:
#predict on testing data
print_dt(clf.predict(XTest).to_frame().head(),rowname="",title=html("<b>Prediction on testing data</b>"))

Prediction on testing data,Prediction on testing data
Unnamed: 0_level_1,prediction
0,KIRSCH
1,KIRSCH
2,KIRSCH
3,KIRSCH
4,KIRSCH


#### `predict_log_proba` function

In [47]:
#predict_log_proba function
print_dt(clf.predict_log_proba(XTest).head(),rowname="",title=html("<b>Estimated log-probabilities (test data)</b>"))

Estimated log-probabilities (test data),Estimated log-probabilities (test data),Estimated log-probabilities (test data),Estimated log-probabilities (test data)
Unnamed: 0_level_1,KIRSCH,MIRAB,POIRE
0,0.0,-121.2349393,-37.14920953
1,0.0,-86.4383472,-56.57381118
2,-0.0,-85.91315901,-36.51465746
3,0.0,-97.35114369,-112.32427223
4,0.0,-126.33139826,-50.61324973


#### `predict_proba` function

In [48]:
#predict_proba function
print_dt(clf.predict_proba(XTest).head(),rowname="",title=html("<b>Estimated probabilities (test data)</b>"))

Estimated probabilities (test data),Estimated probabilities (test data),Estimated probabilities (test data),Estimated probabilities (test data)
Unnamed: 0_level_1,KIRSCH,MIRAB,POIRE
0,1.0,0.0,0.0
1,1.0,0.0,0.0
2,1.0,0.0,0.0
3,1.0,0.0,0.0
4,1.0,0.0,0.0
