## CANDISC - heart dataset

In [1]:
#disable warnings
from warnings import simplefilter, filterwarnings
simplefilter(action='ignore', category=FutureWarning)
filterwarnings("ignore")

### heart dataset

In [2]:
#vins dataset
from discrimintools.datasets import load_heart
D = load_heart("train")
print(D.info())

<class 'pandas.core.frame.DataFrame'>
Index: 150 entries, 0 to 149
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   disease         150 non-null    object 
 1   age             150 non-null    int64  
 2   sex             150 non-null    object 
 3   chestpain       150 non-null    object 
 4   restbpress      150 non-null    int64  
 5   cholesteral     150 non-null    int64  
 6   sugar           150 non-null    object 
 7   electro         150 non-null    object 
 8   maxHeartRate    150 non-null    int64  
 9   ExerciseAngina  150 non-null    object 
 10  oldpeak         150 non-null    float64
 11  slope           150 non-null    object 
 12  vesselsColored  150 non-null    int64  
 13  thal            150 non-null    object 
dtypes: float64(1), int64(5), object(8)
memory usage: 17.6+ KB
None


In [3]:
#split into X and y
y, X = D["disease"], D.drop(columns=["disease"])

### instanciation & training

In [4]:
from discrimintools import CANDISC
clf = CANDISC(n_components=2)
clf.fit(X,y)


Categorical features have been encoded into binary variables.



0,1,2
,n_components,2
,classes,
,warn_message,True


#### Evaluatin of prediction on training data

In [5]:
#eval_predict function
eval_train = clf.eval_predict(X,y,verbose=True)

Observation Profile:
                        Read  Used
Number of Observations   150   150

Number of Observations Classified into disease:
prediction  absence  presence  Total
disease                             
absence          75         7     82
presence         12        56     68
Total            87        63    150

Percent Classified into disease:
prediction    absence   presence  Total
disease                                
absence     91.463415   8.536585  100.0
presence    17.647059  82.352941  100.0
Total       58.000000  42.000000  100.0
Priors       0.546667   0.453333    NaN

Error Count Estimates for disease:
         absence  presence     Total
Rate    0.085366  0.176471  0.126667
Priors  0.546667  0.453333       NaN

Classification Report for disease:
              precision    recall  f1-score     support
absence        0.862069  0.914634  0.887574   82.000000
presence       0.888889  0.823529  0.854962   68.000000
accuracy       0.873333  0.873333  0.873333    0.8

In [6]:
#score function
print("Accuracy : {}%".format(100*round(clf.score(X,y),2)))

Accuracy : 87.0%


In [7]:
#error rate
print("Error rate : {}%".format(100-100*round(clf.score(X,y),2)))

Error rate : 13.0%


### summary

In [8]:
from discrimintools import summaryCANDISC
summaryCANDISC(clf,detailed=True)

                     Canonical Discriminant Analysis - Results                     

Summary Information:
               infos  Value                  DF  DF value
0  Total Sample Size    150            DF Total       149
1          Variables     18   DF Within Classes       148
2            Classes      2  DF Between Classes         1

Class Level Information:
          Frequency  Proportion  Prior Probability
absence          82      0.5467             0.5467
presence         68      0.4533             0.4533

Total-Sample Class Means:
                            absence  presence
age                         53.0244   56.3824
sexmale                      0.5488    0.8529
chestpainatypicalAngina      0.2073    0.0441
chestpainnonAnginal          0.4390    0.1324
chestpaintypicalAngina       0.1098    0.0294
restbpress                 129.3902  135.6471
cholesteral                243.6951  249.1912
sugarlow                     0.8293    0.8529
electrosttAbnormality        0.0000    0.0

### Evaluation of prediction on testing dataset

#### Testing data

In [9]:
#testining data
DTest = load_heart("test")
print(DTest.info())

<class 'pandas.core.frame.DataFrame'>
Index: 120 entries, 150 to 269
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   disease         120 non-null    object 
 1   age             120 non-null    int64  
 2   sex             120 non-null    object 
 3   chestpain       120 non-null    object 
 4   restbpress      120 non-null    int64  
 5   cholesteral     120 non-null    int64  
 6   sugar           120 non-null    object 
 7   electro         120 non-null    object 
 8   maxHeartRate    120 non-null    int64  
 9   ExerciseAngina  120 non-null    object 
 10  oldpeak         120 non-null    float64
 11  slope           120 non-null    object 
 12  vesselsColored  120 non-null    int64  
 13  thal            120 non-null    object 
dtypes: float64(1), int64(5), object(8)
memory usage: 14.1+ KB
None


In [10]:
#split into X and y
yTest, XTest = DTest["disease"], DTest.drop(columns=["disease"])
eval_test = clf.eval_predict(XTest,yTest,verbose=True)

Observation Profile:
                        Read  Used
Number of Observations   120   120

Number of Observations Classified into disease:
prediction  absence  presence  Total
disease                             
absence          59         9     68
presence         11        41     52
Total            70        50    120

Percent Classified into disease:
prediction    absence   presence  Total
disease                                
absence     86.764706  13.235294  100.0
presence    21.153846  78.846154  100.0
Total       58.333333  41.666667  100.0
Priors       0.546667   0.453333    NaN

Error Count Estimates for disease:
         absence  presence    Total
Rate    0.132353  0.211538  0.16825
Priors  0.546667  0.453333      NaN

Classification Report for disease:
              precision    recall  f1-score     support
absence        0.842857  0.867647  0.855072   68.000000
presence       0.820000  0.788462  0.803922   52.000000
accuracy       0.833333  0.833333  0.833333    0.8333