# calfpy 1.20.1 tested on a random classification task

We test calfpy on a classification task and show that it incorrectly reports auc and is dominated in auc by Lasso.
Calfpy 1.20.1 is the current version as of today, August 19, 2023.  Calfpy is available at https://pypi.org/project/calfpy/ 

In [49]:
import pandas as pd
import calfpy
from calfpy.methods import calf
from collections import Counter
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from calf_helper import parse_selection
import numpy as np
from sklearn.linear_model import Lasso
from sklearn.metrics import roc_auc_score

## The calfpy version on August 19, 2023


We get the version using pip from the command line:

Name: calfpy
Version: 1.20.1
Summary: Contains greedy algorithms for coarse approximation linear functions.
Home-page: https://github.com/jorufo/CALF_Python
Author: John Ford, Clark Jeffries, Diana Perkins
Author-email: JoRuFo@gmail.com
License: 
Location: /home/rolf/anaconda3/envs/classifier-testbed/lib/python3.8/site-packages
Requires: numpy, pandas, plotnine, scipy
Required-by:


In [50]:
!pip show calfpy

Name: calfpy
Version: 1.20.1
Summary: Contains greedy algorithms for coarse approximation linear functions.
Home-page: https://github.com/jorufo/CALF_Python
Author: John Ford, Clark Jeffries, Diana Perkins
Author-email: JoRuFo@gmail.com
License: 
Location: /home/rolf/anaconda3/envs/classifier-testbed/lib/python3.8/site-packages
Requires: numpy, pandas, plotnine, scipy
Required-by: 


In [51]:
try:
    calfpy.__version__
except AttributeError as err:
    print('Calfpy does not report version information.  This should be fixed. ', err)

Calfpy does not report version information.  This should be fixed.  module 'calfpy' has no attribute '__version__'


In [52]:
# define the dataset
X, y = make_classification(
    n_samples=100,
    n_features=10, 
    n_informative=5, 
    n_redundant=5, 
    n_classes=2, 
    random_state=1
)

# summarize the dataset
print('Shapes of X, y ', X.shape, y.shape)
print('Class count ', Counter(y))

# standardize the data and make the input df
std_data = StandardScaler().fit_transform(X)
df_x = pd.DataFrame(std_data)
df_y = pd.DataFrame(y)

df = pd.concat([df_y, df_x], axis=1).reindex()
print('Shapes of input to calf: ', df.shape, df_y.shape, df_x.shape)

calf_out = calf(data=df,
                nMarkers=10,
                targetVector="binary",
                optimize="auc")


Shapes of X, y  (100, 10) (100,)
Class count  Counter({1: 50, 0: 50})
Shapes of input to calf:  (100, 11) (100, 1) (100, 10)
100
Marker  Weight
     1       1
     5       1
     3       1
     4       1
     6      -1

AUC: 0.9396



Here calf_out is parsed to reveal the calculated auc and weights.  These weights can be verified by inspection from calf_out.

In [53]:
auc, weights = parse_selection(X, y, calf_out, df)

Calf auc of 0.9396 is incorrect

In [54]:
auc

0.5712

In [55]:
weights

[1, 0, 1, 1, 1, -1, 0, 0, 0, 0]

Calculate y predicted from the weights and X.

In [56]:
y_pred = np.sum(X * weights, axis=1)
y_pred

array([-2.1003013 , 10.2263653 ,  2.40228585,  3.05113609,  0.64699566,
       -0.46806178,  1.72937414,  0.09525418, -0.8059723 ,  2.73203892,
        1.33223815, -3.00454239,  0.16462481, -4.19943127,  0.51170016,
        1.66676116, -2.45568517,  1.23559065, -1.63576697, -0.09907211,
        0.82007044,  3.30223472,  2.3448476 , -2.25207145,  2.65941298,
       -1.12400344,  9.08687879, 11.57506275, -1.31537707, -7.12395054,
        3.99209961,  2.49593342,  3.51367496, -1.1750421 ,  2.71291091,
        4.41131096,  1.5643634 ,  1.40149964,  3.97006589, -1.30160381,
        0.29817159,  1.48416683,  4.28157019, -0.5652639 , -2.93649406,
       -0.33209195, -7.23998338, -3.14170949,  2.8958504 , -3.59991171,
       -0.65531416,  1.06505912,  4.01537006, -1.37704991, -0.80883357,
        2.05094614, -1.14392216,  1.39757331,  0.25028415, -0.22746905,
        1.26767691, -6.52512207, -2.05394113,  3.58809506,  0.9979999 ,
       -1.02950627,  0.72908673,  5.20687715,  1.56170849,  1.40

Calculate the auc from the predicted y_pred and ground truth y.  The calculated auc of 0.5712 is lower than the calfpy reported auc of 0.9396.  This is a problem.

In [57]:
auc = np.round(roc_auc_score(y_true=y, y_score=y_pred), 4)
auc

0.5712

It is not shown whether the markers begin at 0 or 1.  Suppose that we rearrange the weights so that marker 1 is the second weight, reflecting a count starting at 0. 

In [58]:
weights = [0, 1, 0, 1, 1, 1, -1, 0, 0, 0]

In [59]:
y_pred = np.sum(X * weights, axis=1)
y_pred

array([ 1.11104809e+00, -6.41734851e+00,  3.01993588e+00, -7.52458385e+00,
       -4.12160518e+00, -1.20702934e-01, -3.30664198e+00,  2.76435618e-01,
       -4.91018559e+00, -2.19545345e+00,  2.45847011e+00, -2.26087127e+00,
        3.08246611e+00, -7.97746414e-01, -1.92081712e+00, -4.29819322e+00,
       -3.26858113e+00, -3.84508676e-01, -2.22373673e+00,  1.32543177e-02,
       -6.07385261e-01,  1.23356823e+00,  8.87602719e-01, -2.82480553e-01,
       -7.30990882e+00,  6.31204161e-03, -3.28689483e+00, -2.51198262e+00,
       -1.20268793e+00,  4.58545205e-01, -3.25639604e+00, -6.96435856e+00,
       -4.64343511e+00, -4.28676605e+00, -4.70234134e+00, -7.82503039e+00,
       -3.16407677e-01,  1.90353400e+00,  2.50480187e-01, -7.23421195e-01,
        2.12712315e+00, -2.89710228e+00,  3.20380016e+00,  6.09756307e-01,
       -1.18372918e+00, -2.15675949e+00, -4.53848637e+00, -5.24727150e-01,
        6.97925956e-01, -1.21822249e+00, -5.67646379e-01, -1.24679658e-01,
       -8.44402790e+00,  

The AUC is worse if we count markers starting at 0.

In [60]:
auc = np.round(roc_auc_score(y_true=y, y_score=y_pred), 4)
auc

0.0784

Predict the classes using Lasso.  

In [61]:
y_pred_lasso = Lasso(alpha=0.01).fit(X, y).predict(X)

Here are the class predictions.

In [62]:
y_pred_lasso

array([ 0.27124999,  1.29858749, -0.08171615,  1.06831788,  0.40920958,
        0.32625843,  0.72176931,  0.16471321,  0.8768395 ,  0.64841287,
        0.05948886,  0.53747802, -0.23265128,  0.37276811,  0.74214534,
        0.68234933,  0.84898891,  0.66206097,  0.87012742,  0.13842079,
        0.51992359,  0.02813151,  0.2234962 ,  0.3060526 ,  1.24296352,
        0.2942058 ,  0.95314334,  0.85451766,  0.58062984,  0.18398665,
        1.02087065,  1.18513732,  0.9130917 ,  0.76376303,  0.69105084,
        1.30040754,  0.34151644,  0.08089061,  0.43189994,  0.3336025 ,
       -0.16869643,  0.82760838, -0.20938387,  0.18585467,  0.38510442,
        0.76144261,  0.96984714,  0.25677694,  0.08002089,  0.43818408,
        0.64328104,  0.30170098,  1.01558698,  0.17025033,  0.47067672,
        0.0030102 ,  0.01488631,  0.7140528 , -0.0626555 ,  0.43354645,
        0.09977865,  0.53053089,  0.7064755 ,  1.07673102,  0.06945639,
        0.41036791,  0.26849237,  0.85890185,  0.26071359,  0.64

Lasso achieves an auc of 0.9508 dominating the incorrectly reported calf auc of 0.9396 as well as the actual calf auc of 0.5712.

In [63]:
np.round(roc_auc_score(y_true=y, y_score=y_pred_lasso), 4)

0.9508

The data X, y

In [64]:
print(y)

[1 1 0 1 1 0 1 0 1 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 0
 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 1 0 1 1 1 0 1 0 1 0 1
 0 0 0 0 1 1 1 0 1 1 0 1 1 1 0 0 0 1 0 1 0 1 0 0 1 0]


In [65]:
print(X)

[[-8.74350744e-01  1.81604669e+00 -7.62103682e-01  1.22679502e+00
  -2.43879339e+00 -7.48151503e-01 -1.25515129e+00  8.06747465e-01
  -9.35662041e-01  6.24055148e-01]
 [ 3.09671698e+00 -1.80062337e+00  1.02303220e+00 -7.56601960e-01
   2.80558895e+00 -4.05762913e+00  2.60808300e+00 -1.48910941e+00
   1.36293257e+00  1.56063525e+00]
 [ 5.87576256e-01  2.07297434e+00 -1.69892932e+00  2.11492601e+00
   4.20509655e-01 -9.78203247e-01  6.10270887e-01  3.08959501e-01
   5.09206436e-01 -5.21427534e-01]
 [-1.05658617e+00 -3.65510254e-01  1.32900456e+00 -2.64370668e+00
   3.86341951e+00 -1.55900486e+00  6.81978156e+00 -4.16439796e-01
   4.77774991e+00 -2.01200117e+00]
 [-2.66824289e+00  2.95843422e+00  7.26812297e-01 -1.55808123e+00
   2.74465539e+00 -1.40185210e+00  6.86476146e+00  2.34627921e+00
   5.70584115e+00 -1.19880873e+00]
 [-7.40259346e-01  1.55784211e+00 -1.20345458e+00  1.08525356e+00
  -6.08676014e-01 -9.99074595e-01  1.15604800e+00 -2.82631924e-01
   4.61898819e-01 -1.30270746e+00