Appendix, Michał Szumielewicz

In [40]:
import pandas as pd
df = pd.read_csv('churn.csv')

In [44]:
X = df.drop(['TARGET', df.columns[0]], axis=1)
y = df['TARGET']

In [69]:
X

Unnamed: 0,total_day_minutes,total_day_charge,total_eve_minutes,total_eve_charge,total_night_minutes,total_night_charge,total_intl_minutes,total_intl_charge
0,265.1,45.07,197.4,16.78,244.7,11.01,10.0,2.70
1,161.6,27.47,195.5,16.62,254.4,11.45,13.7,3.70
2,243.4,41.38,121.2,10.30,162.6,7.32,12.2,3.29
3,299.4,50.90,61.9,5.26,196.9,8.86,6.6,1.78
4,166.7,28.34,148.3,12.61,186.9,8.41,10.1,2.73
...,...,...,...,...,...,...,...,...
4995,235.7,40.07,223.0,18.96,297.5,13.39,9.9,2.67
4996,184.2,31.31,256.8,21.83,213.6,9.61,14.7,3.97
4997,140.6,23.90,172.8,14.69,212.4,9.56,13.6,3.67
4998,188.8,32.10,171.7,14.59,224.4,10.10,8.5,2.30


The "churn" dataset has 5000 entries with 8 features and a binary target class.

In [70]:
y.value_counts()

0    4293
1     707
Name: TARGET, dtype: int64

There is a very strong imbalance with the target classes, the imbalance ratio is about 6.

In [47]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

I use sklearn to get most of the models and split the dataset.

In [48]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

In [63]:
!pip install tabpfn
from tabpfn import TabPFNClassifier

Collecting tabpfn
  Downloading tabpfn-0.1.9-py3-none-any.whl (156 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/156.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m153.6/156.6 kB[0m [31m4.9 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m156.6/156.6 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tabpfn
Successfully installed tabpfn-0.1.9


In [67]:
from sklearn.metrics import accuracy_score, classification_report

models = [LogisticRegression(), RandomForestClassifier(), GradientBoostingClassifier()]

In [68]:
for model in models:
  model.fit(X_train, y_train)

  prediction = model.predict(X_test)

  print(type(model).__name__)

  print("acc: ", accuracy_score(y_test, prediction))

  print(classification_report(y_test, prediction))

model = TabPFNClassifier()

model.fit(X_train, y_train, overwrite_warning=True) # warning appears otherwise

prediction = model.predict(X_test)

print(type(model).__name__)

print("acc: ", accuracy_score(y_test, prediction))

print(classification_report(y_test, prediction))

LogisticRegression
acc:  0.865
              precision    recall  f1-score   support

           0       0.86      1.00      0.93       861
           1       1.00      0.03      0.06       139

    accuracy                           0.86      1000
   macro avg       0.93      0.51      0.49      1000
weighted avg       0.88      0.86      0.81      1000

RandomForestClassifier
acc:  0.89
              precision    recall  f1-score   support

           0       0.90      0.98      0.94       861
           1       0.71      0.35      0.47       139

    accuracy                           0.89      1000
   macro avg       0.81      0.66      0.70      1000
weighted avg       0.88      0.89      0.87      1000

GradientBoostingClassifier
acc:  0.891
              precision    recall  f1-score   support

           0       0.90      0.99      0.94       861
           1       0.80      0.29      0.42       139

    accuracy                           0.89      1000
   macro avg       0.85 

The models have seemingly similar accuracies although this is mostly because the target classes are very imbalanced.

The logistic regressor labels almost everything as the more common class with full recall on that class and near zero on the other one.

It is also the worst performing model checked when it comes to accuracy.

The 3 other models have a similar performance.

They all place stronger emphasis on getting the second class right although the recall is still about 30%.

The TabFPN slightly outperforms the other models but it is much slower.

The gradient boosting model and tabfpn perform slightly better on the rare class.