# CAS DML Course Project 2: Classification


This project helps you get hands-on experience with the classification techniques your learned in the lecture, using two real-world datasets. The simpler dataset is on credit card fraud detection, and the more challenging dataset is on predicting bankruptcy. 

This notebook not only contains code but also some explanations that help you deepen your understanding. Most important are, however, the exercises. They are designed to help you apply what you have learned and to reflect on the results. All exercises are marked by **EXERCISE**. Some exercises will ask you reflect on results or experiments, while others will ask you to code something. Coding exercises usually come with a cell as the one below.


📑 **A note on python**
We are aware that you might not be very familiar with Python at this stage in your learning journey. You are not expected to understand all the details of the code. Rather, you should be able to understand what the code in a cell is achieving and how you can influence its behavior. We have designed all the exercises that involve programming, such that you should be able to solve them by just copying and slightly adapting code that you have seen before in one of the notebooks. 

📑 **A note on the datasets**

The credit card fraud dataset is called `card_data.csv` and the bankruptcy dataset is called `bank_data.csv`. Both datasets are stored in the CSV format.

The original datasets can be found here:
* https://www.kaggle.com/datasets/dhanushnarayananr/credit-card-fraud/data
* https://www.kaggle.com/datasets/fedesoriano/company-bankruptcy-prediction/data

The datasets have been cleaned for you and contain only numerical features, such that you can focus on the actual classification task. 


## Dataset exploration and visualization

In this section we will explore the dataset. We will compute some statistics but also visually inspect the data. 


**Dataset info**

We use the *Taiwanese Bankruptcy Prediction* dataset for Exploratory Data Analysis. The data was collected from the Taiwan Economic Journal between the years of 1999 to 2009. Company bankruptcy was defined based on the business regulations of the Taiwan Stock Exchange. The values are collected from the latest financial report from the companies. The dataset has been preprocessed, eliminating any missing values or outliers.

### Data statistics

Lets' first get to know the basics of the dataset. We start by importing necessary modules then load the dataset downloaded previously.

In [None]:
# Import modules
import pandas as pd  # For data manipulation
import seaborn as sns # For data visulization
import matplotlib.pyplot as plt  # For plotting
import numpy as np  # For numerical operations

bank_data = pd.read_csv("bank_data.csv") 

Usually we print a few samples and info of the dataset to get some intuition about what the dataset is about.

In [None]:
# Display the first 5 rows of the dataset
bank_data.head(n=5)

We can also print basic information about the dataset.

In [None]:
# Display basic information about the dataset. This will give the column names and the corresponding data type.
bank_data.info()

From the output of the previous two cells, we know that the dataset contains 96 columns and 6270 rows. The column "Bankrupt?" is what we want to predict. So the input feature would be a 95-dimensional vector.  

To know more about the dataset numerically, we need some numbers that quantitatively describe the dataset. These numbers are called statistics.

In [None]:
# Display summary statistics of the dataset
bank_data.describe()

We see that the above command gives us the *count*, the *mean*, the *standard deviation*, the *minimum*, the *25th percentile*, the *50th percentile*, the *75th percentile*, and the *maximum* of each column.

The count is the number of samples in the dataset. The mean is the average of all samples. The standard deviation is a measure of the amount of variation of a random variable expected about its mean. The minimum and maximum are the smallest and largest values in the dataset. The 25th, 50th, and 75th percentiles are the values below which a given percentage of observations in a group of observations fall.

We could, of course, easily compute these statistics for each column separately.:

In [None]:
# Calculate the mean and the standard deviation of "Tax rate (A)". Compare them with the results from the previous block.
mean = bank_data['Tax rate (A)'].mean()
std = bank_data['Tax rate (A)'].std()

# Print the mean and the standard deviation
print("Average tax rate:", mean)
print("Standard deviation of tax rate:", std)


**EXERCISE**

- Compute the mean and standard deviation of the column `Operating Profit Rate`.

In [None]:

mean = ...
std = ...

# Print the mean and the standard deviation
print("Average:", mean)
print("Standard Deviation:", std)

It might also be interesting to compute correlations between two columns. 

**EXERCISE**

- Calculate the correlation between "Operating Profit Rate" and "Operating profit per person" 
    - *Hint*: to calculate correlation between "A" and "B", use df["A"].corr(df["B"])
- How do you interpret this correlation value? Indicate whether the two features are positively, negatively or not correlated.

In [None]:

corr = ...

# Print the correlation
print("Correlation:", corr)

*Answer to the second question*

### Data visualization

We first visualize the target distribution, i.e., how many companies are bankrupt with respect to all the companies.

In [None]:
plt.pie(bank_data.value_counts("Bankrupt?"), labels = ["Not bankrupt", "Bankrupt"], autopct='%1.1f%%')
plt.show()

The pie plot shows that only a small fraction of companies are bankrupt. This is important information, as this means that the classifier has to learn to identify bankruptcy cases from a small number of samples. We will address this problem later.


Let's also visualize some features here, in order to get an impression of the distribution of values for each feature. We first plot a histogram for some features, then select the ones that seem interesting for more detailed plots.

In [None]:
# Plot histograms for the first 30 features

fig = plt.figure(figsize=(20, 20))
rows, cols = 10, 3
for idx in range(rows*cols):
    ax = fig.add_subplot(rows, cols, idx+1)
    ax.grid(alpha = 0.7, axis ="both")
    sns.histplot(x = bank_data.drop(columns=["Bankrupt?"]).columns[idx], fill = True, color ="#3386FF", data = bank_data, bins=30)
fig.tight_layout()
fig.show()

From the histograms we can see most features are highly concentrated (eg. Revenue Per Share), while some might have multiple peaks(Total Asset Growth Rate). 



**EXERCISE**

- Which features are more useful in a classifiaction task? The highly concentrated ones with small variance or the ones that spread out over a wide range of values or even have multiple peaks?

*Answer: Your answer*


We can also visualize the correlation between all pairs of features by plotting the correlation matrix. It can be a bit overwhelming, but don't worry. We just use it to get an overview of the data and see if there's obvious data anomaly.

In [None]:
f, ax = plt.subplots(figsize=(30, 25))
correlation = bank_data.corr()
mask = np.triu(np.ones_like(correlation, dtype=bool))
cmap = sns.diverging_palette(230, 20, as_cmap=True)
sns.heatmap(correlation, mask=mask, cmap=cmap, vmax=1, center=0,
      square=True, linewidths=.5, cbar_kws={"shrink": .5})
plt.show()

We see that "Net Income Flag" has invalid correlation values with every feature. It is probably because it has zero standard deviation, which means it is constant. Let's check:

**EXERCISE**

- Check if "Net Income Flag" has zero standard deviation.

In [None]:
std_net_income_flag = ... 

print(std_net_income_flag)

Let's remove this feature.

In [None]:
# Remove "Net Income Flag" feature
bank_data = bank_data.drop(columns="Net Income Flag")

**EXERCISE**

- Look at the heatmap again. What do the dark blue and dark red colors mean?
    - Look at some of the features that have dark blue and dark red colors. Can you understand why they have such colors?

*Answer: Your answer*

Finally we look at how the features are correlated with the target variable. We can do this by plotting the correlation of each feature with the target variable.

In [None]:

# Plot the correlation between features and target. We use absolute values here.
corr_to_bankrupt = bank_data.drop(columns="Bankrupt?").corrwith(bank_data["Bankrupt?"]).abs().sort_values()
plt.figure(figsize=(20,4))
sns.barplot(data=corr_to_bankrupt)
plt.xticks(rotation=90)
plt.show()

**EXERCISE**
- Are the results as you expected?

*Answer: Your answer*

## A second dataset

For illustration purposes, we will also use another, much simpler, dataset first. This is also your chance to explore the dataset by yourself. The dataset used here is [credit card fraud dataset](https://www.kaggle.com/datasets/dhanushnarayananr/credit-card-fraud/data). 


**EXERCISE**

1. Load the dataset. The dataset is named card_data.csv
2. Print some basic information about the dataset and statistics about the different features. 
3. Visualize the class distribution pie plot, the histogram for all features, and any other useful plots you want to visualize.
4. Comment on the label distribution and feature distribution. Is the dataset unbalanced? Do you see any anomalies in the features?

*Add new cells to do your experiments*

## Training a classifier

Next, we will use the dataset to train a classifier. You might remember from the lecture that we need to split the dataset into a training and a test set. We will use the training set to train the classifier and the test set to evaluate its performance. We do not create an explicit validation set here, but use cross-validation to tune the hyperparameters of the classifier.

Let's import the necessary functions. 

In [None]:
from sklearn.model_selection import train_test_split, StratifiedKFold, GridSearchCV

Before splitting the dataset, we remove the target variable from the dataset and store it in a separate variable.

In [None]:
# Define the prediction target
target_name = "Bankrupt?"
X = np.array(bank_data.drop(columns = target_name))
y = np.array(bank_data[target_name])

# Randomly shuffle the dataset, and use 20% of the data as test set
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size = 0.2, random_state = 2024, shuffle = True)

We also already set up the cross-validation strategy. We will use the variable `kf` later whenever we need to indicate how we want to split the data in the cross-validation strategy. 

In [None]:
# Use K-fold cross validation
kf = StratifiedKFold(n_splits=5)

"Stratified" means we divide the training data into K folds such that in each fold there are the same proportion of bankrupt cases as in the original dataset. Otherwise, there may be folds without positve (bankrupt) cases.

**Exercise**

- Do the same for the credit card fraud dataset

In [None]:

target_name = ...
X_card = ...
y_card = ...
X_train_card, X_test_card, y_train_card, y_test_card = ...
kf_card = ...


## Experiments with simple classifiers

In our first experiments, we will use the simpler credit card fraud dataset. We will later show how the different classifiers perform on the bankruptcy dataset.

In [None]:
# let's import the classifiers
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, balanced_accuracy_score

### KNN

We start with the simplest classifier, the k-nearest neighbors classifier. The k-nearest neighbors classifier assigns a sample to the majority class of its k nearest neighbors.

We have already done dataset splitting. The next steps are:
1. Train the classifier on the training set
2. Calculate the score on the validation set
3. Alternate the classifier hyperparameters
4. Repeat 1-3 until you find the best hyperparameters
5. Retrain the classifier using the best hyperparameters on the union of training and validation set.


*Note: The following code is slightly complicated and may exceed what you have already learned about Python. Don't worry, you don't need to understand every detail. The important thing is to understand the general idea of the code, which you should grasp from the comments and by identifying the parts you are already familiar with.*

In [None]:

# We use KNN as an example to showcase the general machine learning pipeline
# Step 1-3: Train -> validation -> change hyperparameters

results = [] # A list, which stores the results

for n in range(1, 5): # loop 5 times to try n_neighbors from n=1 to 5

  knn = KNeighborsClassifier(n_neighbors = n) # train classifier with n neighbors 
  
  scores = []  # A list that stores the results
  
  # loop over the 5 folds
  for train_index, test_index in kf_card.split(X_train_card, y_train_card):
    
    # Get K-1 folds for training, 1 fold for testing
    X_train_card_fold, X_test_card_fold = X_train_card[train_index], X_train_card[test_index]
    y_train_card_fold, y_test_card_fold = y_train_card[train_index], y_train_card[test_index]
    
    # Train the model
    knn.fit(X_train_card_fold, y_train_card_fold)
    
    # Validate
    y_pred_knn_fold = knn.predict(X_test_card_fold)
    scores.append(recall_score(y_test_card_fold, y_pred_knn_fold))
  
  # Get the average validation score
  results.append(np.mean(scores))

print("Best validation score(recall)", max(results))
print("Best n_neighbors", results.index(max(results)) + 1)

plt.figure()
plt.plot(range(1, 5), results)
plt.xlabel('n_neighbors')
plt.ylabel('Cross-validation score')
plt.title('Cross-validation score vs. n_neighbors')
plt.show()

**EXERCISE**

- Why did we choose the recall as a metric? What would be other possible choices, suitable for this application?
- Use the best parameter, and retrain a KNN classifier. 
- How is the performance of the KNN on the test set? Is it better than the performance on the validation set? Remember that we use recall as the metric.

In [None]:
knn = ...  # Fit a KNN classifier

y_pred_knn = ... # prediction on the test set

# Calculate accuracy, precision, recall, f1-score. Print them out.
print("Balanced accuracy", balanced_accuracy_score(y_test_card, y_pred_knn))
print("Accuracy", accuracy_score(y_test_card, y_pred_knn))
print("Precision", precision_score(y_test_card, y_pred_knn))
print("Recall", recall_score(y_test_card, y_pred_knn))
print("F1-score", f1_score(y_test_card, y_pred_knn))

*Answer: Space for your answer*

### Logistic Regression

Let's try another simple classifier, logistic regression. Logistic regression is a linear classifier that uses the logistic function to predict the probability of a sample belonging to a class.

The `GridSearchCV()` function abstracts the pipeline we have used in the KNN case, including looping through all hyperparameters, finding the best ones and retraining the model on the whole training set. In the following, we will use this function instead of writing the parameter searching process explicitly.

The parameter that we are optimizing is called `C` in scikit learn and is the inverse of the regularization parameter $\lambda$ that was introduced for regression. It just penalizes large coefficients in the model.

In [None]:
# Train logistic classifier
lr = LogisticRegression(max_iter=10000)
params = {
    "C": [0.01, 0.1, 1, 10],
}
grid_search_lr = GridSearchCV(lr, params, cv=kf_card, scoring="recall", refit=True)
grid_search_lr.fit(X_train_card, y_train_card)
print("Best parameters:", grid_search_lr.best_params_)

# Validation score
print("Validation score(recall):", grid_search_lr.best_score_)


We validate the model on the test set. 

In [None]:
# Predict with logistic classifier. Note that we can directly use the best model from grid search
y_pred_lr = grid_search_lr.predict(X_test_card)

# Calculate accuracy, precision, recall, f1-score
print("Balanced accuracy", balanced_accuracy_score(y_test_card, y_pred_lr))
print("Accuracy", accuracy_score(y_test_card, y_pred_lr))
print("Precision", precision_score(y_test_card, y_pred_lr))
print("Recall", recall_score(y_test_card, y_pred_lr))
print("F1-score", f1_score(y_test_card, y_pred_lr))

As logistic regression is a probabilistic classifier, we can also plot the ROC curve and the precision recall curve. 

In [None]:
from sklearn.metrics import PrecisionRecallDisplay, RocCurveDisplay

y_score_lr_card = grid_search_lr.predict_proba(X_test_card)[:, 1]

RocCurveDisplay.from_predictions(y_test_card, y_score_lr_card)


In [None]:
PrecisionRecallDisplay.from_predictions(y_test_card, y_score_lr_card)


**EXERCISE**

- How do you interpret the ROC curve and the precision recall curve? Are the results good?

*Answer: Space for you answer*

Let's investigate which parameters are the most important for the logistic regression model. We can do this by looking at the coefficients of the model. Note, however, that you need to scale the features before interpreting the coefficients.

**EXERCISE**

- Scale the features using the `StandardScaler` and retrain the logistic regression model.
  - *Hint: * Check out the lecture notebook `classification.ipynb` to see how the scaler is used.
- Print the coefficients of the model. Which features are the most important?

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = ...

X_train_scaled = ...
X_test_scaled = ...

*Answer: Space for your answer*

### Decision tree

Next we train a decision tree. The most important hyperparameter for a decision tree would be its max depth. Usually the deeper the tree is, the more powerful it is. But deep trees might encounter severe overfitting problems. We will see that later.

In [None]:
# [Read]
# Train desicion tree
decision_tree = DecisionTreeClassifier()
params = {
    "max_depth": [None, 1, 2, 3, 4, 5],
}

grid_search_dt = GridSearchCV(decision_tree, params, cv=kf_card, scoring="recall", refit=True)
grid_search_dt.fit(X_train_card, y_train_card)
print("Best parameters:", grid_search_dt.best_params_)

# Validation score
print("Validation score(recall):", grid_search_dt.best_score_)

In [None]:
# Predict with decision tree
y_pred_dt = grid_search_dt.predict(X_test_card)

# Calculate accuracy, precision, recall, f1-score
print("Balanced accuracy", balanced_accuracy_score(y_test_card, y_pred_dt))
print("Accuracy", accuracy_score(y_test_card, y_pred_dt))
print("Precision", precision_score(y_test_card, y_pred_dt))
print("Recall", recall_score(y_test_card, y_pred_dt))
print("F1-score", f1_score(y_test_card, y_pred_dt))

The decision tree almost perfectly classifies every samples. To get a more intuitive understanding of the decision tree, we can visualize a tree with small depth.

In [None]:
# [Read]
# Visualize a decision tree that is shallower
# Click on the image to zoom in, or you can download it to have a more clear view.
decision_tree = DecisionTreeClassifier(max_depth=2)
decision_tree.fit(X_train_card, y_train_card)
plt.figure(figsize=(50,50))
target_names = ["not fraud", "fraud"]
feature_names = [col for col in card_data.columns if col not in target_names]
plot_tree(decision_tree, class_names=target_names, feature_names=feature_names, filled=True, impurity=False)

**EXERCISE**

- Visualize a decision tree with depth 3. What can you learn from the tree?
- Can you compare it with the logistic regression model? Which one is more interpretable?

*Answer: Your answer*

The above examples show that logistic regression, KNN and decision trees perform already pretty good. This is because the dataset is fairly easy. Let's see how they perform on the more difficult dataset.


#### Classification on the bankruptcy dataset

Now that we have found a good model to classify credit card fraud, let's switch to the bankruptcy dataset, which is a bit more challenging.

**EXERCISE**

- Train a KNN, a logistic regression and a decision tree classifier on the bankruptcy dataset using the same pipeline as above. Report the accuracy, balanced accuracy, precision, recall and f1 score on the training and test set. 

In [None]:
# Train classifiers
lr_bankruptcy = ...
lr_params_bankruptcy = ...

grid_search_lr_bankruptcy = ...

knn_bankruptcy = ...
knn_params_bankruptcy = ...
grid_search_knn_bankruptcy = ...

decision_tree_bankruptcy = ...
dt_params_bankruptcy = ...

grid_search_dt_bankruptcy = ...


# Evaluate the classifiers

**EXERCISE**
- Which methods do overfit? Why do you think so?
- Why do you think accuracy is so high, even though recall and precision are low. 


*Answer: Space for your answers*

### Imbalanced dataset

Remember that the dataset is quite imbalanced, as bankruptcy cases are rare. To solve this problem, we can give different weights to the bankruptcy cases and non-bankruptcy cases so that the classifier can pay more attention to the rare cases. The following code shows how to do this in scikit-learn.

In [None]:
# [Read]
from sklearn.utils.class_weight import compute_sample_weight

# Compute the weights of the samples based on the balance in the dataset
weight = compute_sample_weight(class_weight="balanced", y=y_train)

# Train classifiers
dt = DecisionTreeClassifier()
dt_params = {
    "max_depth": [None, 1, 2, 3, 4, 5],
}
grid_search_dt = GridSearchCV(dt, dt_params, cv=kf, scoring="recall", refit=True)

# reweight during training
grid_search_dt.fit(X_train, y_train, sample_weight=weight) 

# predict on the test set and calculate the precision, recall, f1-score
y_pred_dt = grid_search_dt.predict(X_test)

print("Balanced accuracy", balanced_accuracy_score(y_test, y_pred_dt))
print("Accuracy", accuracy_score(y_test, y_pred_dt))
print("Precision", precision_score(y_test, y_pred_dt))
print("Recall", recall_score(y_test, y_pred_dt))


**EXERCISE**

- How do the metrics change? Can you explain this?
- Does the overfitting problem improve? 

*Answer: Space for your answers*

## Experiment with ensemble methods

In this section we will explore two ensemble methods: random forest and XGBoost. They have more capabilities than the three simple classifiers.

In [None]:
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier

### Random forest

Random Forest uses an ensemble technique known as bagging **(Bootstrap Aggregating)** which helps to improve stability and accuracy. To produce a prediction that is more reliable and accurate, it constructs several decision trees and combines them. Every tree in the ensemble is constructed using a bootstrap sample, which is a sample taken from the training set with replacement. Furthermore, Random Forest only takes into account a random subset of features for splitting at each node while constructing separate trees, increasing tree diversity and producing a more resilient model that is less prone to overfitting.

The following code illustrates how to train a random forest classifier. 
The parameter `n_estimators` is the number of trees in the forest. The parameter `max_depth` is the maximum depth of the tree.

In [None]:

# Train random forest
weight = compute_sample_weight(class_weight="balanced", y=y_train)
rf = RandomForestClassifier(bootstrap=True) # Bootstrapping specified here
rf_params = {
    "n_estimators": [10, 50, 100],
    "max_depth": [None, 1, 2, 3, 4, 5],
}
grid_search_rf = GridSearchCV(rf, rf_params, cv=kf, scoring="recall", refit=True)
grid_search_rf.fit(X_train, y_train, sample_weight=weight)
print("Best parameters:", grid_search_rf.best_params_)

# Validation score  
print("Validation score(recall):", grid_search_rf.best_score_)
y_pred_rf = grid_search_rf.predict(X_test)
y_pred_proba_rf = grid_search_rf.predict_proba(X_test)[:, 1]

# Calculate accuracy, precision, recall, f1-score
print("Balanced accuracy", balanced_accuracy_score(y_test, y_pred_rf))
print("Accuracy", accuracy_score(y_test, y_pred_rf))
print("Precision", precision_score(y_test, y_pred_rf))
print("Recall", recall_score(y_test, y_pred_rf))
print("F1-score", f1_score(y_test, y_pred_rf))

# add prediction to results dictionary
y_preds["random_forest"] = y_pred_rf

In [None]:
RocCurveDisplay.from_predictions(y_test, y_pred_proba_rf)

In [None]:
PrecisionRecallDisplay.from_predictions(y_test, y_pred_proba_rf)

### XGBoost

XGBoost develops one tree at a time, correcting faults caused by previously trained trees, in contrast to Random Forest, where each tree is generated independently and the results are aggregated at the end. Trees are planted until none remain. The model uses a gradient descent algorithm to minimize the loss when adding new models. This sequential addition of weak learners (trees) ensures that the shortcomings of previous trees are corrected. The additive model known as gradient boosting is implemented by XGBoost.

In [None]:
from xgboost import XGBClassifier

weight = compute_sample_weight(class_weight="balanced", y=y_train)
xgb = XGBClassifier()
xgb_params = {
    "n_estimators": [10, 50, 100],
    "max_depth": [None, 1, 2, 3, 4, 5],
}
grid_search_xgb = GridSearchCV(xgb, xgb_params, cv=kf, scoring="recall", refit=True)
grid_search_xgb.fit(X_train, y_train, sample_weight=weight)

print("Validation score(recall): ", grid_search_xgb.best_score_)

# 2. Predict with random forest
y_pred_xgb = grid_search_xgb.predict(X_test)
y_score_xgb = grid_search_xgb.predict_proba(X_test)[:, 1]

# 3. Calculate accuracy, precision, recall, f1-score
print("Balanced accuracy", balanced_accuracy_score(y_test, y_pred_xgb))
print("Accuracy", accuracy_score(y_test, y_pred_xgb))
print("Precision", precision_score(y_test, y_pred_xgb))
print("Recall", recall_score(y_test, y_pred_xgb))
print("F1-score", f1_score(y_test, y_pred_xgb))

# add prediction to results dictionary
y_preds["xgboost"] = y_pred_xgb

In [None]:
RocCurveDisplay.from_predictions(y_test, y_score_xgb)

In [None]:
PrecisionRecallDisplay.from_predictions(y_test, y_score_xgb)

The following code gives an overview of all the results of the different classifiers. 

In [None]:
results = {
    "logistic_regression": [],
    "knn": [],
    "decision_tree": [],
    "random_forest": [],
    "xgboost": []
}

for classifier in results.keys():
  results[classifier].append(accuracy_score(y_test, y_preds[classifier]))
  results[classifier].append(balanced_accuracy_score(y_test, y_preds[classifier]))
  results[classifier].append(precision_score(y_test, y_preds[classifier]))
  results[classifier].append(recall_score(y_test, y_preds[classifier]))
  results[classifier].append(f1_score(y_test, y_preds[classifier]))


results = pd.DataFrame.from_dict(results, orient='index', columns=["accuracy", "balanced_accuracy", "precision", "recall", "f1-score", "roc-auc", "pr-auc"])
print("---------------Test results for all methods---------------")
results


**EXERCISE**

- Look at the final results. Which one is the best? Why do you think so?
- If the goal is to identify as many bankruptcy cases as possible, which classifier should you use?

*Answer: Space for your answers*
