## Support Vector Machines (SVMs)

Support Vector Machines (SVMs) are a type of machine learning algorithm used for classification and regression analysis. SVMs work by finding the optimal hyperplane that separates the data into different classes. The hyperplane is chosen to maximize the margin between the closest data points from each class, also known as the support vectors. This margin is the distance between the hyperplane and the closest data points from each class. 

In [2]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense
from keras.utils.vis_utils import plot_model



# Load the data from CSV file
df = pd.read_csv("bankruptcy.csv")
# df.head()

# select columns that are not numerical
non_numeric_cols = df.select_dtypes(exclude=['int64', 'float64', 'complex128'])

# print the non-numerical columns
if not non_numeric_cols.empty:
    print(f"The non-numerical columns are: {', '.join(non_numeric_cols.columns)}")
else:
    print("All columns are numerical.")

All columns are numerical.


In [5]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report


# Preprocess the data
X = df.drop("Bankrupt?", axis=1) # input features
y = df["Bankrupt?"] # target variable
X = pd.get_dummies(X, drop_first=True) # encode categorical variables
X = StandardScaler().fit_transform(X) # scale numerical variables

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Train an SVM model with RBF kernel
model = SVC(kernel='rbf')
model.fit(X_train, y_train)

# Evaluate the model on the testing set
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.97      1.00      0.98      1318
           1       1.00      0.02      0.04        46

    accuracy                           0.97      1364
   macro avg       0.98      0.51      0.51      1364
weighted avg       0.97      0.97      0.95      1364



Precision: The precision is the ratio of true positive predictions to the total number of positive predictions. It measures the accuracy of positive predictions. A high precision indicates that the model has a low false positive rate.

Recall: The recall is the ratio of true positive predictions to the total number of actual positive instances. It measures the completeness of positive predictions. A high recall indicates that the model has a low false negative rate.

F1-score: The F1-score is the harmonic mean of precision and recall. It combines both metrics into a single score that balances precision and recall.

Support: The support is the number of actual occurrences of each class in the test set.

Accuracy: The accuracy is the ratio of correct predictions to the total number of predictions.

Macro/micro averages: The macro average calculates the metric for each class and then takes the average. The micro average calculates the metric globally by counting the total true positives, false negatives, and false positives.