# Support Vector Machines
Here, support vector machines are built to predict loan defaults.

The first SVM is untuned with a precision of 0.99 and a recall of 0.58. The scores on the training and testing sets are 0.945 and 0.935, respectively. 

The second SVM is tuned with a C value of 1000 and a gamma value of 0.0001. The optimized SVM has a precision of 0.99 and a recall of 0.81. The score on the training set is 0.977 and on the testing set it is 0.966. 

In comparison to the logistic regression and random forest models, the SVM performed just as good as the logistic regression model, but not as good as the random forest. The next step would be to build an ensemble model using these three optimized models. 

In [1]:
import pandas as pd 
import numpy as np

In [2]:
df = pd.read_csv('Loan_data_ML.csv', index_col='member_id')

In [3]:
df.head()

Unnamed: 0_level_0,loan_amnt,funded_amnt,funded_amnt_inv,int_rate,installment,annual_inc,dti,delinq_2yrs,open_acc,pub_rec,...,pub_rec_bankruptcies_10+ years,pub_rec_bankruptcies_2 years,pub_rec_bankruptcies_3 years,pub_rec_bankruptcies_4 years,pub_rec_bankruptcies_5 years,pub_rec_bankruptcies_6 years,pub_rec_bankruptcies_7 years,pub_rec_bankruptcies_8 years,pub_rec_bankruptcies_9 years,pub_rec_bankruptcies_< 1 year
member_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,5000,5000,4975.0,0.1065,162.87,24000.0,27.65,0.0,3.0,0.0,...,1,0,0,0,0,0,0,0,0,0
2,2500,2500,2500.0,0.1527,59.83,30000.0,1.0,0.0,3.0,0.0,...,0,0,0,0,0,0,0,0,0,1
3,2400,2400,2400.0,0.1596,84.33,12252.0,8.72,0.0,2.0,0.0,...,1,0,0,0,0,0,0,0,0,0
4,10000,10000,10000.0,0.1349,339.31,49200.0,20.0,0.0,10.0,0.0,...,1,0,0,0,0,0,0,0,0,0
5,3000,3000,3000.0,0.1269,67.79,80000.0,17.94,0.0,15.0,0.0,...,0,0,0,0,0,0,0,0,0,0


In [4]:
import warnings
warnings.filterwarnings("ignore")

In [5]:
X = df.drop('loan_status_Charged Off', axis=1).values
y = df['loan_status_Charged Off'].values
from sklearn.model_selection import train_test_split

In [6]:
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report
from sklearn.preprocessing import scale
X_scaled = scale(X)

In [7]:
from sklearn.svm import SVC
X_scaled_train, X_scaled_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=32)
svm = SVC()
svm.fit(X_scaled_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

In [8]:
cv_score_svm2 = cross_val_score(svm, X_scaled_train, y_train, cv=4)
print('Untuned scaled CV score with training set', np.mean(cv_score_svm2))
print('Untuned scaled score with training set', svm.score(X_scaled_train, y_train))
print('Untuned scaled score with testing set', svm.score(X_scaled_test, y_test))

Untuned scaled CV score with training set 0.9341371054439678
Untuned scaled score with training set 0.9452878350238463
Untuned scaled score with testing set 0.935428257973513


In [9]:
y_pred_smv2 = svm.predict(X_scaled_test)
print(classification_report(y_test, y_pred_smv2))

              precision    recall  f1-score   support

           0       0.93      1.00      0.96     10818
           1       0.99      0.58      0.73      1943

   micro avg       0.94      0.94      0.94     12761
   macro avg       0.96      0.79      0.85     12761
weighted avg       0.94      0.94      0.93     12761



In [15]:
param_dist_svc = {'C':[100, 1000, 10000], 'gamma':[0.0001, 0.001, 0.01, 0.1]}
svm_cv = RandomizedSearchCV(svm, param_distributions=param_dist_svc, cv=3)

In [16]:
svm_cv.fit(X_scaled_train, y_train)
best_svm = svm_cv.best_estimator_

In [17]:
print("best parameters:", svm_cv.best_params_)
print('score from best parameters:', svm_cv.best_score_)

best parameters: {'gamma': 0.001, 'C': 1000}
score from best parameters: 0.9646671592664741


In [18]:
cv_score_svm = cross_val_score(best_svm, X_scaled_train, y_train, cv=4)
print('Scaled CV score with training set', np.mean(cv_score_svm))
print('Scaled score with training set', best_svm.score(X_scaled_train, y_train))
print('Scaled score with testing set', best_svm.score(X_scaled_test, y_test))

Scaled CV score with training set 0.9656411384592037
Scaled score with training set 0.9769933499025996
Scaled score with testing set 0.9655983073426847


In [19]:
#Tuned support vector machine
y_pred_smv2 = best_svm.predict(X_scaled_test)
print(classification_report(y_test, y_pred_smv2))

              precision    recall  f1-score   support

           0       0.97      0.99      0.98     10818
           1       0.96      0.81      0.88      1943

   micro avg       0.97      0.97      0.97     12761
   macro avg       0.96      0.90      0.93     12761
weighted avg       0.97      0.97      0.96     12761

