Identificación de créditos riesgosos usando SVM
===

Las entidades financieras desean mejorar sus procedimientos de aprobación de créditos con el fin de disminuir los riesgos de no pago de la deuda, lo que acarrea pérdidas a la entidad. El problema real consiste en poder decidir si se aprueba o no un crédito particular con base en información que puede ser fácilmente recolectada por teléfono o en la web. Se tiene una muestra de 1000 observaciones. Cada registro contiene 20 atributos que recopilan información tanto sobre el crédito como sobre la salud financiera del solicitante. Construya un sistema de recomendación que use máquinas de vectores de soporte.

El archivo de datos se encuentra disponible en el siguiente link:

https://raw.githubusercontent.com/jdvelasq/datalabs/master/datasets/credit.csv



Los atributos y sus valores son los siguientes:

     Attribute 1:  (qualitative)
     	      Status of existing checking account
     	      A11 :      ... <    0 DM
     	      A12 : 0 <= ... <  200 DM
     	      A13 :      ... >= 200 DM /
     	            salary assignments for at least 1 year
     	      A14 : no checking account

     Attribute 2:  (numerical)
     	      Duration in month

     Attribute 3:  (qualitative)
     	      Credit history
     	      A30 : no credits taken/
     	            all credits paid back duly
     	      A31 : all credits at this bank paid back duly
     	      A32 : existing credits paid back duly till now
     	      A33 : delay in paying off in the past
     	      A34 : critical account/
     	            other credits existing (not at this bank)

     Attribute 4:  (qualitative)
     	      Purpose
     	      A40 : car (new)
     	      A41 : car (used)
     	      A42 : furniture/equipment
     	      A43 : radio/television
     	      A44 : domestic appliances
     	      A45 : repairs
     	      A46 : education
     	      A47 : (vacation - does not exist?)
     	      A48 : retraining
     	      A49 : business
     	      A410 : others

     Attribute 5:  (numerical)
     	      Credit amount

     Attribute 6:  (qualitative)
     	      Savings account/bonds
     	      A61 :          ... <  100 DM
     	      A62 :   100 <= ... <  500 DM
     	      A63 :   500 <= ... < 1000 DM
     	      A64 :          .. >= 1000 DM
     	      A65 :   unknown/ no savings account

     Attribute 7:  (qualitative)
     	      Present employment since
     	      A71 : unemployed
     	      A72 :       ... < 1 year
     	      A73 : 1  <= ... < 4 years  
     	      A74 : 4  <= ... < 7 years
     	      A75 :       .. >= 7 years

     Attribute 8:  (numerical)
     	      Installment rate in percentage of disposable income

     Attribute 9:  (qualitative)
     	      Personal status and sex
     	      A91 : male   : divorced/separated
     	      A92 : female : divorced/separated/married
     	      A93 : male   : single
     	      A94 : male   : married/widowed
     	      A95 : female : single

     Attribute 10: (qualitative)
     	      Other debtors / guarantors
     	      A101 : none
     	      A102 : co-applicant
     	      A103 : guarantor

     Attribute 11: (numerical)
     	      Present residence since

     Attribute 12: (qualitative)
     	      Property
     	      A121 : real estate
     	      A122 : if not A121 : building society savings agreement/
     				   life insurance
     	      A123 : if not A121/A122 : car or other, not in attribute 6
     	      A124 : unknown / no property

     Attribute 13: (numerical)
     	      Age in years

     Attribute 14: (qualitative)
     	      Other installment plans 
     	      A141 : bank
     	      A142 : stores
     	      A143 : none

     Attribute 15: (qualitative)
     	      Housing
     	      A151 : rent
     	      A152 : own
     	      A153 : for free

     Attribute 16: (numerical)
              Number of existing credits at this bank

     Attribute 17: (qualitative)
     	      Job
     	      A171 : unemployed/ unskilled  - non-resident
     	      A172 : unskilled - resident
     	      A173 : skilled employee / official
     	      A174 : management/ self-employed/
     		         highly qualified employee/ officer

     Attribute 18: (numerical)
     	      Number of people being liable to provide maintenance for

     Attribute 19: (qualitative)
     	      Telephone
     	      A191 : none
     	      A192 : yes, registered under the customers name

     Attribute 20: (qualitative)
     	      foreign worker
     	      A201 : yes
     	      A202 : no


In [2]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix
from sklearn.svm import SVC

df = pd.read_csv(
    "https://raw.githubusercontent.com/jdvelasq/datalabs/master/datasets/german.csv"
)
y = df.pop("default")
X = df.copy()


In [3]:
#
# Use el transformador LabelEncoder para preprocesar
# las columnas alfanuméricas del dataframe.
#
# Use los primeros 900 datos para entrenamiento del
# modelo y los 100 restantes para validación.
# 
# Construya el SVM usando los valores por defecto de
# los parámetros.
#
# Compute la matriz de confusión para la muestra de
# validación.
#
# Rta/
# True
# True
# True
# True
#

from sklearn.compose import ColumnTransformer
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import StandardScaler
from sklearn.compose import make_column_selector
from sklearn import svm
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import ConfusionMatrixDisplay
import matplotlib.pyplot as plt
import seaborn as sns


df = pd.read_csv(
    "https://raw.githubusercontent.com/jdvelasq/datalabs/master/datasets/german.csv"
)


label = LabelEncoder()

columns = df.select_dtypes(include=['object', 'bool']).columns

df[columns] = df[columns].apply(label.fit_transform)


X = df.drop('default', axis=1)
y = df['default']

column_trans = ColumnTransformer(
    [
       ("scale", StandardScaler(), make_column_selector(dtype_include=np.number)),
    ],
    remainder="drop",
)

#X_trans=column_trans.fit_transform(X)


X_train =X[0:900]
X_test = X[900:]
y_train = y[0:900]
y_test = y[900:]


clf = SVC() 
clf.fit(X_train, y_train)

y_pred=clf.predict(X_test)

cm = confusion_matrix(
    y_true=y_test,
    y_pred=y_pred,
    labels=None,
    normalize=None,
)

confusionMatrixDisplay = ConfusionMatrixDisplay(
    confusion_matrix=cm,
)

#confusionMatrixDisplay.plot(cmap="Blues")
#plt.show()


# >>> Inserte su codigo aquí >>>

# ---->>> Evaluación ---->>>
# cm es la matriz de confusion
print(cm[0][0] == 67)
print(cm[0][1] == 1)
print(cm[1][0] == 30)
print(cm[1][1] == 2)

True
True
True
True


In [4]:
#
# Encuentre la mejor combinación de kernel y parámetros
# de regularización para los valores suministrados 
# durante el entrenamiento y compute la matriz de 
# confusión para la muestra de prueba.
#
# Rta/
# True
# True
# True
# True
#

from sklearn.model_selection import GridSearchCV

estimators=[
            ("clf",SVC(random_state=50))
]

pipeline= Pipeline(
    steps=estimators,
    verbose=False
)


param_grid = [
    # -------------------------------------------------------------------------
    # Primera malla de parámetros
    {
        "clf__kernel": ['rbf', 'poly', 'sigmoid'],
        "clf__C": [1, 2, 3, 4, 5],
    },
]

gridSearchCV = GridSearchCV(
    estimator=pipeline,
    param_grid=param_grid,
    cv=2,
    n_jobs=-1)

gridSearchCV.fit(X_train,y_train)

y_pred_2=gridSearchCV.predict(X_test)

# >>> Inserte su codigo aquí >>>

# ---->>> Evaluación ---->>>
# cm es la matriz de confusion
cm = confusion_matrix(
    y_true=y_test,
    y_pred=y_pred_2,
    labels=None,
    normalize=None,
)

confusionMatrixDisplay = ConfusionMatrixDisplay(
    confusion_matrix=cm,
)

#confusionMatrixDisplay.plot(cmap="Blues")
#plt.show()

print(cm[0][0] == 68)
print(cm[0][1] == 0)
print(cm[1][0] == 30)
print(cm[1][1] == 2)

True
True
True
True
