# Australian Credit Approval in R

Applying algorithms in R for given dataset.

In [162]:
library(rpart)

myData <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/australian/australian.dat", header=TRUE, sep=" ")

In [163]:
head(myData)

X1,X22.08,X11.46,X2,X4,X4.1,X1.585,X0,X0.1,X0.2,X1.1,X2.1,X100,X1213,X0.3
0,22.67,7.0,2,8,4,0.165,0,0,0,0,2,160,1,0
0,29.58,1.75,1,4,4,1.25,0,0,0,1,2,280,1,0
0,21.67,11.5,1,5,3,0.0,1,1,11,1,2,0,1,1
1,20.17,8.17,2,6,4,1.96,1,1,14,0,2,60,159,1
0,15.83,0.585,2,8,8,1.5,1,1,2,0,2,100,1,1
1,17.42,6.5,2,3,4,0.125,0,0,0,0,2,60,101,0


## Data Adjusments

Adjusting dataframe for categorical variables.

In [164]:
myData$X1 <- factor(myData$X1, levels = c(0,1), labels = c(0,1))
myData$X2 <- factor(myData$X2, levels = c(1,2,3), labels = c(1,2,3))
myData$X4 <- factor(myData$X4, levels = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14), labels = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14))
myData$X4.1 <- factor(myData$X4.1, levels = c(1,2,3,4,5,6,7,8,9), labels = c(1,2,3,4,5,6,7,8,9))
myData$X0 <- factor(myData$X0, levels = c(0,1), labels = c(0,1))
myData$X0.1 <- factor(myData$X0.1, levels = c(0,1), labels = c(0,1))
myData$X1.1 <- factor(myData$X1.1, levels = c(0,1), labels = c(0,1))
myData$X2.1 <- factor(myData$X2.1, levels =  c(1,2,3), labels =  c(1,2,3))
myData$X0.3 <- factor(myData$X0.3, levels = c(0,1), labels = c(0,1))

In [165]:
head(myData)

X1,X22.08,X11.46,X2,X4,X4.1,X1.585,X0,X0.1,X0.2,X1.1,X2.1,X100,X1213,X0.3
0,22.67,7.0,2,8,4,0.165,0,0,0,0,2,160,1,0
0,29.58,1.75,1,4,4,1.25,0,0,0,1,2,280,1,0
0,21.67,11.5,1,5,3,0.0,1,1,11,1,2,0,1,1
1,20.17,8.17,2,6,4,1.96,1,1,14,0,2,60,159,1
0,15.83,0.585,2,8,8,1.5,1,1,2,0,2,100,1,1
1,17.42,6.5,2,3,4,0.125,0,0,0,0,2,60,101,0


## Logistic Regression 

Logistic Regression Applications for certain numerical columns. Lets begin with adjusting numerical data in order to check for logistic regression implementation

In [166]:
numericalMyData <- cbind(myData$X22.08, myData$X11.46, myData$X1.585, myData$X100, myData$X1213)

approved <- ifelse(myData$X0.3 == 1,1,0)
rejected <- ifelse(myData$X0.3 == 0,1,0)

logRegData <- cbind(numericalMyData, approved, rejected)

colnames(logRegData)[1] <- "X22.08"
colnames(logRegData)[2] <- "X11.46"
colnames(logRegData)[3] <- "X1.585"
colnames(logRegData)[4] <- "X100"
colnames(logRegData)[5] <- "X1213"

train <- sample(1:150, 100)
test <- -train

In [167]:
head(logRegData)

X22.08,X11.46,X1.585,X100,X1213,approved,rejected
22.67,7.0,0.165,160,1,0,1
29.58,1.75,1.25,280,1,0,1
21.67,11.5,0.0,0,1,1,0
20.17,8.17,1.96,60,159,1,0
15.83,0.585,1.5,100,1,1,0
17.42,6.5,0.125,60,101,0,1


Importing libraries.

In [168]:
library(caret)
library(e1071)

In this section, adjusting our data as "data frame" format since for creating a generalized linear model. After that, we are classifying our predictions for a certain probability value (setted as 0.7 in this example), then creating a confusion matrix in order to test model's accuracy.

#### Testing for approved credit applications.

In [169]:
logRegData <- as.data.frame(logRegData)
model <- glm(approved ~ X22.08 + X11.46 + X1.585 + X100 + X1213, data=logRegData[train,], binomial)
oddsratio <- exp(predict(model,logRegData[test,]))
predicted <- ifelse(oddsratio > 0.7,1,0)
confusionMatrix(table(predicted,real=logRegData[test,"approved"]))

Confusion Matrix and Statistics

         real
predicted   0   1
        0 239  76
        1  88 186
                                          
               Accuracy : 0.7216          
                 95% CI : (0.6835, 0.7574)
    No Information Rate : 0.5552          
    P-Value [Acc > NIR] : <2e-16          
                                          
                  Kappa : 0.4388          
 Mcnemar's Test P-Value : 0.3904          
                                          
            Sensitivity : 0.7309          
            Specificity : 0.7099          
         Pos Pred Value : 0.7587          
         Neg Pred Value : 0.6788          
             Prevalence : 0.5552          
         Detection Rate : 0.4058          
   Detection Prevalence : 0.5348          
      Balanced Accuracy : 0.7204          
                                          
       'Positive' Class : 0               
                                          

#### Testing for rejected applications.
# Result below must be discussed because of lack accuracy.

In [170]:
logRegData <- as.data.frame(logRegData)
model <- glm(approved ~ X22.08 + X11.46 + X1.585 + X100 + X1213, data=logRegData[train,], binomial)
oddsratio <- exp(predict(model,logRegData[test,]))
predicted <- ifelse(oddsratio > 0.7,1,0)
confusionMatrix(table(predicted,real=logRegData[test,"rejected"]))

Confusion Matrix and Statistics

         real
predicted   0   1
        0  76 239
        1 186  88
                                          
               Accuracy : 0.2784          
                 95% CI : (0.2426, 0.3165)
    No Information Rate : 0.5552          
    P-Value [Acc > NIR] : 1.00000         
                                          
                  Kappa : -0.4321         
 Mcnemar's Test P-Value : 0.01166         
                                          
            Sensitivity : 0.2901          
            Specificity : 0.2691          
         Pos Pred Value : 0.2413          
         Neg Pred Value : 0.3212          
             Prevalence : 0.4448          
         Detection Rate : 0.1290          
   Detection Prevalence : 0.5348          
      Balanced Accuracy : 0.2796          
                                          
       'Positive' Class : 0               
                                          