# Random Forest Loan Approver

In this example, we will build a Random Forest classifier that can be used to predict the loan status (approve or deny) given a set of input features.

## Instructions

1. Reading the data into a Pandas DataFrame.

2. Separating the features `X` from the target `y`. In this case, the loan status is the target.

3. Separating the data into training and testing subsets.

4. Scaling the data using `StandardScaler`.

5. Importing and instantiate an Random Forest classifier using sklearn.

6. Fitting the model to the data.

7. Calculating the accuracy score using both the training and the testing data.

8. Making predictions using the testing data.

9. Generating the confusion matrix for the test data predictions.

10. Generating the classification report for the test data.

## Load Data
### 1. Read the data into a Pandas DataFrame.

In [1]:
# Import modules
'''from path import Path
import pandas as pd'''
from path import Path
import pandas as pd

In [2]:
# Read in the data
loan_path = Path('../Resources/loans.csv')
df = pd.read_csv(loan_path)
df.head()

Unnamed: 0,assets,liabilities,income,credit_score,mortgage,status
0,0.210859,0.452865,0.281367,0.628039,0.302682,deny
1,0.395018,0.661153,0.330622,0.638439,0.502831,approve
2,0.291186,0.593432,0.438436,0.434863,0.315574,approve
3,0.45864,0.576156,0.744167,0.291324,0.394891,approve
4,0.46347,0.292414,0.489887,0.811384,0.566605,approve


### 2. Separate the Features `X` from the Target `y`

In [3]:
# Segment the features from the target
X = df.drop(columns='status', axis=1)
y = df['status']

### 3. Split the data into training and testing sets

In [5]:
'''from sklearn.model_selection import train_test_split'''
from sklearn.model_selection import train_test_split

# Use the train_test_split function to create training and testing subsets
X_train,X_test,y_train,y_test = train_test_split(X,y, random_state=1, stratify=y)
X_train.shape

(75, 5)

### 4. Scale the data using `StandardScaler`

In [7]:
'''from sklearn.preprocessing import StandardScaler'''
from sklearn.preprocessing import StandardScaler

# Scale the data
scaler = StandardScaler()
X_scaler = scaler.fit(X_train)
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

## Model

### 5. Import and instantiate the `RandomForestClassifier` using sklearn. 

In [8]:
'''from sklearn.ensemble import RandomForestClassifier'''
from sklearn.ensemble import RandomForestClassifier

# Create a random forest classifier
rf_model = RandomForestClassifier(n_estimators=500, random_state=1)

## Fit

### 6. Train the model using the training data

In [9]:
# Fit the data
rf_model.fit(X_train_scaled, y_train)

RandomForestClassifier(n_estimators=500, random_state=1)

### 7. Score the model using the test data

In [10]:
# Score the accuracy
print(f'TRAINING DATA SCORE: {rf_model.score(X_train_scaled, y_train)}')
print(f'TESTING DATA SCORE: {rf_model.score(X_test_scaled, y_test)}')

TRAINING DATA SCORE: 1.0
TESTING DATA SCORE: 0.76


## Predict

### 8. Make predictions

In [11]:
# Make predictions using the test data
y_pred = rf_model.predict(X_test_scaled)

## Evaluate

### 9. Generate Confusion Matrix

In [12]:
'''from sklearn.metrics import confusion_matrix'''
from sklearn.metrics import confusion_matrix

# Create a confusion matrix
confusion_matrix(y_test, y_pred)

array([[ 9,  3],
       [ 3, 10]], dtype=int64)

### 10. Generate Classification Report

In [15]:
'''
from sklearn.metrics import classification_report
'''
from sklearn.metrics import classification_report

# Print the classification report
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

     approve       0.75      0.75      0.75        12
        deny       0.77      0.77      0.77        13

    accuracy                           0.76        25
   macro avg       0.76      0.76      0.76        25
weighted avg       0.76      0.76      0.76        25

