# Students Do: Predicting Fraudulent Loans Applications

According to the American Bankers Association, [every dollar of fraud costs banks and credit unions roughly $2.92](https://www.aba.com/member-tools/industry-solutions/insights/state-card-fraud-2018). That's a reason why predicting fraud using machine learning techniques is a [broad area of research](https://scholar.google.com.mx/scholar?q=fraud+detection+machine+learning&btnG=&oq=fraud+detection+) and a great [business opportunity for FinTech startups](https://www.eu-startups.com/2019/06/paris-based-fintech-bleckwen-raises-e8-8-million-for-its-fraud-detection-software-to-prevent-financial-crime/).

In this activity, you will explore how tree-based algorithms can be used to identify fraudulent loan applications. You will start by using a decision tree model trained with the `sba_loans_encoded.csv` file that you created earlier.

In [None]:
# Initial imports
import pandas as pd
from path import Path
from sklearn import tree
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

# Needed for decision tree visualization
import pydotplus
from IPython.display import Image


## Loading and Preprocessing Loans Encoded Data

Load the `sba_loans_encoded.csv` in a pandas DataFrame called `df_loans`.

In [None]:
# Loading data



Define the features set, by copying the `df_loans` DataFrame and dropping the `Default` column.

In [None]:
# Define features set



Create the target vector by assigning the values of the `Default` column from the `df_loans` DataFrame.

In [None]:
# Define target vector



Split the data into training and testing sets.

In [None]:
# Splitting into Train and Test sets



Use the `StandardScaler` to scale the features data, remember that only `X_train` and `X_testing` DataFrames should be scaled.

In [None]:
# Create the StandardScaler instance



In [None]:
# Fit the Standard Scaler with the training data



In [None]:
# Scale the training data



## Fitting the Decision Tree Model

Once data is scaled, create a decision tree instance and train it with the training data (`X_train_scaled` and `y_train`).

In [None]:
# Create the decision tree classifier instance



In [None]:
# Fit the model



## Making Predictions Using the Tree Model

Validate the trained model, by predicting fraudulent loan applications using the testing data (`X_test_scaled`).

In [None]:
# Making predictions using the testing data



## Model Evaluation

Evaluate model's results, by using `sklearn` to calculate the confusion matrix, the accuracy score and to generate the classification report.

In [None]:
# Calculating the confusion matrix

# Calculating the accuracy score



In [None]:
# Displaying results



## Visualizing the Decision Tree

In this section, you should create a visual representation of the decision tree using `pydotplus`. Show the graph on the notebook, and also save it in `PDF` and `PNG` formats.

In [None]:
# Create DOT data

# Draw graph

# Show graph



In [None]:
# Saving the tree as PDF


# Saving the tree as PNG



## Analysis Question

Finally, analyze the model's evaluation results and answer the following question.

* Would you trust this model to deploy a loan application approval solution in a bank?

 * **Your answer here**