# Analyzing the Rule coverage of automated loan approval decisions
## Analyzing your IBM Digital Business Automated decisions in Python with Panda and Brunel

This Python 3.5 notebook shows how to load a rule coverage produced by IBM DBA for a decision set. You learn how to apply analytics with Brunel library to get insights on your automated decisions.
The decision set has been automated by running business rules on randomly generated loan applications. The rule execution statistics have been captured through IBM DBA and transformed into a CSV format. 

This notebook has been developed with a Panda 3 dataframe and runs on Spark 2.1 or upper version. 

The intent of applying data science on decisions is to check that decision automation works as expected. In other words, we want to check that the executed rules fit well with the segmentation of the data. From there we will potentialy find optimizations to better automate your decision making. You will be able to extend the notebook to create new views on your decisions by using Panda dataframes and Brunel visualization capabilities.
    
To get the most out of this notebook, you should have some familiarity with the Python programming language.

## Contents 
This notebook contains the following main sections:

1. [Load the loan validation coverage.](#overview)
2. [View an approval distribution pie chart.](#viewapprovaldistribution)
3. [View the income on loan amount distribution.](#incomeoncreditscoredistribution)
4. [View the loan amount on credit score distribution.](#viewamountdistribution)
5. [Summary and next steps.](#next)    

<a id="overview"></a>
## 1. Load the Loan Validation decision set.
The loan validation dataset has been generated with Operational Decision Manager as a CSV file.
The following code accesses to this dataset file to construct a dataframe of simple processed loan applications.

In [1]:
from io import StringIO

import requests
import json
import pandas as pd
import brunel

df = pd.read_csv("https://raw.githubusercontent.com/ODMDev/decisions-on-spark/master/data/loanvalidation/1K/loanvalidation-rulecoverage-1K.csv")
decisionCount = 1000
df

Waiting for a Spark session to start...
Spark Initialization Done! ApplicationId = app-20190707195453-0002
KERNEL_ID = d6b6a7b0-77d0-4494-b843-28c35a95f72d


Unnamed: 0,Rule,Count
0,computation.rate.7,0
1,computation.salary2score.5,43
2,insurance.insurance.8,3
3,computation.rate.10,0
4,insurance.insurance.11,57
5,computation.rate.21,0
6,insurance.insurance.6,45
7,computation.rate.18,0
8,computation.salary2score.3,6
9,insurance.insurance.2,29


A dataframe has been created to capture 1000 loan application decisions automated with business rules. Business rules have been used to determine eligibility based credit score, loan amount, income to debt ratio. Decision outcomes are represented by the approval and yearlyReplayment columns.

Table above represents a rule coverage. Each row shows a business rule with its execution count. Rule name conventions is: 
   * packageName.ruleName
   * packageName.decisionTable.line
     
By example computation.rate.10 means the 10th line of the decision table named "rate" in the "computation" package.

In [2]:
df['Percentage']= (df['Count']*100)/decisionCount
df

Unnamed: 0,Rule,Count,Percentage
0,computation.rate.7,0,0.0
1,computation.salary2score.5,43,4.3
2,insurance.insurance.8,3,0.3
3,computation.rate.10,0,0.0
4,insurance.insurance.11,57,5.7
5,computation.rate.21,0,0.0
6,insurance.insurance.6,45,4.5
7,computation.rate.18,0,0.0
8,computation.salary2score.3,6,0.6
9,insurance.insurance.2,29,2.9


In [3]:
import re

def getAssetPath(str):
    return str.split('.')
    
df['Package'] = df['Rule'].apply(lambda r: getAssetPath(r)[0])
df['Asset'] = df['Rule'].apply(lambda r: getAssetPath(r)[1])
df

Unnamed: 0,Rule,Count,Percentage,Package,Asset
0,computation.rate.7,0,0.0,computation,rate
1,computation.salary2score.5,43,4.3,computation,salary2score
2,insurance.insurance.8,3,0.3,insurance,insurance
3,computation.rate.10,0,0.0,computation,rate
4,insurance.insurance.11,57,5.7,insurance,insurance
5,computation.rate.21,0,0.0,computation,rate
6,insurance.insurance.6,45,4.5,insurance,insurance
7,computation.rate.18,0,0.0,computation,rate
8,computation.salary2score.3,6,0.6,computation,salary2score
9,insurance.insurance.2,29,2.9,insurance,insurance


<a id="loanamountoncreditscoredistribution"></a>
## 2.View the rule coverage distribution
You visualize the rule execution statistics metered when running a decision service. From these statistics you get insights on:
- the rules that are never executed. Rules can be never executed against a data set when their conditions are never met. Typically if they are triggered for specific cases that don't appear in your data set. Otherwise have a deeper look to your rules and dataset.
- the rules that are always executed. Common case is a rule that checks the compliance of input parameters. But if you see a rule that should only triggered on a subset of the requests and is reported with a 100% then check its conditions and the request set.

In [4]:
from matplotlib import rcParams
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt

#rcParams.update({'figure.autolayout': True})
rcParams.update({'font.size': 10})

plt.figure(figsize=(12, 16))
#plt.yscale('log', nonposy='clip')

rules = df['Rule']
percentages = df['Percentage']
counts = df['Count']
y_pos = np.arange(len(rules))

plt.barh(y_pos, percentages, align='center')
plt.yticks(y_pos, rules)
plt.xlabel('percentage')
plt.title('Rule Coverage')
plt.ylabel('rule')
plt.savefig('rulecoverage.jpg')
plt.show()

<matplotlib.figure.Figure at 0x7f47e6bf0160>

In [5]:
%brunel data('df') bar x(rule) y(count) color(count:yellow-green) :: width=700, height=400

<IPython.core.display.Javascript object>

<a id="overview"></a>
## 3. What are the rules never executed.
Let see the list of rules that were never executed in your decision set.

In [6]:
total_rows = df.shape[0]
print("The number of business rules is " + str(total_rows))

neverExecutedRules = df.loc[df['Count'] == 0]
total_executed = neverExecutedRules.shape[0]
print("Number of never executed business rules: " + str(total_executed))
print("Never executed business rules list: ")
neverExecutedRules

The number of business rules is 75
Number of never executed business rules: 24
Never executed business rules list: 


Unnamed: 0,Rule,Count,Percentage,Package,Asset
0,computation.rate.7,0,0.0,computation,rate
3,computation.rate.10,0,0.0,computation,rate
5,computation.rate.21,0,0.0,computation,rate
7,computation.rate.18,0,0.0,computation,rate
10,computation.bankruptcyScore.4,0,0.0,computation,bankruptcyScore
11,computation.rate.14,0,0.0,computation,rate
12,eligibility.checkCreditScore,0,0.0,eligibility,checkCreditScore
15,computation.rate.23,0,0.0,computation,rate
16,computation.rate.9,0,0.0,computation,rate
17,computation.rate.3,0,0.0,computation,rate


<a id="overview"></a>
## 4. What are the always executed rules
Let see the list of rules that were always executed in your decision set.

In [7]:
alwaysExecutedRules = df.loc[df['Count'] == decisionCount]
total_executed = alwaysExecutedRules.shape[0]
print("Number of always executed business rules: " + str(total_executed))
print("Always executed business rules list: ")
alwaysExecutedRules

Number of always executed business rules: 7
Always executed business rules list: 


Unnamed: 0,Rule,Count,Percentage,Package,Asset
19,validation.borrower.checkAge,1000,100.0,validation,borrower
22,validation.borrower.checkName,1000,100.0,validation,borrower
41,validation.loan.checkAmount,1000,100.0,validation,loan
44,validation.borrower.checkSSNdigits,1000,100.0,validation,borrower
48,validation.borrower.checkZipcode,1000,100.0,validation,borrower
57,validation.borrower.checkSSNareanumber,1000,100.0,validation,borrower
67,validation.test.checkTest,1000,100.0,validation,test


All these rules check the input parameters and are expected to be triggered for every request. Observing a rule executed 100% of the cases but supposed to apply to a minority of decisions would imply a deeper look. You will investigate the conditions of the rule, that may to open for the pattern that you intend to match. Additionaly you will look at your request set to make sure that it covers the use cases that you expect to cover. 

<a id="next"></a>
# Summary and next steps
You have manipulated a rule coverage dataframe coming from a decision set powered by IBM Digital Business Automation. The csv file that gives a counting of the executed rules was previoulsy obtained by capturing each decision with IBM Business Automation Insights feature, and reduce the rule execution details for a set of 1000 decisions. You have seen how to exploit this coverage to get basic insights on your decision logic. You can expand this notebook by adapting the views and adding new ones to get more perspectives about your decisions, and make better decisions in the future.

<a id="authors"></a>
## Authors

Pierre Feillet is engineer at the IBM Decision Lab. Pierre is architect in decision automation, and is passionate about data science and machine learning.

Copyright © 2019 IBM. This notebook and its source code are released under the terms of the MIT License.