# Weight of Evidence (WOE) and Information Value (IV)

The weight of evidence tells the predictive power of an independent variable in relation to the dependent variable.

It's good to understand the concept of WOE in terms of events and non-events.

$$ WOE = ln \Bigg(\frac{\text{percent of non-events}}{\text{percent of events}}\Bigg)$$

## Steps of Calculating WOE


    1. For a continuous variable, split data into 10 parts (or lesser depending on the distribution).
    2. Calculate the number of events and non-events in each group (bin)
    3. Calculate the % of events and % of non-events in each group.
    4. Calculate WOE by taking natural log of division of % of non-events and % of events

    For a categorical variable, you do not need to split the data (Ignore Step 1 and follow the remaining steps)

## Usage of WOE

Weight of Evidence (WOE) helps to transform a continuous independent variable into a set of groups or bins based on similarity of dependent variable distribution i.e. number of events and non-events. 

### For continuous independent variables

For continuous independent variables : First, create bins (categories / groups) for a continuous independent variable and then combine categories with similar WOE values and replace categories with WOE values. Use WOE values rather than input values in your model.

### For categorical independent variables

For categorical independent variables : Combine categories with similar WOE and then create new categories of an independent variable with continuous WOE values. In other words, use WOE values rather than raw categories in your model. The transformed variable will be a continuous variable with WOE values. It is same as any continuous variable. 

the categories with similar WOE have almost same proportion of events and non-events. In other words, the behavior of both the categories is same.

## Rules related to WOE

    1. Each category (bin) should have at least 5% of the observations.
    2. Each category (bin) should be non-zero for both non-events and events.
    3. The WOE should be distinct for each category. Similar groups should be aggregated.
    4. The WOE should be monotonic, i.e. either growing or decreasing with the groupings.
    5. Missing values are binned separately.


## Number of Bins (Groups)

In general, 10 or 20 bins are taken. Ideally, each bin should contain at least 5% cases. The number of bins determines the amount of smoothing - the fewer bins, the more smoothing.

## Handle Zero Event/ Non-Event

If a particular bin contains no event or non-event, you can use the formula below to ignore missing WOE. We are adding 0.5 to the number of events and non-events in a group.

AdjustedWOE = ln (((Number of non-events in a group + 0.5) / Number of non-events)) / ((Number of events in a group + 0.5) / Number of events))

## How to check correct binning with WOEm

    1. The WOE should be monotonic i.e. either growing or decreasing with the bins. You can plot WOE values and check linearity on the graph.
    
    2. Perform the WOE transformation after binning. Next, we run logistic regression with 1 independent variable having WOE values. If the slope is not 1 or the intercept is not ln(% of non-events / % of events) then the binning algorithm is not good. 

## Benefits of WOE


    1. It can treat outliers. Suppose you have a continuous variable such as annual salary and extreme values are more than 500 million dollars. These values would be grouped to a class of (let's say 250-500 million dollars). Later, instead of using the raw values, we would be using WOE scores of each classes.
    
    2. It can handle missing values as missing values can be binned separately.
    
    3. Since WOE Transformation handles categorical variable so there is no need for dummy variables.
    
    4. WoE transformation helps you to build strict linear relationship with log odds. Otherwise it is not easy to accomplish linear relationship using other transformation methods such as log, square-root etc. In short, if you would not use WOE transformation, you may have to try out several transformation methods to achieve this.


## Information Value (IV)

Information value is one of the most useful technique to select important variables in a predictive model. It helps to rank variables on the basis of their importance

IV = ∑ (% of non-events - % of events) * WOE

    Information Value 	Variable Predictiveness
    
    Less than 0.02 	   Not useful for prediction
    0.02 to 0.1 	      Weak predictive Power
    0.1 to 0.3 	       Medium predictive Power
    0.3 to 0.5 	       Strong predictive Power
    >0.5 	             Suspicious Predictive Power

If the IV statistic is:

    Less than 0.02, then the predictor is not useful for modeling (separating the Goods from the Bads)
    0.02 to 0.1, then the predictor has only a weak relationship to the Goods/Bads odds ratio
    0.1 to 0.3, then the predictor has a medium strength relationship to the Goods/Bads odds ratio
    0.3 to 0.5, then the predictor has a strong relationship to the Goods/Bads odds ratio.
    > 0.5, suspicious relationship (Check once)


In [12]:
%load_ext autoreload
%autoreload 2
import sys
sys.path.append(f'E:\gitlab\custom-script\script')
from ursar import fe,feature_importance,describe
%reload_ext autoreload

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [9]:
import pandas as pd
import numpy as np

In [10]:
#Read Data
mydata = pd.read_csv("https://stats.idre.ucla.edu/stat/data/binary.csv")

In [22]:
iv, woe = feature_importance.iv_woe(mydata, target = 'admit', bins=10, show_woe = True)

Information value of gre is 0.312882


Unnamed: 0,Variable,Cutoff,N,Events,% of Events,Non-Events,% of Non-Events,WoE,IV
0,gre,"(219.999, 440.0]",48,6,0.047244,42,0.153846,-1.180625,0.125857
1,gre,"(440.0, 500.0]",51,12,0.094488,39,0.142857,-0.41337,0.019994
2,gre,"(500.0, 520.0]",24,10,0.07874,14,0.051282,0.428812,0.011774
3,gre,"(520.0, 560.0]",51,15,0.11811,36,0.131868,-0.110184,0.001516
4,gre,"(560.0, 580.0]",29,6,0.047244,23,0.084249,-0.57845,0.021406
5,gre,"(580.0, 620.0]",53,21,0.165354,32,0.117216,0.344071,0.016563
6,gre,"(620.0, 660.0]",45,17,0.133858,28,0.102564,0.266294,0.008333
7,gre,"(660.0, 680.0]",20,9,0.070866,11,0.040293,0.564614,0.017262
8,gre,"(680.0, 740.0]",44,12,0.094488,32,0.117216,-0.215545,0.004899
9,gre,"(740.0, 800.0]",35,19,0.149606,16,0.058608,0.937135,0.085278


Information value of gpa is 0.27002


Unnamed: 0,Variable,Cutoff,N,Events,% of Events,Non-Events,% of Non-Events,WoE,IV
0,gpa,"(2.259, 2.9]",43,8,0.062992,35,0.128205,-0.710622,0.046342
1,gpa,"(2.9, 3.048]",37,11,0.086614,26,0.095238,-0.094917,0.000819
2,gpa,"(3.048, 3.17]",42,8,0.062992,34,0.124542,-0.681634,0.041955
3,gpa,"(3.17, 3.31]",42,10,0.07874,32,0.117216,-0.397866,0.015308
4,gpa,"(3.31, 3.395]",36,8,0.062992,28,0.102564,-0.487478,0.01929
5,gpa,"(3.395, 3.494]",40,14,0.110236,26,0.095238,0.146246,0.002193
6,gpa,"(3.494, 3.61]",41,16,0.125984,25,0.091575,0.318998,0.010976
7,gpa,"(3.61, 3.752]",39,20,0.15748,19,0.069597,0.816578,0.071764
8,gpa,"(3.752, 3.94]",42,13,0.102362,29,0.106227,-0.037062,0.000143
9,gpa,"(3.94, 4.0]",38,19,0.149606,19,0.069597,0.765285,0.06123


Information value of rank is 0.292044


Unnamed: 0,Variable,Cutoff,N,Events,% of Events,Non-Events,% of Non-Events,WoE,IV
0,rank,1,61,33,0.259843,28,0.102564,0.929588,0.146204
1,rank,2,151,54,0.425197,97,0.355311,0.179558,0.012548
2,rank,3,121,28,0.220472,93,0.340659,-0.43511,0.052295
3,rank,4,67,12,0.094488,55,0.201465,-0.757142,0.080997


In [23]:
iv

Unnamed: 0,Variable,IV,rank,explain
0,gre,0.312882,1.0,Strong predictive Power
1,gpa,0.27002,3.0,Medium predictive Power
2,rank,0.292044,2.0,Medium predictive Power


In [16]:
woe

Unnamed: 0,Variable,Cutoff,N,Events,% of Events,Non-Events,% of Non-Events,WoE,IV
0,gre,"(219.999, 440.0]",48,6,0.047244,42,0.153846,-1.180625,0.125857
1,gre,"(440.0, 500.0]",51,12,0.094488,39,0.142857,-0.41337,0.019994
2,gre,"(500.0, 520.0]",24,10,0.07874,14,0.051282,0.428812,0.011774
3,gre,"(520.0, 560.0]",51,15,0.11811,36,0.131868,-0.110184,0.001516
4,gre,"(560.0, 580.0]",29,6,0.047244,23,0.084249,-0.57845,0.021406
5,gre,"(580.0, 620.0]",53,21,0.165354,32,0.117216,0.344071,0.016563
6,gre,"(620.0, 660.0]",45,17,0.133858,28,0.102564,0.266294,0.008333
7,gre,"(660.0, 680.0]",20,9,0.070866,11,0.040293,0.564614,0.017262
8,gre,"(680.0, 740.0]",44,12,0.094488,32,0.117216,-0.215545,0.004899
9,gre,"(740.0, 800.0]",35,19,0.149606,16,0.058608,0.937135,0.085278
