# Modelling : Credit Card Routing for Online Purchase via Predictive Modelling

## Calculating entropy for dataset
The dataset given here is based on success & failure of a given transaction, so ideally it is the **probability** of success which can help us to start the model. Furthere, mathematical function called **entropy** which identifies degree of randomness in dataset can be applied here to select features & split the entire file.

**log2**= library imported for calculating log with base 2

**entropy**=-$\sum_{i}^n P_i(log)_2(P_i) $

**n**=total number of type transactions (2 in our case i.e. success & failure), **i**=individual type of transactions

In [None]:
#Importing library for calculating log with base 2
from math import log2
#defining entropy function
def class_entropy(k,n):
    #Probability of success
    P_Success = k/n
    #Probability of failure
    P_Fail = (n-k)/n
    #Applying enropy formula
    entropy=-((P_Success*log2(P_Success))+(P_Fail*log2(P_Fail)))
    return entropy
#Calculating total number of data in the set
N=len(pd_merge_t_fee_sorted)
#Calculating total number of transaction success records in the dataset
K=len(pd_merge_t_fee_sorted[pd_merge_t_fee_sorted["success"]==1])
#Calling entropy function
Total_entropy_value=class_entropy(K, N)
print("The total class entropy is",format(feature_info_gain,"0.3f"))

## Calculating Information gain based on entropy
It is an assumption that **PSP** & **Country** are the two deciding features used for spliting the data & the priority of these features can be determined through **information gain**
The entropy of for each feature will be calculated separately by calling the entrpy function for successful record. However, the weighted sum of the entropy for each feature will generate total entropy. It is determined using below code. The weighted sum is also called **split info** which is stored in **Total_feature_entropy_value** for each feature.  

**Total_feature_entropy_value**=


In [None]:
#Calculating information gain for the features passed in feature list.
def info_gain(features_list):
    #Iterating through all the features in the list
    for i in range(len(features_list)):
        #Collecting unique values for each feature in a separate list
        feature_list=list(set(pd_merge_t_fee_sorted[features_list[i]]))
        #Initializing individual lists for probability with parent dataset, split info & vatiable for information gain value
        feature_probability=[]
        feature_entropy_list=[]
        Total_feature_entropy_value=0
        #Iterating through all the features in the feature list
        for j in range(len(feature_list)):
            #Collecting sample size of total & successful data for each feature
            n=len(pd_merge_t_fee_sorted[pd_merge_t_fee_sorted[features_list[i]]==feature_list[j]])
            k=len(pd_merge_t_fee_sorted[(pd_merge_t_fee_sorted[features_list[i]]==feature_list[j]) & (pd_merge_t_fee_sorted["success"]==1)])
            #Calculating feature probability based on total sample size and appending to the list
            feature_probability.append(k/N)
            #Calling class entropy based on child dataset specific to sample size of that feature and appending to the list
            feature_entropy_list.append(class_entropy(k,n))
        #Iterating through all the information gain values calculated above
        for j in range(len(feature_entropy_list)):
            #Calculating split info based on parent sample size for each feature by multiplying with the total probability
            Total_feature_entropy_value+=feature_entropy_list[j]*feature_probability[j]
        #Calculating information gain for each feature
        feature_info_gain=Total_entropy_value-Total_feature_entropy_value
        print("Information Gain for",features_list[i],"is",format(feature_info_gain,"0.3f"))
#Selecting below features for performing the analysis
features_list=["PSP","country"]
#Calling the function
info_gain(features_list)

## Determination of transaction fee using Predictive Model

The predictive model is goverend by different steps as below:

**Data transformation**: Here, given data will be converted into a dataframe where number of attempts for each duplicate transaction will be recorded.

**Objective function**: Objective function will be determined based on the conditions where business can generate better revenue thereby lowering the transaction fees for each PSP.

**Model Creation**: This will work on a predictive analysis where predicted PSP & predicted transaction amount will be calculated by using multiple dataframes thorough series of operations.

**dataframe.insert(a,b,c)** = inserts column *b* at *a* index with *c* as its initial values

**np.arange(a,b,c)**= genrates list from number *a* to *b* with a common difference of *c* between the values

**dataframe.rename(columns={a:b},inplace=True)**=renames column header from *a* to *b*.

In [None]:
import pandas as pd
from math import log2
import numpy as np

#Function which returns dataframe of transaction fee merged with each transaction
def merge_transac_fee(df_original,df_transac_fee,PSP,transaction_fees_col):
    #Filtering failed records from all the given original transaction dataset
    df_original_fail=df_original[df_original["success"]==0]
    #dropping column of success transaction fees from the provided reference
    df_transac_fee_fail=df_transac_fee.drop(columns=("Success"))
    #Merging the filtered failed transaction dataset with the reference of transaction fees
    pd_merge_fail=pd.merge(df_original_fail, df_transac_fee_fail, on=PSP,how="inner")
    #Remaning the column to the value passed in the function
    pd_merge_fail.rename(columns={"Fail":transaction_fees_col},inplace=True)
    #Repeating above steps to merge the transaction fees in a dataframe for successful records
    df_original_success=df_original[df_original["success"]==1]
    df_transac_fee_success=df_transac_fee.drop(columns=("Fail"))
    pd_merge_success=pd.merge(df_original_success, df_transac_fee_success, on=PSP,how="inner")
    pd_merge_success.rename(columns={"Success":transaction_fees_col},inplace=True)
    #Concatinating child datasets for successful & failed transactions to generate parent dataframe
    pd_merge_t_fee=pd.concat([pd_merge_fail,pd_merge_success])
    #Sorting the parent dataframe based on timestamp & returning
    pd_merge_t_fee_sorted=pd_merge_t_fee.sort_values(by=["time_diff_cum"])
    return(pd_merge_t_fee_sorted)

#Funtion to obtain try attempts to complete same transaction
def try_count(merge_transac_fee_dframe):
    #Dropping unnecessary columns
    merge_transac_fee_dframe=merge_transac_fee_dframe.drop(columns=(["Unnamed: 0","tmsp","3D_secured","card"]))
    #Inserting new column called as try count
    merge_transac_fee_dframe.insert(6, "try_count",1)
    #Iterating through entire dataframe
    for i in range(1,len(merge_transac_fee_dframe)):
           #Checking for consecutive transactions with same amount given in first column
           if(merge_transac_fee_dframe.iat[i, 1]==merge_transac_fee_dframe.iat[i-1, 1]):
               #Incrementing count of attempts under try count column
               merge_transac_fee_dframe.iat[i,6]=merge_transac_fee_dframe.iat[i-1,6]+1
    #Returning final dataframe with number of tries
    return(merge_transac_fee_dframe)

#Function for predicting PSP with minimum transaction fee
def predicted_PSP(data_try_count):
    #Inserting new column for determining predicted PSP
    data_try_count.insert(7,"predicted_PSP","predicted_PSP")
    #For every failed transaction at first attempt
    Simplecard=np.arange(1,1000,4)
    #For every failed transaction at second attempt
    UK_card=np.arange(2,1000,4)
    #For every failed transaction at third attempt
    Moneycard=np.arange(3,1000,4)
    #For every failed transaction at fourth attempt
    Goldcard=np.arange(4,1000,4)
    #Iterating through the dataframe having number of try counts
    for i in range(len(data_try_count)):
        #Identifying & updating predicted_PSP value based on matched condition
        if(data_try_count.iat[i,6] in(Simplecard)):
            data_try_count.iat[i,7]="Simplecard"
        if(data_try_count.iat[i,6] in(UK_card)):
            data_try_count.iat[i,7]="UK_Card"
        if(data_try_count.iat[i,6] in(Moneycard)):
            data_try_count.iat[i,7]="Moneycard"
        if(data_try_count.iat[i,6] in(Goldcard)):
            data_try_count.iat[i,7]="Goldcard"
    #Returning updated dataframe with predicted PSP
    predicted_card_count=data_try_count
    return(predicted_card_count)

#Function for calculating predicted transaction fees
def predicted_transaction_amount(predicted_PSP,transac_fee):
    #Calling merge function to obtain transaction fees from predicted PSP
    return(merge_transac_fee(predicted_PSP,transac_fee,"predicted_PSP","predicted_transaction_fees"))
#Specifying path of Original dataset & transaction fee for PSP given by IU from local system to a dataframe
df_original=pd.DataFrame(pd.read_csv("C:\\MISC\\IU_downloads\\PSP_Jan_Feb_2019_original.csv"),index=None)
df_transac_fee=pd.DataFrame(pd.read_csv("C:\\MISC\\IU_downloads\\PSP_Jan_Feb_2019_transac_fees.csv"),index=None)
#Calling predicted PSP function by passing abouve dataframes in the arguement
predicted_PSP_data=predicted_PSP(try_count(merge_transac_fee(df_original,df_transac_fee,"PSP","original_transaction_fees")))
#Renaming PSP column to predicted PSP & passing to a function for predicting new transaction amount
df_transac_fee.rename(columns={"PSP":"predicted_PSP"},inplace=True)
pred_t_fee_dframe=predicted_transaction_amount(predicted_PSP_data,df_transac_fee)