## Identify Fraud Points 

Credit card skippers/defaulters:

--> Assign 1 point to customer for short payment, where a short payment means when customer fails to clear atleast 70% of its monthly spends.

--> Assign 1 point to customer where he has spent 100% of his max_limit but did not clear the full amount.

--> If for any month customer is meeting both the above conditions,assign 1 additional point.

--> Sum up all the points for a customer and output in file.




Loan file key points:

--> For Personal loan category, Bank does not accept short or late payments.
   If a person has not paid monthly installment then that month's entry won't be present in the file.

--> For Medical loan, Bank does accepts late payments but it should be the full amount. It is assumed that there is every month's data/record for Medical Loan.



Loan defaulters:

--> Medical Loan defaulters : If customer has made a total of 3 or more late payments.

--> Personal Loan defaulters : If customer has missed a total of 4 or more installments OR missed 2 consecutive installments.

In [None]:
import apache_beam as beam
from datetime import datetime

In [32]:
def calculate_points(element):
    customer_id, first_name, last_name, relationship_id, card_type, max_limit, spent, cash_withdrawn, payment_cleared, payment_date = element.split(',')
    spent = int(spent)
    payment_cleared = int(payment_cleared)
    max_limit = int(max_limit)
    
    key_name = customer_id + ',' + first_name + ',' + last_name
    defaulter_points = 0
    
    if (payment_cleared > (spent*0.7)):
        defaulter_points += 1
    if (spent == max_limit) and (payment_cleared < spent):
        defaulter_points += 1
    if (spent == max_limit) and (payment_cleared < (spent*0.7)):
        defaulter_points += 1
    return key_name, defaulter_points

def calculate_month(input_list): #input --> [CT88330,Humberto,Banks,Serviceman,LN_1559,Medical Loan,26-01-2018, 2000, 30-01-2018]
    payment_date = datetime.strptime(input_list[8].rstrip().lstrip(), '%d-%m-%Y') #get array of index 8 (payment_date) convert to month
    input_list.append(str(payment_date.month)) #append month to last array [CT88330,Humberto,Banks,Serviceman,LN_1559,Medical Loan,26-01-2018, 2000, 30-01-2018, 1]
    
    return input_list
    
def calculate_late_payment(elements):
    due_date = datetime.strptime(elements[6].rstrip().lstrip(), '%d-%m-%Y')
    payment_date = datetime.strptime(elements[8].rstrip().lstrip(), '%d-%m-%Y')

    if payment_date <= due_date:
        elements.append('0')
    else:
        elements.append('1')
    return elements

def calculate_personal_loan_defaulter(input): #input -->  CT6855, Ronald Chiki value --> [01,05,06,07,08,09,10,11,12]
    max_allowed_missed_months = 4
    max_allowed_consecutive_missing = 2
    
    name, months_list = input                 #input [CT6855, Ronald, Chiki, Serviceman, LN_8460, Personal Loan, 25-01-2018, 50000, 25-01-2018]
    months_list.sort()
    sorted_months = months_list
    total_payments = len(sorted_months)
    
    missed_payments = 12 - total_payments
    
    if missed_payments > max_allowed_missed_months:
        return name, missed_payments
    
    consecutive_missed_months = 0
    
    temp = sorted_months[0] - 1
    if temp > consecutive_missed_months:
        consecutive_missed_months = temp
    
    temp = 12 - sorted_months[total_payments-1]
    if temp > consecutive_missed_months:
        consecutive_missed_months = temp
    
    for i in range(1, len(sorted_months)):
        temp = sorted_months[i] - sorted_months[i-1] -1
        if temp > consecutive_missed_months:
            consecutive_missed_months = temp
    
    if consecutive_missed_months > max_allowed_consecutive_missing:
        return name, consecutive_missed_months
    return name, 0
    
def format_output(sum_pair):
    key_name, miss_months = sum_pair
    return str(key_name) + ',' + str(miss_months) + ' missed'

def format_result(sum_pair):
    key_name, points = sum_pair
    return str(key_name) + ',' + str(points) + ' fraud_points'


def return_tuple(element):
    temp_tuple=element.split(',')
    return(temp_tuple[0], temp_tuple[1:])
    
with beam.Pipeline() as p:
    card_defaulter = (
        p
        | "read CC data" >> beam.io.ReadFromText('./testdata/beam_data/bank/bank/cards.txt', skip_header_lines=1)
        | "calc default point" >> beam.Map(calculate_points)
        | "Sum default total" >> beam.CombinePerKey(sum)
        | "filter card defaulter" >> beam.Filter(lambda element: element[1] > 0)
        | "output" >> beam.Map(format_result)
        | "output on tuple" >> beam.Map(return_tuple)
        #| "output file" >> beam.io.WriteToText('./testdata/beam_data/bank/bank/output/default')
    )
    medical_loan_defaulter = (
        p
        | beam.io.ReadFromText('./testdata/beam_data/bank/bank/loan.txt')
        | "split row" >> beam.Map(lambda row: row.split(','))
        | "filter medical" >> beam.Filter(lambda element: (element[5]).rstrip().lstrip() == 'Medical Loan')
        | "calculate late payment" >> beam.Map(calculate_late_payment)
        | "make key value pairs" >> beam.Map(lambda element: (element[0] + ', ' + element[1]+ ' ' + element[2],int(element[9]))) #id, first last name, number of missed
        | "group medical loan based on month" >> beam.CombinePerKey(sum)
        | "format medical loan output" >> beam.Map(format_output)
        #| "output file_med" >> beam.io.WriteToText('./testdata/beam_data/bank/bank/output/loanmed')
    )
    
    personal_loan_defaulter = (
        p
        | "read" >> beam.io.ReadFromText('./testdata/beam_data/bank/bank/loan.txt')
        | "split" >> beam.Map(lambda row: row.split(','))
        | "filter personal" >> beam.Filter(lambda element: (element[5]).rstrip().lstrip() == 'Personal Loan')
        | "split and append new months" >> beam.Map(calculate_month)
        | "make key value pairs loan" >> beam.Map(lambda elements: (elements[0] + ', ' +elements[1] + ' ' +elements[2], int(elements[9])))
        | "group personal loan based on month" >> beam.GroupByKey()
        | "check for personal loan defaulter" >> beam.Map(calculate_personal_loan_defaulter)
        | "filter only defaulter" >> beam.Filter(lambda element:element[1]>0)
        | "format personal loan output" >> beam.Map(format_output)
        #| "output file_personal" >> beam.io.WriteToText('./testdata/beam_data/bank/bank/output/loanpersonal')
    )
    final_loan_defaulter = (
        (personal_loan_defaulter, medical_loan_defaulter)
        | "union both defaulter" >> beam.Flatten()
        #| "test" >> beam.Map(print)
        | "output tuple" >> beam.Map(return_tuple)
    )
    # join for card defaulter and flattened result of both loan defaulter
    both_defaulters = (
        {'card_defaulter':card_defaulter, 'loan_defaulter':final_loan_defaulter}
        | "Join" >> beam.CoGroupByKey()
        #| "output file_personal" >> beam.io.WriteToText('./testdata/beam_data/bank/bank/output/bothdefaulter')
    )
    

In [23]:
!powershell -Command "Get-Content -TotalCount 5 './testdata/beam_data/bank/bank/output/default-00000-of-00001'"

CT28383,Miyako,Burns,9 fraud_points
CT74474,Nanaho,Brennan,9 fraud_points
CT66322,Chris,Bruce,11 fraud_points
CT65528,Bonnie,Barlow,10 fraud_points
CT84463,Isaac,Bowman,8 fraud_points


In [24]:
!powershell -Command "Get-Content -TotalCount 5 './testdata/beam_data/bank/bank/output/loanmed-00000-of-00001'"

CT88330, Humberto Banks,7 missed
CT71222, Josephine Barr,9 missed
CT14299, Miyuki Brooks,6 missed
CT63122, Etsuko Branch,5 missed
CT12439, Shary Cash,7 missed


In [25]:
!powershell -Command "Get-Content -TotalCount 5 './testdata/beam_data/bank/bank/output/loanpersonal-00000-of-00001'"

CT68554, Ronald Chiki,3 missed
CT56276, Fay Carr,10 missed
CT30950, Arlene Calderon,10 missed
CT27126, Nicole Acevedo,6 missed
CT29233, Wilma Abbott,5 missed


In [31]:
!powershell -Command "Get-Content -TotalCount 5 './testdata/beam_data/bank/bank/output/bothdefaulter-00000-of-00001'"

('CT28383', {'card_defaulter': [['Miyako', 'Burns', '9 fraud_points']], 'loan_defaulter': []})
('CT74474', {'card_defaulter': [['Nanaho', 'Brennan', '9 fraud_points']], 'loan_defaulter': [[' Nanaho Brennan', '5 missed']]})
('CT66322', {'card_defaulter': [['Chris', 'Bruce', '11 fraud_points']], 'loan_defaulter': [[' Chris Bruce', '8 missed']]})
('CT65528', {'card_defaulter': [['Bonnie', 'Barlow', '10 fraud_points']], 'loan_defaulter': []})
('CT84463', {'card_defaulter': [['Isaac', 'Bowman', '8 fraud_points']], 'loan_defaulter': [[' Isaac Bowman', '3 missed']]})
