# Bank Assignment

## Credit card skippers/defaulters:

- Assign 1 point to customer for short payment, where a short payment means when customer fails to clear at least 70% of its monthly spends.

- Assign 1 point to customer where he has spent 100% of his max_limit but did not clear the full amount.

- If for any month customer is meeting both the above conditions, assign 1 additional point.

- Sum up all the points for a customer and output in file top-10 card skippers.

## Loan Defaulters:

- For Personal loan category, Bank does not accept short or late payments. If a person has not paid monthly installment then that month's entry won't be present in the file.

- For Medical loan, Bank does accepts late payments but it should be the full amount. It is assumed that there is every month's data/record for Medical Loan.

- Medical Loan defaulters : If customer has made a total of 3 or more late payments.

- Personal Loan defaulters : If customer has missed a total of 4 or more installments OR missed 2 consecutive installments.

In [1]:
!{'pip install --quiet apache-beam'}

[K     |████████████████████████████████| 9.8 MB 43.2 MB/s 
[K     |████████████████████████████████| 151 kB 56.4 MB/s 
[K     |████████████████████████████████| 2.3 MB 41.9 MB/s 
[K     |████████████████████████████████| 829 kB 52.2 MB/s 
[K     |████████████████████████████████| 247 kB 53.8 MB/s 
[K     |████████████████████████████████| 62 kB 750 kB/s 
[?25h  Building wheel for avro-python3 (setup.py) ... [?25l[?25hdone
  Building wheel for dill (setup.py) ... [?25l[?25hdone
  Building wheel for future (setup.py) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
multiprocess 0.70.12.2 requires dill>=0.3.4, but you have dill 0.3.1.1 which is incompatible.
google-colab 1.0.0 requires requests~=2.23.0, but you have requests 2.26.0 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which

In [4]:
import apache_beam as beam
from datetime import datetime


def calculate_points(element):

  customer_id, first_name, last_name, realtionship_id, card_type, max_limit, spent, cash_withdrawn,payment_cleared,payment_date = element.split(',')
  #[CT28383,Miyako,Burns,R_7488,Issuers,500,490,38,101,30-01-2018]
  
  spent = int(spent)    # spent = 490
  payment_cleared = int(payment_cleared)   #payment_cleared = 101
  max_limit = int(max_limit)               # max_limit = 500
  
  key_name = customer_id + ', ' + first_name + ' ' + last_name     # key_name = CT28383,Miyako Burns
  defaulter_points = 0
  
  # payment_cleared is less than 70% of spent - give 1 point
  if payment_cleared < (spent * 0.7): 
     defaulter_points += 1                                                # defaulter_points =  1 
 
  # spend is = 100% of max limit and any amount of payment is pending
  if (spent == max_limit) and (payment_cleared < spent): 
     defaulter_points += 1                                                # defaulter_points =  2
   
  if (spent == max_limit) and (payment_cleared < (spent*0.7)): 
     defaulter_points += 1                                                # defaulter_points = 3
                                  
  return key_name, defaulter_points      


def is_medical_defaulter(record):
  due_date = datetime.strptime(record[6].strip(), "%d-%m-%Y")
  payment_date = datetime.strptime(record[-1].strip(), "%d-%m-%Y")
  if payment_date > due_date:
    record.append(1)
  else:
    record.append(0)
  return record


def format_defaulters(record):
  return record[0] + " has " + str(record[1]) + " defaults"


def format_result(sum_pair):
  key_name, points = sum_pair
  return str(key_name) + ', ' + str(points) + ' fraud_points' 


def extract_month(record):
  payment_date = datetime.strptime(record[-1].strip(), "%d-%m-%Y")
  return (record[0] + ", " + record[1] + " " + record[2], payment_date.month)


def calc_num_defaults(record):
  key, months = record
  months.sort(reverse=False)
  max_total_defaults = 4
  max_consecutive_defaults = 2
  total_defaults = 12 - len(months)

  if total_defaults >= max_total_defaults:
    return (key, total_defaults)

  consecutive_defaults = 0
  tmp = months[0] - 1
  if tmp > consecutive_defaults:
    consecutive_defaults = tmp

  tmp = 12 - months[-1]
  if tmp > consecutive_defaults:
    consecutive_defaults = tmp

  for i in range(1, len(months)):
    tmp = months[i] - months[i - 1] - 1
    if tmp > consecutive_defaults:
      consecutive_defaults = tmp

  return (key, consecutive_defaults)



def sorting_output(record):
  key, sort_data = record
  sort_data.sort(key=lambda x: x[1], reverse=True)
  return sort_data[:10]


class MyTransform(beam.PTransform):
  def expand(self, input_collection):
    return (
        input_collection 
        | "Mapping each record with 1" >> beam.Map(lambda record: (1, record))
        | "Grouping all records to a single record" >> beam.GroupByKey()
        | "Sorting Records and getting to new line" >> beam.FlatMap(sorting_output)
        | "Formatting output" >> beam.Map(format_defaulters)
        
    )


with beam.Pipeline() as p:
  card_defaulter = (
                  p
                  | 'Read credit card data' >> beam.io.ReadFromText('cards.txt',skip_header_lines=1)
                  | 'Calculate defaulter points' >> beam.Map(calculate_points)                            
                  | 'Combine points for defaulters' >> beam.CombinePerKey(sum)                             
                  | 'Filter card defaulters' >> beam.Filter(lambda element: element[1] > 0)
                  | 'Format output' >> beam.Map(format_result)                                           
                  | 'Write credit card data' >> beam.io.WriteToText('outputs/card_skippers') 
                  )


  medical_loan_defaulters = (
      p | "First Data Read" >> beam.io.ReadFromText("loan.txt", skip_header_lines=1)
        | beam.Map(lambda record: record.split(","))
        | beam.Filter(lambda record: record[5].strip() == "Medical Loan")
        | beam.Map(is_medical_defaulter)
        | beam.Map(lambda record: (record[0] + ", " + record[1] + " " + record[2], record[-1]))
        | beam.CombinePerKey(sum)
        | "First use of custom transform" >> MyTransform()
  )

  personal_loan_defaulters = (
      p | "Second Data Read" >> beam.io.ReadFromText("loan.txt", skip_header_lines=1)
        | beam.Map(lambda record: record.split(","))
        | beam.Filter(lambda record: record[5].strip() == "Personal Loan")
        | beam.Map(extract_month)
        | beam.GroupByKey()
        | beam.Map(calc_num_defaults)
        | "Second use of custom transform" >> MyTransform()
  )

  all_defaulters = (
      (medical_loan_defaulters, personal_loan_defaulters) | beam.Flatten()
                                                          | beam.io.WriteToText("output")
  )

