# Credit Card Fraud

Questions to answer: 

1) Your boss wants to identify those users that in your dataset never went above the monthly credit card limit. The goal of this is to automatically increase their limit. Can you send him the list of Ids?

2) On the other hand, she wants you to implement an algorithm that as soon as a user goes above her monthly limit, it triggers an alert so that the user can be notified about that. Build a function that for each day, returns a list of users who went above their credit card monthly limit on that day. 

3) Finally, your boss is very concerned about frauds cause they are a huge cost for credit card companies. She wants you to implement an unsupervised algorithm that returns all transactions that seem unusual and are worth being investigated further.

In [44]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import datetime
from dateutil.relativedelta import relativedelta


In [20]:
cc_info = pd.read_csv('../Credit_Card_Data_Challenge/cc_info.csv')
print len(cc_info), cc_info.dtypes
cc_info.head()

984 credit_card           int64
city                 object
state                object
zipcode               int64
credit_card_limit     int64
dtype: object


Unnamed: 0,credit_card,city,state,zipcode,credit_card_limit
0,1280981422329509,Dallas,PA,18612,6000
1,9737219864179988,Houston,PA,15342,16000
2,4749889059323202,Auburn,MA,1501,14000
3,9591503562024072,Orlando,WV,26412,18000
4,2095640259001271,New York,NY,10001,20000


In [19]:
transaction = pd.read_csv('../Credit_Card_Data_Challenge/transactions.csv')
print len(transaction), transaction.dtypes
transaction.head()

294588 credit_card                    int64
date                          object
transaction_dollar_amount    float64
Long                         float64
Lat                          float64
dtype: object


Unnamed: 0,credit_card,date,transaction_dollar_amount,Long,Lat
0,1003715054175576,2015-09-11 00:32:40,43.78,-80.174132,40.26737
1,1003715054175576,2015-10-24 22:23:08,103.15,-80.19424,40.180114
2,1003715054175576,2015-10-26 18:19:36,48.55,-80.211033,40.313004
3,1003715054175576,2015-10-22 19:41:10,136.18,-80.174138,40.290895
4,1003715054175576,2015-10-26 20:08:22,71.82,-80.23872,40.166719


In [21]:
transaction['date'] = pd.to_datetime(transaction['date'])
transaction.dtypes

credit_card                           int64
date                         datetime64[ns]
transaction_dollar_amount           float64
Long                                float64
Lat                                 float64
dtype: object

Slice the transaction data for each credit card and determine if their monthly spend exceeds their limit or not

In [31]:
good_user = []
for i in range(len(cc_info)):
    trans_month = transaction.loc[transaction['credit_card'] == cc_info['credit_card'][i]]
    trans_month = trans_month[['date','transaction_dollar_amount']]
    trans_month.index = trans_month['date']
    trans_month = trans_month.resample('M').sum()
    trans_month['exceed'] = trans_month['transaction_dollar_amount'].apply(lambda x:
                            0 if x < cc_info['credit_card_limit'][i] else 1)
    if trans_month['exceed'].sum() == 0:
        good_user.append(cc_info['credit_card'][i])
print good_user[:5]                                       

[4749889059323202, 9591503562024072, 2095640259001271, 1997929794676601, 5449610971108305]


In [40]:
print 'There are %d customers who never exceed their monthly credit limit, which is %.2f percent \
of the total credit card accounts.' % (len(good_user), float(len(good_user))*100/len(cc_info))

There are 860 customers who never exceed their monthly credit limit, which is 87.40 percent of the total credit card accounts.


Build a function that for each day, returns a list of users who went above their credit card monthly limit on that day.

In [59]:
def find_limit(day, cc_info, transaction):
    day = pd.to_datetime(day)
    first_day = day.replace(day=1)
    bad_user = []
    for i in range(len(cc_info)):
        trans_month = transaction.loc[transaction['credit_card'] == cc_info['credit_card'][i]]
        trans_month = trans_month[['date','transaction_dollar_amount']]
        trans_month = trans_month.loc[trans_month['date'] >= first_day]
        trans_month = trans_month.loc[trans_month['date'] <= day]
        trans_month.index = trans_month['date']
        trans_month = trans_month.resample('M').sum()
        trans_month['exceed'] = trans_month['transaction_dollar_amount'].apply(lambda x:
                                0 if x < cc_info['credit_card_limit'][i] else 1)
        if trans_month['exceed'].sum() != 0:
            bad_user.append(cc_info['credit_card'][i])
    return bad_user

In [62]:
print find_limit('2015-09-25',cc_info,transaction)

[2505223645294729, 7499289351166761, 3281814060807145, 1106824181265726, 6984795534098127, 4118286032166087, 3369600965634913, 6174559182308122, 2245942585429940, 9632319271199136, 7324887971716592, 7850942767136368, 6766253113444560, 7943675133681182, 4564117045739728, 5795626689544539, 1460880989446247, 7299183791723634, 5723635641134781, 7214837915436490, 7059627552446649, 8972201384562696, 6292410823269309, 4052848131106690, 8896425420278012, 7107467078128879]


Use unsupervised learning to find pattern for fraud prediction. First need to segment the region by longtitude and latitude. 