The datasets are provided in Kaggle. There are two main datasets: historical transactions data record all historical transaction information (transaction date, purchased products, purchase amount, etc.) and new_merchant_transactions (same features as historical transactions). 

The greatest challenge of this project is that the dataset is large with 2M observations. Finding the best algorithm which works on large dataset is also a challenge.

### Data Cleaning

In [2]:
import pandas as pd
# merge two dataset into one
df = pd.read_csv('historical_transactions.csv')
df_new_metchants = pd.read_csv('new_merchant_transactions.csv')
df_combined = pd.concat([df, df_new_metchants])
df_combined.drop_duplicates(inplace=True)

In [None]:
import matplotlib.pyplot as plt
df_train = pd.read_csv('train.csv')
# The datasets are simulated and not real, so the data needs cleaning
df_train=df_train.drop(df_train.index[df_train['target'] <= -10])
df_train=df_train.drop(df_train.index[df_train['target'] >= 10])

In [None]:
import numpy as np

# bin the continuous target variable to three categories
df_train.drop(df_train.index[df_train['target'] <= -10])
bins=[-10, -1, 1, 10]
df_train['target_binned'] = pd.cut(df_train['target'], bins)
df_train['target_binned'] = df_train['target_binned'].astype(str)

In [None]:
def bin_target (row):
    if row['target_binned'] == '(-1, 1]':
        return "0"
    if row['target_binned'] == '(-10, -1]':
        return "-1"
    if row['target_binned'] == '(1, 10]':
        return "1"
    return np.nan
  
df_train['target_binned'] = df_train.apply (lambda row: bin_target (row),axis=1)

In [None]:
df_train.dropna(subset=['target_binned'], inplace=True)

In [None]:
df_train[df_train['target_binned'] == "1"].count()['target_binned']

In [None]:
# keep original features
df_train['org_feature_1'] = df_train['feature_1']
df_train['org_feature_2'] = df_train['feature_2']
df_train['org_feature_3'] = df_train['feature_3']
        
def convert_type(df):
    column_list = ['feature_1','feature_2','feature_3']
    for i in column_list:
        df[i] = df[i].astype(str)
        
# df_train.apply (lambda row: convert_type(row), axis=1)

convert_type(df_train)

In [None]:
df_target = pd.merge(df_combined,
                  df_train,
                  on='card_id',
                  how = 'left')

In [None]:
df_target.dropna(how='any', inplace=True)

In [None]:
# keep original features
df_target['org_authorized_flag'] = df_target['authorized_flag']
df_target['org_category_1'] = df_target['category_1']
df_target['org_category_3'] = df_target['category_3']

In [None]:
df_target['purchase_date'] = pd.to_datetime(df_target['purchase_date'])
df_target_2018 = df_target[:]
df_target_2018 = df_target_2018[df_target_2018['purchase_date'].dt.year >= 2018]

In [None]:
# Splitting raw data into training/testing data by card_id not row index
# Because the later will seperate multiple transactions records from one card holder to different training/testing sample.
import numpy as np

def get_user_split_data(df, test_size=.2, seed=42):

    rs = np.random.RandomState(seed)
    
    total_users = df['card_id'].unique() 
    test_users = rs.choice(total_users, 
                           size=int(total_users.shape[0] * test_size), 
                           replace=False)

    df_tr = df[~df['card_id'].isin(test_users)]
    df_te = df[df['card_id'].isin(test_users)] 

    y_tr, y_te = df_tr['target_binned'], df_te['target_binned']
    X_tr = df_tr[['installments','purchase_amount','category_2','feature_1_2','feature_1_3','feature_1_4','feature_1_5','feature_2_2', 'feature_2_3', 'feature_3_1','auth_flag_Y','category_3_B','category_3_C','category_1_N']] 
    X_te = df_te[['installments','purchase_amount','category_2','feature_1_2','feature_1_3','feature_1_4','feature_1_5','feature_2_2', 'feature_2_3', 'feature_3_1','auth_flag_Y','category_3_B','category_3_C','category_1_N']]

    return X_tr, X_te, y_tr, y_te

X_tr, X_te, y_tr, y_te = get_user_split_data(df_target_2018)

### Class imbalance analysis

**What is imbalance?**
Imbalance means rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”).

**What are the impacts if the target variable is imbalanced?** 
Assume if you are trying to predict fraud or cancer, and the occurance of fraud/cancer is vary rare (2% of the population). If model does not predict anything by just assigning all prediction to non-fraud/healthy, the accuracy of the model still very high (98%!). But do you consider model performance acceptable? The recall is actually very low (0%). 

Statistically, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. Why? Linear regression models are invariant to the (unconditional) mean of the dependent variables. However, the same is not true for binary dependent variable models. The mean of a binary variable is the relative frequency of events in the data, which, in addition to the number of observations, constitutes the information content of the data set. 
By studying the variance matrix:
$$V(\hat{\beta}) = [\sum_{i=1}^{n}\pi_{i}(1-\pi_{i})X'_{i}X_{i}] ^{-1}$$
The part of this matrix affected by rare events is the factor $\pi_{i}(1-\pi_{i})$.

**What's the typical approach to handle imbalance?**

_Oversampling_

RandomOverSampler to handle imbalanced target variable: does what its name implies. Takes the minority class and over-samples it until it is balanced with the majority class.

_Generate Synthetic data_

1) SMOTE - Synthetic Minority Oversampling Technique: start with a point in the monority class, choose one of the k nearest neighbors, add a new point between them. Two main approaches: SMOTE and ADASYN. (**Ans:** SMOTE does not differentiate between points near the decision boundary and points far away from it. Thus, it generated new points in areas that did not matter for the classifier.)

2) ADASYN - ADAptive SYNthetic oversampling. Instead of generating synthetic observations between any minority points, it puts more emphasis on the regions where the class imbalance is greatest. In other words, in the regions where the classifier is most likely to predict the majority class.

Which oversampling method worked best for **this** dataset? Why? 

**Ans:** This depends. Understand the business reason and the costs associated with false positives/negatives to determine what is an acceptable trade-off.

_Undersampling_

Undersampling is the opposite of Oversampling. It takes the majority class and under-samples it until it is balanced with the minority class.

**In what scenarios would this method be useful?**

**Ans:** If your model is computationally expensive and doubling the size of the data would hurt performance, undersampling would be a better approach.

Fortunately, the methods we covered for binary classification still work in a multiclass setting.

---------------------------------------------------------------------------------------------------------------------

**Are there any different approaches?** Yes.

The two corrections are: 1) Prior correction: Prior correction involves computing the usual logistic regression MLE and correcting the estimates based on prior information about the fraction of ones in the population, τ, and the observed fraction of ones in the sample (or sampling probability), $\bar{y}$. For the logit model, in any of the above sampling designs, the MLE $\hat{\beta}_{1}$ is a statistically consistent estimate of β1 and the following corrected estimate is consistent for $\beta_{0}$: 
$$\beta_{0}-ln[(\frac{1-τ}{τ})(\frac{\bar{y}}{1-\bar{y}})]$$

2) Weighting: An alternative procedure is to weight the data to compensate for differences in the sample $\bar{y}$ and population τ fractions of ones induced by choice-based sampling. The weighted log-likelihood: 
$$lnL_{w}(\beta|y) = w_{1}\sum_{Y_{i}=1}ln(\pi_{i})+w_{0}\sum_{Y_{i}=0}ln(1-\pi_{i})$$
where $w_{1} = τ/\bar{y}$ and $w_{0} = (1 - τ)/(1 - \bar{y})$.

All information is coming from this paper, for more information, read here: [Logistic Regression in Rare Events Data](https://gking.harvard.edu/files/0s.pdf)


**How to determine if the target variable is imbalanced or not?**
If there are two classes, then balanced data would mean 50% points for each of the class. For most machine learning techniques, little imbalance is not a problem. So, if there are 60% points for one class and 40% for the other class, it should not cause any significant performance degradation. Only when the class imbalance is high, e.g. 90% points for one class and 10% for the other, standard optimization criteria or performance measures may not be as effective and would need modification.



### Recap ###

Things to note:

- Which method we use will be dependent on the problem. For example, sometimes ADAYSN will work great, in other cases, not so much. 
- If the dataset is small to begin with, undersampling may reduce your data too much and many classifiers will have difficulty generalizing.
- Oversampling methods may prove to be computationally intensive depending which algorithm is being used. Find out what limitations you have an adjust accordingly.
- Always think of the business case! Is misclassifying the majority class just as bad as misclassifying the minority class? What is the right metric for your model?

In [None]:
# Sampling for imbalanced data
from imblearn.under_sampling import RandomUnderSampler
import seaborn as sns

In [None]:
df = pd.DataFrame(X_resampled)
df.head()
# df['target_binned'] = y

In [None]:
rus = RandomUnderSampler(random_state=42)
X_resampled, y_resampled = rus.fit_sample(X_tr, y_tr)

In [None]:
from collections import Counter
Counter(y_resampled)

In [None]:
from sklearn.metrics import confusion_matrix  
  
# training a DescisionTreeClassifier 
from sklearn.tree import DecisionTreeClassifier 
dtree_model = DecisionTreeClassifier(max_depth = 20).fit(X_resampled, y_resampled) 
dtree_predictions = dtree_model.predict(X_te) 
# accuracy = knn.score(X_te, y_te) 

# creating a confusion matrix 
cm = confusion_matrix(y_te, dtree_predictions) 

In [None]:
def print_confusion_matrix(confusion_matrix, class_names, figsize = (10,7), fontsize=18):
    df_cm = pd.DataFrame(confusion_matrix, index=class_names, columns=class_names, )
    fig = plt.figure(figsize=figsize)
    try:
        heatmap = sns.heatmap(df_cm, annot=True, fmt="d")
    except ValueError:
        raise ValueError("Confusion matrix values must be integers.")
    heatmap.yaxis.set_ticklabels(heatmap.yaxis.get_ticklabels(), rotation=0, ha='right', fontsize=fontsize)
    heatmap.xaxis.set_ticklabels(heatmap.xaxis.get_ticklabels(), rotation=45, ha='right', fontsize=fontsize)
    plt.ylabel('True label', fontsize=20)
    plt.xlabel('Predicted label', fontsize=20)
    return fig

In [None]:
cm = print_confusion_matrix(confusion_matrix(y_te, dtree_predictions), ['Class Bad ', 'Class Neutral', 'Class Good'])

In [None]:
df_target['purchase_date'] = pd.to_datetime(df_target['purchase_date'])

### Feature Engineering

#### Aggregate the transaction data and make it rolling sum purchase amount

In [None]:
df_target['rolling_sum'] = df_target.groupby(['card_id'])['purchase_amount'].cumsum()

In [None]:
df_target['purchase_date'].head()

In [None]:
df_target['date_month'] = df_target['purchase_date'].dt.day
df_target['abs_purchase_amount'] = abs(df_target['purchase_amount'])

In [None]:
date_month_1 = df_target.groupby(['card_id','date_month'],as_index=False).agg({'purchase_amount':['sum','mean','count']})
date_month_1.columns = ['_'.join(col).strip() for col in date_month_1.columns.values]
date_month_1.columns = ['card_id', 'date_month', 'purchase_amount_month_sum', 'purchase_amount_month_mean', 'purchase_amount_month_count']

In [None]:
len(date_month_1)

In [None]:
df_target = pd.merge(df_target, date_month_1, on=['card_id','date_month'], how='left')

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

ax = sns.lineplot(x="date_month", y="abs_purchase_amount", data=date_month_1)

#### Let's see if the pattern still hold when I split the data by year and do the same analysis.

In [None]:
df_target_2017 = df_target[df_target['purchase_date'].dt.year == 2017]

In [None]:
df_target_2017.head()

In [None]:
date_month_2017 = df_target_2017.groupby(['date_month'],as_index=False).agg({'abs_purchase_amount':'sum'})
date_month_2017

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

ax = sns.lineplot(x="date_month", y="abs_purchase_amount", data=date_month_2017)

In [None]:
df_target_2018 = df_target[df_target['purchase_date'].dt.year == 2018]
date_month_2018 = df_target_2018.groupby(['date_month'],as_index=False).agg({'purchase_amount':'sum'})
date_month_2018

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

ax = sns.lineplot(x="date_month", y="purchase_amount", data=date_month_2018)

In [None]:
df_target['first_active_month'].head()

In [None]:
#### Remove date around 12-20 to 12-30 to reduce noise
import datetime

df_wo_noise = df_target[(df_target['purchase_date'].dt.date < datetime.date(2018, 12, 23)) | (df_target['purchase_date'].dt.date > datetime.date(2018, 12, 29))]
df_wo_noise = df_wo_noise[(df_wo_noise['purchase_date'].dt.date < datetime.date(2017, 12, 23)) | (df_wo_noise['purchase_date'].dt.date > datetime.date(2017, 12, 29))]


In [None]:
df_target_2017 = df_wo_noise[df_wo_noise['purchase_date'].dt.year == 2017 ]
date_month_2017 = df_target_2017.groupby(['date_month'],as_index=False).agg({'abs_purchase_amount':'sum'})
ax = sns.lineplot(x="date_month", y="abs_purchase_amount", data=date_month_2017)

In [None]:
df_target_2018 = df_wo_noise[df_wo_noise['purchase_date'].dt.year == 2018]
date_month_2018 = df_target_2018.groupby(['date_month'],as_index=False).agg({'abs_purchase_amount':'sum'})
ax = sns.lineplot(x="date_month", y="abs_purchase_amount", data=date_month_2018)

In [None]:
date_month = df_wo_noise.groupby(['date_month'],as_index=False).agg({'abs_purchase_amount':'sum'})
ax = sns.lineplot(x="date_month", y="abs_purchase_amount", data=date_month)

In [None]:
issue_investigation = df_wo_noise[df_wo_noise['date_month'] == 23]
issue_investigation['purchase_date'] = issue_investigation['purchase_date'].dt.date

In [None]:
issue_investigation_1 = issue_investigation.groupby(['purchase_date'],as_index=False).agg({'abs_purchase_amount':'sum'})

In [None]:
issue_investigation_1 

In [None]:
# df_wo_noise = df_wo_noise[(df_wo_noise['purchase_date'].dt.date != datetime.date(2017, 4, 23))]
import seaborn as sns
import matplotlib.pyplot as plt
df_wo_noise_1 = df_target[(df_target['card_id'] != 'C_ID_3b6ac8e52d') | (df_target['authorized_flag'] != 'N')]
            

In [None]:
date_month = df_wo_noise_1.groupby(['date_month'],as_index=False).agg({'abs_purchase_amount':'mean'})
ax = sns.lineplot(x="date_month", y="abs_purchase_amount", data=date_month)

In [None]:
df_test = df_target[(df_target['purchase_date'].dt.date == datetime.date(2017, 4, 23))]

In [None]:
df_test.groupby(['card_id'],as_index=False).agg({'abs_purchase_amount':'sum'}).sort_values(['abs_purchase_amount'], ascending = False)

In [None]:
df_target[df_target['card_id'] == 'C_ID_3b6ac8e52d']

##### Analyze day of week pattern

In [None]:
df_target['day_of_week'] = df_target['purchase_date'].dt.dayofweek

In [None]:
df_target.head()
len(df_target)

##### Test 

In [None]:
dayofweek = df_target.groupby(['card_id','day_of_week'],as_index=False).agg({'purchase_amount':['sum','mean','count']})
dayofweek.columns = ['_'.join(col).strip() for col in dayofweek.columns.values]
dayofweek.columns = ['card_id', 'day_of_week', 'purchase_amount_dayofweek_sum', 'purchase_amount_dayofweek_mean', 'purchase_amount_dayofweek_count']

In [None]:
df_target = pd.merge(df_target, dayofweek, on=['card_id','day_of_week'], how='left')

##### Analyze pattern by Hour 

In [None]:
df_target['hour'] = df_target['purchase_date'].dt.time

In [None]:
df_target['hour'] = df_target['hour'].apply(lambda x: x.strftime('%H-%M-%S'))
df_target['hour'] = df_target['hour'].apply(lambda x: x.split('-')[0])

In [None]:
date_hour = df_target.groupby(['card_id','hour'],as_index=False).agg({'purchase_amount':['sum','mean','count']})
date_hour.columns = ['_'.join(col).strip() for col in date_hour.columns.values]
date_hour.columns = ['card_id', 'hour', 'purchase_amount_hour_sum', 'purchase_amount_hour_mean', 'purchase_amount_hour_count']

In [None]:
df_target = pd.merge(df_target, date_hour, on=['card_id','hour'], how='left')

In [None]:
df_target

In [None]:
hourofday = df_target.groupby(['hour'],as_index=False).agg({'abs_purchase_amount':'count'})
ax = sns.lineplot(x="hour", y="abs_purchase_amount", data=hourofday)

##### Exam people's purchase behavior across different state

In [None]:
bystate = df_target.groupby(['state_id'],as_index=False).agg({'abs_purchase_amount':'mean'})
ax = sns.lineplot(x='state_id', y="abs_purchase_amount", data=bystate)

In [None]:
bycity = df_target.groupby(['city_id'],as_index=False).agg({'abs_purchase_amount':'count'})
ax = sns.lineplot(x='city_id', y="abs_purchase_amount", data=bycity)

In [None]:
df_target.groupby(['merchant_category_id'],as_index=False).agg({'abs_purchase_amount':'count'})

##### Analyze Usage Frequency 

In [None]:
import datetime
df_target['first_active_month'] = pd.to_datetime(df_target['first_active_month'])
        
df_target['month_diff'] = ((datetime.datetime.today() - df_target['purchase_date']).dt.days)//30                                        
df_target['month_diff'] += df_target['month_lag']
df_target['elapsed_time'] = (datetime.date(2018, 2, 1) - df_target['first_active_month'].dt.date).dt.days

In [None]:
from collections import OrderedDict
purchase_month = df_target.groupby(['card_id','elapsed_time'],as_index=False).agg(OrderedDict([('purchase_amount','count')]))
purchase_month.columns = ['card_id', 'elapsed_time','# of purchase']

purchase_month['order_freq'] = purchase_month['# of purchase'] / purchase_month['elapsed_time']


In [None]:
purchase_month = purchase_month[['card_id', 'order_freq']]
purchase_month

In [None]:
df_target = pd.merge(df_target, purchase_month, on=['card_id'], how='left')
len(df_target)

In [None]:
customer_amount = df_target.groupby(['card_id'],as_index=False).agg(OrderedDict([('purchase_amount','sum'),('purchase_date','count')]))                  
customer_amount.columns = ['card_id', 'purchase_amount', 'total_usage']
customer_amount['amount_per_use'] = customer_amount['purchase_amount'] / customer_amount['total_usage']
customer_amount = customer_amount[['card_id','total_usage','amount_per_use']]

In [None]:
df_target = pd.merge(df_target, customer_amount, on=['card_id'], how='left')
len(df_target)

In [None]:
import pickle
import pandas as pd

with open('my_dataframe_newfeatures.pickle','rb') as read_file:
    df_target = pickle.load(read_file)

In [None]:
import pandas as pd
df_target.to_csv('my_dataframe_newfeatures.csv', sep=',')

In [None]:
y = df_target['target_binned']
X = df_target[['installments','merchant_category_id','purchase_amount','month_lag','category_2','rolling_sum','date_month','abs_purchase_amount', 'day_of_week','hour','purchase_amount_month_sum',
              'purchase_amount_month_mean','purchase_amount_month_count', 'purchase_amount_dayofweek_sum', 'purchase_amount_dayofweek_mean','purchase_amount_dayofweek_count',
              'purchase_amount_hour_sum','purchase_amount_hour_mean','purchase_amount_hour_count', 'auth_flag_Y','category_1_N', 'category_3_B','category_3_C','feature_1_2','feature_1_3','feature_1_4','feature_2_3','feature_3_1','feature_1_5','feature_2_2',
              'feature_1_2','feature_1_3','feature_1_4','feature_1_5','feature_2_2', 'feature_2_3', 'feature_3_1','auth_flag_Y','category_3_B','category_3_C','category_1_N', 'month_diff', 'elapsed_time', 'order_freq', 'total_usage', 'amount_per_use']] 


In [None]:
import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import make_classification
from sklearn.ensemble import ExtraTreesClassifier

## Feature Importance Analysis

In [None]:
df_target = pd.get_dummies(df_target, prefix=['auth_flag'], columns=['authorized_flag'], drop_first=True)
df_target = pd.get_dummies(df_target, prefix=['category_1'], columns=['category_1'])
df_target = pd.get_dummies(df_target, prefix=['category_3'], columns=['category_3'],  drop_first=True)
df_target = pd.get_dummies(df_target, prefix=['feature_1'], columns=['feature_1'], drop_first=True)
df_target = pd.get_dummies(df_target, prefix=['feature_2'], columns=['feature_2'], drop_first=True)
df_target = pd.get_dummies(df_target, prefix=['feature_3'], columns=['feature_3'], drop_first=True)

In [None]:
df_target['auth_flag_Y'] = df_target['auth_flag_Y'].astype(int)
df_target['category_1_N'] = df_target['category_1_N'].astype(int)
df_target['category_3_B'] = df_target['category_3_B'].astype(int)
df_target['category_3_C'] = df_target['category_3_C'].astype(int)
df_target['feature_1_5'] = df_target['feature_1_5'].astype(int)
df_target['feature_2_2'] = df_target['feature_2_2'].astype(int)

df_target.dtypes

In [None]:
df_target['feature_1_2'] = df_target['feature_1_2'].astype(int)
df_target['feature_1_3'] = df_target['feature_1_3'].astype(int)
df_target['feature_1_4'] = df_target['feature_1_4'].astype(int)
df_target['feature_2_3'] = df_target['feature_2_3'].astype(int)
df_target['feature_3_1'] = df_target['feature_3_1'].astype(int)

In [None]:
# Build a random forest model and compute the feature importances
from sklearn.ensemble import RandomForestClassifier

randomforest = RandomForestClassifier().fit(X, y) 

# forest = ExtraTreesClassifier(n_estimators=250,
#                               random_state=0)

# forest.fit(np.array(X), y)
importances = randomforest.feature_importances_
std = np.std([tree.feature_importances_ for tree in randomforest.estimators_],
             axis=0)
indices = np.argsort(importances)[::-1]

# Print the feature ranking
print("Feature ranking:")

for f in range(X.shape[1]):
    print("%d. feature %d (%f)" % (f + 1, indices[f], importances[indices[f]]))

In [None]:
feature_importance_df = pd.DataFrame()
features = ['installments','merchant_category_id','purchase_amount','month_lag','category_2','rolling_sum','date_month','abs_purchase_amount', 'day_of_week','hour','purchase_amount_month_sum',
              'purchase_amount_month_mean','purchase_amount_month_count', 'purchase_amount_dayofweek_sum', 'purchase_amount_dayofweek_mean','purchase_amount_dayofweek_count',
              'purchase_amount_hour_sum','purchase_amount_hour_mean','purchase_amount_hour_count', 'auth_flag_Y','category_1_N', 'category_3_B','category_3_C','feature_1_2','feature_1_3','feature_1_4','feature_2_3','feature_3_1','feature_1_5','feature_2_2',
              'feature_1_2','feature_1_3','feature_1_4','feature_1_5','feature_2_2', 'feature_2_3', 'feature_3_1','auth_flag_Y','category_3_B','category_3_C','category_1_N','month_diff', 'elapsed_time', 'order_freq', 'total_usage', 'amount_per_use']
feature_importance_df['feature'] = features
feature_importance_df['importance'] = randomforest.feature_importances_
# feature_importance_df

In [None]:
import pandas as pd
feature_importance_df.to_csv('result_analysis_1.csv', sep=',')

In [None]:
import seaborn as sns
cols = (feature_importance_df[["feature", "importance"]]
        .groupby("feature")
        .mean()
        .sort_values(by="importance", ascending=False)[:1000].index)

best_features = feature_importance_df.loc[feature_importance_df.feature.isin(cols)]

plt.figure(figsize=(14,25))
sns.barplot(x="importance",
            y="feature",
            data=best_features.sort_values(by="importance",
                                           ascending=False))
plt.title('Features (avg over folds)')
plt.tight_layout()
# plt.savefig('importances.png')

In [None]:
cm = print_confusion_matrix(confusion_matrix(y,randomforest.predict(X)), ['Class Bad ', 'Class Neutral', 'Class Good'])

In [None]:
ax = sns.lineplot(x='timepoint', y="signal", hue="event", data=df_target)