# Survival Analysis applied to Credit Card fraud 
> A "gist" for survival analysis combining techniques from multiple sources

- toc: true
- branch: master
- badges: true
- comments: true
- author: Nazir Kamaldin
- categories: [python, survival analysis, oversampling]

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn_pandas import DataFrameMapper

from pycox.models import CoxTime

random_state = 7

The dataset that I'll be working on is taken from [Kaggle](https://www.kaggle.com/mlg-ulb/creditcardfraud). Since, the features of the dataset are a result of PCA, there's not much in terms of EDA we can do. 

In [None]:
# Read dataset

df = pd.read_csv("datasets/creditcard.csv")
df.head()

Given its a case of credit card fraud, its highly likely that labels are imbalanced, let's check for that.

In [None]:
df["Class"].value_counts()

Clearly the positive class which represents fraud examples is clearly lacking. Here we will be using novelty detection to take advantage of the number of negative examples.

## Dataset split

In [None]:
V_columns = [col for col in df.columns if col.startswith("V")]
feature_columns = V_columns + ["Amount"]
label_column = "Class"

X = df[feature_columns]
y = df[label_column]

negative_indices = y[y == 0].index
positive_indices = y[y == 1].index

X_neg = X.iloc[negative_indices, :]

X_train, X_test = train_test_split(X_neg, test_size=0.2, random_state=random_state)
X_train, X_val = train_test_split(X_train, test_size=0.3, random_state=random_state)

print(f"Dataset sizes, X_train: {len(X_train)}, X_val: {len(X_val)}, X_test: {len(X_test)}")

## Feature transformation

In [None]:
standardize = [(["Amount"], StandardScaler())]
untouch = [(col, None) for col in V_columns]

X_mapper = DataFrameMapper(untouch + standardize)

In [None]:
X_train = X_mapper.fit_transform(X_train).astype('float32')
X_val = X_mapper.transform(X_val).astype('float32')
X_test = X_mapper.transform(X_test).astype('float32')