# <h1 style="font-family: Trebuchet MS; padding: 10px; font-size: 25px; color: white; text-align: center; background-color: #FD8D15;"><b>Credit Card Fraud Detection</b><br></h1>

<center>
    <img src="https://img.freepik.com/free-vector/set-secure-credit-cards-with-chips_98292-4517.jpg?size=626&ext=jpg&ga=GA1.1.720551948.1705646383&semt=ais_hybrid" alt="Credit Card Fraud Detection" ,height="100%">
</center>



**Credit card fraud** has become a pervasive issue in the digital age, with global losses exceeding \$24 billion annually. As the use of credit cards continues to rise, so too does the risk of fraudulent transactions. This problem is particularly challenging due to the **imbalance** in the data, where genuine transactions vastly outnumber fraudulent ones. 

## <span style="color:green">Objective</span>

* **Binary Classification:** Develop a machine learning model to accurately classify credit card transactions as either genuine or fraudulent.
* **Handle Imbalance:** Implement techniques to address the class imbalance in the dataset, ensuring that the model can effectively detect fraudulent transactions even when they are rare.

## <span style="color:green"> Significance </span>

* **Financial Loss:** Accurate fraud detection can significantly reduce financial losses for both individuals and businesses.
* **Customer Protection:** By preventing fraudulent transactions, we can protect consumers from unauthorized charges and identity theft.
* **Industry Innovation:** The development of effective fraud detection models can drive innovation in the financial technology sector and contribute to the overall security of online transactions.


## <span style="color:green">Dataset Attributes</span>
    
* **V1 - V28** : Numerical features that are a result of PCA transformation.
* **Time** : Seconds elapsed between each transaction and the 1st transaction.
* **Amount** : Transaction amount.
* **Class** : Fraud or otherwise (1 or 0)


# <center><div style="font-family: Trebuchet MS; background-color: #FD8D15; color: white; padding: 12px; line-height: 1;">Importing Libraries</div></center>

In [None]:
import warnings
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf
import matplotlib.pyplot as plt
import plotly.graph_objects as go 
from imblearn.over_sampling import SMOTE
from plotly.subplots import make_subplots
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.metrics import confusion_matrix, roc_auc_score, roc_curve, precision_score, recall_score, accuracy_score, classification_report, auc


warnings.filterwarnings('ignore')

# <center><div style="font-family: Trebuchet MS; background-color: #FD8D15; color: white; padding: 12px; line-height: 1;">Data Loading & Exploration</div></center>

In [None]:
# loading dataset
df = pd.read_csv('../input/creditcardfraud/creditcard.csv')
df.head()

In [None]:
# shape
df.shape

In [None]:
# features
df.columns

In [None]:
# information
df.info()

In [None]:
# description
df.describe()

In [None]:
# null data
df.isnull().sum()

`Note`: **No null values** present in the dataset!

# <center><div style="font-family: Trebuchet MS; background-color: #FD8D15; color: white; padding: 12px; line-height: 1;">Exploratory Data Analysis</div></center>

In [None]:
# Fraud & Legit Transaction Analysis
fraud_and_legit = df.groupby(by='Class').mean()
fraud_and_legit

#### Key Findings:

##### 1. **Early Detection:**
* **Timing:** Fraudulent transactions often happen sooner after a card's security is compromised.
* **Explanation:** This suggests that fraudsters act quickly to exploit stolen card information before it can be detected or blocked.

##### 2. **Abnormal Patterns:**
* **Unusual Behavior:** These transactions exhibit patterns that are significantly different from normal, legitimate transactions. The mean values of PCA components (V1 to V28) for fraudulent transactions often show large deviations from zero, both positively and negatively.
* **Explanation:** This might be due to factors like unusual spending habits, transactions in unfamiliar locations, or large amounts spent in a short period.

##### 3. **Larger Amounts:**
* **Higher Value:** Fraudulent transactions tend to involve larger amounts of money.
* **Explanation:** Fraudsters often aim to maximize their gains by conducting larger transactions before the fraud is detected.



In [None]:
# plotting dependent features
sns.countplot(data=df, x='Class', palette='viridis')
plt.show()

`Note`:
- **Class 0 (Legit Transactions):** There are **284,315** legitimate transactions.
- **Class 1 (Fraudulent Transactions):** There are **492** fraudulent transactions.

This indicates a highly imbalanced dataset, where legitimate transactions vastly outnumber fraudulent ones. Such imbalance is common in fraud detection problems and can impact the performance of machine learning models.

In [None]:
# Correlaion Matrix
plt.figure(figsize=(20,10))
sns.heatmap(df.corr(),cmap='viridis',cbar = True, annot=True, fmt='.2f',vmax=1, vmin=-1)
plt.show()

There are too many features in the dataset and so because of which it is difficult to understand anything. Hence, we will plot the correlation map only with important features.

In [None]:
# Correlation Matrix with desired features
corr = df.corr().loc[['Time', 'Amount', 'Class'], df.columns[1:-2]]
plt.figure(figsize=(18, 3))
sns.heatmap(corr, annot=True, cmap='viridis', fmt='.2f', cbar=False)
plt.title("Correlation Matrix", size=20)
plt.show()

In [None]:
corr.T[(corr.T['Class']<-0.12) | (corr.T['Class']>0.12)]

**Key Points:**

* **Feature Remova:** Features with weak correlations (between -0.12 and 0.12) will be removed.
* **Important Features Based On Correlation Matrix Analysis:** V4, V11, V7, V3, V16, V10, V12, V14, and V17 showed stronger correlations with the target variable.

# <center><div style="font-family: Trebuchet MS; background-color: #FD8D15; color: white; padding: 12px; line-height: 1;">Feature Selection</div></center>

In [None]:
# Selecting K Best Features
features = df.drop(columns=['Class'])
target = df['Class']

selector = SelectKBest(score_func=f_classif, k=9)
selector.fit(features, target)

selected_features = features.columns[selector.get_support()]

print(f"Selected features: {selected_features}")

`Note`: By analyzing the correlation matrix and applying feature selection techniques, we identified the following features as the most informative for predicting fraudulent transactions:
['V3', 'V4', 'V7', 'V10', 'V11', 'V12', 'V14', 'V16', 'V17']

In [None]:
# updating dataframe for selected features
df = df[['V3','V4','V7','V10','V11','V12','V14','V16','V17','Class']]
df.head()

# <center><div style="font-family: Trebuchet MS; background-color: #FD8D15; color: white; padding: 12px; line-height: 1;">Handling Imbalanced Data</div></center>



##### The dataset is highly imbalanced, with far more legitimate transactions (284,315) than fraudulent ones (492). This imbalance is common in fraud detection and can impact model performance.

**Techniques that can be used for Handling Imbalanced Data:**

* **Resampling:**
    * Oversampling: Duplicate minority class examples.
    * Undersampling: Remove majority class examples.

* **Ensemble Methods:** Combine multiple models for improved performance.

* **Synthetic Data Generation:** Create new, synthetic examples to augment the minority class.

* **Class Weighting:** Adjust the class weights during training to balance the impact of different classes.

* **SMOTE (Synthetic Minority Over-sampling Technique):** Generate new, synthetic examples for the minority class based on existing ones.

In [None]:
# Split Dependent & Independent Features And Applying SMOTE (Synthetic Minority Over-sampling Technique)
X = df.drop('Class', axis=1) 
y = df['Class']

smote = SMOTE(sampling_strategy=0.5)
X_resampled, y_resampled = smote.fit_resample(X, y)

# Create a new DataFrame with the resampled data
df_resampled = pd.concat([pd.DataFrame(X_resampled,columns=X.columns),
                          pd.Series(y_resampled, name='Class')], 
                          axis=1)

df_resampled['Class'].value_counts()

# <center><div style="font-family: Trebuchet MS; background-color: #FD8D15; color: white; padding: 12px; line-height: 1;">Model Training</div></center>

In [None]:
# splitting training & testing data into 75% - 25% 
X = df_resampled.drop(columns = ['Class'], axis=1)
y = df_resampled['Class']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 42)

In [None]:
# Define and compile the model
model = tf.keras.Sequential([
    tf.keras.layers.InputLayer(input_shape=(X_train.shape[1],)),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',  
              metrics=['accuracy'])

# Train the model
history = model.fit(X_train, 
                    y_train, 
                    epochs=10, 
                    batch_size=32,
                    validation_data=(X_test, y_test), 
                    verbose=1)

# Predict and evaluate
predictions = model.predict(X_test)
y_pred_classes = (predictions > 0.5).astype(int).flatten()  

# <center><div style="font-family: Trebuchet MS; background-color: #FD8D15; color: white; padding: 12px; line-height: 1;">Model Evaluation</div></center>

In [None]:
# Compute ROC AUC score using continuous predictions
roc_auc = roc_auc_score(y_test, predictions)

# Compute accuracy, precision, recall using binary class predictions
accuracy = accuracy_score(y_test, y_pred_classes)
precision = precision_score(y_test, y_pred_classes)
recall = recall_score(y_test, y_pred_classes)

In [None]:
# Create the first figure with performance indicators
fig = make_subplots(rows=1, cols=4, specs=[[{"type": "domain"}, {"type": "domain"}, {"type": "domain"}, {"type": "domain"}]])

# Accuracy gauge
fig.add_trace(go.Indicator(mode="gauge+number", value=round(accuracy * 100, 2), title={'text': "Accuracy", 'font': {'size': 14}},gauge={'axis': {'range': [None, 100]}}), row=1, col=1)

# Roc Auc gauge
fig.add_trace(go.Indicator(mode="gauge+number",value=round(roc_auc* 100, 2),title={'text': "Roc-Auc", 'font': {'size': 14}},gauge={'axis': {'range': [None, 100]}}), row=1, col=2)

# Precision gauge
fig.add_trace(go.Indicator(mode="gauge+number",value=round(precision * 100, 2),title={'text': "Precision", 'font': {'size': 14}},gauge={'axis': {'range': [None, 100]}}), row=1, col=3)

# Recall gauge
fig.add_trace(go.Indicator(mode="gauge+number",value=round(recall * 100, 2),title={'text': "Recall", 'font': {'size': 14}},gauge={'axis': {'range': [None, 100]}}), row=1, col=4)

# 🎨 Updating the Layout
fig.update_layout(title_text="🎯 Model Performance Metrics",title_x=0.5, height=250, width=950,showlegend=False,)
fig.show()



# ROC Curve
fpr, tpr, _ = roc_curve(y_test, predictions)
roc_auc = auc(fpr, tpr)

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred_classes)
names = ['True Negative', 'False Positive', 'False Negative', 'True Positive']
counts = [value for value in cm.flatten()]
percentages = ['{0:.2%}'.format(value) for value in cm.flatten() / np.sum(cm)]
labels = [f'{v1}\n{v2}\n{v3}' for v1, v2, v3 in zip(names, counts, percentages)]
labels = np.asarray(labels).reshape(2, 2)

# Create subplots
fig, axs = plt.subplots(1, 2, figsize=(18, 8))

# Plot ROC Curve
axs[0].plot(fpr, tpr, color='darkorange', lw=3, label=f'ROC curve (area = {roc_auc:.2f})')
axs[0].plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
axs[0].set_xlim([0.0, 1.0])
axs[0].set_ylim([0.0, 1.05])
axs[0].set_xlabel('False Positive Rate', fontsize=14)
axs[0].set_ylabel('True Positive Rate', fontsize=14)
axs[0].set_title('Receiver Operating Characteristic', fontsize=16)
axs[0].legend(loc='lower right', fontsize=14)
axs[0].grid(True, linestyle='--', alpha=0.7)
axs[0].tick_params(axis='both', which='major', labelsize=12)

# Plot Confusion Matrix
sns.heatmap(cm, annot=labels, fmt='', cmap='viridis', cbar=False, ax=axs[1], annot_kws={"size": 14})

# Styling for Confusion Matrix
axs[1].set_title('Confusion Matrix', fontsize=16)
axs[1].set_xlabel('Predicted Label', fontsize=14)
axs[1].set_ylabel('True Label', fontsize=14)
axs[1].set_xticklabels(['Negative', 'Positive'], fontsize=12)
axs[1].set_yticklabels(['Negative', 'Positive'], fontsize=12)

# Adjust layout
plt.tight_layout()
plt.show()

# <center><div style="font-family: Trebuchet MS; background-color: #FD8D15; color: white; padding: 12px; line-height: 1;">Conclusion</div></center>

Model is performing exceptionally well on this credit card fraud detection task. It not only identifies almost all fraudulent transactions but does so with a high level of accuracy, ensuring that legitimate transactions are rarely flagged incorrectly. This balance makes the model both practical and reliable for real-world deployment in detecting fraud.

- **Precision:** The model accurately identifies fraudulent transactions, minimizing false positives.

- **Recall:** It effectively catches most fraudulent transactions, reducing the chance of missed fraud.

- **F1-Score:** Balances precision and recall, ensuring both high fraud detection and low false alarms.

- **Accuracy:** High accuracy reflects the model's ability to correctly classify most transactions.

- **ROC AUC Score:** Nearly perfect, showing strong performance in distinguishing between fraudulent and non-fraudulent transactions.
