# Build a machine learning model to identify fraudulent credit card
transactions.
Preprocess and normalize the transaction data, handle class
imbalance issues, and split the dataset into training and testing sets.
Train a classification algorithm, such as logistic regression or random
forests, to classify transactions as fraudulent or genuine.
Evaluate the model's performance using metrics like precision, recall,and F1-score, and consider techniques like oversampling or
undersampling for improving results.

# Step 1: Import Libraries and Load Data



In [10]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
from imblearn.over_sampling import RandomOverSampler
from imblearn.under_sampling import RandomUnderSampler

# Load your dataset ('creditcard.csv' in this example)
# Replace 'creditcard.csv' with your actual dataset file
dataset = pd.read_csv('creditcard.csv')


# Step 2: Define Features and Target



In [11]:
# Define features (X) and target (y)
X = dataset.iloc[:, 1:30].values  # Adjust columns as needed
y = dataset.iloc[:, 30].values

# Step 3: Split the Dataset



In [12]:
# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# Step 4: Standardize Features



In [13]:
# Standardize the features (mean=0, std=1)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


# Step 5: Handle Class Imbalance (Oversampling and Undersampling)



In [14]:
# Handle class imbalance using oversampling (you can adjust the strategy)
ros = RandomOverSampler(sampling_strategy=0.5, random_state=42)
X_train_oversampled, y_train_oversampled = ros.fit_resample(X_train, y_train)


# Step6:Train a Logistic RegressionClassifier



In [15]:
# Train a logistic regression classifier
classifier = LogisticRegression(random_state=42)
classifier.fit(X_train_oversampled, y_train_oversampled)

LogisticRegression(random_state=42)

# Step 7: Make Predictions



In [16]:
# Make predictions on the test set
y_pred = classifier.predict(X_test)


# Step 8: Evaluate the Model



In [17]:
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
confusion = confusion_matrix(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Confusion Matrix:\n", confusion)
print("Classification Report:\n", classification_rep)


Accuracy: 0.988729328324146
Confusion Matrix:
 [[56231   633]
 [    9    89]]
Classification Report:
               precision    recall  f1-score   support

           0       1.00      0.99      0.99     56864
           1       0.12      0.91      0.22        98

    accuracy                           0.99     56962
   macro avg       0.56      0.95      0.61     56962
weighted avg       1.00      0.99      0.99     56962



# Credit Card Fraud Detection Conclusion:

In this code, we developed a credit card fraud detection model using logistic regression. The goal of this project is to identify fraudulent credit card transactions from a dataset that contains both genuine and fraudulent transactions. Here are the main takeaways:

1. Data Loading and Preparation:

We imported necessary libraries and loaded the dataset ('creditcard.csv' in this example).
Features (X) and the target variable (y) were defined.
The dataset was split into training and testing sets (80% training, 20% testing) to evaluate the model.

2. Data Standardization:

We standardized the features to have a mean of 0 and a standard deviation of 1. Standardization helps in improving model performance.

3. Handling Class Imbalance:

To address the class imbalance problem (where fraudulent transactions are rare), we employed oversampling using the RandomOverSampler from the imblearn library. Oversampling creates synthetic examples of the minority class.

4. Model Training:

A logistic regression classifier was chosen for its simplicity and interpretability.
The classifier was trained using the oversampled training data.

5. Model Evaluation:

We made predictions on the test set using the trained model.
Model performance was evaluated using metrics such as accuracy, confusion matrix, and classification report.
The classification report provides insights into precision, recall, and F1-score for both classes (fraudulent and genuine transactions).
Overall, this code serves as a foundational framework for credit card fraud detection. Keep in mind that this is a basic implementation, and for real-world applications, further steps such as hyperparameter tuning, feature engineering, and more advanced modeling techniques may be necessary to achieve higher accuracy and reliability.

The code can be further expanded and customized to suit the specific requirements and nuances of your dataset and use case.