# Model Training for Real-Time Fraud Detection

This notebook focuses on training a fraud detection model using the preprocessed features. It will cover data loading, preprocessing, model training, and saving the trained model.

## 1. Import Libraries

First, we need to import the necessary libraries.


In [None]:
import pandas as pd
import numpy as np
import joblib
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

# Set display options for better readability
pd.set_option('display.max_columns', None)


## 2. Load the Data

Load the dataset that contains the features and labels for training.


In [None]:
# Load training data
data_path = 'path_to_your_training_data_file.csv'  # Update with your training data path
data = pd.read_csv(data_path)

# Display the first few rows of the dataset
data.head()


## 3. Data Preprocessing

Preprocess the data as needed (e.g., handling missing values, encoding categorical variables).


In [None]:
# Handle missing values
data.fillna(method='ffill', inplace=True)  # Example: forward fill

# Convert categorical variables to numerical
data = pd.get_dummies(data, drop_first=True)

# Separate features and labels
X = data.drop(columns=['is_fraud'])  # Adjust with your label column
y = data['is_fraud']


## 4. Split the Data

Split the data into training and validation sets.


In [None]:
# Split the data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

print("Data split into training and validation sets.")


## 5. Model Training

Select and train the model. Here, we use a Random Forest Classifier as an example.


In [None]:
# Initialize the model
model = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the model
model.fit(X_train, y_train)

print("Model training completed.")


## 6. Evaluate the Model

Evaluate the model's performance on the validation set.


In [None]:
# Make predictions on the validation set
y_pred = model.predict(X_val)

# Calculate accuracy
accuracy = accuracy_score(y_val, y_pred)
print(f"Validation Accuracy: {accuracy:.4f}")

# Display confusion matrix
conf_matrix = confusion_matrix(y_val, y_pred)

plt.figure(figsize=(8, 6))
plt.title('Confusion Matrix')
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=['Not Fraud', 'Fraud'], yticklabels=['Not Fraud', 'Fraud'])
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()

# Display classification report
class_report = classification_report(y_val, y_pred)
print("Classification Report:\n", class_report)


## 7. Save the Trained Model

Save the trained model for later use.


In [None]:
# Save the model
model_path = 'path_to_save_your_trained_model.joblib'  # Update with your desired model path
joblib.dump(model, model_path)

print(f"Model saved to {model_path}.")


## 8. Conclusion

Summarize the training process and model performance.


In [None]:
# Summary of findings
print("The Random Forest model was trained successfully and achieved satisfactory performance on the validation set.")
