
# Incremental SVM vs Traditional SVM  
**Dataset:** Kaggle Fraud Detection Dataset  
‚ö†Ô∏è *Note: Due to no internet access inside this notebook environment, you must manually upload the dataset (CSV file) into `/mnt/data/` before running the notebook.*

---

## üìå Objective
Traditional SVM needs full retraining whenever new data arrives ‚Äî too slow for streaming systems.  
Incremental SVM (ISVM) updates the model continuously without forgetting old knowledge.

This notebook compares:
- Training time of Traditional SVM vs Incremental SVM  
- Accuracy comparison  
- Update time when new data streams in  
- Visualizations using Matplotlib  


In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.linear_model import SGDClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import time

# Load dataset (user must upload the CSV file to /mnt/data)
df = pd.read_csv('/mnt/data/fraudDataset.csv')  # CHANGE FILENAME IF NEEDED

df.head()


In [None]:

# Basic preprocessing
df = df.sample(20000)  # use smaller subset for timing comparison
X = df.drop('Class', axis=1)
y = df['Class']

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42)


In [None]:

# Traditional SVM (Full retraining)
svm = SVC(kernel='rbf')

start = time.time()
svm.fit(X_train, y_train)
svm_train_time = time.time() - start

y_pred = svm.predict(X_test)
svm_acc = accuracy_score(y_test, y_pred)

svm_train_time, svm_acc


## Incremental SVM (ISVM) using `SGDClassifier` with hinge loss

In [None]:

isvm = SGDClassifier(loss='hinge')

batch_size = 2000
start = time.time()

for i in range(0, len(X_train), batch_size):
    X_batch = X_train[i:i+batch_size]
    y_batch = y_train.iloc[i:i+batch_size]
    isvm.partial_fit(X_batch, y_batch, classes=np.unique(y))

isvm_train_time = time.time() - start

y_pred_isvm = isvm.predict(X_test)
isvm_acc = accuracy_score(y_test, y_pred_isvm)

isvm_train_time, isvm_acc


## Streaming New Data Simulation

In [None]:

# Simulate new streaming data arrival
new_data = X_train[:1500]
new_labels = y_train[:1500]

start = time.time()
isvm.partial_fit(new_data, new_labels)
incremental_update_time = time.time() - start

incremental_update_time


## üìä Visualization Results

In [None]:

# Compare training times
methods = ['Traditional SVM', 'Incremental SVM']
times = [svm_train_time, isvm_train_time]

plt.figure(figsize=(7,5))
plt.bar(methods, times)
plt.ylabel('Training Time (seconds)')
plt.title('Training Time Comparison')
plt.show()

# Accuracy comparison
acc = [svm_acc, isvm_acc]

plt.figure(figsize=(7,5))
plt.bar(methods, acc)
plt.ylabel('Accuracy')
plt.title('Accuracy Comparison')
plt.show()
