# **Penting**
- Pastikan Anda melakukan Run All sebelum mengirimkan submission untuk memastikan seluruh cell berjalan dengan baik.
- Hapus simbol pagar (#) jika Anda menerapkan kriteria tambahan
- Biarkan simbol pagar (#) jika Anda tidak menerapkan kriteria tambahan

# **1. Import Library**
Pada tahap ini, Anda perlu mengimpor beberapa pustaka (library) Python yang dibutuhkan untuk analisis data dan pembangunan model machine learning.

In [39]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler, MinMaxScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
import joblib

# **2. Memuat Dataset dari Hasil Clustering**
Memuat dataset hasil clustering dari file CSV ke dalam variabel DataFrame.

In [40]:
df = pd.read_csv('data_clustering_inverse.csv')

In [41]:
df.head()

Unnamed: 0,TransactionAmount,CustomerAge,TransactionDuration,LoginAttempts,AccountBalance,TransactionType,Location,Channel,CustomerOccupation,Target
0,1919.11,70.0,81.0,1.0,14977.99,1,36,0,0,2
1,1919.11,68.0,141.0,1.0,101.25,1,15,0,0,1
2,1919.11,19.0,56.0,1.0,44731.47,1,23,2,3,1
3,1919.11,26.0,25.0,1.0,29854.73,1,33,2,3,2
4,1919.11,18.0,172.0,1.0,44731.47,1,28,0,3,1


# **3. Data Splitting**
Tahap Data Splitting bertujuan untuk memisahkan dataset menjadi dua bagian: data latih (training set) dan data uji (test set).

In [42]:
X = df.drop(columns='Target')
y = df['Target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training set shape: X_train={X_train.shape}, y_train={y_train.shape}")
print(f"Test set shape: X_test={X_test.shape}, y_test={y_test.shape}")

Training set shape: X_train=(1917, 9), y_train=(1917,)
Test set shape: X_test=(480, 9), y_test=(480,)


# **4. Membangun Model Klasifikasi**
Setelah memilih algoritma klasifikasi yang sesuai, langkah selanjutnya adalah melatih model menggunakan data latih.

Berikut adalah rekomendasi tahapannya.
1. Menggunakan algoritma klasifikasi yaitu Decision Tree.
2. Latih model menggunakan data yang sudah dipisah.

In [43]:
dt_model = DecisionTreeClassifier(random_state=42)
dt_model.fit(X_train, y_train)

# Prediksi pada data uji
y_pred = dt_model.predict(X_test)

In [44]:
joblib.dump(dt_model, 'decision_tree_model.h5')

['decision_tree_model.h5']

# **5. Memenuhi Kriteria Skilled dan Advanced dalam Membangun Model Klasifikasi**



**Biarkan kosong jika tidak menerapkan kriteria skilled atau advanced**

In [45]:
model_knn = KNeighborsClassifier().fit(X_train, y_train)
model_rf = RandomForestClassifier().fit(X_train, y_train)
model_svm = SVC().fit(X_train, y_train)
model_nb = GaussianNB().fit(X_train, y_train)

print("Model training selesai.")

Model training selesai.


In [46]:
def evaluate_model(model, X_test, y_test):
    y_pred = model.predict(X_test)
    cm = confusion_matrix(y_test, y_pred)
    results = {
        'Confusion Matrix': cm,
        'Accuracy': accuracy_score(y_test, y_pred),
        'Precision': precision_score(y_test, y_pred, average='weighted', zero_division=0),
        'Recall': recall_score(y_test, y_pred, average='weighted', zero_division=0),
        'F1-Score': f1_score(y_test, y_pred, average='weighted', zero_division=0)
    }
    return results

# Evaluasi setiap model
results = {
    'K-Nearest Neighbors (KNN)': evaluate_model(model_knn, X_test, y_test),
    'Decision Tree (DT)': evaluate_model(dt_model, X_test, y_test),
    'Random Forest (RF)': evaluate_model(model_rf, X_test, y_test),
    'Support Vector Machine (SVM)': evaluate_model(model_svm, X_test, y_test),
    'Naive Bayes (NB)': evaluate_model(model_nb, X_test, y_test)
}

summary_df = pd.DataFrame([
    {
        'Model': model_name,
        'Accuracy': metrics['Accuracy'],
        'Precision': metrics['Precision'],
        'Recall': metrics['Recall'],
        'F1-Score': metrics['F1-Score']
    }
    for model_name, metrics in results.items()
])

# Tampilkan hasil
print(summary_df)

                          Model  Accuracy  Precision    Recall  F1-Score
0     K-Nearest Neighbors (KNN)  0.827083   0.831159  0.827083  0.827542
1            Decision Tree (DT)  1.000000   1.000000  1.000000  1.000000
2            Random Forest (RF)  1.000000   1.000000  1.000000  1.000000
3  Support Vector Machine (SVM)  0.362500   0.131406  0.362500  0.192890
4              Naive Bayes (NB)  0.987500   0.987917  0.987500  0.987532


In [47]:
# Simpan semua model kecuali Decision Tree
joblib.dump(model_knn, 'explore_KNN_classification.h5')
joblib.dump(model_rf, 'explore_RF_classification.h5')
joblib.dump(model_svm, 'explore_SVM_classification.h5')
joblib.dump(model_nb, 'explore_NB_classification.h5')

['explore_NB_classification.h5']

Hyperparameter Tuning Model

Pilih salah satu algoritma yang ingin Anda tuning

In [48]:
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

param_grid_svm = {
    'C': [0.1, 1, 10, 100],
    'kernel': ['linear', 'rbf'],
    'gamma': ['scale', 'auto']
}
grid_svm = GridSearchCV(SVC(), param_grid_svm, cv=5)
grid_svm.fit(X_train_scaled, y_train)
svm_model_best = grid_svm.best_estimator_

svm_best_results = evaluate_model(svm_model_best, X_test_scaled, y_test)

for metric, value in svm_best_results.items():
    print(f"{metric}: {value}")


Confusion Matrix: [[169   0   0]
 [  0 174   0]
 [  0   0 137]]
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1-Score: 1.0


In [49]:
joblib.dump(grid_svm.best_estimator_, 'tuning_classification.h5')

['tuning_classification.h5']