<a href="https://colab.research.google.com/github/rinnarachma/latihan-DA/blob/main/DM_PERTEMUAN_12.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **MODEL CLASSIFICATION**

In [6]:
# Import libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris

# Memuat dataset
url = "https://drive.google.com/uc?export=download&id=1SEpVMpj5HFlqmlzqQE8Yt7Q6tzCMBiNt"
data = pd.read_csv(url)

# Preprocessing data
# Menghapus kolom yang tidak diperlukan untuk pemodelan
data.drop(columns=['RowNumber', 'CustomerId', 'Surname'], inplace=True)

# Melakukan encoding pada variabel kategori
data = pd.get_dummies(data, columns=['Geography', 'Gender'])

# Memisahkan fitur dan label
X = data.drop(columns=['Exited'])
y = data['Exited']

# Membagi data menjadi set pelatihan dan pengujian
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

## **SUPPORT VECTOR MACHINE**

In [5]:
# Model SVM (Support vector machine)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Menginisialisasi Support Vector Machine (SVM) classifier dengan kernel linear
svm_classifier = SVC(kernel='linear', random_state=42)

# Melatih model SVM dengan data training yang telah distandarisasi
svm_classifier.fit(X_train_scaled, y_train)

# Membuat prediksi menggunakan data testing
y_pred = svm_classifier.predict(X_test_scaled)

# Mengevaluasi akurasi model dan mencetak laporan klasifikasi
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f'Accuracy: {accuracy}')
print(f'Classification Report:\n{report}')

Accuracy: 0.8053333333333333
Classification Report:
              precision    recall  f1-score   support

           0       0.81      1.00      0.89      2416
           1       0.00      0.00      0.00       584

    accuracy                           0.81      3000
   macro avg       0.40      0.50      0.45      3000
weighted avg       0.65      0.81      0.72      3000



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


**KESIMPULAN**

Support Vector Machine (SVM)
*   Accuracy: 0.805
*   Precision: 0.81 (class 0), 0.0 (class 1)
*   Recall: 1.00 (class 0), 0.0 (class 1)
*   F1-score: 0.89 (class 0), 0.00 (class 1)

## **RANDOM FOREST**

In [7]:
# Model Random Forest
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)
rf_pred = rf_model.predict(X_test)

# Evaluasi model Random Forest
print("Random Forest Classifier:")
print("Accuracy:", accuracy_score(y_test, rf_pred))
print("Classification Report:\n", classification_report(y_test, rf_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, rf_pred))

Random Forest Classifier:
Accuracy: 0.867
Classification Report:
               precision    recall  f1-score   support

           0       0.88      0.96      0.92      2416
           1       0.76      0.47      0.58       584

    accuracy                           0.87      3000
   macro avg       0.82      0.72      0.75      3000
weighted avg       0.86      0.87      0.85      3000

Confusion Matrix:
 [[2328   88]
 [ 311  273]]


**KESIMPULAN**

RANDOM FOREST
*   Accuracy: 0.867
*   Precision: 0.88 (class 0), 0.76 (class 1)
*   Recall: 0.96 (class 0), 0.47 (class 1)
*   F1-score: 0.92 (class 0), 0.58 (class 1)

## **NAIVE BAYES**

In [8]:
# Menginisialisasi Naive Bayes classifier
nb_classifier = GaussianNB()

# Melatih model Naive Bayes dengan data training
nb_classifier.fit(X_train, y_train)

# Membuat prediksi menggunakan data testing
y_pred = nb_classifier.predict(X_test)

# Mengevaluasi akurasi model dan mencetak laporan klasifikasi
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f'Accuracy: {accuracy}')
print(f'Classification Report:\n{report}')

Accuracy: 0.794
Classification Report:
              precision    recall  f1-score   support

           0       0.81      0.97      0.88      2416
           1       0.37      0.08      0.13       584

    accuracy                           0.79      3000
   macro avg       0.59      0.52      0.51      3000
weighted avg       0.73      0.79      0.74      3000



**KESIMPULAN**

NAIVE BAYES
*   Accuracy: 0.794
*   Precision: 0.81 (class 0), 0.37 (class 1)
*   Recall: 0.97 (class 0), 0.08 (class 1)
*   F1-score: 0.88 (class 0), 0.13 (class 1)

## **GRADIENT BOOSTING**

In [9]:
# Model Gradient Boosting
gb_model = GradientBoostingClassifier(n_estimators=100, random_state=42)
gb_model.fit(X_train, y_train)
gb_pred = gb_model.predict(X_test)

# Evaluasi model Gradient Boosting
print("\nGradient Boosting Classifier:")
print("Accuracy:", accuracy_score(y_test, gb_pred))
print("Classification Report:\n", classification_report(y_test, gb_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, gb_pred))


Gradient Boosting Classifier:
Accuracy: 0.872
Classification Report:
               precision    recall  f1-score   support

           0       0.89      0.97      0.92      2416
           1       0.77      0.48      0.60       584

    accuracy                           0.87      3000
   macro avg       0.83      0.73      0.76      3000
weighted avg       0.86      0.87      0.86      3000

Confusion Matrix:
 [[2333   83]
 [ 301  283]]


**KESIMPULAN**

GRADIENT BOOSTING
*   Accuracy: 0.872
*   Precision: 0.89 (class 0), 0.77 (class 1)
*   Recall: 0.97 (class 0), 0.48 (class 1)
*   F1-score: 0.92 (class 0), 0.60 (class 1)

## **KESIMPULAN**


1.   Random Forest dan Gradient Boosting Tree memiliki tingkat akurasi yang lebih tinggi dibandingkan dengan SVM dan Naive Bayes
2.   SVM memiliki presisi yang tinggi untuk kelas 0, tetapi presisi yang sangat rendah untuk kelas 1
3.   Naive Bayes memiliki nilai recall yang rendah untuk kelas 1, yang menunjukkan bahwa model ini cenderung tidak dapat mengidentifikasi dengan baik kelas 1.
4.   Gradient Boosting Tree memiliki hasil yang cukup baik, tetapi masih memiliki kesulitan dalam mengklasifikasikan kelas 1 dengan baik, meskipun performanya lebih baik daripada Naive Bayes
5.   Dalam kasus ini, Random Forest mungkin menjadi pilihan terbaik karena memiliki akurasi yang tinggi dan dapat mengklasifikasikan kelas minoritas (kelas 1) dengan lebih baik dibandingkan dengan model lainnya


