# **Penting**
- Pastikan Anda melakukan Run All sebelum mengirimkan submission untuk memastikan seluruh cell berjalan dengan baik.
- Hapus simbol pagar (#) jika Anda menerapkan kriteria tambahan
- Biarkan simbol pagar (#) jika Anda tidak menerapkan kriteria tambahan

# **1. Import Library**
Pada tahap ini, Anda perlu mengimpor beberapa pustaka (library) Python yang dibutuhkan untuk analisis data dan pembangunan model machine learning.

In [42]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.tree import DecisionTreeClassifier
import joblib
from sklearn.linear_model import LogisticRegression
from sklearn.impute import SimpleImputer
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.model_selection import GridSearchCV

# **2. Memuat Dataset dari Hasil Clustering**
Memuat dataset hasil clustering dari file CSV ke dalam variabel DataFrame.

In [43]:
# Gunakan dataset hasil clustering yang memiliki fitur Target
# Gunakan dataset hasil clustering yang memiliki fitur Target
# Silakan gunakan dataset data_clustering jika tidak menerapkan Interpretasi Hasil Clustering [Advanced]
# Silakan gunakan dataset data_clustering_inverse jika menerapkan Interpretasi Hasil Clustering [Advanced]
url='https://raw.githubusercontent.com/mauliidna/proyek-machine-learning-pemula/refs/heads/main/data_clustering_inverse%20(6).csv'
df = pd.read_csv(url)

In [44]:
# Tampilkan 5 baris pertama dengan function head.
df.head()

Unnamed: 0,TransactionAmount,CustomerAge,TransactionDuration,LoginAttempts,AccountBalance,TransactionType,Location,Channel,CustomerOccupation,TransactionAmount_Binned,CustomerAge_Binned,Target
0,14.09,70.0,81.0,1.0,5112.21,Debit,San Diego,ATM,Doctor,Low,Young,3.0
1,376.24,68.0,141.0,1.0,13758.91,Debit,Houston,ATM,Doctor,Low,Young,0.0
2,126.29,19.0,56.0,1.0,1122.35,Debit,Mesa,Online,Student,Low,Young,1.0
3,184.5,26.0,25.0,1.0,8569.06,Debit,Raleigh,Online,Student,Low,Young,0.0
4,13.45,44.678444,198.0,1.0,7429.4,Debit,Oklahoma City,ATM,Student,Low,Young,1.0


In [45]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2537 entries, 0 to 2536
Data columns (total 12 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   TransactionAmount         2537 non-null   float64
 1   CustomerAge               2537 non-null   float64
 2   TransactionDuration       2537 non-null   float64
 3   LoginAttempts             2537 non-null   float64
 4   AccountBalance            2537 non-null   float64
 5   TransactionType           2537 non-null   object 
 6   Location                  2537 non-null   object 
 7   Channel                   2537 non-null   object 
 8   CustomerOccupation        2537 non-null   object 
 9   TransactionAmount_Binned  2537 non-null   object 
 10  CustomerAge_Binned        2537 non-null   object 
 11  Target                    2537 non-null   float64
dtypes: float64(6), object(6)
memory usage: 238.0+ KB


# **3. Data Splitting**
Tahap Data Splitting bertujuan untuk memisahkan dataset menjadi dua bagian: data latih (training set) dan data uji (test set).

In [46]:
# Menggunakan train_test_split() untuk melakukan pembagian dataset.
X = df.drop('Target', axis=1)
y = df['Target']

# One-hot encoding untuk kolom kategorikal
categorical_cols = X.select_dtypes(include=['object', 'category']).columns
X = pd.get_dummies(X, columns=categorical_cols, drop_first=True)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Ubah target jadi integer (agar cocok untuk klasifikasi)
y_train = y_train.astype(int)
y_test = y_test.astype(int)


# **4. Membangun Model Klasifikasi**
Setelah memilih algoritma klasifikasi yang sesuai, langkah selanjutnya adalah melatih model menggunakan data latih.

Berikut adalah rekomendasi tahapannya.
1. Menggunakan algoritma klasifikasi yaitu Decision Tree.
2. Latih model menggunakan data yang sudah dipisah.

In [47]:
# Buatlah model klasifikasi menggunakan Decision Tree
model_dt = DecisionTreeClassifier(random_state=42)
model_dt.fit(X_train, y_train)

In [48]:
# Menyimpan Model
# import joblib
# joblib.dump(model, 'decision_tree_model.h5')
joblib.dump(model_dt, 'decision_tree_model.h5')

['decision_tree_model.h5']

# **5. Memenuhi Kriteria Skilled dan Advanced dalam Membangun Model Klasifikasi**



**Biarkan kosong jika tidak menerapkan kriteria skilled atau advanced**

In [49]:
# Melatih model menggunakan algoritma klasifikasi selain Decision Tree.
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_lr = LogisticRegression(max_iter=1000)
model_lr.fit(X_train_scaled, y_train)
y_pred_lr = model_lr.predict(X_test_scaled)

print("Akurasi Logistic Regression (scaled):", accuracy_score(y_test, y_pred_lr))


Akurasi Logistic Regression (scaled): 0.6955380577427821


In [50]:
# Menampilkan hasil evaluasi akurasi, presisi, recall, dan F1-Score pada seluruh algoritma yang sudah dibuat.
print("=== Decision Tree ===")
y_pred_dt = model_dt.predict(X_test)
print(classification_report(y_test, y_pred_dt))

# Evaluasi Logistic Regression
print("=== Logistic Regression ===")
y_pred_lr = model_lr.predict(X_test_scaled)  # pakai data yang sudah distandardisasi
print(classification_report(y_test, y_pred_lr))

=== Decision Tree ===
              precision    recall  f1-score   support

           0       0.48      0.51      0.49       141
           1       0.81      0.78      0.79       334
           2       0.14      0.14      0.14        86
           3       0.63      0.64      0.64       201

    accuracy                           0.62       762
   macro avg       0.52      0.52      0.52       762
weighted avg       0.62      0.62      0.62       762

=== Logistic Regression ===
              precision    recall  f1-score   support

           0       0.51      0.68      0.58       141
           1       0.85      0.88      0.86       334
           2       0.17      0.06      0.09        86
           3       0.68      0.68      0.68       201

    accuracy                           0.70       762
   macro avg       0.55      0.57      0.55       762
weighted avg       0.67      0.70      0.67       762



In [51]:
# Menyimpan Model Selain Decision Tree
# Model ini bisa lebih dari satu
# import joblib
# joblib.dump(___, 'explore_<Nama Algoritma>_classification.h5')
joblib.dump(y_pred_lr, 'explore_LogisticRegression_classification.h5') # Save the Logistic Regression model

['explore_LogisticRegression_classification.h5']

Hyperparameter Tuning Model

Pilih salah satu algoritma yang ingin Anda tuning

In [52]:
# Lakukan Hyperparameter Tuning dan Latih ulang.
param_grid = {
    'max_depth': [3, 5, 10, None],
    'min_samples_split': [2, 5, 10]
}

grid_search = GridSearchCV(DecisionTreeClassifier(random_state=42), param_grid, cv=5, scoring='f1_weighted')
grid_search.fit(X_train, y_train)

# Model terbaik hasil tuning
best_dt_model = grid_search.best_estimator_

In [53]:
# Menampilkan hasil evaluasi akurasi, presisi, recall, dan F1-Score pada algoritma yang sudah dituning.
def evaluate_model(name, model, X_test, y_test):
    y_pred = model.predict(X_test)
    print(f"\n=== Evaluasi Model: {name} ===")
    print("Accuracy:", accuracy_score(y_test, y_pred))
    print("Precision:", precision_score(y_test, y_pred, average='weighted', zero_division=1))
    print("Recall:", recall_score(y_test, y_pred, average='weighted', zero_division=1))
    print("F1 Score:", f1_score(y_test, y_pred, average='weighted', zero_division=1))

# Evaluasi Decision Tree (hasil tuning)
evaluate_model("Tuned Decision Tree", best_dt_model, X_test, y_test)


=== Evaluasi Model: Tuned Decision Tree ===
Accuracy: 0.6942257217847769
Precision: 0.6458726015669721
Recall: 0.6942257217847769
F1 Score: 0.6629625677509147


In [54]:
# Menyimpan Model hasil tuning
# import joblib
# joblib.dump(model_dt, 'tuning_classification.h5')
joblib.dump(best_dt_model, 'tuning_classification.h5')


['tuning_classification.h5']