<a href="https://colab.research.google.com/github/nissrinayy/deeplearning/blob/main/Week2/MLPForestCovertype_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Tugas Week 2 Deep Learning 🚀🚀**

##**Membuat model MLP menggunakan dataset Forest Cover Type:🍿**

In [1]:
from google.colab import files
uploaded = files.upload()

Saving compressed_data.csv to compressed_data.csv


#**Install dan import Library🔖**

In [2]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import tensorflow as tf
from torch.utils.data import DataLoader, TensorDataset
from tensorflow import keras
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, roc_curve
import matplotlib.pyplot as plt

#**Load & PreProcessing Dataset🔎**

In [3]:
df = pd.read_csv("compressed_data.csv")


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 581012 entries, 0 to 581011
Data columns (total 55 columns):
 #   Column                              Non-Null Count   Dtype
---  ------                              --------------   -----
 0   Elevation                           581012 non-null  int64
 1   Aspect                              581012 non-null  int64
 2   Slope                               581012 non-null  int64
 3   Horizontal_Distance_To_Hydrology    581012 non-null  int64
 4   Vertical_Distance_To_Hydrology      581012 non-null  int64
 5   Horizontal_Distance_To_Roadways     581012 non-null  int64
 6   Hillshade_9am                       581012 non-null  int64
 7   Hillshade_Noon                      581012 non-null  int64
 8   Hillshade_3pm                       581012 non-null  int64
 9   Horizontal_Distance_To_Fire_Points  581012 non-null  int64
 10  Wilderness_Area1                    581012 non-null  int64
 11  Soil_Type1                          581012 non-null 

In [5]:
pd.set_option('display.max_columns', None)
print(df.head())

   Elevation  Aspect  Slope  Horizontal_Distance_To_Hydrology  \
0       2596      51      3                               258   
1       2590      56      2                               212   
2       2804     139      9                               268   
3       2785     155     18                               242   
4       2595      45      2                               153   

   Vertical_Distance_To_Hydrology  Horizontal_Distance_To_Roadways  \
0                               0                              510   
1                              -6                              390   
2                              65                             3180   
3                             118                             3090   
4                              -1                              391   

   Hillshade_9am  Hillshade_Noon  Hillshade_3pm  \
0            221             232            148   
1            220             235            151   
2            234             238   

In [6]:
X = df.drop(columns=['Cover_Type'])
y = df['Cover_Type']
# Pastikan label mulai dari 0 (karena CrossEntropyLoss butuh label mulai dari 0)
y = y - y.min()

In [7]:
# Normalisasi fitur
scaler = StandardScaler()
X = scaler.fit_transform(X)

In [8]:
# Pisahkan data menjadi training dan testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)


##**Model MLP dengan PyTorch**

In [9]:
# Konversi data ke Tensor
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train.values, dtype=torch.long)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test.values, dtype=torch.long)


In [10]:
# Dataset dan DataLoader
batch_size = 32
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)


In [11]:
# Definisi Model
class MLP_PyTorch(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(MLP_PyTorch, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(input_dim, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, output_dim)
        )

    def forward(self, x):
        return self.model(x)


In [12]:
# Model, Loss, dan Optimizer
input_dim = X_train.shape[1]
output_dim = len(y.unique())

model = MLP_PyTorch(input_dim, output_dim)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)


In [13]:
# Training Loop
epochs = 50
for epoch in range(epochs):
    model.train()
    total_loss = 0
    for X_batch, y_batch in train_loader:
        optimizer.zero_grad()
        outputs = model(X_batch)
        loss = criterion(outputs, y_batch)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()

    if (epoch+1) % 10 == 0:
        print(f"Epoch {epoch+1}/{epochs}, Loss: {total_loss/len(train_loader):.4f}")


Epoch 10/50, Loss: 0.2796
Epoch 20/50, Loss: 0.2437
Epoch 30/50, Loss: 0.2256
Epoch 40/50, Loss: 0.2168
Epoch 50/50, Loss: 0.2088


In [14]:
# Evaluasi Model PyTorch
model.eval()
y_pred = []
y_true = []
with torch.no_grad():
    for X_batch, y_batch in test_loader:
        outputs = model(X_batch)
        _, predicted = torch.max(outputs, 1)
        y_pred.extend(predicted.numpy())
        y_true.extend(y_batch.numpy())

In [15]:
# Hitung metrik evaluasi
print("PyTorch Model Evaluation:")
print(f"Accuracy: {accuracy_score(y_true, y_pred):.4f}")
print(f"Precision: {precision_score(y_true, y_pred, average='weighted'):.4f}")
print(f"Recall: {recall_score(y_true, y_pred, average='weighted'):.4f}")
print(f"F1-Score: {f1_score(y_true, y_pred, average='weighted'):.4f}")

PyTorch Model Evaluation:
Accuracy: 0.9151
Precision: 0.9150
Recall: 0.9151
F1-Score: 0.9148


In [16]:
# ROC-AUC hanya bisa dihitung jika jumlah kelas = 2
if output_dim == 2:
    print(f"AUC-ROC: {roc_auc_score(y_true, y_pred):.4f}")


##**Model MLP dengan TensorFlow**

In [18]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import Input

model_tf = Sequential([
    Input(shape=(X_train.shape[1],)),  # ✅ Cara baru (lebih disarankan)
    Dense(128, activation='relu'),
    Dense(64, activation='relu'),
    Dense(output_dim, activation='softmax')
])


In [20]:
from tensorflow.keras.optimizers import Adam

# Compile Model
model_tf.compile(optimizer=Adam(learning_rate=0.001),
                 loss='sparse_categorical_crossentropy',
                 metrics=['accuracy'])


In [21]:
# Training
model_tf.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test), verbose=1)


Epoch 1/50
[1m14526/14526[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m49s[0m 3ms/step - accuracy: 0.7491 - loss: 0.5918 - val_accuracy: 0.8131 - val_loss: 0.4407
Epoch 2/50
[1m14526/14526[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 3ms/step - accuracy: 0.8243 - loss: 0.4169 - val_accuracy: 0.8456 - val_loss: 0.3709
Epoch 3/50
[1m14526/14526[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 3ms/step - accuracy: 0.8493 - loss: 0.3642 - val_accuracy: 0.8556 - val_loss: 0.3467
Epoch 4/50
[1m14526/14526[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m43s[0m 3ms/step - accuracy: 0.8612 - loss: 0.3365 - val_accuracy: 0.8621 - val_loss: 0.3292
Epoch 5/50
[1m14526/14526[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m85s[0m 3ms/step - accuracy: 0.8689 - loss: 0.3200 - val_accuracy: 0.8694 - val_loss: 0.3174
Epoch 6/50
[1m14526/14526[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m49s[0m 3ms/step - accuracy: 0.8744 - loss: 0.3061 - val_accuracy: 0.8777 - val_loss: 0.300

<keras.src.callbacks.history.History at 0x7f792a9769d0>

In [22]:
# Evaluasi Model TensorFlow
y_pred_probs = model_tf.predict(X_test)
y_pred = y_pred_probs.argmax(axis=1)

print("TensorFlow Model Evaluation:")
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.4f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.4f}")
print(f"F1-Score: {f1_score(y_test, y_pred, average='weighted'):.4f}")

[1m3632/3632[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 1ms/step
TensorFlow Model Evaluation:
Accuracy: 0.9066
Precision: 0.9099
Recall: 0.9066
F1-Score: 0.9059


In [23]:
# ROC-AUC hanya bisa dihitung jika jumlah kelas = 2
if output_dim == 2:
    print(f"AUC-ROC: {roc_auc_score(y_test, y_pred_probs[:,1]):.4f}")


##**Penjelasan Singkat**

1️⃣ Akurasi (Accuracy)

Akurasi mengukur seberapa banyak prediksi yang benar dibandingkan dengan total prediksi.

Akurasi=

TP+TN/ TP+TN+FP+FN​

📌 Keterangan:

TP (True Positive) → Prediksi benar sebagai positif.

TN (True Negative) → Prediksi benar sebagai negatif.

FP (False Positive) → Prediksi salah sebagai positif.

FN (False Negative) → Prediksi salah sebagai negatif.

2️⃣ Presisi (Precision)

Presisi mengukur seberapa akurat model dalam memprediksi kelas positif.

Presisi= TP/ TP+FP

📌 Keterangan:

Jika model memiliki presisi tinggi, berarti model jarang salah dalam memprediksi kelas positif.

Jika presisi rendah, model sering salah memprediksi negatif sebagai positif (False Positive tinggi).

3️⃣ Recall (Sensitivitas)

Recall mengukur seberapa baik model dalam menangkap semua kasus positif.

Recall= TP/ TP+FN

📌 Keterangan:

Jika recall tinggi, berarti model berhasil menangkap hampir semua data positif. Jika recall rendah, model sering gagal mengenali positif sebagai negatif (False Negative tinggi).

4️⃣ F1-Score (Harmonic Mean) F1-score menggabungkan presisi dan recall dalam satu metrik menggunakan rata-rata harmonik.

𝐹 1 = 2 × Presisi × Recall/ Presisi + Recall

📌 Keterangan:

F1-score tinggi berarti model seimbang dalam presisi dan recall.

Bagus untuk dataset tidak seimbang, karena mempertimbangkan False Positive dan False Negative secara bersamaan.

5️⃣ Area Under Curve (AUC - ROC) AUC (Area Under Curve) mengukur seberapa baik model membedakan antara kelas positif dan negatif.

AUC berasal dari ROC Curve, yang membandingkan True Positive Rate (TPR) vs False Positive Rate (FPR).

TPR = 𝑇 𝑃/ 𝑇 𝑃 + 𝐹 𝑁

(Sama dengan Recall)

FPR= FP/ FP+TN

📌 Keterangan:

AUC = 1.0 → Model sempurna (memisahkan kelas dengan sempurna). AUC = 0.5 → Model acak (tidak lebih baik dari tebak-tebakan). AUC < 0.5 → Model lebih buruk dari acak (terbalik).

6️⃣ Receiver Operating Characteristic (ROC) Curve

ROC Curve adalah grafik yang membandingkan TPR vs FPR pada berbagai threshold.

Sumbu X → FPR (False Positive Rate)

Sumbu Y → TPR (True Positive Rate / Recall)

📌 Bagaimana membacanya?

Semakin tinggi kurva, semakin baik model.

ROC yang mendekati diagonal (AUC ≈ 0.5) berarti model tidak berguna.

# **Kesimpulan**

Model MLP ini performanya lebih baik saat dijalankan menggunakan Pytorch dengan selisih yang sangat kecil dengan Tensorflow.

**Model Evaluasi Terbaik**

Pada model MLP ini, Akurasi & F1-Score adalah metrik utama karena perbedaannya kecil antara Precision dan Recall.