# **Tugas**

### **Tugas 1**

Terdapat dataset mushroom. Berdasarkan dataset yang tersebut, bandingkan peforma antara algoritma Decision Tree dan RandomForest. Gunakan tunning hyperparameter untuk mendapatkan parameter dan akurasi yang terbaik.

#### **Import Library**

In [1]:
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import LabelEncoder

#### **Load Data**

In [2]:
# Load Data

dbt = pd.read_csv('data/mushrooms.csv')

dbt.head(15)

Unnamed: 0,class,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,...,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat
0,p,x,s,n,t,p,f,c,n,k,...,s,w,w,p,w,o,p,k,s,u
1,e,x,s,y,t,a,f,c,b,k,...,s,w,w,p,w,o,p,n,n,g
2,e,b,s,w,t,l,f,c,b,n,...,s,w,w,p,w,o,p,n,n,m
3,p,x,y,w,t,p,f,c,n,n,...,s,w,w,p,w,o,p,k,s,u
4,e,x,s,g,f,n,f,w,b,k,...,s,w,w,p,w,o,e,n,a,g
5,e,x,y,y,t,a,f,c,b,n,...,s,w,w,p,w,o,p,k,n,g
6,e,b,s,w,t,a,f,c,b,g,...,s,w,w,p,w,o,p,k,n,m
7,e,b,y,w,t,l,f,c,b,n,...,s,w,w,p,w,o,p,n,s,m
8,p,x,y,w,t,p,f,c,n,p,...,s,w,w,p,w,o,p,k,v,g
9,e,b,s,y,t,a,f,c,b,g,...,s,w,w,p,w,o,p,k,s,m


#### **Periksa Data**

In [3]:
# Memeriksa Informasi Data
dbt.info()

# Memeriksa nilai unik setiap kolom
dbt.nunique()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8124 entries, 0 to 8123
Data columns (total 23 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   class                     8124 non-null   object
 1   cap-shape                 8124 non-null   object
 2   cap-surface               8124 non-null   object
 3   cap-color                 8124 non-null   object
 4   bruises                   8124 non-null   object
 5   odor                      8124 non-null   object
 6   gill-attachment           8124 non-null   object
 7   gill-spacing              8124 non-null   object
 8   gill-size                 8124 non-null   object
 9   gill-color                8124 non-null   object
 10  stalk-shape               8124 non-null   object
 11  stalk-root                8124 non-null   object
 12  stalk-surface-above-ring  8124 non-null   object
 13  stalk-surface-below-ring  8124 non-null   object
 14  stalk-color-above-ring  

class                        2
cap-shape                    6
cap-surface                  4
cap-color                   10
bruises                      2
odor                         9
gill-attachment              2
gill-spacing                 2
gill-size                    2
gill-color                  12
stalk-shape                  2
stalk-root                   5
stalk-surface-above-ring     4
stalk-surface-below-ring     4
stalk-color-above-ring       9
stalk-color-below-ring       9
veil-type                    1
veil-color                   4
ring-number                  3
ring-type                    5
spore-print-color            9
population                   6
habitat                      7
dtype: int64

#### **Encode Data Kategoris**

In [4]:
# Mengubah fitur kategoris menjadi numerik menggunakan get_dumies
dbt_encoded = pd.get_dummies(dbt)

# Memisahkan fitur dan target
X = dbt_encoded.drop(columns=['class_e', 'class_p']) # 'class' adalah target, 'e' dan 'p' adalah kelas yang dipisah
y = dbt_encoded['class_p'] # Menggunakan kelas 'p' sebagai target prediksi jamur beracun

#### **Split Data Training dan Testing**


In [5]:
# Membagi dataset menjadi training dan testing
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42)

#### **Pelatihan dan Evaluasi Model**

#### **Decision Tree Classifier**

In [6]:
# Decision Tree Tuning dengan GridSearchCV
dt_param_grid = {
    'criterion': ['gini', 'entropy'],
    'max_depth': [10, 20, 30, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

dt = DecisionTreeClassifier(random_state=42)
grid_search_dt = GridSearchCV(estimator=dt, param_grid=dt_param_grid, cv=5, n_jobs=-1)
grid_search_dt.fit(X_train, y_train)

# Best parameters dan accuracy untuk Decision Tree
best_dt = grid_search_dt.best_estimator_
y_pred_dt = best_dt.predict(X_test)
dt_acc = accuracy_score(y_test, y_pred_dt)
print(f"Best Decision Tree Accuracy: {dt_acc:.2f}")
print(f"Best Decision Tree Params: {grid_search_dt.best_params_}")
print(f"Decision Tree Classification Report:")
print(classification_report(y_test, y_pred_dt))

Best Decision Tree Accuracy: 1.00
Best Decision Tree Params: {'criterion': 'gini', 'max_depth': 10, 'min_samples_leaf': 1, 'min_samples_split': 2}
Decision Tree Classification Report:
              precision    recall  f1-score   support

       False       1.00      1.00      1.00      1052
        True       1.00      1.00      1.00       979

    accuracy                           1.00      2031
   macro avg       1.00      1.00      1.00      2031
weighted avg       1.00      1.00      1.00      2031



#### **Random Forest Classifier**

In [7]:
# Random Forest Tuning dengan GridSearchSV
rf_param_grid ={
    'n_estimators': [50, 100, 200],    # Jumlah estimators
    'max_depth': [10, 20, 30, None],    # Kedalaman maksimum
    'min_samples_split': [2, 5, 10],    # Minimum sampel untuk pembagian
    'min_samples_leaf': [1, 2, 4],      # Minimum sampel pada setiap daun
    'criterion': ['gini', 'entropy']    # Kriterian pembagian
}

rf = RandomForestClassifier(random_state=42)
grid_search_rf = GridSearchCV(estimator=rf, param_grid=rf_param_grid, cv=5, n_jobs=-1)
grid_search_rf.fit(X_train, y_train)

# Best parameters dan accuracy untuk Random Forest
best_rf = grid_search_rf.best_estimator_
y_pred_rf = best_rf.predict(X_test)
rf_acc = accuracy_score(y_test, y_pred_rf)
print(f"Best Random Forest Accuracy: {rf_acc:.2f}")
print(f"Best Random Forest Params: {grid_search_rf.best_params_}")
print("Random Forest Classification Report:")
print(classification_report(y_test, y_pred_rf))

Best Random Forest Accuracy: 1.00
Best Random Forest Params: {'criterion': 'gini', 'max_depth': 10, 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 50}
Random Forest Classification Report:
              precision    recall  f1-score   support

       False       1.00      1.00      1.00      1052
        True       1.00      1.00      1.00       979

    accuracy                           1.00      2031
   macro avg       1.00      1.00      1.00      2031
weighted avg       1.00      1.00      1.00      2031



#### **Perbandingan Performa**

In [8]:
print(f"Difference in Accuracy (Random Forest - Decision Tree): {rf_acc - dt_acc:.2f}")

Difference in Accuracy (Random Forest - Decision Tree): 0.00
