## Tugas 1

Buatlah model SVM dengan menggunakan data voice.csv dengan ketentuan,

a. Split data dengan rasio 70:30 dan 80:20 untuk setiap model yang akan dibangun.

    i. Gunakan model dengan kernel linier.
    ii. Gunakan model dengan kernel polynomial.
    iii. Gunakan model dengan kernel RBF.

b. Tabulasikan performansi setiap split dan kernel berdasarkan metrik akurasi.


In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# 1. Load Data
# Pastikan file voice.csv sudah diupload ke Files di Google Colab
try:
    df = pd.read_csv('/content/drive/MyDrive/ML_Dataset/voice.csv')
except FileNotFoundError:
    print("File voice.csv tidak ditemukan. Mohon upload file terlebih dahulu.")

# 2. Preprocessing
# Memisahkan fitur (X) dan label (y)
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

# Encoding label (male/female menjadi angka)
le = LabelEncoder()
y = le.fit_transform(y)

# Scaling fitur (Standardization) sangat penting untuk SVM
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# 3. Membangun Model dan Evaluasi
# Definisi konfigurasi split dan kernel
split_ratios = {
    '70:30': 0.30,
    '80:20': 0.20
}
kernels = ['linear', 'poly', 'rbf']

results = []

print(f"{'Split Ratio':<15} | {'Kernel':<10} | {'Accuracy':<10}")
print("-" * 45)

for ratio_name, test_size in split_ratios.items():
    # a. Split data
    X_train, X_test, y_train, y_test = train_test_split(
        X_scaled, y, test_size=test_size, random_state=42
    )

    for kernel in kernels:
        # i, ii, iii. Membuat model SVM dengan kernel tertentu
        svm_model = SVC(kernel=kernel)
        svm_model.fit(X_train, y_train)

        # Prediksi
        y_pred = svm_model.predict(X_test)

        # Hitung akurasi
        acc = accuracy_score(y_test, y_pred)

        # Simpan hasil
        results.append({
            'Split Ratio': ratio_name,
            'Kernel': kernel,
            'Accuracy': acc
        })

        print(f"{ratio_name:<15} | {kernel:<10} | {acc:.6f}")

# Opsional: Menampilkan hasil sebagai DataFrame yang rapi
print("\nTabulasi Hasil Akhir:")
results_df = pd.DataFrame(results)
print(results_df)

Split Ratio     | Kernel     | Accuracy  
---------------------------------------------
70:30           | linear     | 0.970557
70:30           | poly       | 0.956887
70:30           | rbf        | 0.981073
80:20           | linear     | 0.976341
80:20           | poly       | 0.968454
80:20           | rbf        | 0.982650

Tabulasi Hasil Akhir:
  Split Ratio  Kernel  Accuracy
0       70:30  linear  0.970557
1       70:30    poly  0.956887
2       70:30     rbf  0.981073
3       80:20  linear  0.976341
4       80:20    poly  0.968454
5       80:20     rbf  0.982650



## Tugas 2

Gunakan data pada praktikum 5 untuk membuat model klasifikasi siang dan malam menggunakan SVM dengan kernel RBF menggunakan fitur histrogram. Gunakan rasio 80:20. Anda dapat bereksperimen dengan hyperparameter tunning dari kernel RBF. Catat performansi akurasinya!


In [2]:
import cv2
import numpy as np
import os
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
import matplotlib.pyplot as plt

# ==========================================
# 1. FUNGSI EKSTRAKSI FITUR (HISTOGRAM)
# ==========================================
def extract_color_histogram(image, bins=(8, 8, 8)):
    # Menghitung histogram 3D untuk channel warna (biasanya RGB)
    # bins=(8, 8, 8) berarti kita membagi setiap channel warna menjadi 8 bagian
    hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) # Opsional: HSV sering lebih baik untuk siang/malam
    hist = cv2.calcHist([hsv], [0, 1, 2], None, bins,
                        [0, 180, 0, 256, 0, 256])

    # Normalisasi histogram agar skala fitur sama meski ukuran gambar beda
    if cv2.normalize(hist, hist) is not None:
        cv2.normalize(hist, hist)

    # Flatten histogram menjadi satu array 1D
    return hist.flatten()

# ==========================================
# 2. LOAD DATASET
# ==========================================
# GANTI path ini sesuai lokasi folder dataset Anda di Colab/Drive
# Contoh jika di upload langsung: './dataset'
# Contoh jika di Drive: '/content/drive/MyDrive/dataset_praktikum5'
root_path = 'dataset'

data = []
labels = []
classes = ['siang', 'malam'] # Pastikan nama folder sesuai

print("Sedang memproses gambar...")

# Cek apakah folder ada
if not os.path.exists(root_path):
    print(f"ERROR: Folder '{root_path}' tidak ditemukan. Harap sesuaikan variabel 'root_path'.")
else:
    for label, class_name in enumerate(classes):
        class_path = os.path.join(root_path, class_name)

        # Cek isi folder
        if not os.path.exists(class_path):
            print(f"Warning: Folder {class_path} tidak ditemukan.")
            continue

        for image_name in os.listdir(class_path):
            image_path = os.path.join(class_path, image_name)

            # Baca Gambar
            image = cv2.imread(image_path)

            if image is not None:
                # Ekstrak fitur histogram
                hist_features = extract_color_histogram(image)

                data.append(hist_features)
                labels.append(label) # 0 untuk siang, 1 untuk malam
            else:
                print(f"Gagal membaca: {image_path}")

    # Konversi ke numpy array
    X = np.array(data)
    y = np.array(labels)

    print(f"Total data: {len(X)}")
    print(f"Shape fitur: {X.shape}")

    # ==========================================
    # 3. SPLIT DATA (80:20)
    # ==========================================
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )

    print(f"Data Train: {len(X_train)}, Data Test: {len(X_test)}")

    # ==========================================
    # 4. HYPERPARAMETER TUNING (SVM RBF)
    # ==========================================
    # Parameter yang akan diuji
    # C: Mengatur margin error (makin besar makin ketat/potensi overfitting)
    # Gamma: Mengatur jangkauan pengaruh satu data latih (kernel width)
    param_grid = {
        'C': [0.1, 1, 10, 100],
        'gamma': [1, 0.1, 0.01, 0.001],
        'kernel': ['rbf']
    }

    print("\nMemulai Grid Search (Hyperparameter Tuning)...")
    grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=2, cv=3)
    grid.fit(X_train, y_train)

    # Menampilkan parameter terbaik
    print(f"\nParameter Terbaik: {grid.best_params_}")
    print(f"Skor Cross-Validation Terbaik: {grid.best_score_:.4f}")

    # ==========================================
    # 5. EVALUASI DAN HASIL
    # ==========================================
    # Prediksi menggunakan model terbaik
    best_model = grid.best_estimator_
    y_pred = best_model.predict(X_test)

    # Metrik Akurasi
    acc = accuracy_score(y_test, y_pred)
    print("\n" + "="*30)
    print(f"AKURASI TEST SET (Split 80:20): {acc:.2%}")
    print("="*30)

    print("\nClassification Report:")
    print(classification_report(y_test, y_pred, target_names=classes))

Sedang memproses gambar...
ERROR: Folder 'dataset' tidak ditemukan. Harap sesuaikan variabel 'root_path'.


## Tugas 2

Gunakan data pada praktikum 5 untuk membuat model klasifikasi siang dan malam menggunakan SVM dengan kernel RBF menggunakan fitur histrogram. Gunakan rasio 80:20. Anda dapat bereksperimen dengan hyperparameter tunning dari kernel RBF. Catat performansi akurasinya!

In [3]:
import os
import cv2
import numpy as np
import glob
from google.colab import drive
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report

# 1. Mount Google Drive
drive.mount('/content/drive')

# Definisi Path (Sesuai prompt Anda)
base_train_dir = "/content/drive/MyDrive/ML_Dataset/images/training/"
base_test_dir = "/content/drive/MyDrive/ML_Dataset/images/test/"

# List direktori untuk iterasi (menggabungkan train dan test agar bisa displit ulang 80:20)
directories = [base_train_dir, base_test_dir]
categories = ['day', 'night']

data = []
labels = []

# 2. Fungsi Ekstraksi Fitur Histogram
def extract_color_histogram(image, bins=(8, 8, 8)):
    # Mengubah color space ke HSV (seringkali lebih baik untuk pencahayaan)
    # atau tetap RGB. Di sini kita gunakan HSV.
    hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)

    # Hitung histogram
    hist = cv2.calcHist([hsv], [0, 1, 2], None, bins, [0, 180, 0, 256, 0, 256])

    # Normalisasi histogram (penting untuk SVM)
    cv2.normalize(hist, hist)

    # Flatten menjadi 1D array
    return hist.flatten()

print("Mulai memproses gambar...")

# 3. Load dan Proses Gambar
# Kita load dari folder training DAN test yang ada, lalu digabung
for directory in directories:
    for category in categories:
        path = os.path.join(directory, category)
        label = 0 if category == 'night' else 1  # 0 untuk night, 1 untuk day

        # Cek apakah folder ada
        if not os.path.exists(path):
            print(f"Warning: Path {path} tidak ditemukan.")
            continue

        # Baca semua gambar di folder (jpg, png, jpeg)
        image_paths = []
        for ext in ['*.jpg', '*.jpeg', '*.png']:
            image_paths.extend(glob.glob(os.path.join(path, ext)))

        for img_path in image_paths:
            image = cv2.imread(img_path)
            if image is not None:
                # Resize gambar untuk mempercepat proses (opsional, tapi disarankan)
                # Histogram tidak terlalu terpengaruh ukuran, tapi load time berpengaruh
                image = cv2.resize(image, (128, 128))

                # Ekstraksi fitur
                features = extract_color_histogram(image)

                data.append(features)
                labels.append(label)

# Konversi ke numpy array
X = np.array(data)
y = np.array(labels)

print(f"Total gambar terkumpul: {len(X)}")

# Cek jika data kosong
if len(X) == 0:
    print("Error: Tidak ada data gambar yang ditemukan. Periksa path drive Anda.")
else:
    # 4. Split Data (Rasio 80:20)
    # Kita menggunakan random_state agar hasil bisa direproduksi
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    print(f"Jumlah Data Training (80%): {len(X_train)}")
    print(f"Jumlah Data Testing (20%): {len(X_test)}")

    # Scaling Data (Sangat disarankan untuk SVM)
    # Meskipun histogram sudah dinormalisasi, scaler membantu konvergensi SVM
    scaler = StandardScaler()
    X_train = scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)

    # 5. Hyperparameter Tuning dengan GridSearchCV (Eksperimen)
    # Mencari parameter C dan gamma terbaik untuk kernel RBF
    param_grid = {
        'C': [0.1, 1, 10, 100],
        'gamma': [1, 0.1, 0.01, 0.001],
        'kernel': ['rbf']
    }

    print("\nSedang melakukan Hyperparameter Tuning (GridSearch)...")
    grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=2, cv=3)
    grid.fit(X_train, y_train)

    print(f"\nParameter Terbaik: {grid.best_params_}")

    # 6. Prediksi dan Evaluasi
    best_model = grid.best_estimator_
    y_pred = best_model.predict(X_test)

    acc = accuracy_score(y_test, y_pred)

    print("\n" + "="*30)
    print(f"Akurasi Model (RBF Kernel): {acc:.4f}")
    print("="*30)
    print("\nClassification Report:")
    print(classification_report(y_test, y_pred, target_names=['Night', 'Day']))

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Mulai memproses gambar...
Total gambar terkumpul: 400
Jumlah Data Training (80%): 320
Jumlah Data Testing (20%): 80

Sedang melakukan Hyperparameter Tuning (GridSearch)...
Fitting 3 folds for each of 16 candidates, totalling 48 fits
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.0s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.0s
[CV] END .........................C=0.1, gamma=1, kernel=rbf; total time=   0.0s
[CV] END .......................C=0.1, gamma=0.1, kernel=rbf; total time=   0.0s
[CV] END .......................C=0.1, gamma=0.1, kernel=rbf; total time=   0.0s
[CV] END .......................C=0.1, gamma=0.1, kernel=rbf; total time=   0.0s
[CV] END ......................C=0.1, gamma=0.01, kernel=rbf; total time=   0.0s
[CV] END ......................C=0.1, gamma=0.01, kernel=rbf; total tim