# TUGAS PRAKTIKUM

**1. Buatlah model SVM dengan menggunakan data voice.csv dengan ketentuan,**

**Split data dengan rasio 70:30 dan 80:20 untuk setiap model yang akan dibangun.**

**Gunakan model dengan kernel linier.**

**Gunakan model dengan kernel polynomial.**

**Gunakan model dengan kernel RBF.**

**Tabulasikan performansi setiap split dan kernel berdasarkan metrik akurasi.**

**Import Library**

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from tabulate import tabulate

**Muat Dataset**


In [None]:
df = pd.read_csv('/content/drive/MyDrive/Machine Learning   15/Data/voice.csv')

**Pra-pemrosesan Data**

In [None]:
# Encoding Kolom Target ('label')
# Ubah 'male' menjadi 1 dan 'female' menjadi 0
df['label'] = df['label'].map({'male': 1, 'female': 0})

# Pemisahan Fitur (X) dan Target (y)
X = df.drop('label', axis=1)
y = df['label']

# Scaling Fitur
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print("\nShape data (X, y):", X_scaled.shape, y.shape)


Shape data (X, y): (3168, 20) (3168,)


**Membuat Fungsi yang menerima data, rasio, split, dan jenis kernel, kemudian megembalikan nilai akurasi**

In [None]:
def train_and_evaluate_svm(X, y, test_size_ratio, kernel_type):
    """
    Melatih model SVM dengan kernel tertentu dan rasio split,
    kemudian mengembalikan akurasi pada data uji (test).
    """
    # Menghitung test_size dari rasio (misalnya 0.3 untuk 70:30)
    test_size = test_size_ratio / 100.0

    # Split Data
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=test_size, random_state=42, stratify=y
    )

    # Inisialisasi dan Latih Model SVM
    if kernel_type == 'linear':
        model = SVC(kernel='linear', random_state=42)
    elif kernel_type == 'poly':
        # Untuk polynomial, biasanya menggunakan degree=3
        model = SVC(kernel='poly', degree=3, random_state=42)
    elif kernel_type == 'rbf':
        # RBF (Radial Basis Function) adalah kernel default
        model = SVC(kernel='rbf', random_state=42)
    else:
        raise ValueError("Kernel tidak valid.")

    model.fit(X_train, y_train)

    # Prediksi dan Evaluasi
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)

    return accuracy

**Eksperimen dan pengumpulan hasil**

In [None]:
# Daftar konfigurasi yang akan diuji
configurations = [
    {'split_ratio': '70:30', 'test_size': 30, 'kernel': 'linear'},
    {'split_ratio': '70:30', 'test_size': 30, 'kernel': 'poly'},
    {'split_ratio': '70:30', 'test_size': 30, 'kernel': 'rbf'},
    {'split_ratio': '80:20', 'test_size': 20, 'kernel': 'linear'},
    {'split_ratio': '80:20', 'test_size': 20, 'kernel': 'poly'},
    {'split_ratio': '80:20', 'test_size': 20, 'kernel': 'rbf'},
]

results = []

print("\nMemulai Pelatihan dan Evaluasi Model...")
for config in configurations:
    ratio = config['split_ratio']
    size = config['test_size']
    kernel = config['kernel']

    # Pelatihan dan Evaluasi
    accuracy = train_and_evaluate_svm(X_scaled, y, size, kernel)

    # Simpan Hasil
    results.append([ratio, kernel, f"{accuracy:.4f}"])
    print(f"-> Selesai: Split {ratio}, Kernel {kernel}, Akurasi: {accuracy:.4f}")


Memulai Pelatihan dan Evaluasi Model...
-> Selesai: Split 70:30, Kernel linear, Akurasi: 0.9790
-> Selesai: Split 70:30, Kernel poly, Akurasi: 0.9590
-> Selesai: Split 70:30, Kernel rbf, Akurasi: 0.9832
-> Selesai: Split 80:20, Kernel linear, Akurasi: 0.9748
-> Selesai: Split 80:20, Kernel poly, Akurasi: 0.9574
-> Selesai: Split 80:20, Kernel rbf, Akurasi: 0.9826


**Tabulasi Performansi**

In [None]:
# Header untuk Tabel
headers = ["Split Data", "Kernel SVM", "Akurasi (Test Set)"]

# Cetak Tabel menggunakan tabulate
print("HASIL PERBANDINGAN PERFORMASI MODEL SVM")
print(tabulate(results, headers=headers, tablefmt="fancy_grid"))

# Contoh interpretasi
best_model = max(results, key=lambda item: float(item[2]))
print(f"\nModel Terbaik berdasarkan Akurasi: Split {best_model[0]} dengan Kernel {best_model[1]} (Akurasi: {best_model[2]})")

HASIL PERBANDINGAN PERFORMASI MODEL SVM
╒══════════════╤══════════════╤══════════════════════╕
│ Split Data   │ Kernel SVM   │   Akurasi (Test Set) │
╞══════════════╪══════════════╪══════════════════════╡
│ 70:30        │ linear       │               0.979  │
├──────────────┼──────────────┼──────────────────────┤
│ 70:30        │ poly         │               0.959  │
├──────────────┼──────────────┼──────────────────────┤
│ 70:30        │ rbf          │               0.9832 │
├──────────────┼──────────────┼──────────────────────┤
│ 80:20        │ linear       │               0.9748 │
├──────────────┼──────────────┼──────────────────────┤
│ 80:20        │ poly         │               0.9574 │
├──────────────┼──────────────┼──────────────────────┤
│ 80:20        │ rbf          │               0.9826 │
╘══════════════╧══════════════╧══════════════════════╛

Model Terbaik berdasarkan Akurasi: Split 70:30 dengan Kernel rbf (Akurasi: 0.9832)


**2. Gunakan data pada praktikum 5 untuk membuat model klasifikasi siang dan malam menggunakan SVM dengan kernel RBF menggunakan fitur histrogram. Gunakan rasio 80:20. Anda dapat bereksperimen dengan hyperparameter tunning dari kernel RBF. Catat performansi akurasinya!**

In [None]:
# Image directories
train_dir = "images/training/"
test_dir = "images/test/"

In [None]:
train_img = load_dataset(train_dir)
train_std_img_list = preprocess(train_img)

test_img = load_dataset(test_dir)
test_std_img_list = preprocess(test_img)

print(f"Number of preprocessed training images: {len(train_std_img_list)}")
print(f"Number of preprocessed test images: {len(test_std_img_list)}")

Number of preprocessed training images: 240
Number of preprocessed test images: 160


**Extract Histogram Features**


In [None]:
def extract_histogram_features(image):
    # Convert image to HSV color space
    hsv_image = cv2.cvtColor(image, cv2.COLOR_RGB2HSV)

    # Define histogram parameters
    # H: 0-180, S: 0-255, V: 0-255
    hist_bins = 32
    hist_range_h = [0, 180]
    hist_range_sv = [0, 256]

    # Compute histogram for Hue channel
    hist_h = cv2.calcHist([hsv_image], [0], None, [hist_bins], hist_range_h)
    # Compute histogram for Saturation channel
    hist_s = cv2.calcHist([hsv_image], [1], None, [hist_bins], hist_range_sv)
    # Compute histogram for Value channel
    hist_v = cv2.calcHist([hsv_image], [2], None, [hist_bins], hist_range_sv)

    # Normalize histograms
    cv2.normalize(hist_h, hist_h, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX)
    cv2.normalize(hist_s, hist_s, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX)
    cv2.normalize(hist_v, hist_v, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX)

    # Concatenate normalized histograms into a single feature vector
    feature_vector = np.concatenate((hist_h.flatten(), hist_s.flatten(), hist_v.flatten()))

    return feature_vector

# Apply the function to training data
train_features = []
train_labels = []
for img, label in train_std_img_list:
    features = extract_histogram_features(img)
    train_features.append(features)
    train_labels.append(label)

train_features = np.array(train_features)
train_labels = np.array(train_labels)

# Apply the function to test data
test_features = []
test_labels = []
for img, label in test_std_img_list:
    features = extract_histogram_features(img)
    test_features.append(features)
    test_labels.append(label)

test_features = np.array(test_features)
test_labels = np.array(test_labels)

print(f"Shape of training features: {train_features.shape}")
print(f"Shape of training labels: {train_labels.shape}")
print(f"Shape of test features: {test_features.shape}")
print(f"Shape of test labels: {test_labels.shape}")


Shape of training features: (240, 96)
Shape of training labels: (240,)
Shape of test features: (160, 96)
Shape of test labels: (160,)


**Train model SVM dengan RBF kernel menggunakan rasio 80:20**



In [None]:
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Split training data into 80:20 for internal validation during GridSearchCV
# (GridSearchCV will handle the splits internally, but we use X_train_tuned/y_train_tuned for fitting the GridSearchCV object)
# For the overall split, we already have train_features and test_features based on the original 80:20 ratio from the problem statement
# So, the problem implies using the preprocessed train_features as the training set for the model, and test_features as the test set.

X_train_svm, X_val_svm, y_train_svm, y_val_svm = train_test_split(train_features, train_labels, test_size=0.2, random_state=42, stratify=train_labels)

# Define the parameter grid for RBF kernel
param_grid = {
    'C': [0.1, 1, 10, 100, 1000],
    'gamma': [0.0001, 0.001, 0.01, 0.1, 1]
}

# Initialize GridSearchCV with SVC (RBF kernel is default)
grid_search = GridSearchCV(SVC(kernel='rbf', random_state=42), param_grid, cv=5, verbose=2, n_jobs=-1)

# Fit GridSearchCV on the training features (from the 80:20 split of original data)
print("Fitting GridSearchCV to find best hyperparameters...")
grid_search.fit(X_train_svm, y_train_svm)

print(f"Best parameters: {grid_search.best_params_}")
print(f"Best cross-validation score: {grid_search.best_score_:.4f}")

# Get the best model
best_svm_model = grid_search.best_estimator_

# Evaluate the best model on the original test set
y_pred_test = best_svm_model.predict(test_features)
accuracy_test = accuracy_score(test_labels, y_pred_test)

print(f"Accuracy on the test set with best RBF SVM model: {accuracy_test:.4f}")


Fitting GridSearchCV to find best hyperparameters...
Fitting 5 folds for each of 25 candidates, totalling 125 fits
Best parameters: {'C': 1, 'gamma': 0.1}
Best cross-validation score: 0.9947
Accuracy on the test set with best RBF SVM model: 1.0000
