### Penjelasan

#### Dataset "MaterialStrength.csv" berhasil dimuat. Berikut adalah tinjauan singkat tentang dataset ini:

**Observasi:**
1. Dataset ini berisi banyak fitur (`x1` hingga `x15`) dan satu target, yaitu `target_feature`.
2. Kolom `target_feature` berisi nilai numerik kontinu, sehingga cocok untuk analisis regresi.

#### Langkah-Langkah yang Akan Dilakukan:
1. **Preprocessing**:
   - Menangani fitur kategori (`x14` dan `x15`) serta mengatasi inkonsistensi dalam nilai string-nya.
2. **Feature Scaling**:
   - Melakukan normalisasi data agar sesuai untuk model SVM.
3. **Modeling**:
   - Membangun model regresi menggunakan `SVR` dari library Scikit-learn.
4. **Evaluasi**:
   - Mengeksplorasi metrik evaluasi seperti \(R^2\), MSE, RMSE, dan MAE untuk menilai performa model.

---

### Ringkasan Preprocessing:

1. **Variabel Kategori**:
   - Kolom `x14` dan `x15` dikonversi ke format numerik menggunakan `LabelEncoder`.
   - Nilai-nilai yang tidak konsisten dalam string (seperti huruf besar/kecil) dinormalisasi (diubah menjadi huruf kecil dan di-trim).

2. **Variabel Numerik**:
   - Semua fitur numerik dilakukan standardisasi menggunakan `StandardScaler`.

3. **Pembagian Data**:
   - Dataset dibagi menjadi data pelatihan (80%) dan data pengujian (20%).

---

Selanjutnya, model regresi SVM (`SVR`) dibangun menggunakan Scikit-learn, dan evaluasinya dilakukan dengan metrik seperti \(R^2\), MSE, RMSE, dan MAE. 

Jika ada pertanyaan lebih lanjut, saya siap membantu! 😊

In [11]:
import pandas as pd

# Load the uploaded dataset
file_path = 'MaterialStrength.csv'
data = pd.read_csv(file_path)

# Display the first few rows of the dataset to understand its structure
data.head()


Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,target_feature
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,0.3,540.0,1.538462,1,0.350044,YEs,NO,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,0.3,540.0,1.560651,1,0.452416,yES,nOO,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,0.685714,475.0,1.569024,0,6.704743,yEs,NO,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,0.685714,475.0,1.569024,0,8.891596,yes,NOO,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,0.966767,331.0,1.185221,0,8.126411,YeS,no,44.3


In [12]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder

# Identify categorical columns and numerical columns
categorical_cols = ['x14', 'x15']
numerical_cols = [col for col in data.columns if col not in categorical_cols + ['target_feature']]

# Encode categorical features
label_encoders = {}
for col in categorical_cols:
    le = LabelEncoder()
    data[col] = le.fit_transform(data[col].str.strip().str.lower())
    label_encoders[col] = le

# Extract features and target
X = data[numerical_cols + categorical_cols]
y = data['target_feature']

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

X_train_scaled[:5], y_train.head()  # Display the scaled features and the first few target values for verification


(array([[-1.16087734,  0.85738747,  0.9824489 , -0.314285  ,  1.48165383,
         -0.25746155, -0.64752011, -0.27567315,  1.10651742,  0.13587678,
          0.24425167,  0.76460145, -0.12261551,  0.35016539, -1.03709534],
        [ 1.30862304, -0.60249189,  1.23259821, -0.17819203,  0.39171469,
         -1.92694961, -0.2731482 , -0.27567315, -1.02357946,  1.80108681,
         -0.94973472,  0.76460145, -0.50508326,  0.35016539, -1.03709534],
        [-0.0768653 , -0.85558366,  1.06687429, -1.04637133,  0.62647081,
          1.01785948,  0.06662828, -0.68931339, -0.52554326, -0.15513079,
          0.40374009,  0.76460145, -0.72706322,  0.35016539,  0.9642315 ],
        [-0.29180674,  0.26032097,  0.35707562,  0.57266573,  0.30787321,
         -1.76165376,  0.6040929 , -0.27567315,  0.06611882,  0.15204387,
         -1.36211778,  0.76460145, -0.24086701, -2.85579333,  0.9642315 ],
        [-1.06503765, -0.37010763,  1.11221385, -1.10268566,  0.77738546,
          1.36370925,  0.29767629,

In [13]:
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import numpy as np

# Initialize and train the SVM regression model
svr_model = SVR(kernel='rbf', C=100, gamma='scale', epsilon=0.1)
svr_model.fit(X_train_scaled, y_train)

# Make predictions
y_pred_train = svr_model.predict(X_train_scaled)
y_pred_test = svr_model.predict(X_test_scaled)

# Evaluate the model
metrics = {
    "Train R^2": r2_score(y_train, y_pred_train),
    "Test R^2": r2_score(y_test, y_pred_test),
    "Train MSE": mean_squared_error(y_train, y_pred_train),
    "Test MSE": mean_squared_error(y_test, y_pred_test),
    "Train RMSE": np.sqrt(mean_squared_error(y_train, y_pred_train)),
    "Test RMSE": np.sqrt(mean_squared_error(y_test, y_pred_test)),
    "Train MAE": mean_absolute_error(y_train, y_pred_train),
    "Test MAE": mean_absolute_error(y_test, y_pred_test),
}

metrics


{'Train R^2': 0.9508568013660597,
 'Test R^2': 0.8791248241497263,
 'Train MSE': np.float64(13.960913981282292),
 'Test MSE': np.float64(31.14673991871346),
 'Train RMSE': np.float64(3.7364306471928916),
 'Test RMSE': np.float64(5.580926439106097),
 'Train MAE': np.float64(1.9413884629633558),
 'Test MAE': np.float64(3.8186448447184187)}