<a href="https://colab.research.google.com/github/rahmatullayli/pendata/blob/main/23_185_rahmatul_layli_uas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# UAS

Silahkan lakukan analyis terhadap salah satu data link berikut https://archive.ics.uci.edu/datasets?Task=Classification&skip=0&take=10&sort=desc&orderBy=NumHits&search=&Types=Multivariate&NumInstances=336&NumInstances=1041
Dengan tahapan analysis data berdasarkan
1.	Data understanding
2.	Preprocessing
3.	Modelling
4.	Evaluasi

Setelah evaluasi dilakukan lakukan deployment terhadap model terbaik tersebut
Hasil analis digenerate di webstatis masing masing dan diupload

**🧪 1. Data Understanding**

a. Tujuan:

Memprediksi kualitas anggur berdasarkan sifat fisikokimia (misalnya: pH, alkohol, sulfur, dll.).

b. Target Variable:

quality (skor integer dari 0–10, mayoritas 3–8)

c. Feature:



*   fixed acidity

*   volatile acidity
*   citric acid


*  residual sugar


*   chlorides

*   free sulfur dioxide
*   total sulfur dioxide


*   density



*   pH

*   sulphates
*   alcohol



**🧹 2. Preprocessing**

Langkah-langkah:

1.   Load Dataset
2.   Cek missing values: tidak ada missing.
3.   Normalisasi fitur: MinMaxScaler
4.   Re-label target (opsional klasifikasi):


*   3-5 = low
*   5= medium
- 7-8 = high





In [None]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler  # ⬅️ Penting: ini harus diimpor

# URL resmi dari UCI untuk red wine dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"

# Load dataset langsung dari URL
df = pd.read_csv(url, sep=';')

# Cek data awal
print(df.head())

# Relabel target (klasifikasi)
def relabel(q):
    if q <= 5:
        return 'low'
    elif q == 6:
        return 'medium'
    else:
        return 'high'

df['quality_label'] = df['quality'].apply(relabel)

# Normalisasi fitur
scaler = MinMaxScaler()
X = scaler.fit_transform(df.drop(['quality', 'quality_label'], axis=1))
y = df['quality_label']


   fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \
0            7.4              0.70         0.00             1.9      0.076   
1            7.8              0.88         0.00             2.6      0.098   
2            7.8              0.76         0.04             2.3      0.092   
3           11.2              0.28         0.56             1.9      0.075   
4            7.4              0.70         0.00             1.9      0.076   

   free sulfur dioxide  total sulfur dioxide  density    pH  sulphates  \
0                 11.0                  34.0   0.9978  3.51       0.56   
1                 25.0                  67.0   0.9968  3.20       0.68   
2                 15.0                  54.0   0.9970  3.26       0.65   
3                 17.0                  60.0   0.9980  3.16       0.58   
4                 11.0                  34.0   0.9978  3.51       0.56   

   alcohol  quality  
0      9.4        5  
1      9.8        5  
2      9.8        5 

**🤖 3. Modelling**

a. Split Dataset

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42)


b. Algoritma yang diuji:

- Logistic Regression

- Random Forest

- Support Vector Machine (SVM)

- K-Nearest Neighbors (KNN)

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

models = {
    'Random Forest': RandomForestClassifier(),
    'SVM': SVC(),
    'KNN': KNeighborsClassifier(),
    'LogReg': LogisticRegression(max_iter=1000)
}


**📊 4. Evaluasi**



In [None]:
from sklearn.metrics import classification_report, accuracy_score

for name, model in models.items():
    model.fit(X_train, y_train)
    preds = model.predict(X_test)
    acc = accuracy_score(y_test, preds)
    print(f"{name} Accuracy: {acc:.2f}")
    print(classification_report(y_test, preds))


Random Forest Accuracy: 0.76
              precision    recall  f1-score   support

        high       0.69      0.56      0.62        43
         low       0.81      0.85      0.83       149
      medium       0.73      0.73      0.73       128

    accuracy                           0.76       320
   macro avg       0.74      0.71      0.72       320
weighted avg       0.76      0.76      0.76       320

SVM Accuracy: 0.64
              precision    recall  f1-score   support

        high       0.54      0.35      0.42        43
         low       0.70      0.78      0.74       149
      medium       0.58      0.57      0.57       128

    accuracy                           0.64       320
   macro avg       0.60      0.57      0.58       320
weighted avg       0.63      0.64      0.63       320

KNN Accuracy: 0.60
              precision    recall  f1-score   support

        high       0.47      0.53      0.50        43
         low       0.65      0.74      0.69       149
      me

 code Hasil Evaluasi:

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
import pandas as pd

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42)

# Daftar model
models = {
    'Random Forest': RandomForestClassifier(),
    'SVM': SVC(),
    'KNN': KNeighborsClassifier(),
    'Logistic Regression': LogisticRegression(max_iter=1000)
}

# Simpan hasil evaluasi
results = []

# Training dan evaluasi
for name, model in models.items():
    model.fit(X_train, y_train)
    preds = model.predict(X_test)
    acc = accuracy_score(y_test, preds)
    results.append({'Model': name, 'Accuracy': round(acc, 2)})

# Buat DataFrame ringkasan
results_df = pd.DataFrame(results)
print(results_df)


                 Model  Accuracy
0        Random Forest      0.76
1                  SVM      0.64
2                  KNN      0.60
3  Logistic Regression      0.61
