# Part 2

จาก ข้อมูลจากไฟล์ `wine_data.csv` มี 14 คอลัมน์ โดย คอลัมน์แรก (Class label) เป็นประเภทของไวน์ (เป็น "y") และอีก 13 คอลัมน์ที่เหลือเป็น Feature (เป็น "X") ให้ทำการสร้างโมเดลจากวิธี

- Perceptron
- Logistic Regression
- Support vector machine
- Decision tree
- Random Forrest
- KNN
- Naive Bayes

ให้ได้ผล Test Accuracy มากกว่า 0.95

### Tips

- ข้อมูล `wine_data.csv` มี 178 แถว
- ให้ Split Data โดยใช้ test_size=0.3
- List ของชื่อ Feature Columns (13 คอลัมน์)

```python
["Alcohol", "Malic acid", "Ash", "Alcalinity of ash", "Magnesium", "Total phenols", "Flavanoids", "Nonflavanoid phenols", "Proanthocyanins", "Color intensity", "Hue", "OD280/OD315 of diluted wines", "Proline"]
```

### Scoring

- 17 points total
- 2 points for successful classification of each model
- 3 points for writing an automated process (use loop instead of repeated codes).


In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.linear_model import Perceptron
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB

In [None]:
# Function to obtain ML models.
def get_model(modelName):

    if modelName == "Perceptron":
        model = Perceptron(eta0=0.615, random_state=1,
                           alpha=0.0001, n_iter_no_change=10)
    elif modelName == "LogisticRegression":
        model = LogisticRegression(
            random_state=1,
            solver="lbfgs",
            C=13.8949549437,
        )
    elif modelName == "SVC":
        model = SVC(kernel="rbf", gamma=0.00294705, C=18420.7)
    elif modelName == "DecisionTree":
        model = DecisionTreeClassifier(
            criterion="gini", max_depth=36, random_state=1)
    elif modelName == "RandomForrest":
        model = RandomForestClassifier(
            criterion="gini",
            n_estimators=50,
            max_samples=None,
            max_features="auto",
            max_depth=20,
            random_state=1,
        )
    elif modelName == "KNN":
        model = KNeighborsClassifier(
            metric="minkowski",
            algorithm="auto",
            n_neighbors=11,
            weights="distance",
            p=2,
        )

    elif modelName == 'NaiveBayes':
        model = GaussianNB()

    return model

In [2]:
model_names = ['Perceptron', 'LogisticRegression',
               'SVC', 'DecisionTree', 'RandomForrest', 'KNN', 'NaiveBayes']

# Read data
df = pd.read_csv("wine_data.csv", header=None)
df.columns = ["Class label", "Alcohol", "Malic acid", "Ash", "Alcalinity of ash", "Magnesium", "Total phenols", "Flavanoids",
              "Nonflavanoid phenols", "Proanthocyanins", "Color intensity", "Hue", "OD280/OD315 of diluted wines", "Proline"]

X = df.iloc[:, 1:]
y = df.iloc[:, 0]

# Split data into training and testing data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=1, stratify=y
)
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

# Standardization
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

for model_names in model_names:

    # Classifier
    model = get_model(model_names)

    # Training
    model.fit(X_train_std, y_train)

    # Prediction
    y_pred = model.predict(X_test_std)

    # Misclassification from the test samples
    sumMiss = (y_test != y_pred).sum()

    # Accuracy score from the test samples
    accuracyScore = accuracy_score(y_test, y_pred)

    print(model_names)
    print(f"Misclassified examples: {sumMiss}")
    print(f"Accuracy score: {accuracyScore}")
    print('-'*20)

Perceptron
Misclassified examples: 2
Accuracy score: 0.9629629629629629
--------------------
LogisticRegression
Misclassified examples: 1
Accuracy score: 0.9814814814814815
--------------------
SVC
Misclassified examples: 2
Accuracy score: 0.9629629629629629
--------------------
DecisionTree
Misclassified examples: 2
Accuracy score: 0.9629629629629629
--------------------
RandomForrest
Misclassified examples: 0
Accuracy score: 1.0
--------------------
KNN
Misclassified examples: 1
Accuracy score: 0.9814814814814815
--------------------
NaiveBayes
Misclassified examples: 1
Accuracy score: 0.9814814814814815
--------------------
