## 와인 분류기
---

### 데이터 세팅
___

In [2]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

wine  = load_wine()

wine_data = wine.data
wine_label = wine.target

# 문제지 / 시험지 준비
x_train, x_test, y_train, y_test = train_test_split(wine_data, 
                                                    wine_label, 
                                                   test_size=0.2,
                                                   random_state=24)

print('x_train:', len(x_train), 'x_test:', len(x_test))


x_train: 142 x_test: 36


### 데이터 분석
---

In [6]:
# 데이터에 담긴 정보
print('keys:', wine.keys())

# 데이터와 라벨의 크기
print('data_shape:', wine_data.shape)
print('label_shape:', wine_label.shape)

# feature들의 이름
print('feature_names:', wine.feature_names)

# 라벨의 이름
print('target_names:', wine.target_names)

# 데이터셋에 대한 설명 
print(wine.DESCR)

keys: dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names'])
data_shape: (178, 13)
label_shape: (178,)
feature_names: ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']
target_names: ['class_0' 'class_1' 'class_2']
.. _wine_dataset:

Wine recognition dataset
------------------------

**Data Set Characteristics:**

    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- Alcohol
 		- Malic acid
 		- Ash
		- Alcalinity of ash  
 		- Magnesium
		- Total phenols
 		- Flavanoids
 		- Nonflavanoid phenols
 		- Proanthocyanins
		- Color intensity
 		- Hue
 		- OD280/OD315 of diluted wines
 		- Proline

    - class:
            - class_0
            - class_1
            - class_2
		
    :Summary Statis

### 다양한 학습 모델 설계
---

In [4]:
# Decision Tree
from sklearn.tree import DecisionTreeClassifier

decision_tree = DecisionTreeClassifier(random_state=32)

decision_tree.fit(x_train, y_train)

# RandomForest
from sklearn.ensemble import RandomForestClassifier

random_forest=RandomForestClassifier(random_state=32)

random_forest.fit(x_train, y_train)

# SVM
from sklearn import svm

svm_model=svm.SVC()

svm_model.fit(x_train, y_train)

# SGD
from sklearn.linear_model import SGDClassifier

sgd_model = SGDClassifier()

sgd_model.fit(x_train, y_train)

#Logistic Regression
from sklearn.linear_model import LogisticRegression

logistic_model = LogisticRegression(max_iter=10000)

logistic_model.fit(x_train, y_train)

LogisticRegression(max_iter=10000)

### 다양한 학습 모델 평가
---

In [8]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report

# Decision Tree
dt_y_pred = decision_tree.predict(x_test)
dt_accuracy = accuracy_score(y_test, dt_y_pred)
dt_report = classification_report(y_test, dt_y_pred)
print("Decision Tree:", dt_accuracy)
print(dt_report)


# RandomForest
rf_y_pred = random_forest.predict(x_test)
rf_accuracy = accuracy_score(y_test, rf_y_pred)
rf_report = classification_report(y_test, rf_y_pred)
print("Random Forest:", rf_accuracy)
print(rf_report)

# SVM
svm_y_pred = svm_model.predict(x_test)
svm_accuracy = accuracy_score(y_test, svm_y_pred)
svm_report = classification_report(y_test, svm_y_pred)
print("SVM:", svm_accuracy)
print(svm_report)

# SGD
sgd_y_pred = sgd_model.predict(x_test)
sgd_accuracy = accuracy_score(y_test, sgd_y_pred)
sgd_report = classification_report(y_test, sgd_y_pred)
print("SGD:", sgd_accuracy)
print(sgd_report)

# Logistic Regression
logistic_y_pred = logistic_model.predict(x_test)
logistic_accuracy = accuracy_score(y_test, logistic_y_pred)
logistic_report = classification_report(y_test, logistic_y_pred)
print("Logistic:", logistic_accuracy)
print(logistic_report)

Decision Tree: 0.8888888888888888
              precision    recall  f1-score   support

           0       1.00      0.93      0.96        14
           1       0.81      0.93      0.87        14
           2       0.86      0.75      0.80         8

    accuracy                           0.89        36
   macro avg       0.89      0.87      0.88        36
weighted avg       0.90      0.89      0.89        36

Random Forest: 0.9722222222222222
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        14
           1       1.00      0.93      0.96        14
           2       0.89      1.00      0.94         8

    accuracy                           0.97        36
   macro avg       0.96      0.98      0.97        36
weighted avg       0.98      0.97      0.97        36

SVM: 0.7777777777777778
              precision    recall  f1-score   support

           0       0.93      0.93      0.93        14
           1       0.82      0.64    

  _warn_prf(average, modifier, msg_start, len(result))


와인의 분류는 precision과 recall이 동시에 중요하기 때문에, 각 모델 중 f1-score가 가장 높은 모델이 성능이 좋은 모델이라고 할 수 있다. 

위에서 f1-score의 평균은 Random Forest가 가장 높기 때문에 Random Forest가 가장 좋은 성능을 가진다.