## 와인 등급 예측 데이터셋
- 회귀 (등급을 맞추는 것이기 때문): 정확도보다는, RMSE, MAE 값으로 모델 성능 평가 진행 


In [1]:
# In[1]: Import statements should include all necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import mlflow
import mlflow.sklearn
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

df = pd.read_csv('winequality.csv') 
df


Unnamed: 0,type,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,white,7.0,0.270,0.36,20.7,0.045,45.0,170.0,1.00100,3.00,0.45,8.8,6
1,white,6.3,0.300,0.34,1.6,0.049,14.0,132.0,0.99400,3.30,0.49,9.5,6
2,white,8.1,0.280,0.40,6.9,0.050,30.0,97.0,0.99510,3.26,0.44,10.1,6
3,white,7.2,0.230,0.32,8.5,0.058,47.0,186.0,0.99560,3.19,0.40,9.9,6
4,white,7.2,0.230,0.32,8.5,0.058,47.0,186.0,0.99560,3.19,0.40,9.9,6
...,...,...,...,...,...,...,...,...,...,...,...,...,...
6492,red,6.2,0.600,0.08,2.0,0.090,32.0,44.0,0.99490,3.45,0.58,10.5,5
6493,red,5.9,0.550,0.10,2.2,0.062,39.0,51.0,0.99512,3.52,,11.2,6
6494,red,6.3,0.510,0.13,2.3,0.076,29.0,40.0,0.99574,3.42,0.75,11.0,6
6495,red,5.9,0.645,0.12,2.0,0.075,32.0,44.0,0.99547,3.57,0.71,10.2,5


In [2]:
df.info()
df.isna().sum()
df.describe() 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6497 entries, 0 to 6496
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   type                  6497 non-null   object 
 1   fixed acidity         6487 non-null   float64
 2   volatile acidity      6489 non-null   float64
 3   citric acid           6494 non-null   float64
 4   residual sugar        6495 non-null   float64
 5   chlorides             6495 non-null   float64
 6   free sulfur dioxide   6497 non-null   float64
 7   total sulfur dioxide  6497 non-null   float64
 8   density               6497 non-null   float64
 9   pH                    6488 non-null   float64
 10  sulphates             6493 non-null   float64
 11  alcohol               6497 non-null   float64
 12  quality               6497 non-null   int64  
dtypes: float64(11), int64(1), object(1)
memory usage: 660.0+ KB


Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
count,6487.0,6489.0,6494.0,6495.0,6495.0,6497.0,6497.0,6497.0,6488.0,6493.0,6497.0,6497.0
mean,7.216579,0.339691,0.318722,5.444326,0.056042,30.525319,115.744574,0.994697,3.218395,0.531215,10.491801,5.818378
std,1.29675,0.164649,0.145265,4.758125,0.035036,17.7494,56.521855,0.002999,0.160748,0.148814,1.192712,0.873255
min,3.8,0.08,0.0,0.6,0.009,1.0,6.0,0.98711,2.72,0.22,8.0,3.0
25%,6.4,0.23,0.25,1.8,0.038,17.0,77.0,0.99234,3.11,0.43,9.5,5.0
50%,7.0,0.29,0.31,3.0,0.047,29.0,118.0,0.99489,3.21,0.51,10.3,6.0
75%,7.7,0.4,0.39,8.1,0.065,41.0,156.0,0.99699,3.32,0.6,11.3,6.0
max,15.9,1.58,1.66,65.8,0.611,289.0,440.0,1.03898,4.01,2.0,14.9,9.0


In [3]:
# 해당 가게의 와인은 등급별로 몇개가 있나요?
df['quality'].value_counts()
df.groupby('quality').size()

# 화이트/레드 와인별 와인의 등급은 몇 개씩 있나요?
df.groupby('type')['quality'].value_counts()
df.groupby(['type','quality']).size()

# 화이트/레드 와인의 평균 당도는 어느 정도 차이가 나나요?
df.groupby('type')['residual sugar'].mean()


# 와인의 등급에 가장 큰 영향을 미치는 top3 재료는 무엇인가요? 
df.corr(numeric_only=True)['quality'].sort_values(ascending=False)
# SQL,파있너 => GPT가 다 잘해주는데? 

quality                 1.000000
alcohol                 0.444319
citric acid             0.085706
free sulfur dioxide     0.055463
sulphates               0.038729
pH                      0.019366
residual sugar         -0.036825
total sulfur dioxide   -0.041385
fixed acidity          -0.077031
chlorides              -0.200886
volatile acidity       -0.265953
density                -0.305858
Name: quality, dtype: float64

In [4]:
# feature engineering

df=df.dropna(how='any')
df.isna().sum()


type                    0
fixed acidity           0
volatile acidity        0
citric acid             0
residual sugar          0
chlorides               0
free sulfur dioxide     0
total sulfur dioxide    0
density                 0
pH                      0
sulphates               0
alcohol                 0
quality                 0
dtype: int64

### Modeling

In [5]:
# (1) 데이터 분리

from sklearn.model_selection import train_test_split


train, test = train_test_split(df,test_size=0.3)

train.to_csv('wine_train.csv')
test.to_csv('wine_test.csv')

In [6]:
# (2) 데이터로드 (학습 테스트) =피쳐 데이터, 레이블 데이터 

X_train =  train.drop(columns=['quality', 'type'], axis=1)
X_test = test.drop(columns=['quality', 'type'], axis=1)

y_train = train['quality']
y_test = test['quality']


X_train.shape, y_train.shape
X_test.shape, y_test.shape

((1939, 11), (1939,))

In [7]:
# (3) 모델 학습  (mlflow load)

import mlflow
import mlflow.sklearn
from sklearn.linear_model import ElasticNet # 와인 데이터와 같은 다차원 데이터
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score # 수정
from itertools import product
mlflow.autolog()

mlflow.set_tracking_uri('http://127.0.0.1:5000')
mlflow.set_experiment(experiment_name='winequality_experiment')


2024/11/18 19:22:30 INFO mlflow.tracking.fluent: Autologging successfully enabled for sklearn.
2024/11/18 19:22:30 INFO mlflow.tracking.fluent: Experiment with name 'winequality_experiment' does not exist. Creating a new experiment.


<Experiment: artifact_location='mlflow-artifacts:/472700017867975302', creation_time=1731925350367, experiment_id='472700017867975302', last_update_time=1731925350367, lifecycle_stage='active', name='winequality_experiment', tags={}>

In [8]:
# 실험 설계
from sklearn.linear_model import ElasticNet
from itertools import product
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

alpha = [0.2, 0.5, 0.7, 1.0]
l1_ratio = [0.2, 0.3, 0.7, 1.0]

mlflow.autolog()

for a, l in product(alpha, l1_ratio):
    with mlflow.start_run(nested=True):
        lr = ElasticNet(alpha=a, l1_ratio=l, random_state=123)
        lr.fit(X_train, y_train) # 모의고사 문제, 모의고사 정답 => 공부좀 하고있어

        pred = lr.predict(X_test) # 수능 문제

        # 모델 성능 평가
        rmse = np.sqrt(mean_squared_error(y_test, pred)) # RMSE
        mae = mean_absolute_error(y_test, pred) # MAE
        r2 = r2_score(y_test, pred)

        # 수동으로 log 기록
        mlflow.log_params({'alpha': a, 'l1_ratio': l})
        mlflow.log_metrics({'rmse': rmse, 'mae': mae, 'r2': r2})
        
        mlflow.sklearn.log_model(lr, 'winequality_model')

        print(f"Alpha: {a}, L1 ratio: {l}, RMSE: {rmse:.4f}")

2024/11/18 19:24:05 INFO mlflow.tracking.fluent: Autologging successfully enabled for sklearn.
2024/11/18 19:24:08 INFO mlflow.tracking._tracking_service.client: 🏃 View run mysterious-chimp-822 at: http://127.0.0.1:5000/#/experiments/472700017867975302/runs/06ea4701b2ba451daf4011f919e2d537.
2024/11/18 19:24:08 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:5000/#/experiments/472700017867975302.


Alpha: 0.2, L1 ratio: 0.2, RMSE: 0.7792


2024/11/18 19:24:10 INFO mlflow.tracking._tracking_service.client: 🏃 View run unleashed-panda-34 at: http://127.0.0.1:5000/#/experiments/472700017867975302/runs/30568fd8f5b842b5966e9cc37c0d6bf4.
2024/11/18 19:24:10 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:5000/#/experiments/472700017867975302.


Alpha: 0.2, L1 ratio: 0.3, RMSE: 0.7806


2024/11/18 19:24:12 INFO mlflow.tracking._tracking_service.client: 🏃 View run powerful-roo-424 at: http://127.0.0.1:5000/#/experiments/472700017867975302/runs/ceae13837d924625a1a137253c975a3b.
2024/11/18 19:24:12 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:5000/#/experiments/472700017867975302.


Alpha: 0.2, L1 ratio: 0.7, RMSE: 0.7895


2024/11/18 19:24:13 INFO mlflow.tracking._tracking_service.client: 🏃 View run grandiose-snail-924 at: http://127.0.0.1:5000/#/experiments/472700017867975302/runs/9940c4b1a877435aa9600ace7d648f93.
2024/11/18 19:24:13 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:5000/#/experiments/472700017867975302.


Alpha: 0.2, L1 ratio: 1.0, RMSE: 0.7984


2024/11/18 19:24:15 INFO mlflow.tracking._tracking_service.client: 🏃 View run welcoming-conch-651 at: http://127.0.0.1:5000/#/experiments/472700017867975302/runs/3d1002b571304f10a03d0b65a291d6c0.
2024/11/18 19:24:15 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:5000/#/experiments/472700017867975302.


Alpha: 0.5, L1 ratio: 0.2, RMSE: 0.7934


2024/11/18 19:24:17 INFO mlflow.tracking._tracking_service.client: 🏃 View run polite-tern-130 at: http://127.0.0.1:5000/#/experiments/472700017867975302/runs/8b8c592f83f8456b9c8436705eac27ba.
2024/11/18 19:24:17 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:5000/#/experiments/472700017867975302.


Alpha: 0.5, L1 ratio: 0.3, RMSE: 0.8006


2024/11/18 19:24:19 INFO mlflow.tracking._tracking_service.client: 🏃 View run funny-sponge-116 at: http://127.0.0.1:5000/#/experiments/472700017867975302/runs/d277d84808824a15a9141042fc604731.
2024/11/18 19:24:19 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:5000/#/experiments/472700017867975302.


Alpha: 0.5, L1 ratio: 0.7, RMSE: 0.8386


2024/11/18 19:24:20 INFO mlflow.tracking._tracking_service.client: 🏃 View run placid-snipe-199 at: http://127.0.0.1:5000/#/experiments/472700017867975302/runs/d90ceceed6204ab19b662b77abeecf77.
2024/11/18 19:24:20 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:5000/#/experiments/472700017867975302.


Alpha: 0.5, L1 ratio: 1.0, RMSE: 0.8731


2024/11/18 19:24:22 INFO mlflow.tracking._tracking_service.client: 🏃 View run abundant-shrimp-455 at: http://127.0.0.1:5000/#/experiments/472700017867975302/runs/8972d76006c9497e898159227e913d1d.
2024/11/18 19:24:22 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:5000/#/experiments/472700017867975302.


Alpha: 0.7, L1 ratio: 0.2, RMSE: 0.8045


2024/11/18 19:24:24 INFO mlflow.tracking._tracking_service.client: 🏃 View run rare-fly-413 at: http://127.0.0.1:5000/#/experiments/472700017867975302/runs/8ff66629902e42cba075394cf31a94ec.
2024/11/18 19:24:24 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:5000/#/experiments/472700017867975302.


Alpha: 0.7, L1 ratio: 0.3, RMSE: 0.8144


2024/11/18 19:24:26 INFO mlflow.tracking._tracking_service.client: 🏃 View run trusting-grouse-307 at: http://127.0.0.1:5000/#/experiments/472700017867975302/runs/fe29d2b3c4044d0a9927c0a193f4010e.
2024/11/18 19:24:26 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:5000/#/experiments/472700017867975302.


Alpha: 0.7, L1 ratio: 0.7, RMSE: 0.8730


2024/11/18 19:24:27 INFO mlflow.tracking._tracking_service.client: 🏃 View run carefree-goose-772 at: http://127.0.0.1:5000/#/experiments/472700017867975302/runs/5cc81aaf824d44c1846bc7a3f10f8cdf.
2024/11/18 19:24:27 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:5000/#/experiments/472700017867975302.


Alpha: 0.7, L1 ratio: 1.0, RMSE: 0.8753


2024/11/18 19:24:29 INFO mlflow.tracking._tracking_service.client: 🏃 View run abrasive-bear-290 at: http://127.0.0.1:5000/#/experiments/472700017867975302/runs/f8b117e5fbcc4ed5acc415e502704a0f.
2024/11/18 19:24:29 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:5000/#/experiments/472700017867975302.


Alpha: 1.0, L1 ratio: 0.2, RMSE: 0.8194


2024/11/18 19:24:31 INFO mlflow.tracking._tracking_service.client: 🏃 View run languid-hawk-907 at: http://127.0.0.1:5000/#/experiments/472700017867975302/runs/a7f6ae0750894a1497514369c9558e32.
2024/11/18 19:24:31 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:5000/#/experiments/472700017867975302.


Alpha: 1.0, L1 ratio: 0.3, RMSE: 0.8362


2024/11/18 19:24:33 INFO mlflow.tracking._tracking_service.client: 🏃 View run secretive-fish-613 at: http://127.0.0.1:5000/#/experiments/472700017867975302/runs/c42bd308517a41f39ceae014e16f1ddb.
2024/11/18 19:24:33 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:5000/#/experiments/472700017867975302.


Alpha: 1.0, L1 ratio: 0.7, RMSE: 0.8753


2024/11/18 19:24:35 INFO mlflow.tracking._tracking_service.client: 🏃 View run likeable-toad-253 at: http://127.0.0.1:5000/#/experiments/472700017867975302/runs/76f45e9a09304314bc70890e5178ac54.
2024/11/18 19:24:35 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:5000/#/experiments/472700017867975302.


Alpha: 1.0, L1 ratio: 1.0, RMSE: 0.8792


In [88]:
# (4) 모델 서빙

# http://127.0.0.1:5000/#/experiments/472700017867975302/runs/76f45e9a09304314bc70890e5178ac54

# mlflow models serve -m ./mlartifacts/472700017867975302/76f45e9a09304314bc70890e5178ac54/artifacts/model -p 5002 --no-conda

SyntaxError: invalid syntax (2048475131.py, line 5)

In [25]:
import requests
import json

test_data = pd.read_csv('wine_test.csv', index_col=0)
input_data = test_data.drop(['quality'], axis=1)[:3]

data = {
    'dataframe_split': input_data.to_dict(orient='split')
}

url = 'http://127.0.0.1:5002/invocations'

headers = {'Content-Type': 'application/json'}
res = requests.post(url, headers=headers, data=json.dumps(data))

result = res.json()
print("result: ", result)


result:  {'predictions': [5.80301347737781, 5.828560670861267, 5.846929314109703]}
