## 모델 만들기

사이킷런 패키지 사용 간단한 logistic regression 모델 작성.

모델 작성 후 평가 기준을 통한 평가 수행

In [1]:
import pandas as pd
feats = pd.read_csv('../data/OSI_feats_e3.csv')
target = pd.read_csv('../data/OSI_target_e2.csv')

In [2]:
from sklearn.model_selection import train_test_split
test_size = 0.2
random_state = 42
X_train, X_test, y_train, y_test = train_test_split(feats, target, test_size=test_size, random_state=random_state)

In [3]:
print(f'Shape of X_train: {X_train.shape}')
print(f'Shape of y_train: {y_train.shape}')
print(f'Shape of X_test: {X_test.shape}')
print(f'Shape of y_test: {y_test.shape}')

Shape of X_train: (9864, 68)
Shape of y_train: (9864, 1)
Shape of X_test: (2466, 68)
Shape of y_test: (2466, 1)


In [4]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(random_state=42, max_iter=10000)
model.fit(X_train, y_train['Revenue'])

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=42, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

In [5]:
y_pred = model.predict(X_test)

In [6]:
from sklearn import metrics
accuracy = metrics.accuracy_score(y_pred=y_pred, y_true=y_test)
print(f'Accuracy of the model is {accuracy*100:.4f}%')

Accuracy of the model is 87.0641%


### 다른 평가 기준

precision

recall

f1-score

In [7]:
precision, recall, fscore, _ = metrics.precision_recall_fscore_support(y_pred=y_pred, y_true=y_test, average='binary')
print(f'Precision: {precision:.4f}\nRecall: {recall:.4f}\nfscore: {fscore:.4f}')

Precision: 0.7323
Recall: 0.3528
fscore: 0.4762


accuracy 는 높지만 recall 은 상대적으로 많이 낮다. 즉, true positive의 값이 많이 다를 것임.

### Feature 중요도 확인
   

계수(coefficients)를 살펴봄으로써 피처의 중요도를 확인 가능. 숫자가 높을수록 결과에 영향을 많이 끼침.

In [8]:
coef_list = [f'{feature}: {coef}' for coef, feature in sorted(zip(model.coef_[0], X_train.columns.values.tolist()))]
for item in coef_list:
    print(item)

TrafficType_13: -0.7917570288138118
VisitorType_Returning_Visitor: -0.7077067020108712
Month_Dec: -0.6632994871900317
OperatingSystems_3: -0.5594109873909672
TrafficType_3: -0.5426993596121811
Month_Mar: -0.5419129709525607
Region_9: -0.5100949128094056
ExitRates: -0.4780451929780065
Month_May: -0.4053209053980702
SpecialDay: -0.40045822770896283
BounceRates: -0.3525018284225123
Month_June: -0.34779526836568864
OperatingSystems_8: -0.28780037202647457
VisitorType_New_Visitor: -0.2800880364852062
Browser_6: -0.26835493662055737
TrafficType_1: -0.2338219575750508
Region_4: -0.2337791476492104
Browser_1: -0.22725918368211923
TrafficType_6: -0.21637111631465478
Browser_4: -0.21576994005846414
Region_7: -0.21201474985854196
Browser_3: -0.19287685638823338
Browser_13: -0.18168441394446655
OperatingSystems_2: -0.17132086375465683
OperatingSystems_4: -0.16792807742115978
Browser_2: -0.14192214937877964
OperatingSystems_1: -0.12906124698554883
Region_3: -0.10619817301840695
TrafficType_15: -0.0