# F1 Score
`#f1_score`
- 정밀도와 재현율이 고르게 높은지 확인하는 지표
- 고를수록 높은 값을 갖는다.

<img alt="f1-score" src="./images/f1-score.png" width="400">


In [1]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score
from sklearn.linear_model import LogisticRegression
import pandas as pd

In [2]:
from modules.classifier import MyDummyClassifier
from modules.classifier import MyFakeClassifier
from modules.preprocessing import transform_features

### 데이터셋 준비 & 예측

In [3]:
titanic_df = pd.read_csv('../../datasets/titanic/train.csv')
y_titanic_df = titanic_df['Survived']
X_titanic_df = titanic_df.drop('Survived', axis=1)
X_titanic_df = transform_features(X_titanic_df)
X_train, X_test, y_train, y_test = train_test_split(X_titanic_df, y_titanic_df,
                                                   test_size=0.2, random_state=0)

lr_clf = LogisticRegression(max_iter=500)
lr_clf.fit(X_train, y_train)
pred = lr_clf.predict(X_test)

### f1 스코어 구하기

In [4]:
from sklearn.metrics import f1_score

In [5]:
f1 = f1_score(y_test, pred)
print("F1 스코어: {0:.4f}".format(f1))

F1 스코어: 0.7571


### 평가지표 모아보기

In [6]:
def get_clf_eval(y_test, pred):
    confusion = confusion_matrix(y_test, pred)
    accuracy = accuracy_score(y_test, pred)
    precision = precision_score(y_test, pred)
    recall = recall_score(y_test, pred)
    f1 = f1_score(y_test, pred)
    print('오차행렬')
    print(confusion)
    print('정확도: {0:.4f}, 정밀도: {1:.4f}, 재현율: {2:.4f}, F1: {3:.4f}'
         .format(accuracy, precision, recall, f1))

In [7]:
from sklearn.preprocessing import Binarizer
def get_eval_by_threshold(y_test, pred_proba_c1, thresholds):
    for t in thresholds:
        binarizer = Binarizer(threshold=t).fit(pred_proba_c1)
        predict = binarizer.transform(pred_proba_c1)
        print("임계값:", t)
        get_clf_eval(y_test, predict)

In [8]:
thresholds = [0.4, 0.45, 0.50, 0.55, 0.60]
pred_proba = lr_clf.predict_proba(X_test)
get_eval_by_threshold(y_test, pred_proba[:, 1].reshape(-1, 1), thresholds)

임계값: 0.4
오차행렬
[[86 24]
 [13 56]]
정확도: 0.7933, 정밀도: 0.7000, 재현율: 0.8116, F1: 0.7517
임계값: 0.45
오차행렬
[[91 19]
 [14 55]]
정확도: 0.8156, 정밀도: 0.7432, 재현율: 0.7971, F1: 0.7692
임계값: 0.5
오차행렬
[[92 18]
 [16 53]]
정확도: 0.8101, 정밀도: 0.7465, 재현율: 0.7681, F1: 0.7571
임계값: 0.55
오차행렬
[[98 12]
 [18 51]]
정확도: 0.8324, 정밀도: 0.8095, 재현율: 0.7391, F1: 0.7727
임계값: 0.6
오차행렬
[[99 11]
 [25 44]]
정확도: 0.7989, 정밀도: 0.8000, 재현율: 0.6377, F1: 0.7097
