# F1 Score
## 精准率和召回率在不同的应用背景下被关注得程度不同
+ 股票预测(上升为1下降维0)时，我们更关注精准率，真实为0但是预测为1(FP)的结果很严重
+ 癌症预测(患癌为1不患为0)时，我们更关注召回率，真实为1但是预测为0(FN)的结果很严重
![癌症预测为例对比精准率和召回率](../02-精准率和召回率/images/癌症预测为例对比精准率和召回率.png)

## 为了能兼顾精准率和召回率两个指标，我们引入了F1 Score
> F1 Score是精准率precision和召回率recall的调和平均值

![F1_Score的计算公式](images/F1_Score的计算公式.png)

另一种形式为

![F1_Score的计算公式2](images/F1_Score的计算公式2.png)

两者的转化过程为

![F1_Score的计算公式3](images/F1_Score的计算公式3.png)

## 实现F1 Score

In [1]:
import numpy as np

In [2]:
def f1_score(precision, recall):
    try:
        return 2 * precision * recall /(precision + recall)
    except:
        return 0.0

In [3]:
precision = 0.5
recall = 0.5

In [4]:
f1_score(precision, recall) # 精准率和召回率接近时，f1差不多为均值

0.5

In [5]:
precision = 0.1 
recall = 0.9

In [6]:
f1_score(precision, recall)

0.18000000000000002

In [7]:
precision = 0.0
recall = 1.0

In [8]:
f1_score(precision, recall) # 一个为0结果就为0

0.0

## 使用传统的方法评价分类结果(即正确率)

In [9]:
from sklearn import datasets
digits = datasets.load_digits() # 加载手写数字识别数据集
X = digits.data
y = digits.target.copy()
# 多分类问题转换为二分类问题，即等于9和不等于9,数据比例大约是1:9,也就是说我们只要全认为是非9，按照传统计算正确率的方法我们也有90%的正确率
y[digits.target==9] = 1
y[digits.target!=9] = 0
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=666)

In [10]:
from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)
log_reg.score(X_test, y_test)



0.9755555555555555

## 使用F1 Score评价分类结果

In [11]:
y_predict = log_reg.predict(X_test)

In [12]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_predict) # 混淆矩阵

array([[403,   2],
       [  9,  36]], dtype=int64)

In [15]:
from sklearn.metrics import precision_score
pre_score = precision_score(y_test, y_predict) # 精准率

In [16]:
from sklearn.metrics import recall_score
rec_score = recall_score(y_test, y_predict) # 召回率

In [17]:
f1_score(pre_score, rec_score) # f1_score

0.8674698795180723

## 使用sklearn中的自带的F1 Score

In [18]:
from sklearn.metrics import f1_score
f1_score(y_test, y_predict) # 和上面自己计算地一样

0.8674698795180723