Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python数据分析之性能度量 #80

Open
hsipeng opened this issue Aug 23, 2019 · 0 comments
Open

python数据分析之性能度量 #80

hsipeng opened this issue Aug 23, 2019 · 0 comments

Comments

@hsipeng
Copy link
Owner

hsipeng commented Aug 23, 2019

性能度量

为了了解模型的泛化能力, 我们需要用某个指标来衡量,这就是性能度量的意义

  • 混淆矩阵
    二分类的模型,把预测和实际情况所有结果两两混合,出现四种情况

TP: 预测为1, 预测正确,即实际1
FP: 预测为1, 预测错误,即实际0
FN: 预测为0, 预测错误,即实际1
TN: 预测为0, 预测正确,即实际0

  • 准确率
    准确率 = ( TP + TN)/( TP + TN + FP + FN)
    样本不均衡会导致准确率失效

  • 精准率
    精准率 = TP / ( TP + FP)

  • 召回率
    召回率 = TP / (TP + FN)

召回率越高,代表实际坏用户被预测出来的概率越高,宁可错杀一千,绝不放过一个

  • F1分数
    F1 分数 = 2 * 查准率 * 查全率 /(查准率 + 查全率)

ROC/AUC 的概念

  • 灵敏度
    = TP / (TP + FN)
  • 特异度
    = TN / (FP + TN)
  • 真正率(TPR)
    = 灵敏度 = TP / ( TP + FN)
  • 假正率(FPR)
    = 1- 特异度 = FP/ (FP + TN)

ROC 曲线

真正率和假正率

TPR 越高, 同时FPR 越低,即ROC 曲线越陡, 模型性能就越好

ROC 曲线无视样本不平衡

AUC 曲线下面积(Area Under Curve)

AUC 判断标准

  • 0.5 - 0.7 效果低
  • 0.7 - 0.85 一般
  • 0.85 - 0.95 很好
  • 0.95 - 1 非常好
# ROC/ AUC python 实现

from sklearn import metrics
from sklearn.metrics import auc
import numpy as np

y = np.array([1, 1, 2, 2])
scores = np.array([0.1, 0.4, 0.35, 0.8])
fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2)

metrics.auc(fpr, tpr)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant