# 对数几率回归

## 实验内容
1. 使用对数几率回归完成是否患有心脏病结果分类问题。
2. 计算十折交叉验证下的精度(accuracy)，查准率(precision)，查全率(recall)，F1值。

## 评测指标  
1. 精度
2. 查准率
3. 查全率
4. F1

# 1. 读取数据

In [1]:
import numpy as np

In [2]:
diseaseResults = np.loadtxt('data/heart.csv', delimiter=',')

# 2.导入模型

In [3]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
from sklearn.model_selection import cross_val_predict
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model.logistic import LogisticRegression
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.feature_extraction.text import TfidfVectorizer
from matplotlib.font_manager import FontProperties



# 3.提取数据

In [4]:
diseasex = diseaseResults[:, 0:-1]
diseasey = diseaseResults[:, -1]

# 4. 训练并预测

In [5]:
model = LogisticRegression()
prediction = cross_val_predict(model,diseasex,diseasey,cv = 10)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

# 5.评价指标的计算

In [6]:
acc1 = round(accuracy_score(diseasey,prediction),2)
precision1 = round(precision_score(diseasey,prediction),2)
recall1 = round(recall_score(diseasey,prediction),2)
f1 = round(f1_score(diseasey,prediction),2)

print("logistic回归在heart disease测试集上的四项指标")
print("精度:",acc1)
print("查准率:",precision1)
print("查全率:",recall1)
print("f1值:",f1)

logistic回归在heart disease测试集上的四项指标
精度: 0.83
查准率: 0.81
查全率: 0.89
f1值: 0.85


###### Logistics分类评价

数据集|精度|查准率|查全率|F1
-|-|-|-|-
diseaseResults | 0.83 | 0.81 | 0.89 | 0.85

# 6.超参数的调整

In [7]:
model = LogisticRegression(max_iter=100)
prediction = cross_val_predict(model,diseasex,diseasey,cv = 10)
acc1 = round(accuracy_score(diseasey,prediction),2)
precision1 = round(precision_score(diseasey,prediction),2)
recall1 = round(recall_score(diseasey,prediction),2)
f1 = round(f1_score(diseasey,prediction),2)

print("logistic回归在heart disease测试集上的四项指标")
print("精度:",acc1)
print("查准率:",precision1)
print("查全率:",recall1)
print("f1值:",f1)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

logistic回归在heart disease测试集上的四项指标
精度: 0.83
查准率: 0.81
查全率: 0.89
f1值: 0.85


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


###### Logistics分类评价（max_iter=100）

数据集|精度|查准率|查全率|F1
-|-|-|-|-
diseaseResults | 0.83 | 0.81 | 0.89 | 0.85

In [8]:
model = LogisticRegression(max_iter=10000)
prediction = cross_val_predict(model,diseasex,diseasey,cv = 10)
acc1 = round(accuracy_score(diseasey,prediction),2)
precision1 = round(precision_score(diseasey,prediction),2)
recall1 = round(recall_score(diseasey,prediction),2)
f1 = round(f1_score(diseasey,prediction),2)

print("logistic回归在heart disease测试集上的四项指标")
print("精度:",acc1)
print("查准率:",precision1)
print("查全率:",recall1)
print("f1值:",f1)

logistic回归在heart disease测试集上的四项指标
精度: 0.82
查准率: 0.81
查全率: 0.88
f1值: 0.84


###### Logistics分类评价（max_iter=10000）

数据集|精度|查准率|查全率|F1
-|-|-|-|-
diseaseResults | 0.82 | 0.81 | 0.88 | 0.84

**改变最大的迭代次数没有对结果产生影响**