# 概述
使用pdpbox工具包，进行探索性数据分析，分析不同特征与患心脏病之间的先验关系，并分析特征两两之间的影响关系
## 导入工具包，导入数据集

In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

df = pd.read_csv("./data/process_heart.csv")

## 导入pdpbox工具包

In [None]:
from pdpbox import pdp, get_dataset, info_plots

### 特征：性别
该特征统计分布，及患心脏病和不患心脏病类别分布图

In [None]:
fig, axes, summary_df = info_plots.target_plot(df=df, feature="sex_male", feature_name="gender", target=["target"])
_ = axes["bar_ax"].set_xticklabels(["Female", "Male"])

In [None]:
summary_df

### 特征：心脏周围大血管的个数
该特征统计分布，及患心脏病和不患心脏病类别分布图

In [None]:
fig, axes, summary_df = info_plots.target_plot(df=df, feature="num_major_vessels", feature_name="num_vessels", target=["target"])

In [None]:
summary_df

### 特征：地中海贫血症 - reservable defect
该特征统计分布，及患心脏病和不患心脏病类别分布图

In [None]:
fig, axes, summary_df = info_plots.target_plot(df=df, feature="thalassemia_reversable defect", feature_name="thalassemia_reversable defect", target=["target"])
_ = axes["bar_ax"].set_xticklabels(["Not Reversable Defect", "Reversable Defect"])

In [None]:
summary_df

### 特征：年龄
该特征统计分布，及患心脏病和不患心脏病类别分布图

In [None]:
fig, axes, summary_df = info_plots.target_plot(df=df, feature="age", feature_name="age", target=["target"])
# _ = axes["bar_ax"].set_xticklabels(["Not Reversable Defect", "Reversable Defect"])

In [None]:
fig, axes, summary_df = info_plots.target_plot(df=df, feature="age", feature_name="age", target=["target"])

In [None]:
summary_df

### 特征：最大心率
该特征统计分布，及患心脏病和不患心脏病类别分布图

In [None]:
fig, axes, summary_df = info_plots.target_plot(df=df, feature="max_heart_rate_achieved", feature_name="max_heart_rate_achieved", target=["target"])

In [None]:
summary_df

## 特征两两交互影响的分析
### 心脏周围大血管个数与最大心率

In [None]:
feat_name1 = "num_major_vessels"
nick_name1 = "num_vessels"
feat_name2 = "max_heart_rate_achieved"
nick_name2 = "max_heart_rate"

fig, axes, summary_df = info_plots.target_plot_interact(df=df, features=[feat_name1, feat_name2], feature_names=[nick_name1, nick_name2], target=["target"])
_ = axes["value_ax"].set_xticklabels(["0", "1", "2"])

### 年龄与最大心率

In [None]:
feat_name1 = "age"
nick_name1 = "age"
feat_name2 = "max_heart_rate_achieved"
nick_name2 = "max_heart_rate"

fig, axes, summary_df = info_plots.target_plot_interact(df=df, features=[feat_name1, feat_name2], feature_names=[nick_name1, nick_name2], target=["target"])