## Python绘制散点图查看BMI与保险费的关系

#### 散点图：  
* 用两组数据构成多个坐标点，考察坐标点的分布，判断两变量之间是否存在某种关联或总结坐标点的分布模式
* 散点图核心的价值在于发现变量之间的关系，然后进行预测分析，做出科学的决策

实例：医疗费用个人数据集中，"身体质量指数BMI"与"个人医疗费用"两者之间的关系

数据集原地址：https://www.kaggle.com/mirichoi0218/insurance/home

### 1. 读取保险费数据集

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("./datas/insurance/insurance.csv")

In [3]:
df.head(10)

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552
5,31,female,25.74,0,no,southeast,3756.6216
6,46,female,33.44,1,no,southeast,8240.5896
7,37,female,27.74,3,no,northwest,7281.5056
8,37,male,29.83,2,no,northeast,6406.4107
9,60,female,25.84,0,no,northwest,28923.13692


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1338 entries, 0 to 1337
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1338 non-null   int64  
 1   sex       1338 non-null   object 
 2   bmi       1338 non-null   float64
 3   children  1338 non-null   int64  
 4   smoker    1338 non-null   object 
 5   region    1338 non-null   object 
 6   charges   1338 non-null   float64
dtypes: float64(2), int64(2), object(3)
memory usage: 73.3+ KB


### 2. pyecharts绘制散点图

In [5]:
# 将数据按照bmi升序排列
df.sort_values(by="bmi", inplace=True)
df.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
172,18,male,15.96,0,no,northeast,1694.7964
428,21,female,16.815,1,no,northeast,3167.45585
1226,38,male,16.815,2,no,northeast,6640.54485
412,26,female,17.195,2,yes,northeast,14455.64405
1286,28,female,17.29,0,no,northeast,3732.6251


In [6]:
bmi = df["bmi"].to_list()
charges = df["charges"].to_list()

In [7]:
import pyecharts.options as opts
from pyecharts.charts import Scatter

In [8]:
scatter = (
    Scatter()
    .add_xaxis(
        xaxis_data=bmi
    )
    .add_yaxis( 
        series_name="", 
        y_axis=charges, 
        symbol_size=4,
        label_opts=opts.LabelOpts(is_show=False)
    )
    .set_global_opts(
        xaxis_opts=opts.AxisOpts(type_="value"),
        yaxis_opts=opts.AxisOpts(type_="value"),
        title_opts=opts.TitleOpts(title="(BMI-保险费)关系图", pos_left="center")
    )
)

In [9]:
scatter.render_notebook()