### Coupon使用预测

- 数据说明

| Column                     | definition                 |
| -------------------------- | -------------------------- |
| ID                         | 记录编码                   |
| age                        | 年龄                       |
| job                        | 职业                       |
| marital                    | 婚姻状态                   |
| default                    | 花呗是否有违约             |
| returned                   | 是否有过退货               |
| loan                       | 是否使用花呗结账           |
| coupon_used_in_last6_month | 过去六个月使用的优惠券数量 |
| coupon_used_in_last_month  | 过去一个月使用的优惠券数量 |
| coupon_ind                 | 该次活动中是否有使用优惠券 |

### 提示

- 数据预处理

  - 类别型变量，转换成数值型，字段可以改名
  - coupon = pd.get_dummies(coupon)  #哑变量矩阵
  - coupon.drop([],axis = 1,inplace = True) #删除无用字段
  - coupon = coupon.rename(columns = {’列名‘:'值‘}) #字段改名

- 特征选择

  - 观察样本平衡性
    - coupon.flag.value_counts()
  - 观察目标值在0和1取值时自变量的差别
  - 通过相关系数
    - coupon.corr()[['flag']].sort_values('flag',ascending =False)
  - 可视化
    - sns.countplot(y='',hue='',data =coupon) 计数柱状图 通过修改y 查看不同特征在不同类别下的分布情况

- 建立模型

  - 数据分成训练集测试集

    ```python
    from sklearn.model_selection import train_test_split
    x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=100)
    ```

  - 使用sklearn 创建逻辑回归模型

    ```python
    from sklearn import linear_model
    lr=linear_model.LogisticRegression()
    lr.fit(x_train,y_train)
    ```

  - 模型评估

    - 准确率

    ```python
    y_pred_test=lr.predict(x_test)
    import sklearn.metrics as metrics
    metrics.accuracy_score(y_test,y_pred_test)
    ```

    - AUC

    ```python
    from sklearn.metrics import roc_curve,auc
    fpr,tpr,threshold=roc_curve(y_train,y_pred_train)
    roc_auc=auc(fpr,tpr)
    ```

    

- 业务解读

  - 根据模型的系数找到最重要的因素

    ```python
    lr.intercept_
    lr.coef_
    ```

    

In [1]:
import pandas as pd
coupon = pd.read_csv("./coupon.csv")

In [2]:
coupon.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25317 entries, 0 to 25316
Data columns (total 10 columns):
ID                            25317 non-null int64
age                           25317 non-null int64
job                           25317 non-null object
marital                       25317 non-null object
default                       25317 non-null object
returned                      25317 non-null object
loan                          25317 non-null object
coupon_used_in_last6_month    25317 non-null int64
coupon_used_in_last_month     25317 non-null int64
coupon_ind                    25317 non-null int64
dtypes: int64(5), object(5)
memory usage: 1.9+ MB


In [3]:
coupon.describe()

Unnamed: 0,ID,age,coupon_used_in_last6_month,coupon_used_in_last_month,coupon_ind
count,25317.0,25317.0,25317.0,25317.0,25317.0
mean,12659.0,40.935379,2.77205,0.292847,0.116957
std,7308.532719,10.634289,3.136097,0.765498,0.321375
min,1.0,18.0,1.0,0.0,0.0
25%,6330.0,33.0,1.0,0.0,0.0
50%,12659.0,39.0,2.0,0.0,0.0
75%,18988.0,48.0,3.0,0.0,0.0
max,25317.0,95.0,55.0,15.0,1.0


In [5]:
coupon.shape

(25317, 10)

In [6]:
coupon.head()

Unnamed: 0,ID,age,job,marital,default,returned,loan,coupon_used_in_last6_month,coupon_used_in_last_month,coupon_ind
0,1,43,management,married,no,yes,no,2,0,0
1,2,42,technician,divorced,no,yes,no,1,1,0
2,3,47,admin.,married,no,yes,yes,2,0,0
3,4,28,management,single,no,yes,yes,2,0,0
4,5,42,technician,divorced,no,yes,no,5,0,0


In [7]:
coupon = pd.get_dummies(coupon) #哑变量矩阵

In [8]:
coupon.drop([],axis = 1,inplace = True) #删除无用字段

In [9]:
coupon.head()

Unnamed: 0,ID,age,coupon_used_in_last6_month,coupon_used_in_last_month,coupon_ind,job_admin.,job_blue-collar,job_entrepreneur,job_housemaid,job_management,...,job_unknown,marital_divorced,marital_married,marital_single,default_no,default_yes,returned_no,returned_yes,loan_no,loan_yes
0,1,43,2,0,0,0,0,0,0,1,...,0,0,1,0,1,0,0,1,1,0
1,2,42,1,1,0,0,0,0,0,0,...,0,1,0,0,1,0,0,1,1,0
2,3,47,2,0,0,1,0,0,0,0,...,0,0,1,0,1,0,0,1,0,1
3,4,28,2,0,0,0,0,0,0,1,...,0,0,0,1,1,0,0,1,0,1
4,5,42,5,0,0,0,0,0,0,0,...,0,1,0,0,1,0,0,1,1,0


In [None]:
coupon = coupon.rename(columns = {:}) #字段改名

In [14]:
coupon.corr()[['age']].sort_values('coupon_used_in_last_month',ascending =False)

KeyError: 'coupon_used_in_last_month'