#### Sự gia tăng dữ liệu xung quanh hành vi (behavior) và nhân khẩu học (demographics) của khách hàng đã mở ra rất nhiều tiềm năng cho các chiến lược tiếp thị kỹ thuật số (digital marketing strategies) sử dụng phân tích dự đoán (predictive analytics)
#### Cho dữ liệu Marketing-Customer-Value-Analysis.csv chứa thông tin khách hàng xung quanh việc bán bảo hiểm xe hơi. Nhiệm vụ là dự đoán liệu khách hàng có phản hồi cuộc gọi bán hàng hay không dựa trên dữ liệu nhân khẩu học và hành vi trong quá khứ của họ.
### Yêu cầu: đọc dữ liệu về, chuẩn hóa dữ liệu (nếu cần) và áp dụng thuật toán SVM để thực hiện việc dự đoán khách hàng response (1 hay 0) dựa trên thông tin được cung cấp
1. Đọc dữ liệu. Tiền xử lý dữ liệu nếu cần. Trực quan hóa dữ liệu.
2. Tạo X_train, X_test, y_train, y_test từ dữ liệu đọc được với tỷ lệ dữ liệu test là 0.2
3. Áp dụng thuật toán SVM
4. Tìm kết quả. Kiểm tra độ chính xác. Nhận xét model.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn import datasets
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [2]:
data = pd.read_csv("../../Data/Marketing-Customer-Value-Analysis.csv")

In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9134 entries, 0 to 9133
Data columns (total 24 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Customer                       9134 non-null   object 
 1   State                          9134 non-null   object 
 2   Customer Lifetime Value        9134 non-null   float64
 3   Response                       9134 non-null   object 
 4   Coverage                       9134 non-null   object 
 5   Education                      9134 non-null   object 
 6   Effective To Date              9134 non-null   object 
 7   EmploymentStatus               9134 non-null   object 
 8   Gender                         9134 non-null   object 
 9   Income                         9134 non-null   int64  
 10  Location Code                  9134 non-null   object 
 11  Marital Status                 9134 non-null   object 
 12  Monthly Premium Auto           9134 non-null   i

In [4]:
data.head()

Unnamed: 0,Customer,State,Customer Lifetime Value,Response,Coverage,Education,Effective To Date,EmploymentStatus,Gender,Income,...,Months Since Policy Inception,Number of Open Complaints,Number of Policies,Policy Type,Policy,Renew Offer Type,Sales Channel,Total Claim Amount,Vehicle Class,Vehicle Size
0,BU79786,Washington,2763.519279,No,Basic,Bachelor,2/24/11,Employed,F,56274,...,5,0,1,Corporate Auto,Corporate L3,Offer1,Agent,384.811147,Two-Door Car,Medsize
1,QZ44356,Arizona,6979.535903,No,Extended,Bachelor,1/31/11,Unemployed,F,0,...,42,0,8,Personal Auto,Personal L3,Offer3,Agent,1131.464935,Four-Door Car,Medsize
2,AI49188,Nevada,12887.43165,No,Premium,Bachelor,2/19/11,Employed,F,48767,...,38,0,2,Personal Auto,Personal L3,Offer1,Agent,566.472247,Two-Door Car,Medsize
3,WW63253,California,7645.861827,No,Basic,Bachelor,1/20/11,Unemployed,M,0,...,65,0,7,Corporate Auto,Corporate L2,Offer1,Call Center,529.881344,SUV,Medsize
4,HB64268,Washington,2813.692575,No,Basic,Bachelor,2/3/11,Employed,M,43836,...,44,0,1,Personal Auto,Personal L1,Offer1,Agent,138.130879,Four-Door Car,Medsize


In [5]:
data["Response"].value_counts()

No     7826
Yes    1308
Name: Response, dtype: int64

In [6]:
data["Response"] = data["Response"].apply(lambda x : 0 if x == 'No' else 1)

In [7]:
X = data.drop(['Customer','Effective To Date'], axis = 1)

In [8]:
y = data["Response"]

In [9]:
X.head()

Unnamed: 0,State,Customer Lifetime Value,Response,Coverage,Education,EmploymentStatus,Gender,Income,Location Code,Marital Status,...,Months Since Policy Inception,Number of Open Complaints,Number of Policies,Policy Type,Policy,Renew Offer Type,Sales Channel,Total Claim Amount,Vehicle Class,Vehicle Size
0,Washington,2763.519279,0,Basic,Bachelor,Employed,F,56274,Suburban,Married,...,5,0,1,Corporate Auto,Corporate L3,Offer1,Agent,384.811147,Two-Door Car,Medsize
1,Arizona,6979.535903,0,Extended,Bachelor,Unemployed,F,0,Suburban,Single,...,42,0,8,Personal Auto,Personal L3,Offer3,Agent,1131.464935,Four-Door Car,Medsize
2,Nevada,12887.43165,0,Premium,Bachelor,Employed,F,48767,Suburban,Married,...,38,0,2,Personal Auto,Personal L3,Offer1,Agent,566.472247,Two-Door Car,Medsize
3,California,7645.861827,0,Basic,Bachelor,Unemployed,M,0,Suburban,Married,...,65,0,7,Corporate Auto,Corporate L2,Offer1,Call Center,529.881344,SUV,Medsize
4,Washington,2813.692575,0,Basic,Bachelor,Employed,M,43836,Rural,Single,...,44,0,1,Personal Auto,Personal L1,Offer1,Agent,138.130879,Four-Door Car,Medsize


In [10]:
X = pd.get_dummies(X, drop_first=True)

In [11]:
X.head()

Unnamed: 0,Customer Lifetime Value,Response,Income,Monthly Premium Auto,Months Since Last Claim,Months Since Policy Inception,Number of Open Complaints,Number of Policies,Total Claim Amount,State_California,...,Sales Channel_Branch,Sales Channel_Call Center,Sales Channel_Web,Vehicle Class_Luxury Car,Vehicle Class_Luxury SUV,Vehicle Class_SUV,Vehicle Class_Sports Car,Vehicle Class_Two-Door Car,Vehicle Size_Medsize,Vehicle Size_Small
0,2763.519279,0,56274,69,32,5,0,1,384.811147,0,...,0,0,0,0,0,0,0,1,1,0
1,6979.535903,0,0,94,13,42,0,8,1131.464935,0,...,0,0,0,0,0,0,0,0,1,0
2,12887.43165,0,48767,108,18,38,0,2,566.472247,0,...,0,0,0,0,0,0,0,1,1,0
3,7645.861827,0,0,106,18,65,0,7,529.881344,1,...,0,1,0,0,0,1,0,0,1,0
4,2813.692575,0,43836,73,12,44,0,1,138.130879,0,...,0,0,0,0,0,0,0,0,1,0


In [12]:
y.head()

0    0
1    0
2    0
3    0
4    0
Name: Response, dtype: int64

In [13]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 42)

In [14]:
clf = svm.SVC()
clf.fit(X_train, y_train)

SVC()

In [15]:
y_pred = clf.predict(X_test)

In [16]:
y_pred

array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

In [17]:
print("Accuracy is ", accuracy_score(y_test,y_pred)*100,"%")

Accuracy is  85.44061302681992 %


In [18]:
# Kiểm tra độ chính xác
print("The Train Score is: ",
clf.score(X_train,y_train)*100,"%")
print("The Test Score is: ",
clf.score(X_test,y_test)*100,"%")

The Train Score is:  85.73970165594635 %
The Test Score is:  85.44061302681992 %


In [19]:
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

[[1561    0]
 [ 266    0]]
              precision    recall  f1-score   support

           0       0.85      1.00      0.92      1561
           1       0.00      0.00      0.00       266

    accuracy                           0.85      1827
   macro avg       0.43      0.50      0.46      1827
weighted avg       0.73      0.85      0.79      1827



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


##### Kết quả:
* R^2 của cả train và test đều cao và như nhau
* Precision và recall đều cao
* => Model phù hợp