### 回購預測 模型處理流程
目標:預測客戶在收到優惠券後,是否進行回購之機率
流程:將資料整理為以個人為單位,透過收到優惠券前的消費行為找出關鍵特徵,收到優惠券後是否有消費為目標特徵,將客戶資料分為訓練.測試集,再進行模型建立

消費者模型 RFM(Recency,Frequency,Monetary)　NES 模型
留存率.回購率.購買週期.消費分布

###### order.csv
* columns:GUID(訂單編號) seller_GUID(廠商ID) subtotal(訂單金額) is_returned(是否退貨)create_time(訂購日期) business_hour_guid(商品檔次ID) order_from_type(來源裝置) user_id(訂購人ID) installment(刷卡分期)
* 利用回購次數分為 order_new(新客戶)與 order_returned(舊客戶)
* 找出舊客戶與優惠券發放時間關聯(是否有受到優惠卷回購影響)
* 計算以客戶ID為單位的總購買量.總購買金額.總購買次數(平均購買量.平均購買金額).平均購買週期(NES分類),購買日期

###### discount_campaign.csv
* columns:id(優惠活動ID) name(優惠活動名稱) amount(優惠面額) qty(發放總量) total(優惠總額) start_time(活動開始時間) end_time(活動結束時間) cancel_time(活動取消時間) minimum_amount(訂單金額下限)	min_gross_margin(最小毛利率) is_discount_price_for_deal(是否在前台顯示優惠後價格) category_id(分類編號)
* 與discount_code合併
* 了解該優惠券是否為專一品項或是全品項式

###### discount_code.csv
* columns:id(優惠券ID) campaign_id(優惠活動ID) use_amount(使用金額) order_guid(訂單編號) order_amount(訂單金額) cancel_time(取消時間) order_cost(訂單扣除優惠金額) send_date(發放日期) use_id(使用者ID) owner(擁有者ID)
* 找出每個使用者ID收取優惠券的資訊


In [1]:
import numpy as np 
import pandas as pd

In [3]:
order=pd.read_csv("order.csv")

In [4]:
order.rename(columns={'Unnamed: 4':'is_returned'}, inplace=True)

In [5]:
order=order[:4080856]

In [8]:
order.user_id = order.user_id.astype('int')
order.user_id = order.user_id.astype('str')

In [9]:
order_new = order[order.groupby('user_id').user_id.transform(len) ==1]

In [10]:
order_returned = order[order.groupby('user_id').user_id.transform(len) > 1]

In [11]:
f"first-time purchase: {len(order_new.user_id.unique())} , repeated-purchase: {len(order_returned.user_id.unique())}"

'first-time purchase: 499105 , repeated-purchase: 467634'

In [14]:
order_returned=order_returned.sort_values(['user_id','create_time'])
order_returned.head(3)

Unnamed: 0,GUID,seller_GUID,seller_name,subtotal,is_returned,create_time,business_hour_guid,order_from_type,user_id,installment
191336,678A8B76-EAFF-4396-A7F7-1474A1941475,1750DF0D-4CC9-49FE-B2F1-ABB35BDBF4BC,17P商品-諾貝兒益智玩具租借專賣店,1797.0,0.0,2012-03-21 16:45:35.700,21BF6040-97F4-4AAF-8E9E-16E04054C557,1.0,1111111122,0.0
202222,4A049484-14BE-48E8-960B-EFDF16E4AB82,28AC9DA8-CC57-46A5-B21D-3BB3421D5131,17P商品-名隼企業社,249.0,0.0,2012-03-30 14:34:05.143,88B8BD5E-A663-4CD0-8A60-4199E8E2068E,1.0,1111111122,0.0
1660665,B8A06496-00E7-46BB-89BE-C2F8F43A0C68,29CF2911-9521-4590-9B70-603346D21FDF,【P玩美】摩娜卡諾日韓服飾行,349.0,0.0,2014-06-22 22:26:39.077,F95FFD2D-F88F-44BA-A1EF-1896C118500E,1.0,1111111127,0.0


In [13]:
order_new=order_new.sort_values(['user_id','create_time'])
order_new.head(3)

Unnamed: 0,GUID,seller_GUID,seller_name,subtotal,is_returned,create_time,business_hour_guid,order_from_type,user_id,installment
15241,F138A1CB-6029-45E5-845F-EAADEA6B3B83,4BDA1E37-3987-4119-990B-6DFAA949BC95,17P福利-膳魔師天鵝壺1000cc,299.0,1.0,2011-03-31 20:15:17.773,B9E0A83A-7366-4CFA-8BFE-D04360E7BEA6,1.0,1111111111,0.0
2601490,29A88CCB-14D1-4483-BE22-19765F5DA632,6DBCD583-716B-4AFC-8CE5-A97A093482FF,吉芙特有限公司,550.0,0.0,2015-12-14 17:24:24.087,0DE160A0-8180-48A8-8AC3-E0C2C45823B6,6.0,1111111114,0.0
5244,8EB48635-1B6A-4D69-94CC-07D7E7CA1A2C,6ABC7986-3907-4877-8C21-2ECBA33DD434,17P福利-【enegreen】充電式電暖蛋,4990.0,0.0,2010-11-26 16:04:41.610,C7067EA3-23D8-4987-9E34-39AC21388051,1.0,1111111123,0.0


In [15]:
order_returned.to_csv("order_returned.csv",index=False)
order_new.to_csv("order_new.csv",index=False)

In [24]:
#新客訂單金額
order_new.subtotal.describe()

count    499105.000000
mean       1126.289148
std        1630.086113
min           1.000000
25%         399.000000
50%         744.000000
75%        1199.000000
max      117300.000000
Name: subtotal, dtype: float64

In [25]:
#舊客訂單金額
order_returned.subtotal.describe()

count    3.581751e+06
mean     8.556358e+02
std      1.144889e+03
min      1.000000e+00
25%      3.880000e+02
50%      6.270000e+02
75%      9.900000e+02
max      2.845000e+05
Name: subtotal, dtype: float64

In [27]:
#return_times: 
return_times=pd.DataFrame({'user_id':order_returned.user_id.value_counts().index,'purchase_times':order_returned.user_id.value_counts().values})

In [40]:
discount_code=pd.read_csv("discount_code_0513v3.csv")

In [41]:
discount_code.shape

(12897581, 10)

In [42]:
discount_code[discount_code.use_id.isnull()==False].use_id = discount_code[discount_code.use_id.isnull()==False].use_id.astype('int')
discount_code.use_id = discount_code.use_id.astype('str')

In [44]:
discount_code.to_csv("discount_code_0513v3.csv")