### Background
As smart phone penetration reaches the hundreds of millions mark, O2O (Online to Offline) requires businesses to have a strong presence both offline and online. APPs with O2O capabilities accumulate daily consumer behaviour and location data that require big data and commercial operations management. The competition at hand focuses on coupon redemption rates. Sending coupons is a general O2O marketing tool used to activate existing customers and attract new ones. While customers are happy to receive coupons that they want, they are frustrated when receiving coupons that they do not need. For merchants, sending unwanted coupons may erode brand equity and hinder marketing expense forecasting. Targeted marketing is an important technology to increase the coupon redemption rate, providing relevant discounts to customers and effective marketing tools to businesses. The competition provides participants with abundant O2O data in this field and expects contestants to predict whether the customer will use the coupon within a specified time frame.
### Data
This competition provides real online and offline user consumption data from January 1, 2016 to June 30, 2016. The contestants are expected to predict the probability of customers redeeming a coupon within 15 days of receiving it.
Note: To protect the privacy of users and merchants, data is desensitized and under biased sampling.
### Evaluation
The results are evaluated based on the average AUC value. That is, the AUC value is calculated for every coupon_id. The average of each AUC value is the evaluation score. More information on AUC value calculation method on wikipedia.


In [1]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import math

### Online and Offline Training data

In [2]:
df_on = pd.read_csv('ccf_online_stage1_train.csv')
df_off = pd.read_csv('ccf_offline_stage1_train.csv')

FileNotFoundError: [Errno 2] File b'ccf_online_stage1_train.csv' does not exist: b'ccf_online_stage1_train.csv'

In [None]:
print("Online Training Data Sample\nShape:"+str(df_on.shape))
df_on.head()

In [None]:
print("Offline Training Data Sample\nShape:"+str(df_off.shape))
df_off.head()

### Test Data (Offline)

In [None]:
df_test = pd.read_csv('ccf_offline_stage1_test_revised.csv')
print("Testing Data(Offline) Sample\nShape:"+str(df_test.shape))
df_test.head()

#### Converting Date to DateTime format

In [None]:
#Online Training Data
df_on['Date'] = pd.to_datetime(df_on["Date"],format='%Y%m%d')
df_on['Date_received'] = pd.to_datetime(df_on["Date_received"],format='%Y%m%d')

#Offline Training Data
df_off['Date'] = pd.to_datetime(df_off["Date"],format='%Y%m%d')
df_off['Date_received'] = pd.to_datetime(df_off["Date_received"],format='%Y%m%d')

#Testing Data
df_test['Date_received'] = pd.to_datetime(df_test["Date_received"],format='%Y%m%d')

### Removing Duplicates from Online and Offline Training Data

In [None]:
#Removing duplicates and giving frequency counts(Count) to each row

#Online
x = 'g8h.|$hTdo+jC9^@'    
df_on_unique = (df_on.fillna(x).groupby(['User_id', 'Merchant_id', 'Action', 'Coupon_id', 'Discount_rate',
       'Date_received', 'Date']).size().reset_index()
               .rename(columns={0 : 'Count'}).replace(x,np.NaN))
df_on_unique["Date_received"]=pd.to_datetime(df_on_unique["Date_received"])
df_on_unique["Date"]=pd.to_datetime(df_on_unique["Date"])

print("Online Training Data Shape:"+str(df_on_unique.shape))

In [None]:
#Offline
x = 'g8h.|$hTdo+jC9^@'   #garbage value for nan values 
df_off_unique = (df_off.fillna(x).groupby(['User_id', 'Merchant_id', 'Coupon_id', 'Discount_rate', 'Distance',
       'Date_received', 'Date']).size().reset_index()
               .rename(columns={0 : 'Count'}).replace(x,np.NaN))
df_off_unique["Date_received"]=pd.to_datetime(df_off_unique["Date_received"])
df_off_unique["Date"]=pd.to_datetime(df_off_unique["Date"])

print("Offline Training Data Shape:"+str(df_off_unique.shape))

### Converting Discount Ratio to Rate

In [None]:
#Funtion to convert discount ratio to discount rate
def convert_discount(discount):
    values = []
    for i in discount:
        if ':' in i:
            i = i.split(':')
            rate = round((int(i[0]) - int(i[1]))/int(i[0]),3)
            values.append([int(i[0]),int(i[1]),rate])
        elif '.' in i:
            i = float(i)
            x = 100*i
            values.append([100,int(100-x),i])
            
    discounts = dict(zip(discount,values))      
    return discounts
    

# convert_discount(list(df_of['Discount_rate']))

In [None]:
#ONLINE DATA
df_on_coupon = df_on_unique[(df_on_unique['Coupon_id'].isna()==False) & (df_on_unique['Coupon_id']!='fixed')]
discounts_online = list(df_on_coupon['Discount_rate'].unique())
df_on_coupon.loc[:,('Discount')] = df_on_coupon.loc[:,('Discount_rate')] 
df_on_coupon.loc[:,('Discount_rate')] = df_on_coupon.loc[:,('Discount')].map(convert_discount(discounts_online))
df_on_coupon[['Original_price','Discounted_price','Rate']] = pd.DataFrame(df_on_coupon.Discount_rate.values.tolist(), index= df_on_coupon.index)
df_on_coupon.head()

In [None]:
#OFFLINE DATA
df_off_coupon = df_off_unique[(df_off_unique['Coupon_id'].isna()==False)].copy()
discounts_offline = list(df_off_coupon['Discount_rate'].unique())
df_off_coupon.loc[:,('Discount')] = df_off_coupon.loc[:,('Discount_rate')] 
df_off_coupon['Discount_rate'] = df_off_coupon['Discount'].map(convert_discount(discounts_offline))
df_off_coupon[['Original_price','Discounted_price','Rate']] = pd.DataFrame(df_off_coupon.Discount_rate.values.tolist(), index= df_off_coupon.index)
df_off_coupon.head()

In [None]:
#Test Data
df_test_coupon = df_test[df_test['Coupon_id'].isna()==False]
discounts_test = list(df_test_coupon['Discount_rate'].unique())
df_test_coupon.loc[:,('Discount')] = df_test_coupon.loc[:,('Discount_rate')] 
df_test_coupon['Discount_rate'] = df_test_coupon['Discount'].map(convert_discount(discounts_test))
df_test_coupon[['Original_price','Discounted_price','Rate']] = pd.DataFrame(df_test_coupon.Discount_rate.values.tolist(), index= df_test_coupon.index)
df_test_coupon.head()

#### Filling Nan for Distance (OFFLINE)

In [None]:
df_off_unique['Distance'].fillna(df_off_unique['Distance'].mean(), inplace=True)
df_off_unique['Distance'] = df_off_unique.Distance.astype(int)

### Training Data (Online + Offline)

In [None]:
df_train = df_on_unique.append(df_off_unique, sort=False)
df_train = df_train.sort_values(by = ['User_id'] )
df_train = df_train.reset_index()
del df_train['index']
print("Training Data(Offline+Online) \nShape:"+str(df_train.shape))
df_train.head()

## User Anlaysis

### Distributing users into three categores: 
1. users getting coupon
2. users making purchases without coupon
3. users making purchases with coupon

In [None]:
#Online
df_on_get_coupon = df_on_unique[df_on_unique['Action']==2]
df_on_no_coupon = df_on_unique[df_on_unique['Coupon_id'].isna()]
df_on_redeem_coupon = df_on_unique[(df_on_unique['Date'].isna()==False) & (df_on_unique['Coupon_id'].isna()==False)]
print('ONLINE: Shape of Get Coupon'+ str(df_on_get_coupon.shape))
print('ONLINE: Shape of No Coupon'+ str(df_on_no_coupon.shape))
print('ONLINE: Shape of Redeem Coupon'+ str(df_on_redeem_coupon.shape))

#Offline
df_off_get_coupon = df_off_unique[(df_off_unique['Date'].isna()) & (df_off_unique['Coupon_id'].isna()==False)]
df_off_no_coupon = df_off_unique[df_off_unique['Coupon_id'].isna()]
df_off_redeem_coupon = df_off_unique[(df_off_unique['Date'].isna()==False) & (df_off_unique['Coupon_id'].isna()==False)]
print('\nOFFLINE: Shape of Get Coupon'+ str(df_off_get_coupon.shape))
print('OFFLINE: Shape of No Coupon'+ str(df_off_no_coupon.shape))
print('OFFLINE: Shape of Redeem Coupon'+ str(df_off_redeem_coupon.shape))


#Complete Traininig Data
df_train_get_coupon = df_train[(df_train['Date'].isna()) & (df_train['Coupon_id'].isna()==False)]
df_train_no_coupon = df_train[df_train['Coupon_id'].isna()]
df_train_redeem_coupon = df_train[(df_train['Date'].isna()==False) & (df_train['Coupon_id'].isna()==False)]
print('\nONLINE+OFFLINE: Shape of Get Coupon'+ str(df_train_get_coupon.shape))
print('ONLINE+OFFLINE: Shape of No Coupon'+ str(df_train_no_coupon.shape))
print('ONLINE+OFFLINE: Shape of Redeem Coupon'+ str(df_train_redeem_coupon.shape))
df_train_coupon = df_on_coupon.append(df_off_coupon, sort=False)

### User : Online, Offline or Common(Online+Offline) Tag
        0: Common User
        1: Only Offline
        2: Only Online
      
        

In [None]:
users_on = set(df_on["User_id"].unique())             #number of users in online data
users_off = set(df_off["User_id"].unique())           #number of users in offline data
users_test = set(df_test["User_id"].unique())         #number of users in test data
common_users = set(users_off.intersection(users_on))  #number of users having both online and offline presence
online_users = list(users_on - common_users)
offline_users = list(users_off - common_users)
common_users = list(common_users)
print('Count of only Online Users:  '+ str(len(online_users)))
print('Count of only Offline Users: '+ str(len(offline_users)))
print('Count of Common Users:       '+ str(len(common_users)))

In [None]:
common_tags = [0 for _ in range(len(common_users))]
offline_tags = [1 for _ in range(len(offline_users))]
online_tags = [2 for _ in range(len(online_users))]

#Common Users DataFrame
tag_0 = pd.DataFrame(
    {'Users': common_users,
     'Tag': common_tags
    })

#Offline Users DataFrame
tag_1 = pd.DataFrame(
    {'Users': offline_users,
     'Tag': offline_tags
    })

#Online Users DataFrame
tag_2 = pd.DataFrame(
    {'Users': online_users,
     'Tag': online_tags
    })

user_tag = tag_0.append(tag_1, sort=False)
user_tag = user_tag.append(tag_2, sort=False)
user_tag.sample(5)

### User Redemption Score 

In [None]:
#Users in training Dataset
user_redemption_train = df_train_coupon.groupby(['User_id'])['Coupon_id','Date'].count()
user_redemption_train.columns = ['User_Released', 'User_Redeemed']
user_redemption_train['User_Ratio'] = round(user_redemption_train['User_Redeemed']/user_redemption_train['User_Released'],2)
user_redemption_train[user_redemption_train['User_Ratio']!=0].sample(5)

In [None]:
plt.figure(figsize=(8,5))
sns.distplot(user_redemption_train[user_redemption_train['User_Ratio']!=0]['User_Ratio'],kde=False,bins=26)
plt.xlabel('User Redemption Ratio')
plt.ylabel('Count of Users')
plt.title('User Redemption Score Distribution')
plt.show()

### Users and their Merchant Preferences

In [None]:
#OFFLINE
visits_offline = pd.DataFrame(df_off_unique.groupby(['User_id','Merchant_id']).size()).reset_index()
visits_offline.columns = ['User_id','Merchant_id','Visits']
visits_offline.head()

In [None]:
plt.figure(figsize=(15,10))
ax = sns.countplot(visits_offline['Visits'])
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
i = 4
for p in ax.patches:
        ax.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
        i-=1
        if i <0:
            break
plt.xlabel('Number of User-Merchant Visits')
plt.ylabel('Count')
plt.title('Plot for frequency of user-merchant visits (OFFLINE)')
plt.show()

In [None]:
#ONLINE
visits_online = pd.DataFrame(df_on_unique.groupby(['User_id','Merchant_id']).size()).reset_index()
visits_online.columns = ['User_id','Merchant_id','Visits']
visits_online.head()

In [None]:
plt.figure(figsize=(15,10))
ax = sns.countplot(visits_online['Visits'])
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
i = 3
for p in ax.patches:
        ax.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
        i-=1
        if i <0:
            break
plt.xlabel('Number of User-Merchant Visits')
plt.ylabel('Count')
plt.title('Plot for frequency of user-merchant visits (ONLINE)')
plt.show()

#### For offline:
Around 69% of times the frequency of user-merchant pair is 1.<br>
#### For online:
Around 79.4% of times the frequency of user-merchant pair is 1.<br>
So, users don't prefer any certain set of merchants.

### Users as Purchasers and Non-Purchasers
Purchasers (Number of buys >= 5)<br>
Non Purchasers (Number of buys < 5)

In [None]:
user_purchasers = pd.DataFrame(df_train.groupby(['User_id'])['Date'].count())
user_purchasers.columns = ['User_Buys']
user_purchasers['User_Buys'].describe()

In [None]:
user_purchasers['Purchaser'] = [1 if x>=6 else 0 for x in user_purchasers['User_Buys']] 
user_purchasers.sample(5)

### For any user-merchant pair, the distance should remain constant (Offline)

In [None]:
user_merchant_distance = pd.DataFrame(df_off_unique.groupby(['User_id','Merchant_id'])['Distance'].nunique()).reset_index()
user_merchant_distance['Distance'].unique()

Unique distance values for a user-merchant pair are 0 (for nan distance value) and 1.<br>
This shows for any user-merchant pair, the distance value remains constant.

### Common Users: Online and Offline visits

In [None]:
df_train = df_train.merge(user_tag['Tag'],how='outer', left_on='User_id', right_on=user_tag['Users'])

In [None]:
common_users = df_train[df_train['Tag']==0]
common_users_activity = common_users.groupby(['User_id'])['Action','Distance'].count()
common_users_activity.columns = ['Online_Activity','Offline_Activity']
common_users_activity.sample(2)

### User tracking (Online click to Offline buy)

In [None]:
common_users['Action'].fillna(3, inplace=True)
common_users.head()

In [None]:
common_users.loc[:,('DateTrack')] = common_users.loc[:,('Date')]
common_users.DateTrack.fillna(common_users.Date_received, inplace=True)
common_users['Action'] = common_users['Action'].astype(str)
common_users.head()

In [None]:
common_users = common_users.sort_values(by=['User_id','DateTrack'])
common_users.head()

In [None]:
common_user_activity = common_users.groupby(['User_id'])['Action'].apply(list).reset_index(name='ActivityList')
common_user_activity.head()


In [None]:
common_user_activity.loc[:,('ActivityList')] = common_user_activity.loc[:,('ActivityList')] .apply(lambda x: ''.join(x))
common_user_activity.loc[:,('OnlineToOffline')] = [1 if re.search('\d*0\d*3\d*',a) else 0 for a in common_user_activity['ActivityList']]
common_user_activity.head()

In [None]:
common_user_activity[common_user_activity['OnlineToOffline']==1].shape[0]/common_user_activity.shape[0]

In [None]:
onlineToOffline([0,1,2,3,4,5,6])  

## Merchant Anlaysis

 ### Merchant Redemption Score 

In [None]:
#Merchants in offline Dataset
merchant_redemption_offline = df_off_coupon.groupby(['Merchant_id'])['Coupon_id','Date'].count()
merchant_redemption_offline.columns = ['Merchant_Released', 'Merchant_Redeemed']
merchant_redemption_offline['Merchant_Ratio'] = round(merchant_redemption_offline['Merchant_Redeemed']/merchant_redemption_offline['Merchant_Released'],2)
merchant_redemption_offline.sample(5)

In [None]:
plt.figure(figsize=(8,5))
sns.distplot(merchant_redemption_offline['Merchant_Ratio'],kde=False,bins=26)
plt.xlabel('Merchant Redemption Ratio')
plt.ylabel('Count of Merchants')
plt.title('Merchants Redemption Score Distribution (OFFLINE)')
plt.show()

In [None]:
#Merchants in online Dataset
merchant_redemption_online = df_on_coupon.groupby(['Merchant_id'])['Coupon_id','Date'].count()
merchant_redemption_online.columns = ['Merchant_Released', 'Merchant_Redeemed']
merchant_redemption_online['Merchant_Ratio'] = round(merchant_redemption_online['Merchant_Redeemed']/merchant_redemption_online['Merchant_Released'],2)
merchant_redemption_online.sample(5)

In [None]:
plt.figure(figsize=(8,5))
sns.distplot(merchant_redemption_online['Merchant_Ratio'],kde=False,bins=26)
plt.xlabel('Merchant Redemption Ratio')
plt.ylabel('Count of Merchants')
plt.title('Merchants Redemption Score Distribution (ONLINE)')

plt.show()

### Merchant and Average Distance of its customers (OFFLINE)

In [None]:
merchant_distance = df_off_unique.groupby(['Merchant_id'])['Distance'].agg(['mean','count'])
merchant_distance.columns = ['AvgDistance','Count']
merchant_distance.head() 

In [None]:
plt.figure(figsize=(8,5))
ax = sns.distplot(merchant_distance['AvgDistance'],kde=False,bins=26)
plt.xlabel('AvgDistance of Users')
plt.ylabel('Count of Merchants')
plt.title('Distribution of Average Distance of customers (OFFLINE)')
for p in ax.patches:
    ax.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.show()

### Merchant and its Popularity  (OFFLINE)
(based on number of visits of its customers)<br>
If visits > 40:
    then Merchant is Popular

In [None]:
df_off_purchase = df_off_unique[df_off_unique['Date'].isna()==False]
merchant_visits_off = pd.DataFrame(df_off_purchase.groupby(['Merchant_id','User_id','Date'])['Distance'].count()).reset_index()

merchant_visits_offline = pd.DataFrame(merchant_visits_off.groupby(['Merchant_id'])['Date'].count()).reset_index()

merchant_visits_offline = merchant_visits_offline.rename(columns={"Date": "Visits"})
merchant_visits_offline.head()

In [None]:
merchant_visits_offline['Visits'].describe()

In [None]:
merchant_visits_offline['Merchant_Popular'] = [1 if x>40 else 0 for x in merchant_visits_offline['Visits']]
merchant_visits_offline.sample(5)

### Merchants and its Active duration

In [None]:
merchant_duration = df_off_unique.copy()
merchant_duration['DateTrack'] = merchant_duration['Date']
merchant_duration.DateTrack.fillna(merchant_duration.Date_received, inplace=True)
merchant_duration.head()

In [None]:
merchant_duration_days = merchant_duration.groupby(['Merchant_id'])['DateTrack'].agg(['min','max'])
merchant_duration_days['Duration'] = merchant_duration_days['max'] - merchant_duration_days['min']
merchant_duration_days.head()

In [None]:
merchant_duration_days['Duration'] = merchant_duration_days['Duration'].dt.days.astype('str')
merchant_duration_days['Duration'] = pd.to_numeric(merchant_duration_days['Duration'],errors="coerce")
merchant_duration_days.head()

In [None]:
plt.figure(figsize=(8,5))
ax = sns.distplot(merchant_duration_days['Duration'],kde=False,bins=26)
plt.xlabel('Active Duration of Merchants(days)')
plt.ylabel('Count of Merchants')
plt.title('Merchant and their Duration time(days)')
for p in ax.patches:
    ax.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))

### Avg discount each merchant offers

In [None]:
#avg discounts each merchant offers
merchant_discounts_avg = pd.DataFrame(df_off_coupon.groupby(['Merchant_id'])['Rate'].mean())
merchant_discounts_avg.columns = ['AvgRate']
merchant_discounts_avg = merchant_discounts_avg.reset_index()
merchant_discounts_avg.head()

In [None]:
plt.figure(figsize=(8,5))
sns.distplot(merchant_discounts_avg['AvgRate'],kde=False,bins=20)
plt.xlabel('Mean Discount Rate')
plt.ylabel('Count of Merchants')
plt.title('Merchant and Average Discount it offers')

## DISCOUNT ANALYSIS

In [None]:
# Coupons Released and redeemed and Discount Rate(OFFLINE)
fig,(ax1,ax2) = plt.subplots(nrows=1,ncols=2,figsize=(20,7))
plt.subplot(121)
ax1 = sns.countplot(df_off_coupon['Rate'])
ax1.set_xticklabels(ax1.get_xticklabels(),rotation=90)
for p in ax1.patches:
        ax1.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))

plt.xlabel('Discount Rate')
plt.ylabel('Count of Coupons released')
plt.title('Number of coupons released for each discount rate(OFFLINE)')

plt.subplot(122)
df_off_redeem_coupon= df_off_coupon[df_off_coupon['Date'].isna()==False]

ax2 = sns.countplot(df_off_redeem_coupon['Rate'])
ax2.set_xticklabels(ax2.get_xticklabels(),rotation=90)
for p in ax2.patches:
        ax2.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
        
plt.xlabel('Discount Rate')
plt.ylabel('Count of Coupons redeemed')
plt.title('Number of coupons redeemed for each discount rate(OFFLINE)')

plt.show()

In [None]:
# Coupons Released and redeemed and Discount Rate(ONLINE)
fig,(ax1,ax2) = plt.subplots(nrows=1,ncols=2,figsize=(20,7))
plt.subplot(121)
ax1 = sns.countplot(df_on_coupon['Rate'])
ax1.set_xticklabels(ax1.get_xticklabels(),rotation=90)
for p in ax1.patches:
        ax1.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))

plt.xlabel('Discount Rate')
plt.ylabel('Count of Coupons released')
plt.title('Number of coupons released for each discount rate(ONLINE)')

plt.subplot(122)
df_on_redeem_coupon= df_on_coupon[df_on_coupon['Date'].isna()==False]

ax2 = sns.countplot(df_on_redeem_coupon['Rate'])
ax2.set_xticklabels(ax2.get_xticklabels(),rotation=90)
for p in ax2.patches:
        ax2.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
        
plt.xlabel('Discount Rate')
plt.ylabel('Count of Coupons redeemed')
plt.title('Number of coupons redeemed for each discount rate(ONLINE)')

plt.show()

In [None]:
discount_redemption =  df_train_coupon.groupby(['Rate'])['Coupon_id','Date'].count()
discount_redemption.columns = ['Rate_Releases','Rate_Redeemed']
discount_redemption['Rate_Ratio'] = discount_redemption['Rate_Redeemed']/discount_redemption['Rate_Releases']
discount_redemption.head()

## Coupon Analysis

### Merchant is constant for a particular Coupon ID

In [None]:
coupon_merchant = pd.DataFrame(df_train_coupon.groupby(['Coupon_id'])['Merchant_id'].nunique())
coupon_merchant.columns = ['NumberOfMerchants']
coupon_merchant['NumberOfMerchants'].nunique()

### Discount Rate is constant for a particular Coupon ID

In [None]:
coupon_discount = pd.DataFrame(df_train_coupon.groupby(['Coupon_id'])['Rate'].nunique())
coupon_discount.columns = ['NumberOfDiscounts']
coupon_discount['NumberOfDiscounts'].nunique()

### Coupon Redemption Score

In [None]:
#Coupon in training Dataset
coupon_redemption_offline = df_off_coupon.groupby(['Coupon_id'])['Rate','Date'].count()
coupon_redemption_offline.columns = ['Coupon_Released', 'Coupon_Redeemed']
coupon_redemption_offline['Coupon_Ratio'] = round(coupon_redemption_offline['Coupon_Redeemed']/coupon_redemption_offline['Coupon_Released'],2)
coupon_redemption_offline.sample(5)

In [None]:
plt.figure(figsize=(8,5))
sns.distplot(coupon_redemption_offline['Coupon_Ratio'],kde=False,bins=26)
plt.xlabel('Coupon Redemption Ratio')
plt.ylabel('Count of Coupon')
plt.title('Coupon Redemption Score Distribution (OFFLINE)')
plt.show()

### Distance Distribution with respect to coupon redemption

In [None]:
plt.figure(figsize=(9,7))
ax = sns.countplot(df_off_redeem_coupon['Distance'])
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
for p in ax.patches:
        ax.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.xlabel('Distance between users and merchants')
plt.ylabel('Count of Coupon Redeemed for that discount')
plt.title('Coupon Redemption and the Distance of User')
plt.show()

## Date Analysis

#### Count of coupons released each day (OFFLINE)

In [None]:
plt.figure(figsize=(35,20))
ax = sns.countplot(df_off_coupon['Date_received'])
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
plt.xlabel('Dates : Jan 1, 2016 - Jun 30,2016 (182 days)')
plt.ylabel('Count of Coupon Released')
plt.title('Count of coupons Released each day')
plt.show()

#### Count of coupons redeemed each day (OFFLINE)

In [None]:
plt.figure(figsize=(35,20))
ax = sns.countplot(df_off_redeem_coupon['Date'])
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
plt.xlabel('Dates : Jan 1, 2016 - Jun 30,2016 (182 days)')
plt.ylabel('Count of Coupon Redeemed')
plt.title('Count of coupons redeemed each day')
plt.show()

In [None]:
df_off_redeem_coupon

#### Count of coupons released each day (ONLINE)

In [None]:
plt.figure(figsize=(35,20))
ax = sns.countplot(df_on_coupon['Date_received'])
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
plt.xlabel('Dates : Jan 1, 2016 - Jun 30,2016 (182 days)')
plt.ylabel('Count of Coupon Released')
plt.title('Count of coupons Released each day')
plt.show()

#### Count of coupons redeemed each day (ONLINE)

In [None]:
plt.figure(figsize=(35,20))
ax = sns.countplot(df_on_redeem_coupon['Date'])
ax.set_xticklabels(ax.get_xticklabels(),rotation=90)
plt.xlabel('Dates : Jan 1, 2016 - Jun 30,2016 (182 days)')
plt.ylabel('Count of Coupon Redeemed')
plt.title('Count of coupons redeemed each day')
plt.show()

### Weekdays or Weekends for Date Received (Offline)

In [None]:
#Receive Date
df_off_coupon.loc[:,('Weekend')]  = np.where((df_off_coupon.loc[:,('Date_received')] .dt.dayofweek) < 5,0,1)
df_off_coupon.loc[:,('DayOfWeek')] = df_off_coupon.loc[:,('Date_received')].dt.dayofweek
df_off_coupon.loc[:,('Month')]  = (df_off_coupon.loc[:,('Date_received')]).dt.month

#Purchase Date
df_off_redeem_coupon.loc[:,('Weekend_p')]  = np.where((df_off_redeem_coupon.loc[:,('Date')] .dt.dayofweek) < 5,0,1)
df_off_redeem_coupon.loc[:,('DayOfWeek_p')] = df_off_redeem_coupon.loc[:,('Date')].dt.dayofweek
df_off_redeem_coupon.loc[:,('Month_p')]  = (df_off_redeem_coupon.loc[:,('Date')]).dt.month

df_off_redeem_coupon.head()


In [None]:
fig,(ax1,ax2) = plt.subplots(nrows=1,ncols=2,figsize=(20,10))
plt.subplot(121)
# plt.figure(figsize=(7,4))
ax1 = sns.countplot(df_off_coupon['Weekend'])
ax1.set_xticklabels(['Weekdays','Weekend'],rotation=90)
for p in ax1.patches:
        ax1.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))

plt.title('Number of Releases and Weekends')

plt.subplot(122)
# plt.figure(figsize=(7,4))
ax2 = sns.countplot(df_off_redeem_coupon['Weekend_p'])
ax2.set_xticklabels(['Weekdays','Weekend'],rotation=90)
for p in ax2.patches:
        ax2.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.title('Count of Redemption and Weekends')

plt.show()

In [None]:
fig,(ax1,ax2) = plt.subplots(nrows=1,ncols=2,figsize=(20,10))
plt.subplot(121)
# plt.figure(figsize=(7,4))
ax1 = sns.countplot(df_off_coupon['DayOfWeek'])
ax1.set_xticklabels(['MON','TUE','WED','THUR','FRI','SAT','SUN'],rotation=90)
for p in ax1.patches:
        ax1.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))

plt.title('Number of Releases and Days')

plt.subplot(122)
# plt.figure(figsize=(7,4))
ax2 = sns.countplot(df_off_redeem_coupon['DayOfWeek_p'])
ax2.set_xticklabels(['MON','TUE','WED','THUR','FRI','SAT','SUN'],rotation=90)
for p in ax2.patches:
        ax2.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.title('Count of Redemption and Days')

plt.show()

In [None]:
fig,(ax1,ax2) = plt.subplots(nrows=1,ncols=2,figsize=(20,10))
plt.subplot(121)
# plt.figure(figsize=(7,4))
ax1 = sns.countplot(df_off_coupon['Month'])
ax1.set_xticklabels(['JAN','FEB','MAR','APR','MAY','JUN'],rotation=90)
for p in ax1.patches:
        ax1.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))

plt.title('Number of Releases and Months')

plt.subplot(122)
# plt.figure(figsize=(7,4))
ax2 = sns.countplot(df_off_redeem_coupon['Month_p'])
ax2.set_xticklabels(['JAN','FEB','MAR','APR','MAY','JUN'],rotation=90)
for p in ax2.patches:
        ax2.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.title('Count of Redemption and Months')

plt.show()

## Feature Engineering