<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Initialize" data-toc-modified-id="Initialize-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Initialize</a></span><ul class="toc-item"><li><span><a href="#Import-Stuff" data-toc-modified-id="Import-Stuff-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Import Stuff</a></span></li><li><span><a href="#Environmental-Stuff" data-toc-modified-id="Environmental-Stuff-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Environmental Stuff</a></span></li><li><span><a href="#Functions" data-toc-modified-id="Functions-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Functions</a></span><ul class="toc-item"><li><span><a href="#Named" data-toc-modified-id="Named-1.3.1"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>Named</a></span></li></ul></li><li><span><a href="#Load-Pickles" data-toc-modified-id="Load-Pickles-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Load Pickles</a></span></li></ul></li><li><span><a href="#Scoring-Scheme" data-toc-modified-id="Scoring-Scheme-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Scoring Scheme</a></span></li></ul></div>

# Initialize

## Import Stuff

In [1]:
import pickle as pkl
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import warnings

## Environmental Stuff

In [2]:
warnings.filterwarnings('ignore')
sns.set(font_scale=1.5)
%matplotlib inline

## Functions

### Named

In [3]:
def make_figure(figsize=(12,9)):
    fig = plt.figure(figsize=figsize)
    ax = plt.gca()
    return fig, ax

def score_category(in_df=None,category=None,total_points=5):
    df = in_df[['user_id',category]].copy().sort_values(category)
    df[category+'_score'] = 0
    unique_sorted = sorted(df[category].unique())
    no_unique = len(unique_sorted)
    segments = no_unique/total_points
    
    for i in np.arange(1,total_points+1):
            start = int(segments*(i-1))
            end = int(segments*i)
            if i<total_points:
                df.iloc[start:end,-1] = i
            else:
                df.iloc[start:,-1] = total_points
    out_df = pd.merge(customer_cat_df,df.drop(category,axis=1),on='user_id')
    
    return out_df

## Load Pickles

In [4]:
with open('../pickles/customers.pkl','rb') as fin:
    customers_df = pkl.load(fin)
with open('../pickles/order_products.pkl','rb') as fin:
    order_products_df = pkl.load(fin)

# Scoring Scheme

**Goal:** Segment customers in Value Levels 0-2 (2: most valuable).

***Assumption:*** The most valuable customers will be generally very consistent with their orders.
- High ratio of reordered items
- High purchase rate
- Long lifetime

**Scoring Scheme:** For each category, user's a ranked in ascending order. The ordered lists are broken into 5 equal parts, and scored accordingly.

In [5]:
customer_cat_df = (customers_df
                   .copy()
                   .drop(['avg_dow', 'avg_hour', 'avg_days_since', 'var_dow', 'var_hour', 
                          'var_days_since', 'total_orders', 'avg_cart_size', 'var_cart_size', 
                          'avg_consistency_rate', 'var_consistency_rate', 'var_purchase_rate',]
                         ,axis=1)
                  )

categories = 'lifetime ratio_reordered avg_purchase_rate'.split()
for category in categories:
    customer_cat_df = score_category(in_df=customer_cat_df,category=category,total_points=5)
customer_cat_df.drop(categories,axis=1,inplace=True)
customer_cat_df['overall_score'] = (customer_cat_df.lifetime_score 
                                    + customer_cat_df.ratio_reordered_score 
                                    + customer_cat_df.avg_purchase_rate_score)/3

customer_cat_df['overall_score'] = (customer_cat_df.lifetime_score 
                                    + customer_cat_df.ratio_reordered_score 
                                    + customer_cat_df.avg_purchase_rate_score)
customer_cat_df.head()

Unnamed: 0,user_id,lifetime_score,ratio_reordered_score,avg_purchase_rate_score,overall_score
0,187545,5,4,5,14
1,25444,5,4,5,14
2,34867,1,2,5,8
3,172424,5,1,5,11
4,162611,5,5,5,15


**NOTE:** Too many high scores. Need to use unique list to determine scores.

In [15]:
len(customers_df.query('lifetime > 300'))

4093