## Marketing STP
Segmentation, Targeting and Positioning

This notebook will contain a sequence of **`Positioning`**. The main study areas will be divided into three as following :
1. Purchase incident
2. Brand choice
3. Purchase quantity


## Overview

- The dataset will firstly be segmented using the model defined in `Segmentaion.ipynb`
- Logistic regression and linear regression will be applied to perform purchase analysis and forecast.
- More details about the data can be found in the path : `data/purchase data legend.xlsx`

---
## Libraries

In [2]:
import numpy as np
import pandas as pd

from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA

import os
import pickle

## Segmentation

### Import the segmentation method

In [10]:
def import_pickle_data(filename) :
    path = os.path.join(os.getcwd(), 'obj_data', filename)
    return pickle.load(open(path, 'rb')) #rb = read as byte datatype

In [11]:
scaler = import_pickle_data('scaler.pickle')
pca = import_pickle_data('pca.pickle')
kmeans_pca = import_pickle_data('kmeans_pca.pickle')

### Loading and assessing the data
The data has already been cleaned and neat

In [12]:
def import_csv(filePath) :
    return pd.read_csv(filePath)

In [14]:
filePath_purchase = os.path.join(os.getcwd(), 'data', 'purchase data.csv')
df_purchase = import_csv(filePath_purchase)

In [17]:
df_purchase.head()

Unnamed: 0,ID,Day,Incidence,Brand,Quantity,Last_Inc_Brand,Last_Inc_Quantity,Price_1,Price_2,Price_3,...,Promotion_3,Promotion_4,Promotion_5,Sex,Marital status,Age,Education,Income,Occupation,Settlement size
0,200000001,1,0,0,0,0,0,1.59,1.87,2.01,...,0,0,0,0,0,47,1,110866,1,0
1,200000001,11,0,0,0,0,0,1.51,1.89,1.99,...,0,0,0,0,0,47,1,110866,1,0
2,200000001,12,0,0,0,0,0,1.51,1.89,1.99,...,0,0,0,0,0,47,1,110866,1,0
3,200000001,16,0,0,0,0,0,1.52,1.89,1.98,...,0,0,0,0,0,47,1,110866,1,0
4,200000001,18,0,0,0,0,0,1.52,1.89,1.99,...,0,0,0,0,0,47,1,110866,1,0


### `1` Standardization
the StandardScaler is expecting 7 features as input.

In [23]:
df_purchase.columns.values[-7:] # the 7 features standized 

array(['Sex', 'Marital status', 'Age', 'Education', 'Income',
       'Occupation', 'Settlement size'], dtype=object)

In [20]:
features_to_include = df_purchase.columns.values[-7:]
df_purchase_std = scaler.transform(df_purchase[features_to_include])

In [22]:
df_purchase_std.shape 

(58693, 7)

### `2` Dimentionality reduction (PCA)

In [24]:
df_purchase_pca = pca.transform(df_purchase_std)

In [25]:
df_purchase_pca.shape # features reduce down to 3

(58693, 3)

### `3` Kmeans clustering using PCA

In [26]:
purchase_cluster_kmeans_pca = kmeans_pca.predict(df_purchase_pca)

In [30]:
segment = purchase_cluster_kmeans_pca # clustered
df_purchase_predict = df_purchase.copy()
df_purchase_predict['Segment'] = segment

In [31]:
df_purchase_predict['Segment'].value_counts()

0    21526
1    13677
2    12123
3    11367
Name: Segment, dtype: int64