# Assumptions & Considerations

## Columns

- Quantity, Rate and Total Price columns has informations only when the user purchases.
- Rate is the price per unity of product.
- Total price is the quantity multiplied by the rate. 

## Propensity Models

- They are predictive models to determine the behavior of users based on past behavior. 
- This helps us to create customized campaigns to the users.

1. Data time range - 1 year (2019).
2. Trigger based modelling approach - what is the propensity of buying after a user does the intended action (here it is added to cart). 
3. Only users who added the products to cart are considered in this analysis (ignoring users who made a direct buy because there is no trigger).
4. RFM features are used to improve the performance of the model for the existing users.

# Libraries

In [6]:
import inflection 
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

import warnings 
warnings.filterwarnings('ignore')

# Loading Data

In [2]:
data = pd.read_excel('data/final_customer_data.xlsx')
data.head()

Unnamed: 0,User_id,Session_id,DateTime,Category,SubCategory,Action,Quantity,Rate,Total Price
0,52243841613,d76fde-8bb3-4e00-8c23,2019-01-10 10:20:00,Electronic Appliances,Speakers,first_app_open,,,
1,52243841613,33dfbd-b87a-4708-9857,2019-01-10 10:22:00,Electronic Appliances,Speakers,search,,,
2,57314161118,6511c2-e2e3-422b-b695,2019-01-10 14:00:00,Men's Fashion,Jeans,search,,,
3,57314161118,90fc70-0e80-4590-96f3,2019-01-10 14:07:00,Men's Fashion,Jeans,product_view,,,
4,57314161118,bd7419-2748-4c56-95b4,2019-01-10 14:12:00,Men's Fashion,Jeans,read_reviews,,,


# 1. Data Description

In [3]:
df1 = data.copy()

## Data Fields Meaning

- User_id: unique identifier for each user.

- Session_id: unique identifier that generates every time a user enters the app and it will expire when the user exits the app. 

- DateTime: timestamp when a particular action is performed. 

- Category: product category.

- SubCategory: product sub category. 

- Actions: these are the event, the actions the users can do in the app. These action may be product view, read reviews, purchase, add to cart, etc.

- Quantity: number of products ordered.

- Rate: it is the price per unity of product.

- Total price is the quantity multiplied by the rate.

## 1.1. Rename Columns

We are going to rename columns from camel case to snake case.

In [4]:
df1.columns

Index(['User_id', 'Session_id', 'DateTime', 'Category', 'SubCategory',
       'Action', 'Quantity', 'Rate', 'Total Price'],
      dtype='object')

In [9]:
cols_old = ['User_id', 'Session_id', 'DateTime', 'Category', 'SubCategory',
            'Action', 'Quantity', 'Rate', 'TotalPrice']

snake_case = lambda x: inflection.underscore(x)

cols_new = list(map(snake_case, cols_old))

#rename
df1.columns = cols_new

In [10]:
df1.columns

Index(['user_id', 'session_id', 'date_time', 'category', 'sub_category',
       'action', 'quantity', 'rate', 'total_price'],
      dtype='object')

## 1.2. Data Dimensions

In [12]:
print("Number of rows: {}".format(df1.shape[0]))
print("Number of cols: {}".format(df1.shape[1]))

Number of rows: 2090
Number of cols: 9


## 1.3. Data Types

In [13]:
df1.dtypes

user_id           int64
session_id       object
date_time        object
category         object
sub_category     object
action           object
quantity        float64
rate            float64
total_price     float64
dtype: object