### P&G analysis document

##### Deadline
Nov 9, 2016

##### Scope
Categories: [From Brian] 
- 1st priority: Hair Treatments, Shampoo, Conditioner 
- 2nd priority: Toothpaste, Face/Neck Care.  See below for more details

Items: Products with a P&G brand.  Still an outstanding question.

##### Definitions 
- $k$ = category
- $i$ = index of a product
- $M_k$ = number of products in category $k$
- $c30_1$ = aggregated c30 number from mv_simtxn
- $c30_2$ = allcomments_today - allcomments_30DaysAgo
- $c$ =  a more reliable c30. Produced by combining c30_1 and c30_2 
- $X^{(k)}_{etype}$    = (features for stackers)
- $\widetilde{X}^{(k)}_{etype}$    = $(c30_1, c30_2)$
- $y^{(k)}_i$    = $q30_i$
- $\widetilde{y}^{(k)}_i$    = $\displaystyle\sum_{i=1}^{M_k} q30_i$
- $( \alpha^{(k)}_1 , \beta^{(k)}_1 )$ = parameters for Beta distribution fit to Tmall's q30 data on a particular month, category $k$
- $( \alpha^{(k)}_2 , \beta^{(k)}_2 )$ = parameters for Beta distributon fit to Tmall's products (from 1 to n=30) sorted by c30 descending
- $( \alpha^{(k)}_3 , \beta^{(k)}_3 )$ = parameters for Beta distributon fit to Tmall's products (from n=30 to $M_k$) sorted by c30 descending

##### For each category $k$:

- Verify inputs $X^{(k)}_{YHD} , \widetilde{X}^{(k)}_{YHD}$ are similar looking to $X^{(k)}_{Tmall} , \widetilde{X}^{(k)}_{Tmall}$

- $X^{(k)}_{YHD}$ => stacker models => output look reasonable

- Create a list of $( \alpha_k , \beta_k )$ for each month.  NOTE: should be highly correlated normal distributions sitting on a $y=mx$ line

- For each category in YHD, create a table of source categories from Tmall

- Monte Carlo using method 1 and 2 specified below

##### Monte Carlo methods:

Assumption is that category sizes can be simulated using the following model: $\sum q30 = \sum \frac{c30}{r}$ where $r = \frac{c30}{q30}$.  

We know that in general,

$$r \sim Beta( \alpha, \beta)$$

So we'll Monte Carlo over this distribution.  We also don't know for sure that this model from Tmall is the same as in YHD, so we will abstract $\alpha$ and $\beta$ according to how they waver as observed in Tmall.  So, the output will be a distribution of possible $\sum q30$ values, according to the assumptions that we've made in this model.

Additionally, since data quality of c30 tends to worsen as the number of sales or comments increases, we'll want to account for this lack of understanding by using several methods

- Method 1: fit $( \alpha^{(k)}_1 , \beta^{(k)}_1 )$ on r
- Method 2: fit $( \alpha^{(k)}_2 , \beta^{(k)}_2 )$ on r's head, $( \alpha^{(k)}_3 , \beta^{(k)}_3 )$ on r's tail

##### Working directory:

Update ipython notebooks in this directory, and use this notebook as a template for your work.  Copy it, and rename the notebook as "[category_name]".ipynb (don't use spaces in name)

##### Scope
priority 1: see below table

In [2]:
import pandas as pd
df = pd.read_csv('data.csv', encoding='gbk')
df[df.compass_cat_en.isin(["Conditioner","Shampoo","Hair Treatments"])&(df.platform_name_en.isin(["YHD","Tmall"]))].loc[:,["compass_cat_en","platform_name_en","platform_categ_l3_id","platform_categ_l3_cn","count_items"]]#.tolist()

Unnamed: 0,compass_cat_en,platform_name_en,platform_categ_l3_id,platform_categ_l3_cn,count_items
415,Shampoo,Tmall,10030101,洗发水,4618
416,Shampoo,YHD,20010401,洗发水,6251
426,Conditioner,Tmall,10030201,护发素,1660
428,Conditioner,Tmall,10030301,发膜/倒模,1014
430,Conditioner,Tmall,10030401,免洗护发素,361
432,Conditioner,Tmall,10030501,洗发护发套装,1257
433,Conditioner,YHD,20010402,护发/润发,5303
459,Hair Treatments,Tmall,10030201,护发素,1660
461,Hair Treatments,Tmall,10030301,发膜/倒模,1014
463,Hair Treatments,Tmall,10030401,免洗护发素,361


priority 2: see below table

In [3]:
df[df.compass_cat_en.isin(["Face/Neck Care", "Toothpaste"])&(df.platform_name_en.isin(["YHD","Tmall"]))].loc[:,["compass_cat_en","platform_name_en","platform_categ_l3_id","platform_categ_l3_cn","count_items"]]#.tolist()

Unnamed: 0,compass_cat_en,platform_name_en,platform_categ_l3_id,platform_categ_l3_cn,count_items
91,Face/Neck Care,Tmall,10010101,洁面,7439
93,Face/Neck Care,Tmall,10010201,面部磨砂/去角质,1100
95,Face/Neck Care,Tmall,10010301,面部按摩霜,422
97,Face/Neck Care,Tmall,10010401,化妆水/爽肤水,6622
99,Face/Neck Care,Tmall,10010501,面部精华,1988
101,Face/Neck Care,Tmall,10010601,乳液/面霜,11139
103,Face/Neck Care,Tmall,10010701,面膜,6201
105,Face/Neck Care,Tmall,10010801,面部护理套装,12580
107,Face/Neck Care,Tmall,10010901,唇部护理,1828
109,Face/Neck Care,Tmall,10011001,T区护理,920
