### Notebook - 1e (store_open feature)

Open store column added in this notebook (so we can look at stores that are open versus closed for churning). 

In [1]:
# Import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns',95)
pd.set_option('display.max_rows',None)
import pickle
import re
import datetime

In [2]:
# Opening the pickled file
f = open('df3.pkl', 'rb')
df3 = pickle.load(f)
f.close()

In [3]:
# Checking file
df3.shape

(20095649, 10)

In [4]:
# Checking file
df3.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 20095649 entries, 0 to 20095648
Data columns (total 10 columns):
 #   Column               Dtype         
---  ------               -----         
 0   invoice/item_number  category      
 1   date                 datetime64[ns]
 2   store_number         int64         
 3   item_number          category      
 4   item_description     category      
 5   bottles_sold         object        
 6   sale_dollars         object        
 7   volume_sold_liters   float16       
 8   store_name           category      
 9   cat_name             object        
dtypes: category(4), datetime64[ns](1), float16(1), int64(1), object(3)
memory usage: 1.3+ GB


In [5]:
df3.head()

Unnamed: 0,invoice/item_number,date,store_number,item_number,item_description,bottles_sold,sale_dollars,volume_sold_liters,store_name,cat_name
0,S04763500007,2012-03-27,2534,11788,Black Velvet,6,94.02,10.5,Hy-Vee Drugtown / Urbandale,whiskey
1,S27474100012,2015-08-25,4924,89194,Jose Cuervo Especial Reposado Flask,4,33.0,1.5,Abby Lea's,tequila
2,S10731000040,2013-02-21,4652,34449,Ketel One Citroen,2,40.48,1.5,Brady Mart Food & Liquor,vodka
3,S17037900080,2014-01-27,4794,32236,Seagrams Extra Dry Gin,1,8.99,0.75,Smokin' Joe's #17 Tobacco and Liquor Outlet,gin
4,S14396900023,2013-09-09,2647,13038,Canadian Reserve Whisky,6,80.94,10.5,Hy-Vee #7 / Edgewood Cedar Rapids,whiskey


In [None]:
# Checking for category egg nog 
df3[df3['cat_name'] == 'egg nog']

### Logic for tagging a store as "closed"

To analyze stores correctly, we need to understand which stores are no longer in business/operation. A store will be considered "closed" if the store's last invoice date was more than 6 months from Dec 1, 2020 (which is the date this data set was run).

In [6]:
closed = df3.groupby(['store_number'])['date'].max().reset_index()
closed

Unnamed: 0,store_number,date
0,2106,2020-11-27
1,2113,2020-02-03
2,2130,2020-11-27
3,2132,2012-04-23
4,2152,2016-03-17
5,2161,2012-07-09
6,2178,2020-11-27
7,2190,2020-11-30
8,2191,2020-11-25
9,2200,2020-11-30


In [7]:
# Assigning the logic for if a store is open or closed (no invoice in last 6 months)
closed = closed.assign(open_store = closed.date > np.datetime64('2020-06-01'))
closed['open_store'] = closed['open_store'].replace(True, 'open')
closed['open_store'] = closed['open_store'].replace(False, 'closed')

In [8]:
# Checking closed dataframe
closed

Unnamed: 0,store_number,date,open_store
0,2106,2020-11-27,open
1,2113,2020-02-03,closed
2,2130,2020-11-27,open
3,2132,2012-04-23,closed
4,2152,2016-03-17,closed
5,2161,2012-07-09,closed
6,2178,2020-11-27,open
7,2190,2020-11-30,open
8,2191,2020-11-25,open
9,2200,2020-11-30,open


In [9]:
# Merging the open_store column into our main df3
df3 = pd.merge(df3, closed, how = 'left', on = 'store_number')

In [10]:
# Dropping date_y (which is the max date)
df3.drop('date_y', axis = 1, inplace = True)

In [11]:
# Renaming date 
df3.rename(columns = {'date_x' : 'date'}, inplace=True)

In [12]:
df3.head()

Unnamed: 0,invoice/item_number,date,store_number,item_number,item_description,bottles_sold,sale_dollars,volume_sold_liters,store_name,cat_name,open_store
0,S04763500007,2012-03-27,2534,11788,Black Velvet,6,94.02,10.5,Hy-Vee Drugtown / Urbandale,whiskey,closed
1,S27474100012,2015-08-25,4924,89194,Jose Cuervo Especial Reposado Flask,4,33.0,1.5,Abby Lea's,tequila,closed
2,S10731000040,2013-02-21,4652,34449,Ketel One Citroen,2,40.48,1.5,Brady Mart Food & Liquor,vodka,open
3,S17037900080,2014-01-27,4794,32236,Seagrams Extra Dry Gin,1,8.99,0.75,Smokin' Joe's #17 Tobacco and Liquor Outlet,gin,open
4,S14396900023,2013-09-09,2647,13038,Canadian Reserve Whisky,6,80.94,10.5,Hy-Vee #7 / Edgewood Cedar Rapids,whiskey,open


In [15]:
# Saving into a new df4
df4 = df3.copy()

In [16]:
# Writing/saving as a pickled file 
f = open('df4.pkl', 'wb')
pickle.dump(df4, f)
f.close()