### Data

This dataset is obtained from Kaggle dataset collections. It contains 15,010 observations and more than 6,000 transactions from a bakery. The data set contains the following columns:

Date. Categorical variable that tells us the date of the transactions (YYYY-MM-DD format). The column includes dates from 10/30/2016 to 4/9/2017.

Time. Categorical variable that tells us the time of the transactions (HH:MM:SS format).

Transaction. Quantitative variable that allows us to differentiate the transactions. The rows that share the same value in this field belong to the same transaction, that's why the data set has less transactions than observations.

Item. Categorical variable with the products.

In [1]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
# Read the dataset
df = pd.read_csv('BreadBasket_DMS.csv')
df.head(1)

Unnamed: 0,Date,Time,Transaction,Item
0,2016-10-30,09:58:11,1,Bread


In [3]:
df.dtypes

Date           object
Time           object
Transaction     int64
Item           object
dtype: object

In [4]:
df.shape

(21293, 4)

In [5]:
df.isnull().sum()

Date           0
Time           0
Transaction    0
Item           0
dtype: int64

In [6]:
df.drop(df[df.Item=='NONE'].index,inplace=True)

### Find out the top two most popular items sold in the bakery

In [7]:
df['Item'].value_counts().head(2)

Coffee    5471
Bread     3325
Name: Item, dtype: int64

In [8]:
df['Coffee']=0
df['Bread']=0

In [9]:
df.loc[(df.Item=='Coffee'),'Coffee']=1
df.loc[(df.Item=='Bread'),'Bread']=1
df1=df.loc[(df.Item=='Coffee') | (df.Item=='Bread')]

In [10]:
df1.head(1)

Unnamed: 0,Date,Time,Transaction,Item,Coffee,Bread
0,2016-10-30,09:58:11,1,Bread,0,1


In [11]:
df2=df1[['Transaction','Coffee']]
df2=df2.drop(df2[df2.Coffee==0].index,inplace=False)
df2=df2.drop_duplicates(subset=['Transaction'],keep='first',inplace=False)
df2.head(1)

Unnamed: 0,Transaction,Coffee
7,5,1


In [12]:
df3=df1[['Transaction','Bread']]
df3=df3.drop(df3[df3.Bread==0].index,inplace=False)
df3=df3.drop_duplicates(subset=['Transaction'],keep='first',inplace=False)
df3.head(1)

Unnamed: 0,Transaction,Bread
0,1,1


In [13]:
df4=pd.merge(df2,df3,on='Transaction',how='outer')
df4.shape

(6773, 3)

In [14]:
df4['Both']=0

In [15]:
df4.loc[(df4.Coffee==1)&(df4.Bread==1),'Both']=1

### The number of customers who bought coffee and bread together

In [16]:
n=len(df4[df4.Both==1])
n

852

### The number of customers who bought coffee, but not bread

In [17]:
len(df2)-n

3676

### The number of customers who bought bread, but not coffee

In [18]:
len(df3)-n

2245

### The Problem

From the data exploration performed above, we can tell the top two most popular items in this bakery are "Coffee" and "Bread". "Coffee" is the most sold product and "bread" comes with the second. The number of customers who bought coffee, but not bread is 3,676. The number of customers who bought bread, but not coffee is 2,245. The number of customers who bought both coffee and bread is only 852. Let's assume the bakery earns more profit for every bread sold than each cup of coffee. The bakery would like to see if there is any way to increase the sale of the bread.

### The potential solution

We could propose a new promortion for bread. The promotion is if the customer buy the coffee and bread together, they will get a 90% discount on their total purchase.

### The method of testing the solution

We can have two testing groups - one without discount vs the one with discount, when customers buy coffee and bread together. We could have discount for every other day. This new promotion might take a while to see the effect. Let's set the first examination day to two weeks from the beginning of the experiment. We will compare the sale of bread with regular price with the one with discount price. If the sale of the bread is higher with the discount group, we can conclude the new promotion successfully bring more sale for bread. If the sale of bread is not higher or even less with the discount group, we will wait for another two weeks to compare those two testing groups again. If the result still shows that the test group does not perform better than the control group, we will recommend the bakery to remove the promotion.