<h1 style="text-align:center; font-size:250%; font-family:Arial;"><b>The Bread Basket</b></h1> 

<h2 style="text-align:left; font-family:Arial;"><b>1. Importing Necessary Dependencies</b></h2> 

In [42]:
import numpy as np
import pandas as pd

import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot

import networkx as nx

<h2 style="text-align:left; font-family:Arial;"><b>2. Loading and Reading Dataset</b></h2> 

In [3]:
bakeryDF=pd.read_csv("Baker.csv")
bakeryDF.head(10)

Unnamed: 0,TransactionNo,Items,DateTime,Daypart,DayType
0,1,Bread,2016-10-30 09:58:11,Morning,Weekend
1,2,Scandinavian,2016-10-30 10:05:34,Morning,Weekend
2,2,Scandinavian,2016-10-30 10:05:34,Morning,Weekend
3,3,Hot chocolate,2016-10-30 10:07:57,Morning,Weekend
4,3,Jam,2016-10-30 10:07:57,Morning,Weekend
5,3,Cookies,2016-10-30 10:07:57,Morning,Weekend
6,4,Muffin,2016-10-30 10:08:41,Morning,Weekend
7,5,Coffee,2016-10-30 10:13:03,Morning,Weekend
8,5,Pastry,2016-10-30 10:13:03,Morning,Weekend
9,5,Bread,2016-10-30 10:13:03,Morning,Weekend


In [4]:
print("Database dimension :", bakeryDF.shape)
print("Database size      :", bakeryDF.size)

Database dimension : (20507, 5)
Database size      : 102535


In [5]:
bakeryDF.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20507 entries, 0 to 20506
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   TransactionNo  20507 non-null  int64 
 1   Items          20507 non-null  object
 2   DateTime       20507 non-null  object
 3   Daypart        20507 non-null  object
 4   DayType        20507 non-null  object
dtypes: int64(1), object(4)
memory usage: 801.2+ KB


In [6]:
bakeryDF['TransactionNo'].nunique()

9465

In [7]:
bakeryDF.describe(include=object)

Unnamed: 0,Items,DateTime,Daypart,DayType
count,20507,20507,20507,20507
unique,94,9465,4,2
top,Coffee,2017-02-17 14:18:20,Afternoon,Weekday
freq,5471,11,11569,12807


## MetaData

Number of variables: 1
Numeric variables: 1
Categorical variables: 4
Number of observations: 20507
Total number of transactions: 9465
Missing cells : 0

<h2 style="text-align:left; font-family:Arial;"><b>Data Summary:</b></h2>

Variables ¶
TransactionNo : 9465 distinct values

Items has a high cardinality: 94 distinct values

DateTime has a high cardinality: 9182 distinct values

Daypart has 4 distinct values

DayType has 2 distinct values

### EDA

## Unique values in bakery along with the count

In [16]:
itemFrequency = bakeryDF['Items'].value_counts().reset_index().rename(columns={'index': 'item', 'Items': 'frequency'})
itemFrequency

Unnamed: 0,item,frequency
0,Coffee,5471
1,Bread,3325
2,Tea,1435
3,Cake,1025
4,Pastry,856
...,...,...
89,Bacon,1
90,Gift voucher,1
91,Olum & polenta,1
92,Raw bars,1


In [18]:
fig = px.bar(itemFrequency.head(20), x='item', y='frequency', title='20 Most Frequent Items',
             color_discrete_sequence=['#1f77b4']*20, # set color to blue
             text='frequency', hover_name='item', template='simple_white')

# Update the plot layout and appearance
fig.update_layout(margin=dict(t=50, b=0, l=0, r=0), title_font_size=20,
                  xaxis_tickangle=-45, plot_bgcolor='white', coloraxis_showscale=False)
fig.update_xaxes(title='')
fig.update_yaxes(showticklabels=False, title='')

<p style="color:#4a4a4a; font-size:110%;">Coffee is the best-selling product by far, followed by bread and tea.</p>

### Sales peak hours

In [19]:
peakHours = bakeryDF.groupby('Daypart')['Items'].count().sort_values(ascending=False)
peakHours

Daypart
Afternoon    11569
Morning       8404
Evening        520
Night           14
Name: Items, dtype: int64

In [21]:
# Define the color scheme for the pie chart
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728']

# Create the pie chart using Plotly
fig = go.Figure(data=[go.Pie(labels=['Afternoon','Morning','Evening','Night'],
                             values=peakHours, title="Peak Selling Hours",
                             titlefont=dict(size=20), textinfo='label+percent',
                             marker=dict(colors=colors), hole=.5)])
fig.show()

Baker has the highest amount of sales in the afternoon followed by Morning and Evening with Night having the least

## Monthly and Weekly Sales

<p style="color:#4a4a4a; font-size:110%;">Need to extract months and days from the dataset for further analysis.</p>

In [22]:
dateTime=pd.to_datetime(bakeryDF['DateTime'])
bakeryDF['Day']=dateTime.dt.day_name()
bakeryDF['Month']=dateTime.dt.month_name()
bakeryDF['Year']=dateTime.dt.year
bakeryDF.head(5)

Unnamed: 0,TransactionNo,Items,DateTime,Daypart,DayType,Day,Month,Year
0,1,Bread,2016-10-30 09:58:11,Morning,Weekend,Sunday,October,2016
1,2,Scandinavian,2016-10-30 10:05:34,Morning,Weekend,Sunday,October,2016
2,2,Scandinavian,2016-10-30 10:05:34,Morning,Weekend,Sunday,October,2016
3,3,Hot chocolate,2016-10-30 10:07:57,Morning,Weekend,Sunday,October,2016
4,3,Jam,2016-10-30 10:07:57,Morning,Weekend,Sunday,October,2016


In [23]:
mpd = bakeryDF.groupby('Day')['Items'].count().sort_values(ascending=False)
mpd

Day
Saturday     3554
Friday       3266
Sunday       3118
Monday       3035
Tuesday      2645
Thursday     2601
Wednesday    2288
Name: Items, dtype: int64

In [24]:
fig = px.bar(mpd, title='Most Productive Day',
             color=mpd, color_continuous_scale='viridis')

# Update the plot layout and appearance
fig.update_layout(margin=dict(t=50, b=0, l=0, r=0), titlefont=dict(size=20),
                  xaxis_tickangle=0, plot_bgcolor='white', coloraxis_showscale=False)
fig.update_yaxes(showticklabels=False, title=' ')
fig.update_xaxes(title=' ')
fig.update_traces(texttemplate='%{y}', textposition='outside',
                  hovertemplate='<b>%{x}</b><br>No. of Transactions: %{y}')


### Weekends mark the highest Sales

In [26]:
mpm = bakeryDF.groupby('Month')['Items'].count().sort_values(ascending=False)
mpm

Month
March        3220
November     3076
January      3027
February     2748
December     2647
April        1048
October      1041
May           924
July          741
June          739
August        700
September     596
Name: Items, dtype: int64

In [27]:
fig = px.bar(mpm, title='Most Productive Month', color=mpm, color_discrete_sequence=px.colors.qualitative.Pastel1)
fig.update_layout(margin=dict(t=50, b=0, l=0, r=0), titlefont=dict(size=20), xaxis_tickangle=0, plot_bgcolor='white', coloraxis_showscale=False)
fig.update_yaxes(showticklabels=False, title=' ')
fig.update_xaxes(title=' ')
fig.update_traces(texttemplate='%{y}', textposition='outside', hovertemplate = '<b>%{x}</b><br>No. of Transactions: %{y}')
fig.show()

<p style="color:#4a4a4a; font-size:110%;">The bakery seems to be heavily occupied and makes most of its business from November to March.</p>

### Coffee is the best-selling product by far, followed by bread and tea. The bakery seems to be making most of its sales in the afternoon everyday with over 56% of the sales. Sales fall sharply after that. However the bakery makes a decent amount of sales in the morning as well. For obvious reasons, the sales are high as expected during the weekends. However the sales seem to be quite uniform rest of the days. The bakery seems to be heavily occupied and makes most of its business from November to March.

<h2 style="text-align:left; font-family:Arial;"><b>4. Association Rules Generation</b></h2> 

<h3 style="text-align:left; font-family:Arial;"><b>4.1 Data Preparation for Association Rule Mining</b></h3>
<p style="color:#4a4a4a; font-size:110%;">Apriori algorithm requires a dataframe with all the transactions one hot encoded for all the items.</p>

- <h4 style="text-align:left; font-family:Arial;"><b>list of all the transactions</b></h4>

In [28]:
transactions=[]
for item in bakeryDF['TransactionNo'].unique():
    lst=list(set(bakeryDF[bakeryDF['TransactionNo']==item]['Items']))
    transactions.append(lst)

transactions[0:20]

[['Bread'],
 ['Scandinavian'],
 ['Jam', 'Hot chocolate', 'Cookies'],
 ['Muffin'],
 ['Pastry', 'Bread', 'Coffee'],
 ['Pastry', 'Muffin', 'Medialuna'],
 ['Pastry', 'Medialuna', 'Tea', 'Coffee'],
 ['Pastry', 'Bread'],
 ['Muffin', 'Bread'],
 ['Medialuna', 'Scandinavian'],
 ['Bread', 'Medialuna'],
 ['Tea', 'Jam', 'Pastry', 'Tartine', 'Coffee'],
 ['Basket', 'Bread', 'Coffee'],
 ['Pastry', 'Bread', 'Medialuna'],
 ['Mineral water', 'Scandinavian'],
 ['Bread', 'Medialuna', 'Coffee'],
 ['Hot chocolate'],
 ['Farm House'],
 ['Farm House', 'Bread'],
 ['Bread', 'Medialuna']]

- <h4 style="text-align:left; font-family:Arial;"><b>one hot encoding</b></h4>

In [32]:
pip install mlxtend

Collecting mlxtend
  Downloading mlxtend-0.22.0-py2.py3-none-any.whl (1.4 MB)
Installing collected packages: mlxtend
Successfully installed mlxtend-0.22.0
Note: you may need to restart the kernel to use updated packages.


In [33]:
import mlxtend

In [35]:
from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
encodedData = te.fit(transactions).transform(transactions)
data = pd.DataFrame(encodedData, columns=te.columns_)
data.head()

Unnamed: 0,Adjustment,Afternoon with the baker,Alfajores,Argentina Night,Art Tray,Bacon,Baguette,Bakewell,Bare Popcorn,Basket,...,The BART,The Nomad,Tiffin,Toast,Truffles,Tshirt,Valentine's card,Vegan Feast,Vegan mincepie,Victorian Sponge
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [36]:
te.columns_

['Adjustment',
 'Afternoon with the baker',
 'Alfajores',
 'Argentina Night',
 'Art Tray',
 'Bacon',
 'Baguette',
 'Bakewell',
 'Bare Popcorn',
 'Basket',
 'Bowl Nic Pitt',
 'Bread',
 'Bread Pudding',
 'Brioche and salami',
 'Brownie',
 'Cake',
 'Caramel bites',
 'Cherry me Dried fruit',
 'Chicken Stew',
 'Chicken sand',
 'Chimichurri Oil',
 'Chocolates',
 'Christmas common',
 'Coffee',
 'Coffee granules ',
 'Coke',
 'Cookies',
 'Crepes',
 'Crisps',
 'Drinking chocolate spoons ',
 'Duck egg',
 'Dulce de Leche',
 'Eggs',
 "Ella's Kitchen Pouches",
 'Empanadas',
 'Extra Salami or Feta',
 'Fairy Doors',
 'Farm House',
 'Focaccia',
 'Frittata',
 'Fudge',
 'Gift voucher',
 'Gingerbread syrup',
 'Granola',
 'Hack the stack',
 'Half slice Monster ',
 'Hearty & Seasonal',
 'Honey',
 'Hot chocolate',
 'Jam',
 'Jammie Dodgers',
 'Juice',
 'Keeping It Local',
 'Kids biscuit',
 'Lemon and coconut',
 'Medialuna',
 'Mighty Protein',
 'Mineral water',
 'Mortimer',
 'Muesli',
 'Muffin',
 'My-5 Fruit S

<h3 style="text-align:left; font-family:Arial;"><b>4.2 Association Rules Generation</b></h3>

- <h4 style="text-align:left; font-family:Arial;"><b>frequent items</b></h4>

In [38]:
from mlxtend.frequent_patterns import apriori
frequentItems= apriori(data, use_colnames=True, min_support=0.02)
frequentItems.head()

Unnamed: 0,support,itemsets
0,0.036344,(Alfajores)
1,0.327205,(Bread)
2,0.040042,(Brownie)
3,0.103856,(Cake)
4,0.478394,(Coffee)


- <h4 style="text-align:left; font-family:Arial;"><b>association rules</b></h4>

In [40]:
from mlxtend.frequent_patterns import association_rules

rules = association_rules(frequentItems, metric="lift", min_threshold=1)
rules.antecedents = rules.antecedents.apply(lambda x: next(iter(x)))
rules.consequents = rules.consequents.apply(lambda x: next(iter(x)))
rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,Pastry,Bread,0.086107,0.327205,0.02916,0.33865,1.034977,0.000985,1.017305,0.03698
1,Bread,Pastry,0.327205,0.086107,0.02916,0.089119,1.034977,0.000985,1.003306,0.050231
2,Cake,Coffee,0.103856,0.478394,0.054728,0.526958,1.101515,0.005044,1.102664,0.10284
3,Coffee,Cake,0.478394,0.103856,0.054728,0.114399,1.101515,0.005044,1.011905,0.176684
4,Tea,Cake,0.142631,0.103856,0.023772,0.166667,1.604781,0.008959,1.075372,0.439556


<h2 style="text-align:left; font-family:Arial;"><b>5. Refining Rules</b></h2> 

<p style="color:#4a4a4a; font-size:110%;">The confidence for a very frequent consequent is always high even if there is a very weak association. So this doesn't give a clearer picture. Here, coffee is by far the most frequent item and the best seller. It can therefore be recommended anyway with every other item. So, we can drop the rules recommending coffee to get a clearer picture of the real unknown rules generated from the data.</p>

In [41]:
index_names = rules[rules['consequents'] == 'Coffee'].index
refinedRules = rules.drop(index_names).sort_values('lift', ascending=False)
refinedRules.drop(['leverage','conviction'], axis=1, inplace=True)
refinedRules = refinedRules.reset_index()
refinedRules

Unnamed: 0,index,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,zhangs_metric
0,4,Tea,Cake,0.142631,0.103856,0.023772,0.166667,1.604781,0.439556
1,5,Cake,Tea,0.103856,0.142631,0.023772,0.228891,1.604781,0.420538
2,18,Coffee,Toast,0.478394,0.033597,0.023666,0.04947,1.472431,0.615122
3,13,Coffee,Medialuna,0.478394,0.061807,0.035182,0.073542,1.189878,0.305936
4,15,Coffee,Pastry,0.478394,0.086107,0.047544,0.099382,1.154168,0.256084
5,10,Coffee,Juice,0.478394,0.038563,0.020602,0.043065,1.11675,0.200428
6,17,Coffee,Sandwich,0.478394,0.071844,0.038246,0.079947,1.112792,0.194321
7,3,Coffee,Cake,0.478394,0.103856,0.054728,0.114399,1.101515,0.176684
8,6,Coffee,Cookies,0.478394,0.054411,0.028209,0.058966,1.083723,0.14811
9,9,Coffee,Hot chocolate,0.478394,0.05832,0.029583,0.061837,1.060311,0.109048


<h2 style="text-align:left; font-family:Arial;"><b>Summary:</b></h2> 

Insights
Based on the association rules analysis, the following insights were found:

Coffee is the best-selling item at the bakery, and it is associated with 8 other items. Over 11% of coffee buyers also purchase cake, while almost 10% of them buy pastry along with it.
Over 16% of tea consumers also buy cakes, and over 22% of cake lovers also buy tea.
Among the pastry lovers, over 33% of them also buy bread, while nearly 9% of those who buy pastry also buy bread.

### Business Strategy
Based on the associations we have seen between coffee and its 8 partners, there are a couple of strategies that the bakery can adopt to increase its sales.

Promotional discounts: Offering discounts on the items associated with coffee can entice customers to buy coffee or vice versa. This can be a great way to increase sales for both the bakery and the customers.

Placement of items: Placing the associated items close to the coffee ordering counter can be a good strategy to tempt customers into buying them. This can increase the visibility of these items and make it more likely that customers will purchase them.