<h1 style="text-align:center;font-size:250%;"><b>The Bread Basket</b></h1> 

## **1. Importing Necessary Dependencies**

In [None]:
import numpy as np
import pandas as pd

import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot

from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules
import networkx as nx

## **2. Loading and Reading Dataset**

In [None]:
bakeryDF=pd.read_csv("../input/bakery/Bakery.csv")
bakeryDF.head()

In [None]:
print("Database dimension :", bakeryDF.shape)
print("Database size      :", bakeryDF.size)

In [None]:
bakeryDF.info()

In [None]:
bakeryDF['TransactionNo'].nunique()

In [None]:
bakeryDF.describe(include=object)

> ## **Data Summary:**
The dataset provides transaction details of all items purchased between 2016 and 2017 from the bakery online. The dataset has 20507 entries over 9000 transactions, and 4 columns.

### **Overview**

Number of variables	         :    5<br>
Numeric variables            :	  1<br>
Categorical variables        :	  4<br>
Number of observations       :	  20507<br>
Total number of transactions :	  9465<br>
Missing cells	             :    0<br>

### **Variables**
- `TransactionNo` : has a high cardinality: 9465 distinct values<br>
- `Items` has a high cardinality: 94 distinct values<br>
- `DateTime` has a high cardinality: 9182 distinct values<br>
- `Daypart` has 4 distinct values<br>
- `DayType` has 2 distinct values



## **3. Data Exploration and Visualization**

### **3.1 Let's look into the frequent items and the best sellers**

In [None]:
itemFrequency = bakeryDF['Items'].value_counts().sort_values(ascending=False)
itemFrequency.head(10)

In [None]:
fig = px.bar(itemFrequency.head(20), title='20 Most Frequent Items', color=itemFrequency.head(20), color_continuous_scale=px.colors.sequential.Mint)
fig.update_layout(xaxis_tickangle=-45, plot_bgcolor='white', coloraxis_showscale=False)
fig.update_yaxes(title=' ')
fig.update_xaxes(title=' ')
fig.update_traces(hovertemplate = '<b>%{x}</b><br>No. of Transactions: %{y}')
fig.show()

Coffee is the best-selling product by far, followed by bread and tea.

### **3.2 Let's look into the peak hours of sales**

In [None]:
peakHours = bakeryDF.groupby('Daypart')['Items'].count().sort_values(ascending=False)
peakHours

In [None]:
fig = go.Figure(data=[go.Pie(labels=['Afternoon','Morning','Evening','Night'],
                values=peakHours, title="Peak Selling Hours",titlefont=dict(size=18), textinfo='label+percent', marker=dict(colors=px.colors.qualitative.Pastel2), hole=.5)])
fig.update_layout(margin=dict(t=40, b=40, l=0, r=0), font=dict(size=12), showlegend=False)
fig.show()

The bakery seems to be making most of its sales in the afternoon everyday with over 56% of the sales. Sales fall sharply after that. However the bakery makes a decent amount of sales in the morning as well.

### **3.3 Further let's look into the monthly and weekly sales**

Need to extract months and days from the dataset for further analysis.

In [None]:
dateTime=pd.to_datetime(bakeryDF['DateTime'])
bakeryDF['Day']=dateTime.dt.day_name()
bakeryDF['Month']=dateTime.dt.month_name()
bakeryDF['Year']=dateTime.dt.year
bakeryDF.head(5)

In [None]:
mpd = bakeryDF.groupby('Day')['Items'].count().sort_values(ascending=False)
mpd

In [None]:
fig = px.bar(mpd, title='Most Productive Day', color=mpd, color_continuous_scale=px.colors.sequential.Mint)
fig.update_layout(xaxis_tickangle=0, plot_bgcolor='white', coloraxis_showscale=False)
fig.update_yaxes(title=' ')
fig.update_xaxes(title=' ')
fig.update_traces(hovertemplate = '<b>%{x}</b><br>No. of Transactions: %{y}')
fig.show()

For obvious reasons, the sales are high as expected during the weekends. However the sales seem to be quite uniform rest of the days.

In [None]:
mpm = bakeryDF.groupby('Month')['Items'].count().sort_values(ascending=False)
mpm

In [None]:
fig = px.bar(mpm, title='Most Productive Month', color=mpm, color_continuous_scale=px.colors.sequential.Mint)
fig.update_layout(xaxis_tickangle=0, plot_bgcolor='white', coloraxis_showscale=False)
fig.update_yaxes(title=' ')
fig.update_xaxes(title=' ')
fig.update_traces(hovertemplate = '<b>%{x}</b><br>No. of Transactions: %{y}')
fig.show()

The bakery seems to be heavily occupied and makes most of its business from November to March.

> ## **EDA Summary:**
Coffee is the best-selling product by far, followed by bread and tea. The bakery seems to be making most of its sales in the afternoon everyday with over 56% of the sales. Sales fall sharply after that. However the bakery makes a decent amount of sales in the morning as well. For obvious reasons, the sales are high as expected during the weekends. However the sales seem to be quite uniform rest of the days. The bakery seems to be heavily occupied and makes most of its business from November to March.

## **4. Association Rules Generation**

### **4.1 Data Preparation for Association Rule Mining**

Apriori algorithm requires a dataframe with all the transactions one hot encoded for all the items.

- <h4>list of all the transactions</h4>

In [None]:
transactions=[]
for item in bakeryDF['TransactionNo'].unique():
    lst=list(set(bakeryDF[bakeryDF['TransactionNo']==item]['Items']))
    transactions.append(lst)

transactions[0:10]

- <h4>one hot encoding</h4>

In [None]:
te = TransactionEncoder()
encodedData = te.fit(transactions).transform(transactions)
data = pd.DataFrame(encodedData, columns=te.columns_)
data.head()

### **4.2 Association Rules Generation**

- <h4> frequent items </h4>

In [None]:
frequentItems= apriori(data, use_colnames=True, min_support=0.02)
frequentItems.head()

- <h4> association rules </h4>

In [None]:
rules = association_rules(frequentItems, metric="lift", min_threshold=1)
rules.antecedents = rules.antecedents.apply(lambda x: next(iter(x)))
rules.consequents = rules.consequents.apply(lambda x: next(iter(x)))
rules.head()

### **4.3 Rules Visualization**

In [None]:
network_A = list(rules["antecedents"].unique())
network_B = list(rules["consequents"].unique())
node_list = list(set(network_A + network_B))
G = nx.Graph()
for i in node_list:
    G.add_node(i)
for i,j in rules.iterrows():
    G.add_edges_from([(j["antecedents"], j["consequents"])])
pos = nx.spring_layout(G, k=0.5, dim=2, iterations=400)
for n, p in pos.items():
    G.nodes[n]['pos'] = p

edge_trace = go.Scatter(x=[], y=[], line=dict(width=0.5, color='#888'), hoverinfo='none', mode='lines')

for edge in G.edges():
    x0, y0 = G.nodes[edge[0]]['pos']
    x1, y1 = G.nodes[edge[1]]['pos']
    edge_trace['x'] += tuple([x0, x1, None])
    edge_trace['y'] += tuple([y0, y1, None])

node_trace = go.Scatter(x=[], y=[], text=[], mode='markers', hoverinfo='text',
    marker=dict(showscale=True, colorscale='Burg', reversescale=True, color=[], size=15,
    colorbar=dict(thickness=10, title='Node Connections', xanchor='left', titleside='right')))

for node in G.nodes():
    x, y = G.nodes[node]['pos']
    node_trace['x'] += tuple([x])
    node_trace['y'] += tuple([y])

for node, adjacencies in enumerate(G.adjacency()):
    node_trace['marker']['color']+=tuple([len(adjacencies[1])])
    node_info = str(adjacencies[0]) +'<br>No of Connections: {}'.format(str(len(adjacencies[1])))
    node_trace['text']+=tuple([node_info])

fig = go.Figure(data=[edge_trace, node_trace], 
    layout=go.Layout(title='Item Connections Network', titlefont=dict(size=18),
    plot_bgcolor='white', showlegend=False, margin=dict(b=0,l=80,r=80,t=35),
    xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
    yaxis=dict(showgrid=False, zeroline=False, showticklabels=False)))

iplot(fig)

## **5. Refining Rules**

The confidence for a very frequent consequent is always high even if there is a very weak association. So this doesn't give a clearer picture. Here, coffee is by far the most frequent item and the best seller. It can therefore be recommended anyway with every other item. So, we can drop the rules recommending coffee to get a clearer picture of the real unknown rules generated from the data.

In [None]:
index_names = rules[rules['consequents'] == 'Coffee'].index
refinedRules = rules.drop(index_names).sort_values('lift', ascending=False)
refinedRules.drop(['leverage','conviction'], axis=1, inplace=True)
refinedRules = refinedRules.reset_index()
refinedRules

## **Summary**

### **Insights:**
- Coffee is the bestseller of this bakery and it shows association with 8 other items.
- Over 11% coffee lovers also buy cake along with while almost 10% of them buy pastry along with it.
- Over 16% of tea consumers also buy cakes and over 22% cake lovers also buy tea
- Among the pastry lovers, over 33% of them also buy bread, while nearly 9% of those who buy pastry also buy bread. 

### **Business Strategy:**

There are a couple of strategies that the bakery can adopt if is yet to use, to increase its sales considering the associations we have seen between coffee and its 8 partners.

- Promotional discounts on these items can entice customers to buy coffee or the other way round.
- Arranging placements of these items close to coffee ordering counter can be a good strategy to tempt customers into buying them.
<!-- - How about some recipes like a coffee cake or coffee pastry? Will that entice coffe and cake/pastry lovers?? -->