# Bakery Sales

<img src="https://www.theinternationalkitchen.com/wp-content/uploads/2020/01/TIK_FrenchCookingClasses_croissants-1280x720.jpg" width="800">



I worked part time at a small bakery.

Our bakery started delivery service in July 2019.

I collected this data and analyzed it to share with my employees.

My goal is this.

First, I'm going to analyze sales and sales by item. Look for months with low sales, and find out if there was a problem with the product or service. If you have a product that sells less, consider switching to another product.

Second, I'm going to see a volume of order by day of the week, time(lunch or afternoon), and customer's location. Then we can control the production.

Third, I'm going to see association rules. We could know which items have high support and lift.

# 1. Loading and Cleaning Data

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import folium
from folium import Choropleth, Circle, Marker

In [None]:
df = pd.read_csv('../input/bakery-sales/Bakery Sales.csv')
df = df[0:2405]
df.head()

In [None]:
df.replace(np.nan, 0, inplace = True)
df.tail()

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
df['date'] = pd.to_datetime(df['datetime'])
df.drop('datetime', axis = 1, inplace = True)

df['dates'] = df['date'].dt.strftime("%Y-%m")
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day
df['hour'] = df['date'].dt.hour

df.sample()

In [None]:
df_month_sale = df.groupby(df['dates'], as_index = False)['total'].sum()
df_month_sale

# 2. EDA

## Total Sales

In [None]:
sns.set(style = 'darkgrid')
plt.figure(figsize = (10,4))

sns.lineplot(data = df_month_sale, x = 'dates', y = 'total', color = 'skyblue', linewidth = 3)
plt.axvline(x = '2020-01', color = 'r', linestyle = '--', label = 'Covid-19 outbreak in Korea: Jan, 20')

plt.legend()
plt.title('Monthly Total Sales', size = 16)
plt.xlabel('Date')
plt.ylabel('Total Sales (Won)')
plt.show()

Since we started delivery service on July 11th, July recorded 20 days of sales.

In August, it posted sales of 6 million won, but declined significantly in September and October.

Since Korea's first coronavirus confirmed, delivery sales have increased dramatically.

### Trend of Corona Virus

In [None]:
corona = pd.read_csv('../input/corona/corona.csv')

corona['date'] = pd.to_datetime(corona['date'])
corona['date'] = corona['date'].dt.strftime("%Y-%m-%d")
corona.set_index('date', inplace=True)

In [None]:
sns.set(style = 'darkgrid')

fig, ax = plt.subplots(figsize = (10,4))
x= corona.index
y = corona.confirmed

ax.plot(x,y, label = 'corona confirmed', color = 'r')
plt.axvline(x = '2020-02-19', color = 'y', linestyle = '--', label = 'No.31 confirmed')
ax.set_xticks(ax.get_xticks()[::10])
plt.xticks(rotation = 45)
plt.title("Trend of Corona in South Korea", size = 16)
plt.ylabel('Count')
plt.legend()
plt.show()

This has a high correlation with the corona confirmed graph.

On February 18th, a large-scale spread began from confirmed patient 31.

It seems that during this period, customers used delivery services rather than stores, and sales increased sharply.

Delivery sales declined as the spread eased in April.

It can be speculated that the store's utilization rate has increased.

In [None]:
df_month_pain = df.groupby(df['dates'], as_index = False)[['angbutter', 'plain bread', 'jam',
                                                           'croissant', 'tiramisu croissant',
                                                           'cacao deep', 'pain au chocolat', 'almond croissant', 'croque monsieur',
                                                           'mad garlic', 'gateau chocolat', 'pandoro',
                                                           'cheese cake', 'orange pound', 'wiener','tiramisu', 'merinque cookies']].sum()
df_month_pain

## Monthly Sales of Pain

In [None]:
sns.set(style = 'darkgrid')
figure, ((ax1, ax2, ax3), (ax4, ax5, ax6)) = plt.subplots(ncols = 3, nrows = 2)

figure.set_size_inches(30,10)
sns.lineplot(data = df_month_pain, x = 'dates', y = 'angbutter', label = 'Angbutter', ax = ax1)
sns.lineplot(data = df_month_pain, x = 'dates', y = 'croissant', label = 'Croissant', ax = ax2, color = 'green')
sns.lineplot(data = df_month_pain, x = 'dates', y = 'plain bread', label = 'Plain Bread', ax = ax3, color = 'Tan')
sns.lineplot(data = df_month_pain, x = 'dates', y = 'tiramisu croissant', label = 'Tiramisu Croissant', ax = ax4, color = 'Tomato')
sns.lineplot(data = df_month_pain, x = 'dates', y = 'pain au chocolat', label = 'Pain Au Chocolat', ax = ax5, color = 'Olive')
sns.lineplot(data = df_month_pain, x = 'dates', y = 'wiener', label = 'Wiener', ax = ax6, color = 'Maroon')

plt.show()

Angbutter that the most selling menu in our store is strongly related to the total profit. so Angbutter has a similar trend to the sales graph.

The croissant, Pain Au Chocolat is in Febuary, it had a very small sales volume. We had to check the process or ingredients once again.

Tiramisu  Croissant, Plain bread show an increasing trend. It looks like a good sign.

In [None]:
sns.set(style = 'darkgrid')
figure, ((ax1, ax2, ax3), (ax4, ax5, ax6)) = plt.subplots(ncols = 3, nrows = 2)

figure.set_size_inches(30,10)
sns.lineplot(data = df_month_pain, x = 'dates', y = 'pandoro', label = 'Pandoro', ax = ax1, color = 'gold')
sns.lineplot(data = df_month_pain, x = 'dates', y = 'orange pound', label = 'Orange Pound', ax = ax2, color = 'orange')
sns.lineplot(data = df_month_pain, x = 'dates', y = 'cacao deep', label = 'Cacao Deep', ax = ax3, color = 'black')
sns.lineplot(data = df_month_pain, x = 'dates', y = 'almond croissant', label = 'Almond Croissant', ax = ax4, color = 'Peru')
sns.lineplot(data = df_month_pain, x = 'dates', y = 'gateau chocolat', label = 'Gateau Chocolat', ax = ax5, color = 'black')
sns.lineplot(data = df_month_pain, x = 'dates', y = 'cheese cake', label = 'Cheese cake', ax = ax6, color = 'yellow')

plt.show()

Pandoro had the lowest sales volume in December. However, it has been showing decent sales since January.

Orange pound and cacao deep are also on the rise. In particular, cacao deep shows steady sales even when sales were low.

Almond croissants are selling poorly. To make an almond croissant, we need to make almond cream. Considering the sales volume, it is likely that production should be reduced or replaced with other products.

## Monthly Sales of Beverage

In [None]:
df_month_beverage = df.groupby(df['dates'], as_index = False)[['americano', 'caffe latte', 'milk tea', 'vanila latte', 'berry ade', 'lemon ade']].sum()
df_month_beverage

In [None]:
sns.set(style = 'darkgrid')
figure, ((ax1, ax2, ax3), (ax4, ax5, ax6)) = plt.subplots(ncols = 3, nrows = 2)

figure.set_size_inches(30,10)
sns.lineplot(data = df_month_beverage, x = 'dates', y = 'americano', label = 'Americano', ax = ax1, color = 'black')
sns.lineplot(data = df_month_beverage, x = 'dates', y = 'caffe latte', label = 'Caffe Latte', ax = ax2, color = 'black')
sns.lineplot(data = df_month_beverage, x = 'dates', y = 'vanila latte', label = 'Vanila Latte', ax = ax3, color = 'gray')
sns.lineplot(data = df_month_beverage, x = 'dates', y = 'milk tea', label = 'Ice Milk Tea', ax = ax4, color = 'Plum')
sns.lineplot(data = df_month_beverage, x = 'dates', y = 'berry ade', label = 'Berry Ade', ax = ax5, color = 'red')
sns.lineplot(data = df_month_beverage, x = 'dates', y = 'lemon ade', label = 'Lemon Ade', ax = ax6, color = 'yellow')

plt.legend()
plt.show()

Beverage sales are low compared to bread sales.

In particular, most beverages sold very low in April.

This is thought to be because the spread of the corona has been mitigated and beverage were drank at bakery rather than delivery.

## By Day of Week

### Sales by day and time, place

In [None]:
df['time'] = ['lunch' if hour < 14 else 'afternoon' for hour in df['hour']]

In [None]:
p = df.pivot_table(index = 'time', columns = 'day of week', values = 'day', aggfunc = 'count')
p = p.reindex(['Mon', 'Tues', 'Wed', 'Thur', 'Fri', 'Sat', 'Sun'], axis = 1)

plt.figure(figsize = (10,5))
g = sns.heatmap(p, annot = True, cmap = 'Blues', fmt=".0f")
plt.title("Day Of Week", size = 15)
plt.show()

### Sales of Angbutter(Our signature menu)

In [None]:
angbutter_pivot = df.pivot_table(index = 'time', columns = 'day of week', values = 'angbutter', aggfunc = 'sum')
angbutter_pivot = angbutter_pivot.reindex(['Mon', 'Tues', 'Wed', 'Thur', 'Fri', 'Sat', 'Sun'], axis = 1)

plt.figure(figsize = (10,5))
g = sns.heatmap(angbutter_pivot, annot = True, cmap = 'Greens', fmt= ".0f")
plt.title('Sales of Angbutter by day and time', size = 15)
plt.show()

In general, the order volume was high at lunch time (11, 12, 1 o'clock).

Especially, it shows a lot of sales on weekends.

We were very surprised that the number of orders was low on Friday!

## By Customer's Place

In [None]:
df_place = df[df['place'] != 0]

In [None]:
df_place['place'].replace({'소양동' : 'Soyang-dong', '효자 3동' : 'Hyoja3-dong', '후평 1동' : 'Hoopyeong1-dong', '후평 2동' : 'Hoopyeong2-dong', 
                           '석사동' : 'Seoksa-dong', '퇴계동' : 'Toegye-dong', '동면' : 'Dongmyeon', '후평 3동': 'Hoopyeong3-dong', "신사우동" : 'Sinsawoo-dong',
                          "강남동" : 'Gangnam-dong', "효자 1동": 'Hyoja1-dong', '조운동' : 'Jowoon-dong', '교동' : 'Gyo-dong', '효자 2동' : 'Hyoja2-dong', '약사명동' : 'Yaksamyeong-dong',
                          '근화동': 'Geunhwa-dong', '동내면' : 'Dongnae-myeon', '신동면' : 'Sindong-myeon', '교동 ': 'Gyo-dong'}, inplace = True)

In [None]:
g = df_place.groupby('place').count()
g.reset_index(inplace = True)
g.sort_values(by = 'time', inplace = True, ascending = False)

plt.figure(figsize = (10,6))

sns.barplot(data = g, x = 'time', y = 'place', palette = 'rocket')
plt.title('Volume of Order by Place', size = 15)
plt.xlabel('Volum of Order')
plt.ylabel('Place')
plt.show()

Since our bakery is located in Dongmyeon, it showed the highest sales volume in Dongmyeon. It also showed a lot of sales in nearby Hupyeong-dong. 

The platform called  Beamin, which we use, shows poor sales rates in relatively distant neighborhoods because delivery charges vary according to distance.

For this reason, we are trying to set up a new branch to deliver itmes to distant areas.

Since the corona, many franchise bakeries are also delivering.

In [None]:
restaurant = pd.read_csv('../input/gangwon-restaurant/gangwon_restaurant.csv')
chuncheon_cafe = restaurant[(restaurant['시군구명'] == '춘천시') & 
                            (restaurant['상권업종대분류코드'] == 'Q') & 
                            (restaurant['상권업종중분류명'] == '제과제빵떡케익') |
                            (restaurant['상권업종중분류명'] == '커피점/카페')]

In [None]:
bakery = chuncheon_cafe[(chuncheon_cafe['상호명'] == '파리바게뜨') |
                        (chuncheon_cafe['상호명'] == '뚜레쥬르') |
                        (chuncheon_cafe['상호명'] == '자유빵집') |
                        (chuncheon_cafe['상호명'] == '스타벅스')]

bakery['상호명'].replace({'파리바게뜨' : 'Paris Baguette', '뚜레쥬르' : 'Tours les Jours', '자유빵집' : 'Our Bakery', '스타벅스' : 'Starbucks'}, inplace = True)

bakery = bakery[['상호명', '도로명주소', '위도', '경도']]
geo_df = bakery

In [None]:
map = folium.Map(location = [geo_df['위도'].mean(), geo_df['경도'].mean()], zoom_start = 10)

for n in geo_df.index:
    popup_name = geo_df.loc[n, '상호명']
    if geo_df.loc[n, '상호명'] == 'Starbucks':
        icon_color = 'green'
    elif geo_df.loc[n, '상호명'] == 'Our Bakery':
        icon_color = 'red'
    elif geo_df.loc[n, '상호명'] == 'Tours les Jours':
        icon_color = 'orange'
    else:
        icon_color = 'blue'
        
    Circle(
        location = [geo_df.loc[n, '위도'], geo_df.loc[n,'경도']],
        radius = 30,
        popup = popup_name,
        color = icon_color,
        fill = True,
        fill_color = icon_color).add_to(map)
map

This is a map showing Paris Baguette(blue) and Tous Les Jours(green), the representative Korean franchise bakery, and our bakery(red).

You can see that Paris Baguette is located near the station compared to Tous Les Jours.

We decided to target the Seoksa-dong, Toegye-dong, and Onui-dong areas where sales were relatively low due to the distance.

Finally, Onui-dong, which is located near the station and is expected to be less competitive, was selected.

# 3. Association Analysis

In [None]:
from mlxtend.frequent_patterns import apriori, association_rules

In [None]:
df_item = df.reset_index(inplace = True)
df_item = df.drop(['date','place', 'day of week', 'total', 'dates', 'month', 'day', 'hour','time', 'index'], axis = 1, index = None)
df_item = df_item >= 1
df_item.head()

In [None]:
df_apriori = apriori(df_item, min_support = 0.01, use_colnames = True)
df_apriori.sort_values(by = 'support', ascending = False).head()

80% customers order ang butter, and about 35% order bread.

about 30% order tiramisu croissants, croissant.

In [None]:
df_apriori['length'] = df_apriori['itemsets'].apply(lambda x : len(x))

In [None]:
df_apriori2 = df_apriori[(df_apriori['length'] == 2) & df_apriori['support'] >= 0.05]
df_apriori2.sort_values(by = 'support', ascending = False).head()

In [None]:
rules = association_rules(df_apriori, metric = 'lift', min_threshold = 1)
rules.sort_values(by = 'lift', ascending = False, inplace = True)
rules.head()

# 4. Conclusion


While analyzing, I saw a lot of unexpected results I couldn't think of.

Although it was difficult to collect data directly, it was very beneficial to visualize and solve business problems(like monthly sales, sales by day of the week, time, place)

If it was fun, Upvote and comment please.

thank you for reading. 