<a href="https://www.kaggle.com/code/ronaldopangarego/supermarket-sales-analysis?scriptVersionId=142180687" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

## **Context**
The growth of supermarkets in most populated cities are increasing and market competitions are also high. The dataset is one of the historical sales of supermarket company which has recorded in 3 different branches for 3 months data.

## **Objectives**

   * Which branch has the most sales?
   * How much did we sell in each month? (What was the best month for sales? How much was earned that month?)
   * What product and category sold the most?
   * How much did we sell in each city?
   * Average rating (customer shopping experience)

## **Import Libraries and Dataset**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

warnings.filterwarnings("ignore")
sns.set_theme(style="whitegrid")

sales = pd.read_csv('../input/supermarket-sales/supermarket_sales - Sheet1.csv')

## **Data Exploration**



In [None]:
# The first 5 rows  of the DataFrame
sales.head()

In [None]:
# The last 5 rows  of the DataFrame
sales.tail()

In [None]:
# Sales DataFrame's shape
print("Shape: %d rows and %d columns" % sales.shape)

In [None]:
# Sales DataFrame's columns name
print("Columns: ")
for column in sales.columns:
    print("-",column)

In [None]:
# Sales DataFrame's data types
print(sales.dtypes)

In [None]:
# Convert date and time object into datetime data type
sales['Date'] = pd.to_datetime(sales['Date'])
sales['Time'] = pd.to_datetime(sales['Time'])

In [None]:
# Check for missing values
print(sales.isnull().sum())

In [None]:
# Dataset Statistical Summary
sales.describe()

In [None]:
# Features Correlation
sales.corr(numeric_only=True)

## **Analysis & Visualizations**


#### **Which branch has the most sales?**

In [None]:
branch_sum = sales.groupby('Branch')['Total'].sum().reset_index()
print(branch_sum)

sns.barplot(x='Branch', y='Total', data=branch_sum)
plt.ylim(0, 150000);
plt.title("Total Sales each Branch")

# Add labels to each bar
for index, row in branch_sum.iterrows():
    plt.text(index, row['Total'], f'${row["Total"]:.2f}', ha='center', va='bottom')

#### **How much did we sell in each month?**

In [None]:
sales['Month'] = sales['Date'].dt.month
monthly_sales = sales.groupby('Month')['Total'].sum().reset_index()
print(monthly_sales)

sns.barplot(x='Month', y='Total', data=monthly_sales, estimator="sum")
plt.ylim(0, 150000);
plt.title("Supermarket Total Sales")

# Add labels to data points
for index, row in monthly_sales.iterrows():
    plt.text(index, row['Total']*1.05, f'${row["Total"]:.2f}', ha='center', va='bottom')

#### **What product sold the most by quantity?**

In [None]:
product = sales.groupby('Product line')['Quantity'].sum().reset_index()
product_sorted = product.sort_values(by='Quantity', ascending=False)
ax = sns.barplot(x='Quantity', y='Product line', data=product_sorted)
plt.xlim(0, product_sorted['Quantity'].max() * 1.2)
plt.title("The most sold product by quantity")
 
# Add labels to data points
for p in ax.patches:
    ax.annotate(f'{p.get_width():.0f}', (p.get_x() + p.get_width(), p.get_y() + p.get_height() / 2), 
                ha='left', va='center')


#### **How much did we sell in each city?**

In [None]:
sales_each_city = sales.groupby('City')['Total'].sum().reset_index()
sales_each_city_sorted = sales_each_city.sort_values(by='Total', ascending=False)
# print(sales_each_city_sorted)

ax = sns.barplot(x='City', y='Total', data=sales_each_city_sorted, estimator="sum")
plt.ylim(0, sales_each_city_sorted['Total'].max()*1.2);
plt.title("Total Sales each City")

# Add labels to the bars
for p in ax.patches:
    ax.annotate(f'${p.get_height():.0f}', (p.get_x() + p.get_width() / 2., p.get_height()),
                ha='center', va='bottom')


#### **Customer shopping experience**

In [None]:
mean_rating = sales['Rating'].mean()
# print(f"Average rating for customer shopping experince: {mean_rating}")
sns.displot(sales['Rating'],kde=False)
plt.title('Customer Shopping Experience');


In [None]:
mean_rating_branch = sales.groupby('Branch')['Rating'].mean().reset_index()
print(mean_rating_branch)

ax = sns.barplot(x="Branch", y="Rating", data=mean_rating_branch)
plt.title('Mean Customer Shopping Experience each Branch')
plt.ylim(0,8)

# Add labels to the bars
for p in ax.patches:
    ax.annotate(f'{p.get_height():f}', (p.get_x() + p.get_width() / 2., p.get_height()),
                ha='center', va='bottom')
    

## **Conclusion**

   * Which branch has the most sales?
       > **Branch C** has the most sales with total `$110,568`

   * How much sales in each month?
       > January `$116291.868` \
       > February `$97219.374` \
       > March     `$109455.507`
       
   * What product sold the most?
       > *Electronic accessories* are the most sold product, closely followed by *Food and beverages*
       
   * How much did we sell in each city?
       > Naypyitaw  `$110568.706`
       > Yangon  `$106200.370`
       > Mandalay  `$106197.672`
       
   * Average rating (customer shopping experience) to each branch
        > Branch A  - 7.03/10 \
        > Branch B  - 6.82/10 \
        > Branch C  - 7.08/10 \
        > Total average customer shopping experience `6.97/10`

#### Read more on medium: [Supermarket Sales Analysis](https://medium.com/@rpangarego/supermarket-sales-analysis-with-python-539faedbde7b)