# Brewed Insights: Coffee Sales Analysis

### 1. Started by installing and importing the relevant libraries

In [None]:
import sqlite3
import pandas as pd

### 2. Load the dataset

In [None]:
# Load CSV using the notebook's csv_path variable to avoid hardcoded path issues
df = pd.read_csv(str('/Users/graziellamorais/Desktop/Data projects/coffeeshop_sales/data/coffe_sales.csv'))

### 3. Create a temporary database using SQLite and insert the table

In [None]:
# Create a temporary SQLite database
conn = sqlite3.connect('coffee.db')

# Write the DataFrame into a SQL table
df.to_sql('coffee_sales', conn, index=False, if_exists='replace')

### 4. Exploratory Data Analysis (EDA)

#### 4.1 Top-Selling Coffee Products

The top 5 best-selling drinks (ordered by total revenue) are:

1. **Latte** – 757 units sold, $26,875.30 revenue  
2. **Americano with Milk** – 809 units sold, $24,751.12 revenue  
3. **Capuccino** – 486 units sold, $17,439.14 revenue  
4. **Americano** – 564 units sold, $14,650.26 revenue  
5. **Cortado** – 287 units sold, $7,384.86 revenue

**Strategy Recommendations:**
- **Menu focus**: Promote top-performing drinks like Latte and Americano with Milk during peak hours to maximize revenue.
- **Upselling opportunities**: Encourage add-ons or combos for mid-level sellers like Capuccino and Cortado to boost average ticket.
- **Inventory planning**: Ensure adequate stock of high-demand drinks, especially during morning and afternoon peaks.

In [None]:
query = """
SELECT coffee_name, 
       COUNT(*) AS total_sales, 
       ROUND(SUM(money), 2) AS total_revenue
FROM coffee_sales
GROUP BY coffee_name
ORDER BY total_sales DESC
LIMIT 5;
"""
top_sellers = pd.read_sql_query(query, conn)
top_sellers

#### 4.2 Peak Hours

- **10 AM is the peak sales hour**, generating $10,198 in revenue. This suggests mornings are the busiest period, likely due to office commuters and the customary "morning coffee" routine.

- **1–2 PM sees a slight dip** in revenue (~$7,100), which could be an opportunity for **promotions or lunch combos** to boost revenue during this quieter period.

- **7 PM–9 PM** ($6,400–$7,700), still generates decent revenue (~$6,398–$7,752), but lower than afternoon. People still buy coffee in the evening; maybe offer **seasonal warm drinks** like Hot Chocolate to increase evening sales.

- **6-7 AM** are low hours: $149.40 (6 AM) and $2,846.02 (7 AM). Very early opening may not be worth staffing heavily unless you have loyal early-morning customers. Could **consider reducing staff or offering pre-order options**.

**Strategy Recommendations:**
- **Staffing**: Allocate more baristas from 9 AM–12 PM and 4 PM–5 PM to handle peak demand.

- **Product promotions**: Target slow hours (1–3 PM) with discounts or combos to increase average revenue.

- **Menu focus**: Highlight high-margin drinks during peak hours to maximize profits.

- **Operational planning**: Monitor inventory for popular drinks during peak hours to avoid shortages.

In [None]:
pd.read_sql_query("""
SELECT hour_of_day, SUM(money) AS total
FROM coffee_sales
GROUP BY hour_of_day
ORDER BY hour_of_day;
""", conn)

#### 4.3 Revenue by Day of the Week

- **Tuesday generates the highest revenue** with $18,168.38, suggesting mid-week demand is strongest.

- **Monday** follows closely at $17,363.10, showing strong early-week sales, likely as people start their workweek.

- **Friday** brings in $16,802.66, still significant but slightly lower than earlier in the week, which could reflect early weekend patterns.

- **Thursday ($16,091.40) and Wednesday ($15,750.46)** show steady mid-week revenue.

- **Weekends see lower revenue**: Saturday at $14,733.52 and Sunday at $13,336.06, indicating less foot traffic compared to weekdays.

**Strategy Recommendations:**

- **Staffing**: Schedule more staff on weekdays, particularly Tuesday and Monday, to handle higher demand.

- **Promotions**: Offer weekend promotions to boost sales during slower days.

- **Inventory planning**: Ensure top-selling drinks are well-stocked during high-revenue weekdays.

In [None]:
pd.read_sql_query("""
SELECT Weekday, SUM(money) AS total_revenue
FROM coffee_sales
GROUP BY Weekday
ORDER BY Weekdaysort;
""", conn)

#### 4.4 Average Sale per Hour

- **Morning transactions (6–9 AM)** tend to be smaller, averaging around $29–$32 per sale, reflecting lighter orders early in the day.  

- **Late morning to early afternoon (10 AM–2 PM)** sees moderate average sales ($31–$32), coinciding with peak traffic hours.  

- **Afternoon and evening (3–9 PM)** have the highest average transaction values, peaking at $33.85 around 7 PM, suggesting customers are purchasing larger or "premium" drinks later in the day.  

- **Night hours (10–11 PM)** maintain relatively high averages despite fewer customers, indicating that fewer orders are slightly bigger in value.

**Strategy Recommendations:**

- **Upselling:** Promote premium drinks or add-ons in the afternoon and evening to maximize revenue per transaction.  

- **Early morning offers:** Introduce combos or incentives to increase average sales when traffic is lighter. 

In [None]:
avg_sale_hour = pd.read_sql_query("""
SELECT hour_of_day,
	   ROUND(AVG(money), 2) AS avg_sale_per_hour
FROM coffee_sales
GROUP BY hour_of_day
ORDER BY hour_of_day;
""", conn)
avg_sale_hour

#### 4.5 Monthly Sales Performance: Growth Rate by Month

- **Initial growth:**: Feb (+107%) jumps from January’s $6,399 to $13,215. March (+20%) continues growth, but at a slower pace.

- **Fluctuations**: Apr drops sharply (-64%), then May recovers (+43%), showing volatility in spring months.

- **Moderate stability**: Jun–Aug see smaller changes (-7% to +10%), indicating steady sales.

- **High season**: Sep–Oct (+31% and +39%) mark strong late-year performance.

- **Slowdown**: Nov drops (-38%), possibly due to seasonal factors like colder weather or lower foot traffic.

In [None]:
monthly_sales = df.groupby(['Month_name', 'Monthsort'])['money'].sum().reset_index()
monthly_sales = monthly_sales.sort_values('Monthsort') # Sort by Monthsort to ensure correct order
monthly_sales['sales_growth_rate'] = (
    monthly_sales['money'].pct_change().fillna(0) * 100 # Calculate percentage change and convert to percentage
).map(lambda x: f"{x:.2f}%") # Format as percentage string
monthly_sales = monthly_sales.drop(columns='Monthsort')  # Remove Monthsort for cleaner output
print(monthly_sales)

#### 4.6 Outliers: Extreme Sales

- **Number of extreme transactions:** 128, representing unusually high-value sales.  

- **Total revenue from these transactions:** $4,953.60, accounting for **4.41% of total revenue**.  

- **Average transaction value:** $38.70, higher than the typical sale.

- Extreme transactions make up a small portion of overall revenue but reflect **customers opting for premium or multiple-item orders**.

**Strategy Recommendations:**  
- **Upselling and promotions:** Encourage similar high-value purchases during peak hours.  

- **Inventory management:** Ensure sufficient stock of popular premium items to support these larger transactions.  

- **Customer engagement:** Consider loyalty rewards or targeted marketing for customers who frequently make high-value purchases.

In [None]:
# Threshold for top 5% transactions
threshold = df['money'].quantile(0.95)
outliers = df[df['money'] > threshold]

# Insights
num_outliers = len(outliers)
total_outliers = outliers['money'].sum()
avg_outliers = outliers['money'].mean()
pct_of_total = total_outliers / df['money'].sum() * 100

print(f"Number of extreme transactions: {num_outliers}")
print(f"Total value of extreme transactions: ${total_outliers:.2f}")
print(f"Average value of extreme transactions: ${avg_outliers:.2f}")
print(f"Percentage of total revenue from extreme transactions: {pct_of_total:.2f}%")