# Exploring Coffee Shop Trends: A Comprehensive Analysis

# introduction:
### in this analysis, we dive into a coffee shop's transaction data, uncovering trends, spending patterns, and customer preferences. From coffee types to payment methods, let’s explore the story behind the numbers.

# Key Findings 

### Coffee Preferences 



### .Top Choice :
 "Americano with Milk" leads with the highest count of transactions, indicating strong customer preference.



### . Most Expensive Coffee:
 Latte emerges as the priciest, with a median spend around  
37.50
,
c
o
m
p
a
r
e
d
t
o
∗
∗
A
m
e
r
i
c
a
n
o
∗
∗
a
t
 27.50. The price for Lattes shows more variability, suggesting a wider range of options or promotions.

### Cash vs. Card Spending

### . Card Payments:
 Cards not only dominate but also exhibit higher median spending. The wider interquartile range (IQR) for card transactions highlights greater variability in spending compared to cash.

### Spending Trends Over Time 

### .Monthly Trends:
Spending was steady from February to May 2024, hovering around  
2
,
000
m
o
n
t
h
l
y
.
A
s
i
g
n
i
f
i
c
a
n
t
u
p
t
i
c
k
i
n
J
u
n
e
a
n
d
J
u
l
y
s
a
w
s
p
e
n
d
i
n
g
s
o
a
r
t
o
 8,000, hinting at seasonal spikes or successful promotions

### Weekly Trends:
 Weekly spending remained stable at  
1
,
000
f
r
o
m
M
a
r
c
h
t
o
M
a
y
,
w
i
t
h
a
n
o
t
a
b
l
e
r
i
s
e
t
o
 2,000 per week in June and July, aligning with the monthly trend.

## Transaction Insights

### Monthly Transactions:
 The number of transactions peaked in July with 250 cups, while February had the fewest at 50. This fluctuation suggests growing customer engagement or seasonal influences.

## Weekly Transaction:
Weekly transaction counts saw gradual growth from 20 in February to 80 by July, reflecting an upward trend

# Distribution of Spending

### Maximum Transaction:
 The highest recorded spend was $37.25, with a total of 273 such transactions, demonstrating significant customer expenditure on premium options.

# Conclusion 
The data paints a vibrant picture of customer behavior, revealing a strong preference for certain coffee types and payment methods. The increase in spending during the summer months and the higher spend with cards provide actionable insights for marketing and inventory management.

# Import Library

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px 
import datetime as dt

In [28]:
df=pd.read_csv("C:\\Users\\3520\\Downloads\\index (2).csv")

In [29]:
df

Unnamed: 0,date,datetime,cash_type,card,money,coffee_name
0,2024-03-01,2024-03-01 10:15:50.520,card,ANON-0000-0000-0001,38.70,Latte
1,2024-03-01,2024-03-01 12:19:22.539,card,ANON-0000-0000-0002,38.70,Hot Chocolate
2,2024-03-01,2024-03-01 12:20:18.089,card,ANON-0000-0000-0002,38.70,Hot Chocolate
3,2024-03-01,2024-03-01 13:46:33.006,card,ANON-0000-0000-0003,28.90,Americano
4,2024-03-01,2024-03-01 13:48:14.626,card,ANON-0000-0000-0004,38.70,Latte
...,...,...,...,...,...,...
971,2024-07-14,2024-07-14 22:31:29.976,card,ANON-0000-0000-0376,32.82,Latte
972,2024-07-15,2024-07-15 07:33:05.557,card,ANON-0000-0000-0377,32.82,Cappuccino
973,2024-07-16,2024-07-16 12:23:37.467,card,ANON-0000-0000-0378,27.92,Americano with Milk
974,2024-07-16,2024-07-16 19:29:25.485,card,ANON-0000-0000-0367,32.82,Hot Chocolate


In [30]:
df.columns

Index(['date', 'datetime', 'cash_type', 'card', 'money', 'coffee_name'], dtype='object')

In [31]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 976 entries, 0 to 975
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   date         976 non-null    object 
 1   datetime     976 non-null    object 
 2   cash_type    976 non-null    object 
 3   card         887 non-null    object 
 4   money        976 non-null    float64
 5   coffee_name  976 non-null    object 
dtypes: float64(1), object(5)
memory usage: 45.9+ KB


### check null values

In [32]:
df.isnull().sum()

date            0
datetime        0
cash_type       0
card           89
money           0
coffee_name     0
dtype: int64

## fill null value

In [33]:
df['card'] = df['card'].fillna('Unknown')

In [34]:
df

Unnamed: 0,date,datetime,cash_type,card,money,coffee_name
0,2024-03-01,2024-03-01 10:15:50.520,card,ANON-0000-0000-0001,38.70,Latte
1,2024-03-01,2024-03-01 12:19:22.539,card,ANON-0000-0000-0002,38.70,Hot Chocolate
2,2024-03-01,2024-03-01 12:20:18.089,card,ANON-0000-0000-0002,38.70,Hot Chocolate
3,2024-03-01,2024-03-01 13:46:33.006,card,ANON-0000-0000-0003,28.90,Americano
4,2024-03-01,2024-03-01 13:48:14.626,card,ANON-0000-0000-0004,38.70,Latte
...,...,...,...,...,...,...
971,2024-07-14,2024-07-14 22:31:29.976,card,ANON-0000-0000-0376,32.82,Latte
972,2024-07-15,2024-07-15 07:33:05.557,card,ANON-0000-0000-0377,32.82,Cappuccino
973,2024-07-16,2024-07-16 12:23:37.467,card,ANON-0000-0000-0378,27.92,Americano with Milk
974,2024-07-16,2024-07-16 19:29:25.485,card,ANON-0000-0000-0367,32.82,Hot Chocolate


In [35]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 976 entries, 0 to 975
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   date         976 non-null    object 
 1   datetime     976 non-null    object 
 2   cash_type    976 non-null    object 
 3   card         976 non-null    object 
 4   money        976 non-null    float64
 5   coffee_name  976 non-null    object 
dtypes: float64(1), object(5)
memory usage: 45.9+ KB


In [36]:
df.dtypes

date            object
datetime        object
cash_type       object
card            object
money          float64
coffee_name     object
dtype: object

# change datatype

In [38]:
df['date']=df['date'].apply(lambda x :dt.datetime.strptime(x,"%Y-%m-%d"))

In [41]:
df['datetime']=df['datetime'].apply(lambda x :dt.datetime.strptime(x,"%Y-%m-%d %H:%M:%S.%f"))

In [42]:
df['date'] = pd.to_datetime(df['date'])
df['datetime'] = pd.to_datetime(df['datetime'])
df['datetime'] = df['datetime'].combine_first(df['date'])
df['datetime'] = df['datetime'].dt.floor('min')
df.drop(columns=['date'],inplace=True)


# plot graph

In [43]:
fig = px.histogram(df, x='coffee_name', title='Count of Different Coffee Types')
fig.show()

In [44]:
df_datetime_count = df['datetime'].value_counts().reset_index()
df_datetime_count.columns = ['datetime', 'count']
fig = px.line(df_datetime_count, x='datetime', y='count', title='Transaction Count Over Time')
fig.show()

In [45]:
fig = px.histogram(df, x='cash_type', title='Distribution of Cash Types')
fig.show()

In [46]:
fig = px.histogram(df, x='money', nbins=50, title='Distribution of Money Spent')
fig.show()


# bivariate analysis visualizations

In [47]:
fig = px.scatter(df, x='datetime', y='money', title='Money Spent Over Time')
fig.show()

In [48]:
fig = px.box(df, x='cash_type', y='money', title='Money Spent by Cash Type')
fig.show()

In [49]:
fig = px.box(df, x='coffee_name', y='money', title='Money Spent by Coffee Type')
fig.show()


In [50]:
fig = px.histogram(df, x='cash_type', color='coffee_name', barmode='group', title='Coffee Types by Cash Type')
fig.show()

In [51]:
df['year_month'] = df['datetime'].dt.to_period('M').astype(str)
df_by_month = df.groupby('year_month').agg({'money': ['sum', 'count']}).reset_index()
df_by_month.columns = ['month', 'total_money', 'total_cups']
fig = px.bar(df_by_month, x='month', y='total_money', title='Total Money Spent Each Month', labels={'total_money': 'Total Money', 'month': 'Month'})
fig.update_xaxes(tickangle=-45)  # Rotate x-axis labels for better readability
fig.show()

In [52]:
fig_cups = px.bar(df_by_month, x='month', y='total_cups', title='Number of Transactions (Cups) Each Month', labels={'total_cups': 'Total Cups', 'month': 'Month'})
fig_cups.update_xaxes(tickangle=-45)  # Rotate x-axis labels for better readability
fig_cups.show()

In [53]:
df_coffee_distribution = df.groupby('coffee_name').agg({'money': 'sum'}).reset_index()
df_coffee_distribution.columns = ['coffee_name', 'total_money']
fig = px.bar(df_coffee_distribution, x='coffee_name', y='total_money', 
             title='Total Money Spent by Coffee Type', 
             labels={'total_money': 'Total Money', 'coffee_name': 'Coffee Type'})
fig.update_xaxes(tickangle=-45)  # Rotate x-axis labels for better readability
fig.show()

In [54]:
df['year_week'] = df['datetime'].dt.to_period('W').astype(str)
df_by_week = df.groupby('year_week').agg({'money': ['sum', 'count']}).reset_index()
df_by_week.columns = ['week', 'total_money', 'total_cups']
fig = px.bar(df_by_week, x='week', y='total_money', title='Total Money Spent Each Week', labels={'total_money': 'Total Money', 'week': 'Week'})
fig.update_xaxes(tickangle=-45)  # Rotate x-axis labels for better readability
fig.show()

In [55]:
fig_cups = px.bar(df_by_week, x='week', y='total_cups', title='Number of Transactions (Cups) Each Week', labels={'total_cups': 'Total Cups', 'week': 'Week'})
fig_cups.update_xaxes(tickangle=-45)  # Rotate x-axis labels for better readability
fig_cups.show()

In [56]:
fig_pie = px.pie(df_coffee_distribution, names='coffee_name', values='total_money', 
                 title='Distribution of Money Spent by Coffee Type')
fig_pie.show()

In [57]:
df['day'] = df['datetime'].dt.date
daily_spending = df.groupby('day').agg({'money': 'sum'}).reset_index()
fig = px.line(daily_spending, x='day', y='money', title='Daily Spending Trend', labels={'money': 'Total Money', 'day': 'Date'})
fig.show()

In [58]:
df['week'] = df['datetime'].dt.to_period('W').astype(str)
weekly_spending = df.groupby('week').agg({'money': 'sum'}).reset_index()
fig = px.line(weekly_spending, x='week', y='money', title='Weekly Spending Trend', labels={'money': 'Total Money', 'week': 'Week'})
fig.show()