# Coffee Transaction Analysis During Exam Periods

This analysis investigates whether coffee transaction frequency increases during midterm and final exam periods. We'll analyze personal spending data to determine if exam periods have a significant impact on coffee consumption patterns.

## Hypothesis
- **H0 (Null)**: No significant difference in coffee transaction frequency between exam and normal periods
- **H1 (Alternative)**: Exam periods have a significant impact on coffee transaction frequency

In [1]:
# Coffee Transaction Analysis During Exam Periods
import pandas as pd
import numpy as np
from datetime import datetime
from scipy.stats import chi2_contingency
import matplotlib.pyplot as plt

ModuleNotFoundError: No module named 'scipy'

## Data Preparation

We'll process the transaction data following these steps:
1. Read personal bank statement data
2. Filter for coffee-related transactions
3. Track transactions by vendor
4. Identify transactions during exam periods

In [2]:
# Read and process transaction data
df = pd.read_excel("yeni.xlsx", sheet_name="Sheet1", engine="openpyxl")

# Define coffee vendors
coffee_keywords = ["starbucks", "espresso", "coffy", "yukselen","SABANCI UNIVERSITESI TEMASSIZ".lower()]

brand_count = {
    "Starbucks":0,
    "EspressoLab":0,
    "Coffy":0,
    "Fasshane":0
}

# Count transactions by vendor
for a in df['Açıklama']:
    if a.lower().find('starbucks')!=-1:
        brand_count['Starbucks']+=1
    elif a.lower().find('espresso')!=-1:
        brand_count['EspressoLab']+=1
    elif a.lower().find('coffy')!=-1 or a.lower().find("SABANCI UNIVERSITESI TEMASSIZ".lower())!=-1:
        brand_count['Coffy']+=1
    elif 'YUKSELEN' in a:
        brand_count['Fasshane']+=1

# Filter coffee transactions
df["IsCoffee"] = df["Açıklama"].apply(
    lambda x: any(keyword.lower() in str(x).lower() for keyword in coffee_keywords)
)

df_coffee = df[df["IsCoffee"] == True]
print(f"Total coffee transactions:{len(df_coffee)} ")

ImportError: Missing optional dependency 'openpyxl'.  Use pip or conda to install openpyxl.

## Exam Period Definition

We'll calculate the total number of exam days based on the university's academic calendar. This includes:
- First Final: January 8-20, 2023
- First Midterm: April 11-24, 2023
- Second Final: May 30 - June 11, 2023
And subsequent exam periods through January 2025.

In [3]:
def calculate_num_of_exam_days():
    exam_periods = [
        ("2023-01-08", "2023-01-20"),
        ("2023-04-11", "2023-04-24"),
        ("2023-05-30", "2023-06-11"),
        ("2023-11-06", "2023-11-18"),
        ("2024-01-05", "2024-01-19"),
        ("2024-03-22", "2024-04-03"),
        ("2024-05-29", "2024-06-09"),
        ("2024-11-01", "2024-11-16"),
        ("2024-11-29", "2024-12-14"),
        ("2025-01-01", "2025-01-12")
    ]
    total_exam_days = 0
    for start_date_str, end_date_str in exam_periods:
        start_date = pd.to_datetime(start_date_str)
        end_date = pd.to_datetime(end_date_str)
        num_days = (end_date - start_date).days + 1
        total_exam_days += num_days
    return total_exam_days

## Statistical Analysis

We'll perform a chi-squared test to evaluate whether there's a significant difference in coffee transaction frequency between exam and normal periods. This will help us test our hypothesis about exam period effects on coffee consumption.

In [4]:
exam_periods = [
    ("2023-01-08", "2023-01-20"),
    ("2023-04-11", "2023-04-24"),
    ("2023-05-30", "2023-06-11"),
    ("2023-11-06", "2023-11-18"),
    ("2024-01-05", "2024-01-19"),
    ("2024-03-22", "2024-04-03"),
    ("2024-05-29", "2024-06-09"),
    ("2024-11-01", "2024-11-16"),
    ("2024-11-29", "2024-12-14"),
    ("2025-01-01", "2025-01-12")
]

def in_exam_period(date, periods):
    date = pd.to_datetime(date)
    for start, end in periods:
        start_date = pd.to_datetime(start)
        end_date = pd.to_datetime(end)
        if start_date <= date <= end_date:
            return "Exam"
    return "Normal"

df["Period_Type"] = df["Tarih"].apply(lambda d: in_exam_period(d, exam_periods))

exam_count = df[df["Period_Type"] == "Exam"].shape[0]
normal_count = df[df["Period_Type"] == "Normal"].shape[0]

exam_total = calculate_num_of_exam_days()
normal_total = 732

A = exam_count
B = A - exam_total
C = normal_count
D = C - normal_total

contingency_table = np.array([[A, B], [C, D]])
chi2, p, dof, expected = chi2_contingency(contingency_table)

print("Chi-Square:", chi2)
print("p-value:", p)

NameError: name 'df' is not defined

## Visualization Analysis

We'll create several visualizations to analyze the data:
1. Bar plot comparing exam vs normal period transactions
2. Transaction distribution by coffee vendor
3. Monthly transaction trends
4. Yearly spending analysis

In [None]:
# Transaction Frequency Comparison
plt.figure(figsize=(8, 6))
coffee_transactions = df[df['IsCoffee'] == True]
df_exam = coffee_transactions[coffee_transactions["Period_Type"] == "Exam"]
df_normal = coffee_transactions[coffee_transactions["Period_Type"] == "Normal"]

print(f"Coffee transactions during exam period: {len(df_exam)}")
print(f"Coffee transactions during normal period: {len(df_normal)}")

coffee_counts = coffee_transactions.groupby("Period_Type")["IsCoffee"].count()
coffee_counts.plot(kind="bar", color=["orange", "blue"])
plt.title("Coffee Transaction Count: Exam vs. Normal Periods")
plt.xlabel("Period Type")
plt.ylabel("Number of Transactions")
plt.show()

In [None]:
# Brand Distribution Analysis
plt.figure(figsize=(10, 6))
brands = list(brand_count.keys())
counts = list(brand_count.values())
plt.bar(brands, counts, color=['green', 'red', 'blue', 'purple'])
plt.title('Brand Purchases')
plt.xlabel('Brands')
plt.ylabel('Purchase Count')
plt.show()

## Findings and Conclusion

Based on our analysis:
1. Statistical results show no significant difference in coffee transaction frequency during exam periods (p > 0.05)
2. Total transactions:
   - Exam periods: 31
   - Normal periods: 146
3. Brand preferences show varying patterns of usage

### Limitations
- Analysis based on personal data only
- May not be generalizable to broader population
- Limited to specific coffee vendors