## Hypothesis Testing on Spending Data

This notebook analyzes spending data to perform hypothesis testing. The data is separated into weekday and weekend spending, and a manual t-test is conducted using `numpy` to check for significant differences.

The key steps include:
- Preparing the data by converting columns to the appropriate formats.
- Categorizing spending into weekdays and weekends.
- Performing a manual t-test using `numpy`.
- Generating a report with the results.

The final output includes both console results and a text report.



In [4]:
# Importing necessary libraries
import pandas as pd
import numpy as np


## Data Loading and Preparation

In this section, the dataset is loaded and columns are prepared:
- The date column is converted to datetime format.
- The balance and spending columns are converted to numerical values.
- Spending frequency is calculated for each day.


In [12]:
# Load and prepare the data
file_path = 'Emre_YONTUCU_ticket.csv'
data = pd.read_csv(file_path)

# Convert necessary columns to appropriate formats
data['IslemZamanı'] = pd.to_datetime(data['IslemZamanı'], errors='coerce')
data['KartBakiyesi'] = data['KartBakiyesi'].str.replace(',', '.').astype(float)
data['IslemTutarı'] = data['IslemTutarı'].str.replace(',', '.').astype(float)

# Calculate daily spending frequency
spending_frequency = data['IslemZamanı'].dt.date.value_counts().sort_index()

# Display daily spending frequency
print("Daily Spending Frequency:\n")
print(spending_frequency)


Daily Spending Frequency:

IslemZamanı
2023-10-31    1
2023-11-13    1
2023-11-14    1
2023-11-15    1
2023-11-20    1
             ..
2024-11-15    1
2024-11-18    1
2024-11-19    1
2024-11-25    1
2024-11-26    1
Name: count, Length: 89, dtype: int64


## Weekday and Weekend Categorization

The dataset is further categorized into weekday and weekend spending:
- Weekdays: Monday to Friday.
- Weekends: Saturday and Sunday.

This allows for separate analysis of spending patterns.


In [6]:
# Classify data into weekdays and weekends
data['day_of_week'] = data['IslemZamanı'].dt.day_name()
data['is_weekend'] = data['day_of_week'].isin(['Saturday', 'Sunday'])

# Separate weekday and weekend spending
weekday_spending = data[data['is_weekend'] == False]['IslemTutarı']
weekend_spending = data[data['is_weekend'] == True]['IslemTutarı']


## Manual T-Test with `numpy`

A manual implementation of the t-test is performed:
- The mean and variance of both groups (weekday and weekend spending) are calculated.
- The pooled variance is computed.
- The t-statistic and p-value are derived to test the null hypothesis:
  - Null Hypothesis (H₀): No significant difference exists between weekday and weekend spending.
  - Alternative Hypothesis (H₁): A significant difference exists.

The significance level is set at 0.05 (5%).


In [7]:
# Define a manual t-test function
def manual_ttest(data1, data2):
    """Perform a manual t-test for two independent samples."""
    mean1, mean2 = np.mean(data1), np.mean(data2)
    var1, var2 = np.var(data1, ddof=1), np.var(data2, ddof=1)
    n1, n2 = len(data1), len(data2)
    
    # Calculate t-statistic
    pooled_se = np.sqrt((var1 / n1) + (var2 / n2))
    t_stat = (mean1 - mean2) / pooled_se
    
    # Degrees of freedom
    df = ((var1 / n1 + var2 / n2) ** 2) / (((var1 / n1) ** 2) / (n1 - 1) + ((var2 / n2) ** 2) / (n2 - 1))
    
    # Calculate p-value (using Z-test approximation for normal distribution)
    from math import erf, sqrt
    p_value = 2 * (1 - 0.5 * (1 + erf(abs(t_stat) / sqrt(2))))
    
    return t_stat, p_value

# Perform the manual t-test
t_stat, p_value = manual_ttest(weekday_spending.dropna(), weekend_spending.dropna())

# Display test results
print("Hypothesis Testing Results:\n")
print(f"T-statistic: {t_stat}")
print(f"P-value: {p_value}")
if p_value < 0.05:
    print("Conclusion: There is a significant difference between weekday and weekend spending.")
else:
    print("Conclusion: There is no significant difference between weekday and weekend spending.")


Hypothesis Testing Results:

T-statistic: -0.28838438381078685
P-value: 0.7730525201247296
Conclusion: There is no significant difference between weekday and weekend spending.


## Report Generation

The results of the analysis, including daily spending frequency and hypothesis testing outcomes, are saved into a text file for further review.


In [8]:
# Generate a report
report_lines = [
    "Spending Frequency and Hypothesis Testing Report\n",
    "Date\t\tNumber of Transactions\n"
]
report_lines.extend([f"{date}\t{count}\n" for date, count in spending_frequency.items()])

# Add hypothesis test results to the report
report_lines.append("\nHypothesis Testing:\n")
report_lines.append(f"T-statistic: {t_stat}\n")
report_lines.append(f"P-value: {p_value}\n")
if p_value < 0.05:
    report_lines.append("Conclusion: There is a significant difference between weekday and weekend spending.\n")
else:
    report_lines.append("Conclusion: There is no significant difference between weekday and weekend spending.\n")

# Save the report to a text file
output_path = 'Spending_Frequency_Hypothesis_Testing_Report_Numpy_Only.txt'
with open(output_path, 'w') as report_file:
    report_file.writelines(report_lines)

print(f"Report saved to {output_path}")


Report saved to Spending_Frequency_Hypothesis_Testing_Report_Numpy_Only.txt


## Conclusion

This notebook provided an analysis of spending patterns using manual t-test implementation with `numpy`. 
The results showed whether there is a significant difference in spending behavior between weekdays and weekends.

The final report is saved as `Spending_Frequency_Hypothesis_Testing_Report_Numpy_Only.txt`.
