# 📊 A retail company operates two stores: Store A in an urban area and Store B in a suburban area. The company wants to determine if there is a significant difference in average daily sales between the two locations to guide future investment and marketing strategies. The company collected a random sample of daily sales from each store in the last trimester.

---

| **Metadata** | **Details** |
| :--- | :--- |
| 🗓️ Generated | December 10, 2025 at 00:56 |
| 📁 Dataset | `Assignment data_06.xlsx` |
| 📏 Dimensions | 4 rows × 2 columns |
| 🛠️ Tool | Auto-Analysis App |

---


## 📑 Table of Contents

1. [Setup & Data Loading](#setup--data-loading)
2. [🔧 Data Imputation](#data-imputation)
3. [⚙️ Normality Test](#normality-test)
4. [⚙️ Hypothesis Testing](#hypothesis-testing)
5. [📊 Result Interpretation](#result-interpretation)

---


## 🚀 Setup & Data Loading

Import required libraries and load the preprocessed dataset.


In [None]:
# ============================================================
# Required Libraries
# ============================================================

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn

# Set visual styling
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:.2f}'.format)


In [None]:
# ============================================================
# Load Dataset
# ============================================================

df = pd.read_csv('data.csv')

print('=' * 50)
print('📊 DATASET OVERVIEW')
print('=' * 50)
print(f'Rows:    {len(df):,}')
print(f'Columns: {len(df.columns)}')
print(f'Memory:  {df.memory_usage(deep=True).sum() / 1024:.1f} KB')
print('=' * 50)

df.head(10)


## 🔧 Data Imputation

**Category:** `Preprocessing`

> Replace missing values with mean imputation method



In [None]:
# ============================================================
# Data Imputation
# ============================================================

import pandas as pd

try:
    # Calculate the mean of available values for each column
    mean_values = df.mean()

    # Replace missing values with the calculated mean for each column
    df_imputed = df.fillna(mean_values)

    # Update the dataframe
    df = df_imputed

except Exception as e:
    print(f"Error in data imputation: {e}")

---


## ⚙️ Normality Test

**Category:** `Algorithm`

> Apply Shapiro-Wilk normality test with alpha = 0.05



In [None]:
# ============================================================
# Normality Test
# ============================================================

try:
    from scipy import stats
    import pandas as pd

    # Apply the Shapiro-Wilk normality test to the imputed data for each store
    normality_test_results = []
    for column in df.columns:
        if df[column].dtype.kind in 'bifc':  # Check if column is numeric
            result = stats.shapiro(df[column].dropna())
            normality_test_results.append((column, result.statistic, result.pvalue))
        
    # Interpret the test results
    for column, statistic, pvalue in normality_test_results:
        if pvalue < 0.05:
            print(f"{column} does not follow a normal distribution")
        else:
            print(f"{column} follows a normal distribution")
            
except Exception as e:
    print(f"Error in normality test: {e}")

---


## ⚙️ Hypothesis Testing

**Category:** `Algorithm`

> Perform t-test with alpha = 0.05



In [None]:
# ============================================================
# Hypothesis Testing
# ============================================================

import pandas as pd
from scipy import stats

def perform_t_test(df):
    try:
        # Formulate the null and alternative hypotheses
        null_hypothesis = "The average daily sales are equal between the two stores"
        alternative_hypothesis = "The average daily sales are not equal between the two stores"

        # Apply the t-test to compare the average daily sales between the two stores
        t_test_result = stats.ttest_ind(df['StoreA'].dropna(), df['StoreB'].dropna())

        # Interpret the test results
        alpha = 0.05
        if t_test_result.pvalue < alpha:
            print(f"Reject the null hypothesis: {null_hypothesis}")
            print(f"Accept the alternative hypothesis: {alternative_hypothesis}")
        else:
            print(f"Fail to reject the null hypothesis: {null_hypothesis}")

    except Exception as e:
        print(f"Error in hypothesis testing: {e}")

# Example usage:
# df = pd.DataFrame({
#     'StoreA': [1, 2, 3, 4, 5],
#     'StoreB': [6, 7, 8, 9, 10]
# })
# perform_t_test(df)

---


## 📊 Result Interpretation

**Category:** `Output`

> Translate statistical findings into actionable insights



In [None]:
# ============================================================
# Result Interpretation
# ============================================================

def translate_statistical_findings():
    try:
        # Summarize the key findings from the statistical analysis
        print("Key Findings:")
        print("- Data imputation was performed using the mean method")
        print("- Normality tests were applied to each store's data")
        print("- Hypothesis testing was performed to compare average daily sales between the two stores")
        
        # Discuss the implications of the findings for future investment and marketing strategies
        print("Implications:")
        print("- The results of the hypothesis test can inform decisions on resource allocation and marketing strategies")
        
    except Exception as e:
        print(f"Error in result interpretation: {e}")

translate_statistical_findings()


---

## 📝 Notes

This notebook was automatically generated by **Auto-Analysis App**.

- All code cells can be modified and re-executed
- Visualizations are interactive in Jupyter environments
- Results may vary based on data preprocessing applied

---
*Generated on 2025-12-10 00:56:28*
