# Chi-Squared Test: Test for Independence

The Chi-Squared Test is a statistical method used to determine whether there is a significant association between two categorical variables. It's particularly useful in various business scenarios to understand the relationships and dependencies between different factors, facilitating informed decision-making and strategic planning.

#### Formula for Chi-Squared Test

The Chi-Squared Test statistic is calculated using the formula:

$$
\chi^2 = \sum{\frac{(O_i - E_i)^2}{E_i}}
$$

where:

- $\chi^2$ is the Chi-Squared statistic.
- $O_i$ is the observed frequency count for the $i^{th}$ category.
- $E_i$ is the expected frequency count for the $i^{th}$ category, calculated under the null hypothesis that there is no association between the variables.

The degrees of freedom (df) for the test are calculated as:

$$
df = (r - 1) \times (c - 1)
$$

where:

- $r$ is the number of rows in the contingency table.
- $c$ is the number of columns in the contingency table.

The p-value is then determined by comparing the calculated $\chi^2$ statistic to the Chi-Squared distribution with the corresponding degrees of freedom.


#### Business Scenario: Customer Purchase Behavior

In this scenario, we explore the purchase behavior of customers across different segments and channels to enhance marketing strategies, optimize product placement, and improve overall customer engagement.

##### Objective

The objective is to analyze the relationship between the choice of purchase channel (Online, In-Store, Mobile App) and customer segment (New Customer, Returning Customer, VIP Customer) to tailor marketing efforts, adjust product placement, and strategize customer engagement approaches more effectively.

#### Data Description

The dataset simulates over 2000 transactions, including several categorical variables:

- **Product Category**: Categories such as Electronics, Apparel, Home & Kitchen, etc.
- **Customer Segment**: Categories including New Customer, Returning Customer, VIP Customer.
- **Purchase Channel**: How the purchase was made, Online, In-Store, Mobile App.
- **Time of Purchase**: When the purchase was made, Morning, Afternoon, Evening.
- **Region**: Geographic region of the purchase, North, South, East, West.

This dataset covers transactions made within one fiscal year across various locations and platforms.

#### Business Problem

The aim is to determine whether the choice of purchase channel is independent of the customer segment. This analysis will help optimize resource allocation, marketing campaigns, and customer service across different channels and segments.

#### Assumption for Chi-Squared Test

1. Observations are independent of each other.
2. Each category contains an expected frequency count of at least 5.

#### Python Code to Simulate Data and Perform Chi-Squared Test


In [5]:
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency

# Simulate the dataset
np.random.seed(0)  # For reproducibility
data = {
    'Product Category': np.random.choice(['Electronics', 'Apparel', 'Home & Kitchen'], size=2000, p=[0.3, 0.4, 0.3]),
    'Customer Segment': np.random.choice(['New Customer', 'Returning Customer', 'VIP Customer'], size=2000, p=[0.5, 0.3, 0.2]),
    'Purchase Channel': np.random.choice(['Online', 'In-Store', 'Mobile App'], size=2000, p=[0.4, 0.4, 0.2]),
    'Time of Purchase': np.random.choice(['Morning', 'Afternoon', 'Evening'], size=2000, p=[0.3, 0.4, 0.3]),
    'Region': np.random.choice(['North', 'South', 'East', 'West'], size=2000, p=[0.25, 0.25, 0.25, 0.25])
}

df = pd.DataFrame(data)

# Prepare a contingency table
contingency_table = pd.crosstab(df['Purchase Channel'], df['Customer Segment'])

# Perform the Chi-Squared Test
chi2, p_value, dof, expected = chi2_contingency(contingency_table)

print(f'Chi-Squared Statistic: {chi2:.4f}, P-Value: {p_value:.4f}')


Chi-Squared Statistic: 5.3875, P-Value: 0.2498


#### Interpretation of Chi-Squared Test Results

The results of the Chi-Squared Test are as follows:

- **Chi-Squared Statistic:** 5.3875
- **P-Value:** 0.2498

Given these results, the interpretation is straightforward:

The p-value of 0.2498 is greater than the common alpha level of 0.05, which suggests that we do not have enough evidence to reject the null hypothesis. In the context of our business scenario, this implies that there is no significant association between the choice of purchase channel (Online, In-Store, Mobile App) and the customer segment (New Customer, Returning Customer, VIP Customer). 

In practical terms, this means that the marketing teams and strategists can infer that the preference for a particular purchase channel does not significantly differ among the different customer segments. This could suggest that factors other than the customer segment may have a more substantial influence on the choice of purchase channel.

However, it's important to remember that the lack of a statistically significant association does not imply that there is no relationship at all between these variables, just that any relationship that does exist could not be detected as statistically significant with this test and this dataset.

As always, these results should be considered in conjunction with other analyses and understood within the broader context of marketing strategies, customer behavior research, and business goals.

# Chi-Squared Test: Test for Goodness-of-Fit

# Chi-Squared Test: Test for Homogeneity