## CHI-SQUARE TEST

In [None]:
A Chi-Square test is a statistical method used to assess the differences between categorical variables. 
It is particularly useful in evaluating whether there is a significant association between two or more categorical variables.
There are two main types of Chi-Square tests: 
     *The Chi-Square Goodness of Fit test and the Chi-Square Test of Independence.

Types of Chi-Square Tests:
    1.Chi-Square Goodness of Fit Test
    2.Chi-Square Test of Independence
    

In [2]:
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency, chi2

In [3]:
# 1. State the Hypotheses:
# H0: There is no significant association between the type of device purchased and the customer's satisfaction level.
# H1: There is a significant association between the type of device purchased and the customer's satisfaction level.

# Data Provided
data = {
    "Satisfaction": ["Very Satisfied", "Satisfied", "Neutral", "Unsatisfied", "Very Unsatisfied"],
    "Smart Thermostat": [50, 80, 60, 30, 20],
    "Smart Light": [70, 100, 90, 50, 50]
}

# Creating the contingency table
df = pd.DataFrame(data)
df.set_index("Satisfaction", inplace=True)
print("Contingency Table:")
print(df)

Contingency Table:
                  Smart Thermostat  Smart Light
Satisfaction                                   
Very Satisfied                  50           70
Satisfied                       80          100
Neutral                         60           90
Unsatisfied                     30           50
Very Unsatisfied                20           50


In [4]:
# 2. Compute the Chi-Square Statistic:
chi2_stat, p, dof, expected = chi2_contingency(df)

print("\nChi-Square Test Results:")
print(f"Chi-Square Statistic: {chi2_stat}")
print(f"p-value: {p}")
print(f"Degrees of Freedom: {dof}")
print("Expected Frequencies:")
print(pd.DataFrame(expected, index=df.index, columns=df.columns))


Chi-Square Test Results:
Chi-Square Statistic: 5.638227513227513
p-value: 0.22784371130697179
Degrees of Freedom: 4
Expected Frequencies:
                  Smart Thermostat  Smart Light
Satisfaction                                   
Very Satisfied                48.0         72.0
Satisfied                     72.0        108.0
Neutral                       60.0         90.0
Unsatisfied                   32.0         48.0
Very Unsatisfied              28.0         42.0


In [5]:
# 3. Determine the Critical Value:
alpha = 0.05
critical_value = chi2.ppf(1 - alpha, dof)
print(f"\nCritical Value (alpha = {alpha}): {critical_value}")


Critical Value (alpha = 0.05): 9.487729036781154


In [6]:
# 4. Make a Decision:
if chi2_stat > critical_value:
    decision = "Reject the null hypothesis. There is a significant association between the type of device purchased and customer satisfaction."
else:
    decision = "Fail to reject the null hypothesis. There is no significant association between the type of device purchased and customer satisfaction."

print("\nDecision:")
print(decision)



Decision:
Fail to reject the null hypothesis. There is no significant association between the type of device purchased and customer satisfaction.


## HYPOTHESIS TESTING

In [None]:
Hypothesis testing is a statistical method used to make inferences or draw conclusions about a population based on sample data.
It involves making an initial assumption (the null hypothesis), 
then determining whether the sample data provide enough evidence to reject that assumption in favor of an alternative hypothesis.

In [7]:
import numpy as np
from scipy.stats import norm

In [8]:
# Data Provided
sample_mean = 3050
X_mean = 600  # Mean number of units produced
X_std = 25    # Standard deviation of units produced
n = 25        # Sample size

# Calculate the theoretical mean weekly cost
mu = 1000 + 5 * X_mean  # Theoretical mean weekly cost
sigma = 5 * X_std       # Standard deviation of weekly cost

# 2. Calculate the Test Statistic:
# Test Statistic (t) = (ˉx - μ) / (σ / sqrt(n))
t_statistic = (sample_mean - mu) / (sigma / np.sqrt(n))
print(f"Test Statistic: {t_statistic:.4f}")

# 3. Determine the Critical Value:
# Using a one-tailed test with alpha = 0.05
alpha = 0.05
critical_value = norm.ppf(1 - alpha)
print(f"Critical Value (alpha = {alpha}): {critical_value:.4f}")

# 4. Make a Decision:
if t_statistic > critical_value:
    decision = "Reject the null hypothesis. There is significant evidence to support the claim that the weekly operating costs are higher than the model suggests."
else:
    decision = "Fail to reject the null hypothesis. There is no significant evidence to support the claim that the weekly operating costs are higher than the model suggests."

print("\nDecision:")
print(decision)

# The detailed report as a summary
def hypothesis_test_summary():
    summary = f"""
Hypothesis Testing on Weekly Operating Costs

Hypotheses:
- Null Hypothesis (H0): The actual mean weekly operating cost is equal to the theoretical mean weekly cost (μ = {mu}).
- Alternative Hypothesis (H1): The actual mean weekly operating cost is greater than the theoretical mean weekly cost (μ > {mu}).

Provided Data:
- Sample Mean Weekly Cost (ˉx): {sample_mean}
- Theoretical Mean Weekly Cost (μ): {mu}
- Standard Deviation (σ): {sigma}
- Sample Size (n): {n}

Calculations:
- Test Statistic (t): {t_statistic:.4f}
- Critical Value (alpha = {alpha}): {critical_value:.4f}

Conclusion:
{decision}
"""
    return summary

print("\nSummary Report:")
print(hypothesis_test_summary())


Test Statistic: -38.0000
Critical Value (alpha = 0.05): 1.6449

Decision:
Fail to reject the null hypothesis. There is no significant evidence to support the claim that the weekly operating costs are higher than the model suggests.

Summary Report:

Hypothesis Testing on Weekly Operating Costs

Hypotheses:
- Null Hypothesis (H0): The actual mean weekly operating cost is equal to the theoretical mean weekly cost (μ = 4000).
- Alternative Hypothesis (H1): The actual mean weekly operating cost is greater than the theoretical mean weekly cost (μ > 4000).

Provided Data:
- Sample Mean Weekly Cost (ˉx): 3050
- Theoretical Mean Weekly Cost (μ): 4000
- Standard Deviation (σ): 125
- Sample Size (n): 25

Calculations:
- Test Statistic (t): -38.0000
- Critical Value (alpha = 0.05): 1.6449

Conclusion:
Fail to reject the null hypothesis. There is no significant evidence to support the claim that the weekly operating costs are higher than the model suggests.

