##Hypotheis Testing

In [7]:
# import library packages
import numpy as np
from scipy.stats import ttest_ind

In [2]:
# Generate random data for two samples
np.random.seed(42)  # For reproducibility
sample1 = np.random.normal(loc=5, scale=2, size=100)  # Sample 1 with mean 5 and standard deviation 2
sample2 = np.random.normal(loc=6, scale=2, size=100)  # Sample 2 with mean 6 and standard deviation 2


In [3]:
# Calculate sample statistics
mean1 = np.mean(sample1)
mean2 = np.mean(sample2)
std_dev1 = np.std(sample1, ddof=1)  # Use Bessel's correction by setting ddof=1 for sample standard deviation
std_dev2 = np.std(sample2, ddof=1)
n1 = len(sample1)
n2 = len(sample2)

In [8]:
# Perform two-sample t-test
# Null Hypothesis: The means of the two samples are equal
# Alternative Hypothesis: The means of the two samples are not equal
t_stat, p_value = ttest_ind(sample1, sample2)

In [9]:
# Print results
print("T-statistic:", t_stat)
print("P-value:", p_value)

T-statistic: -4.75469594350529
P-value: 3.8191352626793134e-06


In [10]:
# Interpretation of results
alpha = 0.05
if p_value < alpha:
    print("\nReject the null hypothesis. There is sufficient evidence to conclude that the means of the two samples are not equal.")
else:
    print("\nFail to reject the null hypothesis. There is not enough evidence to conclude that the means of the two samples are different.")


Reject the null hypothesis. There is sufficient evidence to conclude that the means of the two samples are not equal.


**Example 2:** Perform two-sample t-test
Null Hypothesis: The means of the two samples are equal <br>
Alternative Hypothesis: The means of the two samples are not equal <br>

In [12]:
import numpy as np
from scipy.stats import ttest_ind
from sklearn.datasets import load_iris

# Load the Iris dataset
iris_data = load_iris()

# Select two random classes of iris flowers
np.random.seed(42)  # For reproducibility
class1 = np.random.randint(0, 3)
class2 = np.random.randint(0, 3)
while class2 == class1:
    class2 = np.random.randint(0, 3)

# Get samples of a specific feature for the selected classes
feature_idx = 0  # Selecting the first feature (sepal length) for demonstration
samples_class1 = iris_data.data[iris_data.target == class1, feature_idx]
samples_class2 = iris_data.data[iris_data.target == class2, feature_idx]

# Perform two-sample t-test
# Null Hypothesis: The means of the two samples are equal
# Alternative Hypothesis: The means of the two samples are not equal
t_stat, p_value = ttest_ind(samples_class1, samples_class2)

# Print results
print("T-statistic:", t_stat)
print("P-value:", p_value)

# Interpretation of results
alpha = 0.05
if p_value < alpha:
    print("\nReject the null hypothesis. There is sufficient evidence to conclude that the means of the two samples are not equal.")
else:
    print("\nFail to reject the null hypothesis. There is not enough evidence to conclude that the means of the two samples are different.")


T-statistic: 15.386195820079404
P-value: 6.892546060674059e-28

Reject the null hypothesis. There is sufficient evidence to conclude that the means of the two samples are not equal.


#Task

In this task, you will perform hypothesis testing using the Boston housing dataset. You will investigate whether there is a significant difference in the median house prices between houses located near the Charles River (CHAS = 1) and those not located near the river (CHAS = 0).<br>
The following steps to perform:<br>
1. Load the Boston housing dataset from scikit-learn.  **boston_data = load_boston()**
2. Create a Pandas DataFrame from the dataset.
3. Extract samples of median house prices for houses near and not near the Charles River.
4. Perform a two-sample t-test to compare the median house prices between houses near and not near the Charles River.

