<a href="https://colab.research.google.com/github/shRiyas02/Cancer-Detection-ML-/blob/main/Shriya_S_Hypothesis_testing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task : Formulate both null and alternative hypothesis for each and solve using the techniques

# Example-1

A pharmaceutical company conducts a clinical trial for a new drug and finds that out of 500 patients, 180 show signs of improvement in their condition. Test
whether there is a significant difference in the proportion of patients showing improvement compared to the expected response rate of 30%.

Solution:<br>
>Null hypothesis, H0 : The proportion of patients showing improvement is 30% i.e. H0 : p=0.3 <br>
>Alternate hypothesis, H1: The proportion of patients showing improvement is different than 30% i.e. H1: p != 0.3

> The level of significance ($\alpha$) is the threshold for rejecting the null hypothesis <br>
> Let us assume $\alpha$ = 0.05

> perform a z-test for clinical trial proportion to determine if there is enough evidence to reject the null hypothesis <br>
    1. calculate sample proportion<br>
    2. calculate standard error<br>
    3. perform z-test<br>


In [None]:
from scipy import stats
import scipy
import numpy as np
from statsmodels.stats.proportion import proportions_ztest

total_patients = 500 # total number of patients
improved_patients = 180 # number of patients showing improvement
expected_rate = 0.30 # Expected response rate


# Calculate sample proportion
sample_proportion = improved_patients / total_patients

# Perform z-test for proportions
z_stat, p_value = proportions_ztest(improved_patients, total_patients, expected_rate)

print("Results of proportions z-test:")
print(f"t-statistic: {z_stat}")
print(f"p-value: {p_value}")

# Compare p-value to significance level
alpha = 0.05

if p_value < alpha:
    print(f"Reject null hypothesis.")
    print("There is significant evidence to suggest the proportion of patients showing improvement is different from the expected rate.")
else:
    print(f"Fail to reject null hypothesis.")
    print("There is not enough evidence to suggest a significant difference in the proportion of patients showing improvement.")


Results of proportions z-test:
t-statistic: 2.7950849718747373
p-value: 0.005188607552315565
Reject null hypothesis.
There is significant evidence to suggest the proportion of patients showing improvement is different from the expected rate.


# Example-2

A pharmaceutical company is developing a new drug claimed to reduce blood pressure by 10 mmHg. The company selects a sample of patients and administers the
drug. After treatment, the mean reduction in blood pressure is found to be 9.2 mmHg with a standard deviation of 2.5 mmHg. Conduct a hypothesis test to
determine if there is significant evidence to support the claim that the drug reduces blood pressure by 10 mmHg.

    Given Data: 9.2, 9.5, 8.8, 10.1, 9.6, 8.9, 9.3, 9.7, 9.1, 9.4

>H0: drug reduces bp by 10 mmHg, H0: p = 10 <br>
>H1: drug does not reduce bp by 10 mmHg, H1: p !=10

In [None]:
data = [9.2, 9.5, 8.8, 10.1, 9.6, 8.9, 9.3, 9.7, 9.1, 9.4]
expected_mean = 10
sample_mean = 9.2
sample_std_dev = 2.5
sample_size = len(data)

t_statistic, p_value = stats.ttest_1samp(data, expected_mean)

print("Results of One-Sample t-test:")
print(f"t-statistic: {t_statistic}")
print(f"p-value: {p_value}")

# Compare p-value to significance level (e.g., 0.05)

alpha = 0.05


if p_value < alpha:
    print("Reject null hypothesis.")
    print("There is significant evidence to suggest the drug's reduction in blood pressure is different from 10 mmHg.")
else:
    print(f"Fail to reject null hypothesis.")
    print("There is not enough evidence to suggest a significant difference in the reduction of blood pressure.")

Results of One-Sample t-test:
t-statistic: -5.1986914663092785
p-value: 0.0005650435402758814
Reject null hypothesis.
There is significant evidence to suggest the drug's reduction in blood pressure is different from 10 mmHg.


# Example - 3

A biotechnology company is investigating the variability of gene expression levels under two different environmental conditions. They measure the expression levels of a particular gene in 20 samples for each condition and compute the sample variances. Test whether there is a significant difference in the variance of gene expression levels between the two conditions.

Given data<br>
Condition 1: 4.5, 5.1, 3.9, 4.8, 4.4, 5.2, 4.7, 4.6, 4.9, 5.0 # Expression levels condition 1<br>
Condition 2: 3.8, 3.7, 4.0, 3.9, 4.1, 3.6, 4.0, 3.8, 4.2, 3.9 # Expression levels condition 2

>H0: no difference in variance of given conditions<br>
>H1: difference in variance of given conditions

In [None]:
condition_1 = [4.5, 5.1, 3.9, 4.8, 4.4, 5.2, 4.7, 4.6, 4.9, 5.0]
condition_2 = [3.8, 3.7, 4.0, 3.9, 4.1, 3.6, 4.0, 3.8, 4.2, 3.9]

# Performing ANOVA test
F_statistic, p_value = stats.f_oneway(condition_1, condition_2)

print(f"ANOVA F-Statistic: {round(F_statistic,2)}")
print(f"p-value: {p_value}")

# level of significance
alpha = 0.05

if p_value < alpha:
    print("Reject null hypothesis: Variances are not equal among groups")
else:
    print("Fail to reject null hypothesis: Variances are equal among groups")

ANOVA F-Statistic: 36.25
p-value: 1.0802859220147445e-05
Reject null hypothesis: Variances are not equal among groups


# Example-4

A life sciences company is comparing two different protocols for cell culture growth rates. One protocol involves a newly formulated growth medium, while the
other uses a standard medium. The company takes samples from two groups of cultures and measures the growth rates. Test whether there is a significant
difference in the mean growth rates between these two protocols.
Given data

Group 1: 12, 13, 11, 15, 14, 10, 12, 14, 13, 12, 16, 13, 14, 15, 12 # Growth rates group 1<br>
Group 2: 10, 11, 10, 12, 11, 9, 12, 11, 10, 10, 13, 11, 12, 13, 11 # Growth rates group 2

>Null Hypothesis (H₀): No difference between mean growth rate of protocol-1 and protocol-2<br>
>Alternative Hypothesis (H₁): There is significant difference between mean growth rate of protocol-1 and protocol-2

In [None]:
group_1 = [12, 13, 11, 15, 14, 10, 12, 14, 13, 12, 16, 13, 14, 15, 12] # Growth rates group 1<br>
group_2 = [10, 11, 10, 12, 11, 9, 12, 11, 10, 10, 13, 11, 12, 13, 11] # Growth rates group 2

t_statistic, p_value = stats.ttest_ind(group_1, group_2)

# Display the results
print("Results of Two-Sample t-Test:")
print(f"t-statistic: {t_statistic}")
print(f"p-value: {p_value}")

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis.")
    print("There is a significant difference between the test scores of protocol-1(new media) and protocol-2(standard media).")
else:
    print("Fail to reject the null hypothesis.")
    print("There is not enough evidence to conclude a significant difference in  protocol-1(new media) and protocol-2(standard media).")

Results of Two-Sample t-Test:
t-statistic: 3.877602290420335
p-value: 0.0005828738127856237
Reject the null hypothesis.
There is a significant difference between the test scores of protocol-1(new media) and protocol-2(standard media).
