#Q1:What is hypothesis testing in statistics?

    #Answer:

  Hypothesis testing is a statistical method used to determine if there is enough evidence in a sample data to infer that a certain condition or statement about a population is true. It involves formulating a null hypothesis (a statement of no effect or no difference) and an alternative hypothesis (a statement that contradicts the null hypothesis). Then, statistical tests are performed on the sample data to assess the likelihood of observing the data if the null hypothesis were true. Based on the results of these tests, you either reject the null hypothesis in favor of the alternative hypothesis or fail to reject the null hypothesis.


      
             


#Q2: What is the null hypothesis, and how does it differ from the alternative
hypothesis?

#Answer:

 The null hypothesis ($H_0$$H_0$) is a statement of no effect, no difference, or no relationship. It represents the status quo or a commonly accepted belief. The alternative hypothesis ($H_1$$H_1$ or $H_a$$H_a$) is a statement that contradicts the null hypothesis. It proposes that there is a significant effect, difference, or relationship. The goal of hypothesis testing is to determine if there is enough evidence to reject the null hypothesis in favor of the alternative hypothesis.



#Q3: Explain the significance level in hypothesis testing and its role in deciding the outcome of a test?

 #Answer:

  The significance level, denoted by alpha ($\alpha$$\alpha$), is the probability of rejecting the null hypothesis when it is actually true. It represents the maximum risk of making a Type I error (false positive) that you are willing to accept. Commonly used significance levels are 0.05 (5%), 0.01 (1%), and 0.10 (10%). The significance level plays a crucial role in deciding the outcome of a test. If the p-value (the probability of observing the sample data, or more extreme data, if the null hypothesis were true) is less than or equal to the significance level, you reject the null hypothesis. Otherwise, you fail to reject the null hypothesis.

  



#Q4:What are Type I and Type II errors? Give examples of each.

 #Answer:

  Type I and Type II errors are potential mistakes made during hypothesis testing:

Type I Error (False Positive): This occurs when you reject the null hypothesis when it is actually true. The probability of making a Type I error is equal to the significance level ($\alpha$$\alpha$).

Example: A medical test indicates that a person has a disease when they actually do not.
Type II Error (False Negative): This occurs when you fail to reject the null hypothesis when it is actually false. The probability of making a Type II error is denoted by beta ($\beta$$\beta$).

Example: A medical test indicates that a person does not have a disease when they actually do.

  

#Q5: What is the difference between a Z-test and a T-test? Explain when to use
each.

 #Answer:

  A Z-test and a T-test are both statistical tests used to compare means. The key difference lies in whether the population standard deviation is known or unknown.

Z-test: Used when the population standard deviation is known, or when the sample size is large (typically n > 30), in which case the sample standard deviation can be used as an estimate of the population standard deviation.
T-test: Used when the population standard deviation is unknown and the sample size is small (typically n <= 30). The t-distribution, which accounts for the increased uncertainty due to the small sample size, is used instead of the normal distribution.

  

#Q6: Write a Python program to generate a binomial distribution with n=10 and
p=0.5, then plot its histogram.
(Include your Python code and output in the code box below.)
Hint: Generate random number using random function.

 #Answer:

   

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Parameters for the binomial distribution
n = 10  # number of trials
p = 0.5 # probability of success

# Generate random numbers from a binomial distribution
binomial_data = np.random.binomial(n, p, 1000) # Generate 1000 random numbers

# Plot the histogram
plt.hist(binomial_data, bins=np.arange(n+2)-0.5, density=True, edgecolor='black')
plt.xlabel('Number of Successes')
plt.ylabel('Probability')
plt.title('Binomial Distribution (n=10, p=0.5)')
plt.show()

#Q7: Implement hypothesis testing using Z-statistics for a sample dataset in
Python. Show the Python code and interpret the results.
sample_data = [49.1, 50.2, 51.0, 48.7, 50.5, 49.8, 50.3, 50.7, 50.2, 49.6,
50.1, 49.9, 50.8, 50.4, 48.9, 50.6, 50.0, 49.7, 50.2, 49.5,
50.1, 50.3, 50.4, 50.5, 50.0, 50.7, 49.3, 49.8, 50.2, 50.9,
50.3, 50.4, 50.0, 49.7, 50.5, 49.9]
(Include your Python code and output in the code box below.)

 #Answer:

   


In [None]:
import numpy as np
from statsmodels.stats.weightstats import ztest

# Sample data
sample_data = [49.1, 50.2, 51.0, 48.7, 50.5, 49.8, 50.3, 50.7, 50.2, 49.6,
               50.1, 49.9, 50.8, 50.4, 48.9, 50.6, 50.0, 49.7, 50.2, 49.5,
               50.1, 50.3, 50.4, 50.5, 50.0, 50.7, 49.3, 49.8, 50.2, 50.9,
               50.3, 50.4, 50.0, 49.7, 50.5, 49.9]

# Assume a population mean (e.g., hypothesized mean) and population standard deviation (if known, otherwise use sample std for large sample)
# Let's assume a hypothesized population mean of 50 and use the sample standard deviation as an estimate since the sample size is > 30.
hypothesized_mean = 50
sample_std = np.std(sample_data, ddof=1) # Use ddof=1 for sample standard deviation

# Perform the Z-test
z_statistic, p_value = ztest(sample_data, value=hypothesized_mean, ddof=1)

print(f"Sample Mean: {np.mean(sample_data):.4f}")
print(f"Hypothesized Population Mean: {hypothesized_mean}")
print(f"Z-statistic: {z_statistic:.4f}")
print(f"P-value: {p_value:.4f}")

# Interpret the results (using a significance level of 0.05)
alpha = 0.05
if p_value < alpha:
    print(f"Since the p-value ({p_value:.4f}) is less than the significance level ({alpha}), we reject the null hypothesis.")
    print("There is enough evidence to suggest that the sample mean is significantly different from the hypothesized population mean.")
else:
    print(f"Since the p-value ({p_value:.4f}) is greater than the significance level ({alpha}), we fail to reject the null hypothesis.")
    print("There is not enough evidence to suggest that the sample mean is significantly different from the hypothesized population mean.")

#Q8: Write a Python script to simulate data from a normal distribution and
calculate the 95% confidence interval for its mean. Plot the data using Matplotlib.
(Include your Python code and output in the code box below.)

 #Answer:
  
  


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# Parameters for the normal distribution
mean = 50
std_dev = 5
sample_size = 100

# Simulate data from a normal distribution
np.random.seed(42) # for reproducibility
data = np.random.normal(mean, std_dev, sample_size)

# Calculate the 95% confidence interval for the mean
confidence_level = 0.95
degrees_freedom = sample_size - 1
sample_mean = np.mean(data)
sample_standard_error = stats.sem(data) # Standard error of the mean

confidence_interval = stats.t.interval(confidence_level, degrees_freedom,
                                        loc=sample_mean, scale=sample_standard_error)

print(f"Sample Mean: {sample_mean:.4f}")
print(f"95% Confidence Interval for the Mean: ({confidence_interval[0]:.4f}, {confidence_interval[1]:.4f})")

# Plot the data as a histogram
plt.hist(data, bins=15, density=True, alpha=0.6, color='g')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Simulated Data from a Normal Distribution')
plt.show()

#Q9: Write a Python function to calculate the Z-scores from a dataset and
visualize the standardized data using a histogram. Explain what the Z-scores represent
in terms of standard deviations from the mean.
(Include your Python code and output in the code box below.)

 #Answer:

  


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import zscore

def calculate_z_scores(data):
  """
  Calculates the Z-scores for a given dataset.

  Args:
    data: A list or NumPy array of numerical data.

  Returns:
    A NumPy array of Z-scores.
  """
  return zscore(data)

# Example usage with the sample data from Q7
sample_data = [49.1, 50.2, 51.0, 48.7, 50.5, 49.8, 50.3, 50.7, 50.2, 49.6,
               50.1, 49.9, 50.8, 50.4, 48.9, 50.6, 50.0, 49.7, 50.2, 49.5,
               50.1, 50.3, 50.4, 50.5, 50.0, 50.7, 49.3, 49.8, 50.2, 50.9,
               50.3, 50.4, 50.0, 49.7, 50.5, 49.9]

z_scores = calculate_z_scores(sample_data)

print("Original Data Mean:", np.mean(sample_data))
print("Original Data Standard Deviation:", np.std(sample_data))
print("\nZ-scores (first 10):", z_scores[:10])
print("Z-scores Mean:", np.mean(z_scores))
print("Z-scores Standard Deviation:", np.std(z_scores))


# Visualize the standardized data (Z-scores) using a histogram
plt.hist(z_scores, bins=15, density=True, alpha=0.6, color='skyblue')
plt.xlabel('Z-score')
plt.ylabel('Frequency')
plt.title('Histogram of Z-scores')
plt.show()