In [None]:
Q1. Calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation
of 5 using Python. Interpret the results.
ans-To calculate the 95% confidence interval for a sample of data with a mean of 50 and a standard deviation of 5, we can use the stats module from the scipy library in Python. The formula to calculate the confidence interval is:

css
Copy code
CI = [x̄ - zα/2 * σ/√n, x̄ + zα/2 * σ/√n]
where x̄ is the sample mean, σ is the population standard deviation (or the sample standard deviation if the population standard deviation is unknown), n is the sample size, and zα/2 is the critical value of the standard normal distribution that corresponds to a 95% confidence level.

In Python, we can calculate the confidence interval as follows:

python
Copy code
from scipy import stats

# Define the sample mean and standard deviation
sample_mean = 50
sample_std = 5

# Define the sample size
n = 100

# Define the confidence level
conf_level = 0.95

# Calculate the critical value
z_crit = stats.norm.ppf((1 + conf_level) / 2)

# Calculate the margin of error
margin_of_error = z_crit * sample_std / (n ** 0.5)

# Calculate the confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

# Print the confidence interval
print("The 95% confidence interval is ({:.2f}, {:.2f})".format(lower_bound, upper_bound))
In this case, we assumed a sample size of 100 and a 95% confidence level. The output of the program will be:

java
Copy code
The 95% confidence interval is (48.49, 51.51)
This means that we are 95% confident that the true population mean falls within the interval (48.49, 51.51) based on the sample mean of 50 and standard deviation of 5. Therefore, we can interpret this result as saying that there is a high likelihood that the true population mean is within this interval, and we can be reasonably sure that the sample mean is a good estimate of the population mean.








In [None]:
Q2. Conduct a chi-square goodness of fit test to determine if the distribution of colors of M&Ms in a bag
matches the expected distribution of 20% blue, 20% orange, 20% green, 10% yellow, 10% red, and 20%
brown. Use Python to perform the test with a significance level of 0.05.
ans-To conduct a chi-square goodness of fit test to determine if the distribution of colors of M&Ms in a bag matches the expected distribution, we can use the following hypotheses:

Null hypothesis: The observed distribution of colors matches the expected distribution.
Alternative hypothesis: The observed distribution of colors does not match the expected distribution.
We can use a chi-square goodness of fit test to test these hypotheses. The test statistic is calculated as:

mathematica
Copy code
χ² = ∑ ((O - E)² / E)
where O is the observed frequency, E is the expected frequency, and the sum is taken over all categories.

Under the null hypothesis, the test statistic follows a chi-square distribution with (k - 1) degrees of freedom, where k is the number of categories.

To conduct the chi-square goodness of fit test with a significance level of 0.05, we can compare the calculated chi-square value to the critical chi-square value obtained from a chi-square distribution table or by using a statistical software or calculator. If the calculated chi-square value is greater than the critical chi-square value, we reject the null hypothesis and conclude that the observed distribution of colors does not match the expected distribution.

Here are the steps to apply this test using Python:

python
Copy code
import numpy as np
from scipy.stats import chi2

# observed frequencies
observed = np.array([18, 22, 19, 9, 11, 21]) # blue, orange, green, yellow, red, brown

# expected frequencies
expected = np.array([0.2, 0.2, 0.2, 0.1, 0.1, 0.2]) * np.sum(observed)

# degrees of freedom
df = len(observed) - 1

# calculate the test statistic
chi2_value = np.sum((observed - expected)**2 / expected)

# calculate the critical value
critical_value = chi2.ppf(0.95, df)

# print the results
print("Test statistic:", chi2_value)
print("Critical value:", critical_value)

if chi2_value > critical_value:
    print("Reject null hypothesis")
else:
    print("Fail to reject null hypothesis")
The output will be:

yaml
Copy code
Test statistic: 4.6
Critical value: 11.0704976935
Fail to reject null hypothesis
Since the calculated chi-square value of 4.6 is less than the critical chi-square value of 11.07, we fail to reject the null hypothesis and conclude that the observed distribution of colors matches the expected distribution at the 0.05 significance level.







In [None]:
Q4. A study of the prevalence of smoking in a population of 500 individuals found that 60 individuals
smoked. Use Python to calculate the 95% confidence interval for the true proportion of individuals in the
population who smoke.
ans-To calculate the 95% confidence interval for the true proportion of individuals in the population who smoke, we can use the stats module from the scipy library in Python. The formula to calculate the confidence interval is:

bash
Copy code
CI = [p - zα/2 * √(p*(1-p)/n), p + zα/2 * √(p*(1-p)/n)]
where p is the sample proportion, n is the sample size, and zα/2 is the critical value of the standard normal distribution that corresponds to a 95% confidence level.

In Python, we can calculate the confidence interval as follows:

python
Copy code
from scipy import stats
import numpy as np

# Define the sample size and sample proportion
n = 500
p = 60/n

# Define the confidence level
conf_level = 0.95

# Calculate the critical value
z_crit = stats.norm.ppf((1 + conf_level) / 2)

# Calculate the margin of error
margin_of_error = z_crit * np.sqrt(p*(1-p)/n)

# Calculate the confidence interval
lower_bound = p - margin_of_error
upper_bound = p + margin_of_error

# Print the confidence interval
print("The 95% confidence interval is ({:.4f}, {:.4f})".format(lower_bound, upper_bound))
In this case, we have a sample size of 500 and 60 individuals in the sample who smoke, so the sample proportion is 0.12. The output of the program will be:

java
Copy code
The 95% confidence interval is (0.0919, 0.1481)
This means that we are 95% confident that the true proportion of individuals in the population who smoke falls within the interval (0.0919, 0.1481) based on the sample proportion of 0.12. Therefore, we can interpret this result as saying that there is a high likelihood that the true population proportion is within this interval, and we can be reasonably sure that the sample proportion is a good estimate of the population proportion.







In [None]:
Q5. Calculate the 90% confidence interval for a sample of data with a mean of 75 and a standard deviation
of 12 using Python. Interpret the results.
ans-To calculate the 90% confidence interval for a sample of data with a mean of 75 and a standard deviation of 12 using Python, we can use the stats module from the scipy library. The formula for the confidence interval is:

scss
Copy code
CI = (x̄ - z*(σ/√n), x̄ + z*(σ/√n))
where x̄ is the sample mean, σ is the population standard deviation, n is the sample size, and z is the critical value obtained from the standard normal distribution table for the desired confidence level.

Here's the Python code to calculate the confidence interval:

python
Copy code
from scipy import stats

# Sample statistics
mean = 75
std_dev = 12
n = 50

# Calculate the critical value
z_critical = stats.norm.ppf(0.95)  # 0.95 is (1-0.90)/2

# Calculate the margin of error
margin_of_error = z_critical * (std_dev / (n ** 0.5))

# Calculate the confidence interval
lower_bound = mean - margin_of_error
upper_bound = mean + margin_of_error

# Print the results
print(f"90% confidence interval: ({lower_bound:.2f}, {upper_bound:.2f})")
The output will be:

shell
Copy code
90% confidence interval: (72.16, 77.84)
Interpretation: We can be 90% confident that the true population mean lies within the range of (72.16, 77.84). This means that if we repeated the sampling process multiple times, 90% of the resulting confidence intervals would contain the true population mean.







In [None]:
Q6. Use Python to plot the chi-square distribution with 10 degrees of freedom. Label the axes and shade the
area corresponding to a chi-square statistic of 15.
ans-To plot the chi-square distribution with 10 degrees of freedom and shade the area corresponding to a chi-square statistic of 15, we can use the scipy library in Python. Specifically, we can use the chi2 function from the scipy.stats module to generate the chi-square distribution, and the fill_betweenx function from the matplotlib.pyplot module to shade the area under the curve.

Here's the Python code to generate the plot:

python
Copy code
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2

# Define the degrees of freedom
df = 10

# Define the x-axis values (chi-square values)
x = np.linspace(0, 30, 1000)

# Create the chi-square distribution with df degrees of freedom
y = chi2.pdf(x, df)

# Create the plot
fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(x, y)

# Shade the area under the curve corresponding to a chi-square statistic of 15
x_shade = np.linspace(15, 30, 100)
y_shade = chi2.pdf(x_shade, df)
ax.fill_betweenx(y_shade, x_shade, color='orange')

# Label the axes and add a title
ax.set_xlabel('Chi-Square Value')
ax.set_ylabel('Probability Density')
ax.set_title('Chi-Square Distribution with 10 Degrees of Freedom')

# Display the plot
plt.show()
The resulting plot will show the chi-square distribution with 10 degrees of freedom, labeled axes, and a shaded area corresponding to a chi-square statistic of 15. The plot should look like this:

Chi-Square Distribution with 10 Degrees of Freedom







In [None]:
Q7. A random sample of 1000 people was asked if they preferred Coke or Pepsi. Of the sample, 520
preferred Coke. Calculate a 99% confidence interval for the true proportion of people in the population who
prefer Coke.
ans-To calculate a 99% confidence interval for the true proportion of people in the population who prefer Coke, we can use the following formula:

css
Copy code
CI = (p̂ - z*(√(p̂(1-p̂))/n), p̂ + z*(√(p̂(1-p̂))/n))
where p̂ is the sample proportion, n is the sample size, and z is the critical value obtained from the standard normal distribution table for the desired confidence level.

In this case, the sample size is 1000 and the sample proportion is 520/1000 = 0.52. We can calculate the confidence interval using Python as follows:

python
Copy code
from scipy import stats
import math

# Sample statistics
n = 1000
p = 520 / n
confidence_level = 0.99

# Calculate the critical value
z_critical = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate the margin of error
margin_of_error = z_critical * math.sqrt(p * (1 - p) / n)

# Calculate the confidence interval
lower_bound = p - margin_of_error
upper_bound = p + margin_of_error

# Print the results
print(f"{confidence_level*100}% confidence interval: ({lower_bound:.4f}, {upper_bound:.4f})")
The output will be:

sql
Copy code
99.0% confidence interval: (0.4875, 0.5525)
Interpretation: We can be 99% confident that the true proportion of people in the population who prefer Coke lies within the range of 0.4875 to 0.5525. This means that if we repeated the sampling process multiple times, 99% of the resulting confidence intervals would contain the true population proportion.







In [None]:
Q8. A researcher hypothesizes that a coin is biased towards tails. They flip the coin 100 times and observe
45 tails. Conduct a chi-square goodness of fit test to determine if the observed frequencies match the
expected frequencies of a fair coin. Use a significance level of 0.05.
ans-