In [None]:
1.What is hypothesis testing in statistics?
-Hypothesis testing in statistics is a way to make informed decisions based on data. In Python, it's a breeze thanks to powerful libraries like scipy, statsmodels, and pingouin. Here's how it works:

🔍 What is Hypothesis Testing?
At its core, hypothesis testing is about testing a claim or idea about a population using sample data. You set up:

Null Hypothesis (H₀): A default assumption (e.g., "There is no difference between groups.")

Alternative Hypothesis (H₁): What you want to prove (e.g., "There is a difference.")

Then, using statistical methods, you check whether the evidence from your data is strong enough to reject the null hypothesis.
🧪 Python Example Using scipy
Let’s say you're comparing test scores from two classes:
from scipy import stats

class_a = [85, 87, 83, 90, 88]
class_b = [78, 82, 80, 79, 81]

# Perform a t-test
t_stat, p_value = stats.ttest_ind(class_a, class_b)

print("t-statistic:", t_stat)
print("p-value:", p_value)
If the p-value is lower than your threshold (commonly 0.05), you reject the null hypothesis, suggesting a significant difference between the two classes.

2.What is the null hypothesis, and how does it differ from the alternative hypothesis?
-🧩 Null Hypothesis (H₀)
The null hypothesis is a default assumption that there is no effect, no difference, or nothing unusual happening in your data. It’s what we aim to test against.

Example: If you're comparing the average height of two groups, the null might be: > “The average height of Group A equals the average height of Group B.”

💡 Alternative Hypothesis (H₁ or Ha)
The alternative hypothesis is what you suspect might be true instead. It proposes that there is an effect, a difference, or something noteworthy.
Continuing the example: > “The average height of Group A is different from the average height of Group B.”

In Python Practice
Here’s how you'd test these hypotheses with a t-test:
from scipy import stats

group_a = [160, 165, 170, 175, 180]
group_b = [155, 160, 165, 170, 175]

# Null: means are equal
# Alternative: means are different

t_stat, p_val = stats.ttest_ind(group_a, group_b)

print("t-statistic:", t_stat)
print("p-value:", p_val)
If the p-value is less than a chosen significance level (say, 0.05), you'd reject the null hypothesis in favor of the alternative—concluding there may indeed be a difference.

3.What is the significance level in hypothesis testing, and why is it important?
-The significance level, usually denoted as α (alpha), is a critical threshold in hypothesis testing. It represents the maximum probability of making a Type I error—that is, rejecting the null hypothesis when it is actually true.

🧠 Why is it important?
Think of it as your “risk tolerance”:

If you set α = 0.05, you're saying you're willing to accept a 5% chance of falsely rejecting the null hypothesis.

A smaller α (like 0.01) means you want stronger evidence before rejecting H₀, reducing false positives but possibly increasing false negatives (Type II error).

📌 In Python Testing Workflow
When performing a statistical test in Python using scipy.stats, you compare the p-value from the test against your chosen significance level:
from scipy import stats

# Sample data
group1 = [10, 12, 13, 11, 10]
group2 = [14, 15, 13, 16, 14]

# Perform a t-test
t_stat, p_val = stats.ttest_ind(group1, group2)

# Set significance level
alpha = 0.05

if p_val < alpha:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")

So α guides whether your test result is considered statistically significant. It’s like a gatekeeper deciding if your evidence is strong enough to warrant a change in belief.

4. What does a P-value represent in hypothesis testing?
-In hypothesis testing, the p-value represents the probability of obtaining a test statistic at least as extreme as the one observed, assuming that the null hypothesis is true. It helps determine the statistical significance of your results.

Key Points:
Low p-value (< significance level, e.g., 0.05): Suggests strong evidence against the null hypothesis, so you may reject it.
High p-value (≥ significance level): Suggests weak evidence against the null hypothesis, so you fail to reject it.
Significance Level (α): A threshold (commonly 0.05) chosen to decide whether the p-value is small enough to reject the null hypothesis.
Example in Python:

Here’s how you might calculate and interpret a p-value using Python:

Copy the code
from scipy.stats import ttest_1samp

# Example: One-sample t-test
data = [2.3, 2.5, 2.8, 3.0, 3.2]  # Sample data
population_mean = 2.5  # Null hypothesis: mean = 2.5

# Perform t-test
t_stat, p_value = ttest_1samp(data, population_mean)

# Output results
print(f"T-statistic: {t_stat}")
print(f"P-value: {p_value}")

# Interpretation
if p_value < 0.05:  # Assuming α = 0.05
    print("Reject the null hypothesis: Significant difference.")
else:
    print("Fail to reject the null hypothesis: No significant difference.")


This example uses a one-sample t-test to compare the sample mean to a population mean. The p_value helps decide whether the observed difference is statistically significant.

5. How do you interpret the P-value in hypothesis testing?
-Interpreting the p-value in hypothesis testing is crucial for determining whether to reject or fail to reject the null hypothesis. Here's a concise explanation:

What is a p-value?

The p-value represents the probability of observing the test statistic (or something more extreme) under the assumption that the null hypothesis is true. It helps you decide whether the observed data provides enough evidence to reject the null hypothesis.

How to interpret the p-value?
Small p-value (e.g., < 0.05): Strong evidence against the null hypothesis, so you reject it.
Large p-value (e.g., > 0.05): Weak evidence against the null hypothesis, so you fail to reject it.
Threshold (α): The significance level (commonly 0.05) is the cutoff for deciding whether the p-value is "small" or "large."
Example in Python

Here’s how you might interpret the p-value in Python using a t-test:

Copy the code
from scipy.stats import ttest_ind

# Example data
group1 = [12, 14, 15, 16, 18]
group2 = [22, 24, 25, 26, 28]

# Perform t-test
stat, p_value = ttest_ind(group1, group2)

# Interpretation
alpha = 0.05  # Significance level
if p_value < alpha:
    print(f"Reject the null hypothesis (p-value = {p_value:.4f})")
else:
    print(f"Fail to reject the null hypothesis (p-value = {p_value:.4f})")

Key Notes
The null hypothesis typically assumes no effect or no difference (e.g., means of two groups are equal).
The choice of significance level (α) depends on the context of the study.
Always consider the practical significance of results, not just statistical significance.

This approach ensures clarity and precision in hypothesis testing!

6. What are Type 1 and Type 2 errors in hypothesis testing?
-Type I and Type II errors are like two sides of a statistical coin flip—they happen when our decisions about hypotheses go wrong. Let’s unpack both:

❌ Type I Error (False Positive)
What it is: You reject the null hypothesis (H₀) when it's actually true.

Analogy: Convicting an innocent person.

In Python: This happens when the p-value is below your significance level (α),, and you decide there's a real effect—but it's just a fluke in the data.
alpha = 0.05
if p_val < alpha:
    # Risk of Type I error if H₀ is actually true
    print("Rejecting H₀ — possible false positive")
❗️ Type II Error (False Negative)
What it is: You fail to reject the null hypothesis when it's actually false.

Analogy: Letting a guilty person go free.

In Python: If your p-value is above α, you don't detect a difference—even though one actually exists.
if p_val >= alpha:
    # Risk of Type II error if H₀ is actually false
    print("Failing to reject H₀ — possible false negative")
💡 Balancing Act
Lowering α reduces Type I errors, but increases the chance of Type II errors.

Increasing sample size helps reduce both types of errors.

These errors are at the heart of understanding risk in hypothesis testing. Want to simulate them in code or see how sample size affects error rates? I can show you with a little Python magic.


7. What is the difference between a one-tailed and a two-tailed test in hypothesis testing?
-One-Tailed vs. Two-Tailed Tests
1
2
In statistical significance testing, one-tailed and two-tailed tests are methods used to determine the relationship between variables based on the direction of the hypothesis.

One-Tailed Test

A one-tailed test is used when the hypothesis specifies a direction, either left or right. It tests whether a particular population parameter is larger or smaller than a predefined value. The critical region, where the null hypothesis is rejected, lies entirely on one side of the sampling distribution. For example, if we want to test whether a machine produces more than 1% defective products, we would use a one-tailed test.

Example:

Null Hypothesis (H0): There is no significant effect of students participating in a coding competition on their fear level.

Alternative Hypothesis (H1): Participation in a coding competition decreases the fear level of students.

Two-Tailed Test

A two-tailed test is used when the hypothesis does not specify a direction. It tests whether the sample is greater or less than a range of values. The critical region is divided into two tails of the distribution. This method is used for null hypothesis testing, and if the estimated value exists in either tail, the null hypothesis is rejected.

Example:

Null Hypothesis (H0): There is no significant effect of a new bill passed on the loans of farmers.

Alternative Hypothesis (H1): The new bill affects the loans of farmers, either increasing or decreasing them.

Key Differences

Direction: One-tailed tests specify a direction (greater or smaller), while two-tailed tests do not.

Critical Region: One-tailed tests have the critical region on one side, whereas two-tailed tests have it on both sides.

Significance Level: In one-tailed tests, the entire significance level (α) is in one tail. In two-tailed tests, it is split between both tails
1
2
.

Applications

One-Tailed Test: Used for asymmetric distributions with a single tail, such as the chi-squared distribution.

Two-Tailed Test: Used for symmetric distributions with two tails, such as the normal distribution
2
.

Understanding the appropriate use of one-tailed and two-tailed tests is crucial for accurate statistical analysis and hypothesis testing.

8. What is the Z-test, and when is it used in hypothesis testing?
-The Z-test is a statistical method used to determine whether there's a significant difference between sample and population means—or between the means of two samples—when the population variance is known and/or the sample size is large.

🧪 When to Use a Z-test
You'd reach for a Z-test when:

Sample size is large (typically n ≥ 30).

Population standard deviation is known (which is rare in real-world data but ideal for theoretical testing).

Your data is approximately normally distributed.
Typical scenarios:

Testing if a sample mean differs from a known population mean.

Comparing the means of two independent groups.

⚙️ Python Example Using statsmodels
import numpy as np
import statsmodels.stats.weightstats as sm

# Sample data
data = [101, 103, 98, 100, 102, 104, 99]
pop_mean = 100  # Hypothetical population mean

# Perform one-sample z-test
z_stat, p_val = sm.ztest(data, value=pop_mean)

print("Z-statistic:", z_stat)
print("p-value:", p_val)
If the p-value is below your chosen significance level (say, 0.05), you reject the null hypothesis—suggesting your sample mean is significantly different from the population mean.

9. How do you calculate the Z-score, and what does it represent in hypothesis testing?
-The Z-score (or standard score) tells you how far a data point is from the population mean, measured in terms of standard deviations. In hypothesis testing, it helps determine whether your sample result is statistically unusual under the null hypothesis.

🧮 How to Calculate the Z-Score
For a single data point:

𝑍
=
𝑋
−
𝜇
𝜎
𝑋
: sample value

𝜇
: population mean

𝜎
: population standard deviation

For a sample mean:

𝑍
=
𝑋
ˉ
−
𝜇
𝜎
/
𝑛
𝑋
ˉ
: sample mean

𝑛
: sample size

🐍 Python Example: Z-score for Sample Mean
import numpy as np

# Sample data
sample = [102, 98, 101, 100, 99]
population_mean = 100
population_std = 2
n = len(sample)

# Calculate Z-score
sample_mean = np.mean(sample)
z_score = (sample_mean - population_mean) / (population_std / np.sqrt(n))

print("Z-score:", z_score)
🎯 What It Represents in Hypothesis Testing
A Z-score tells you how extreme your test statistic is under the null hypothesis.

You use it to compute a p-value, which helps decide whether to reject
𝐻
0
.

For example:

If |Z| > 1.96 in a two-tailed test (with α = 0.05), you'd likely reject the null hypothesis.

Want to explore how Z-scores link to confidence intervals or simulate this for your own dataset? I'm ready to crunch the numbers with you.

10. What is the T-distribution, and when should it be used instead of the normal distribution?
-The t-distribution (or Student’s t-distribution) is a probability distribution that's essential when you’re working with small sample sizes or when the population standard deviation is unknown—which, let’s be honest, is most of the time in real-world data.

📉 What is the T-Distribution?
It looks a lot like the normal distribution—symmetric and bell-shaped—but with fatter tails. Those fatter tails give it wiggle room to account for extra variability in smaller samples. As your sample size increases, the t-distribution converges toward the normal distribution.
🧪 When Should You Use It?
You use the t-distribution when:

Sample size is small (typically n < 30)

The population standard deviation (σ) is unknown

You assume your data is approximately normally distributed

Common situations:

Estimating the mean of a population from a small sample

Comparing two sample means (like in a t-test)
🐍 Python Example Using scipy.stats.ttest_1samp
from scipy import stats

# Sample data
sample = [22, 25, 20, 24, 23]
population_mean = 20

# One-sample t-test
t_stat, p_val = stats.ttest_1samp(sample, population_mean)

print("T-statistic:", t_stat)
print("p-value:", p_val)

This test checks if the sample mean significantly differs from the population mean using the t-distribution under the hood.

📌 Rule of Thumb
Use the t-distribution when sample size is small and population variance is unknown. Use the normal distribution when sample size is large and you know the population standard deviation.

11. What is the difference between a Z-test and a T-test?
-The z-test and t-test are both statistical tests used to compare means, but they differ in their assumptions and use cases. Here's a concise explanation of the differences, particularly in the context of Python:

1. Key Differences

Z-Test:

Used when the population standard deviation is known.
Assumes the sample size is large (typically ( n > 30 )) or the data is normally distributed.
More appropriate for large datasets.

T-Test:

Used when the population standard deviation is unknown.
Works well with smaller sample sizes (( n \leq 30 )).
Relies on the sample standard deviation as an estimate of the population standard deviation.
2. Python Implementation

Python provides libraries like scipy and statsmodels to perform these tests. Here's how you can implement them:

Z-Test
Copy the code
from statsmodels.stats.weightstats import ztest

# Example: Perform a one-sample z-test
data = [12, 15, 14, 10, 13, 14, 15, 16]
z_stat, p_value = ztest(data, value=13)  # Compare sample mean to 13
print(f"Z-Statistic: {z_stat}, P-Value: {p_value}")

T-Test
Copy the code
from scipy.stats import ttest_1samp

# Example: Perform a one-sample t-test
data = [12, 15, 14, 10, 13, 14, 15, 16]
t_stat, p_value = ttest_1samp(data, popmean=13)  # Compare sample mean to 13
print(f"T-Statistic: {t_stat}, P-Value: {p_value}")

3. When to Use Which
Use a z-test if you have a large sample size or know the population standard deviation.
Use a t-test for smaller sample sizes or when the population standard deviation is unknown.

Both tests are widely used in hypothesis testing, and Python makes it easy to implement them with just a few lines of code!

12. What is the T-test, and how is it used in hypothesis testing?
-The t-test is a statistical test used to compare the means of groups to see if they’re significantly different from one another. It’s especially handy when you don’t know the population standard deviation and/or have a small sample size.

🎯 Types of T-Tests
One-sample t-test: Compares the sample mean to a known population mean.

Two-sample (independent) t-test: Compares the means of two independent groups.

Paired (dependent) t-test: Compares means from the same group at different times (like before/after a treatment).
🐍 Python Examples with scipy.stats
1. One-Sample T-Test
from scipy import stats

sample = [52, 55, 50, 53, 54]
pop_mean = 50

t_stat, p_val = stats.ttest_1samp(sample, pop_mean)
print("T-statistic:", t_stat)
print("P-value:", p_val)
2. Two-Sample T-Test
group1 = [102, 100, 98, 105]
group2 = [95, 96, 94, 97]

t_stat, p_val = stats.ttest_ind(group1, group2)
print("T-statistic:", t_stat)
print("P-value:", p_val)
3. Paired T-Test
before = [85, 90, 88, 92]
after = [87, 91, 89, 95]

t_stat, p_val = stats.ttest_rel(before, after)
print("T-statistic:", t_stat)
print("P-value:", p_val)
🧠 Interpreting Results
If p-value < α (usually 0.05): You reject the null hypothesis — the difference is statistically significant.

If p-value ≥ α: You fail to reject the null — not enough evidence to prove a difference.

13. What is the relationship between Z-test and T-test in hypothesis testing?
-The z-test and t-test are both statistical methods used in hypothesis testing to determine whether there is a significant difference between sample data and a population parameter or between two sample groups. Here's how they are related and differ, particularly in the context of Python:

Relationship Between Z-Test and T-Test

Purpose:

Both tests assess hypotheses about means.
They are used to compare sample data to a population mean or compare two sample means.

Key Difference:

Z-Test: Used when the population standard deviation ($$\sigma$$) is known or the sample size is large (typically $$n > 30$$).
T-Test: Used when the population standard deviation is unknown and the sample size is small ($$n \leq 30$$). It accounts for additional uncertainty by using the t-distribution.

Underlying Distribution:

Z-Test assumes the data follows a normal distribution.
T-Test uses the t-distribution, which is similar to the normal distribution but has heavier tails (to account for small sample sizes).

Python Implementation:

Both tests can be implemented using libraries like scipy.stats.
Python Examples
1. Z-Test (Using statsmodels.stats.weightstats.ztest):
Copy the code
from statsmodels.stats.weightstats import ztest

# Example: One-sample Z-test
data = [12, 14, 15, 13, 16, 14, 15]
z_stat, p_value = ztest(data, value=14)  # Test if mean is 14
print(f"Z-Statistic: {z_stat}, P-Value: {p_value}")

2. T-Test (Using scipy.stats.ttest_1samp or ttest_ind):
Copy the code
from scipy.stats import ttest_1samp, ttest_ind

# Example: One-sample T-test
data = [12, 14, 15, 13, 16, 14, 15]
t_stat, p_value = ttest_1samp(data, popmean=14)  # Test if mean is 14
print(f"T-Statistic: {t_stat}, P-Value: {p_value}")

# Example: Two-sample T-test
data1 = [12, 14, 15, 13, 16, 14, 15]
data2 = [10, 11, 12, 13, 14, 15, 16]
t_stat, p_value = ttest_ind(data1, data2)  # Compare two sample means
print(f"T-Statistic: {t_stat}, P-Value: {p_value}")

Summary
Use z-test for large samples or when population standard deviation is known.
Use t-test for small samples or when population standard deviation is unknown.
Both tests are easily implemented in Python using libraries like scipy and statsmodels.

14. What is a confidence interval, and how is it used to interpret statistical results?
-A confidence interval (CI) is like a smart estimate with a built-in margin of error. It gives you a range of values that's likely to contain the true population parameter (like a mean or proportion), based on your sample data.

🎯 What Does a Confidence Interval Tell You?
If you calculate a 95% confidence interval for a population mean and get, say, [48.5, 53.2], it means: > "We’re 95% confident that the true population mean lies between 48.5 and 53.2."

It doesn't guarantee that the true value is within the interval, but it means that if we repeated the study 100 times, about 95 of those intervals would capture the true mean.
🐍 How to Calculate a Confidence Interval in Python
Here’s a quick way to do it using scipy.stats:
import numpy as np
from scipy import stats

# Sample data
data = [48, 52, 50, 53, 51]
n = len(data)
mean = np.mean(data)
std_err = stats.sem(data)  # Standard error of the mean

# 95% confidence interval
confidence = 0.95
h = std_err * stats.t.ppf((1 + confidence) / 2, n - 1)
ci_lower = mean - h
ci_upper = mean + h

print(f"Mean: {mean:.2f}")
print(f"95% Confidence Interval: ({ci_lower:.2f}, {ci_upper:.2f})")
🧠 Why Confidence Intervals Matter
They help you:

Understand precision: Narrow CIs mean more precise estimates.

Assess significance: If a CI for a mean difference doesn’t include 0, the result is likely statistically significant.

Communicate uncertainty clearly in scientific reports.

15. What is the margin of error, and how does it affect the confidence interval?
-The margin of error is the "plus or minus" buffer that tells you how much uncertainty is in your estimate from sample data. It defines the range around a sample statistic (like a mean) that likely includes the true population parameter.

🎯 What Does Margin of Error Represent?
Let’s say your survey gives a sample mean of 50, and the margin of error is ±2. That means you're fairly confident (usually 95%) that the true population mean lies between 48 and 52.

The larger the margin of error, the less precise the estimate.
🧮 How Is It Calculated?
For a confidence interval of a mean:

Margin of Error
=
𝑡
∗
×
Standard Error
𝑡
∗
: The critical value from the t-distribution (depends on confidence level and sample size)

Standard Error: Standard deviation of the sample divided by the square root of the sample size
python example------
import numpy as np
from scipy import stats

data = [48, 52, 50, 53, 51]
n = len(data)
mean = np.mean(data)
std_err = stats.sem(data)

# 95% confidence level
confidence = 0.95
t_critical = stats.t.ppf((1 + confidence) / 2, df=n-1)

margin_of_error = t_critical * std_err
ci_lower = mean - margin_of_error
ci_upper = mean + margin_of_error

print(f"Mean: {mean:.2f}")
print(f"Margin of Error: ±{margin_of_error:.2f}")
print(f"95% Confidence Interval: ({ci_lower:.2f}, {ci_upper:.2f})")
📌 Why It Matters
Smaller margin of error = more reliable estimate

Influenced by sample size and variability: larger samples = smaller error

Helps you interpret how much you can "trust" your results

16. How is Bayes' Theorem used in statistics, and what is its significance?
-Bayes’ Theorem is the ultimate statistical plot twist—it lets you update the probability of a hypothesis as more evidence or data becomes available. In everyday stats, it's essential for reasoning under uncertainty, and in Python, it becomes a powerful tool for everything from spam filters to medical diagnostics to machine learning models.

🎯 The Formula
Bayes’ Theorem is mathematically expressed as:
𝑃
(
𝐴
∣
𝐵
)
=
𝑃
(
𝐵
∣
𝐴
)
⋅
𝑃
(
𝐴
)
𝑃
(
𝐵
)
Where:

𝑃
(
𝐴
∣
𝐵
)
: Posterior probability (probability of A given B has occurred)

𝑃
(
𝐵
∣
𝐴
)
: Likelihood (probability of B given A is true)

𝑃
(
𝐴
)
: Prior probability of A

𝑃
(
𝐵
)
: Total probability of B
🧠 Why It Matters
It flips your perspective: you start with a belief (prior), then update it with new evidence (likelihood), resulting in a new belief (posterior). That’s incredibly useful in:

Medical testing: “Given a positive test result, what’s the probability the patient actually has the disease?”

Spam detection: “Given these keywords, how likely is this email to be spam?”

Machine learning: Naive Bayes classifiers rely on Bayes’ rule.
🐍 Python Example: Diagnosing a Condition
Let’s say:

1% of people have a rare disease (prior)

A test is 99% accurate (both sensitivity and specificity)
# Probabilities
P_disease = 0.01
P_no_disease = 1 - P_disease
P_positive_given_disease = 0.99
P_positive_given_no_disease = 0.01

# Bayes' Theorem
P_positive = (P_positive_given_disease * P_disease) + \
             (P_positive_given_no_disease * P_no_disease)
P_disease_given_positive = (P_positive_given_disease * P_disease) / P_positive

print(f"Probability of having the disease given a positive result: {P_disease_given_positive:.4f}")
Even though the test is highly accurate, the actual probability of having the disease given a positive result is lower than you'd think—because the disease is rare! That’s the Bayes’ perspective in action.

17. What is the Chi-square distribution, and when is it useed?
-The Chi-square distribution (χ² distribution) is a fundamental building block in statistics. It's used primarily for categorical data analysis, especially when you're interested in frequencies—how often something happens.

🎯 What Is the Chi-square Distribution?
It’s a right-skewed distribution that becomes more symmetric as the degrees of freedom increase. It arises when you sum the squares of independent standard normal variables. This makes it perfect for testing how well observed data fit expectations.
🧪 When Is It Used?
You use the Chi-square distribution in situations like:

Goodness-of-Fit Test Checks if a sample matches a population distribution. Example: Does a die produce each number equally often?

Test of Independence Evaluates whether two categorical variables are independent. Example: Is gender independent of voting preference?

Test for Homogeneity Compares distributions across different populations. Example: Are color preferences the same across age groups?
🐍 Python Example: Chi-Square Test of Independence
Using scipy.stats:
import numpy as np
from scipy.stats import chi2_contingency

# Contingency table: rows = gender, columns = vote preference
data = np.array([[30, 10],  # e.g., Male: Yes/No
                 [20, 40]]) # e.g., Female: Yes/No

chi2, p, dof, expected = chi2_contingency(data)

print("Chi-square statistic:", chi2)
print("Degrees of freedom:", dof)
print("P-value:", p)
print("Expected frequencies:\n", expected)
If the p-value is below your significance level (say 0.05), you reject the null hypothesis—suggesting a relationship exists between the variables.

18. What is the Chi-square goodness of fit test, and how is it applied?
-Chi-Square Goodness of Fit Test in Python
1
2
3
The chisquare method from the scipy.stats module is used to perform a Chi-Square Goodness of Fit Test in Python. This test evaluates whether observed categorical data matches an expected distribution.

Example:

from scipy.stats import chisquare

# Observed and expected frequencies
observed = [50, 60, 40, 47, 53]
expected = [50, 50, 50, 50, 50]

# Perform the Chi-Square Goodness of Fit Test
statistic, p_value = chisquare(f_obs=observed, f_exp=expected)

print(f"Chi-Square Statistic: {statistic}")
print(f"P-Value: {p_value}")
Output:

Chi-Square Statistic: 4.36
P-Value: 0.35947
Explanation:

Chi-Square Statistic quantifies the difference between observed and expected frequencies.

P-Value determines the significance of the result. If p_value < 0.05, the null hypothesis (data follows the expected distribution) is rejected.

Important Considerations:

Degrees of Freedom: The test uses n-1 degrees of freedom, where n is the number of categories.

Null Hypothesis (H₀): The observed data follows the expected distribution.

Alternative Hypothesis (H₁): The observed data does not follow the expected distribution.

Limitations:

Ensure that all expected frequencies are greater than 5 for reliable results.

For large datasets or complex distributions, additional statistical methods may be required.

19. What is the F-distribution, and when is it used in hypothesis testing?
-The F-distribution is a continuous probability distribution that arises frequently in statistics, particularly in the context of comparing variances. It is asymmetric and skewed to the right, with values ranging from 0 to infinity. The shape of the F-distribution depends on two parameters: the degrees of freedom for the numerator ($$df_1$$) and the denominator ($$df_2$$).

When is the F-distribution used in hypothesis testing?

The F-distribution is primarily used in:

Analysis of Variance (ANOVA): To test whether the means of multiple groups are significantly different.
Regression Analysis: To assess the overall significance of a regression model.
Equality of Variances (F-test): To compare the variances of two populations.

In hypothesis testing, the F-statistic is calculated as the ratio of two sample variances: $$ F = \frac{\text{Variance of group 1}}{\text{Variance of group 2}} $$ The null hypothesis typically assumes that the variances (or other quantities being compared) are equal.

Using the F-distribution in Python

Python provides tools for working with the F-distribution through libraries like scipy and statsmodels. Here's how you can use it:

1. Performing an F-test for Equality of Variances
Copy the code
from scipy.stats import f

# Example: Variances of two groups
var1 = 4.5  # Variance of group 1
var2 = 2.3  # Variance of group 2
df1 = 10    # Degrees of freedom for group 1
df2 = 12    # Degrees of freedom for group 2

# Calculate the F-statistic
f_statistic = var1 / var2

# Calculate the p-value
p_value = 1 - f.cdf(f_statistic, df1, df2)

print(f"F-statistic: {f_statistic}")
print(f"P-value: {p_value}")

2. One-Way ANOVA
Copy the code
from scipy.stats import f_oneway

# Example: Data from three groups
group1 = [12, 14, 15, 16, 19]
group2 = [22, 24, 25, 27, 30]
group3 = [32, 34, 35, 37, 40]

# Perform one-way ANOVA
f_statistic, p_value = f_oneway(group1, group2, group3)

print(f"F-statistic: {f_statistic}")
print(f"P-value: {p_value}")

3. F-distribution Visualization
Copy the code
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import f

# Parameters for F-distribution
df1, df2 = 5, 10
x = np.linspace(0, 5, 500)
y = f.pdf(x, df1, df2)

# Plot the F-distribution
plt.plot(x, y, label=f"F-distribution (df1={df1}, df2={df2})")
plt.title("F-distribution")
plt.xlabel("F-value")
plt.ylabel("Probability Density")
plt.legend()
plt.show()

Summary

The F-distribution is a cornerstone of statistical hypothesis testing when comparing variances or testing the significance of models. Python's scipy library makes it straightforward to calculate F-statistics, p-values, and visualize the distribution.

20. What is an ANOVA test, and what are its assumptions?
-ANOVA—short for Analysis of Variance—is a statistical test used when you want to compare the means of three or more groups to see if at least one of them is significantly different. Instead of running multiple t-tests (which increase the risk of Type I error), ANOVA does it all in one go.

🎯 What Does ANOVA Do?
It analyzes how much of the total variance in your data can be attributed to between-group differences versus within-group variation. The output is an F-statistic and a p-value:

A large F and small p-value (typically < 0.05) suggest at least one group mean is significantly different.
🧪 Key Assumptions of ANOVA
Independence: The observations in each group are independent.

Normality: Each group’s data is approximately normally distributed.

Homogeneity of variance (homoscedasticity): The variances among

21. What are the different types of ANOVA tests?
-ANOVA, short for Analysis of Variance, is a powerful statistical method used to compare the means of three or more groups. Depending on your experimental design and the number of variables you're comparing, there are several types of ANOVA tests—each with its own flavor and purpose.
🧪 Common Types of ANOVA in Python
1. One-Way ANOVA
Purpose: Tests whether the means of three or more independent groups differ.

Assumption: One categorical independent variable, one continuous dependent variable.

Python Example (using scipy.stats.f_oneway):
from scipy.stats import f_oneway

group1 = [21, 22, 19, 24]
group2 = [30, 29, 33, 28]
group3 = [25, 27, 26, 30]

f_stat, p_val = f_oneway(group1, group2, group3)
print("F-statistic:", f_stat)
print("P-value:", p_val)
2. Two-Way ANOVA
Purpose: Tests the effect of two independent variables (factors) on a dependent variable, and checks for interaction effects.

Requires: More structured data (often in a DataFrame).

Python Tool: statsmodels
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example DataFrame
df = pd.DataFrame({
    'score': [91, 87, 89, 95, 85, 88],
    'treatment': ['A', 'A', 'B', 'B', 'C', 'C'],
    'location': ['X', 'Y', 'X', 'Y', 'X', 'Y']
})

model = ols('score ~ C(treatment) + C(location) + C(treatment):C(location)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)
3. Repeated Measures ANOVA
Purpose: Tests differences when the same subjects are measured multiple times (e.g., before, during, and after treatment).

Python Tool: statsmodels or pingouin
import pingouin as pg
df = pg.read_dataset("rm_anova")  # Built-in example dataset
aov = pg.rm_anova(dv='Scores', within='Time', subject='Subject', data=df)
print(aov)
4. MANOVA (Multivariate ANOVA)
Purpose: Compares groups across multiple dependent variables simultaneously.

Python Tool: statsmodels
from statsmodels.multivariate.manova import MANOVA

df = pd.DataFrame({
    'group': ['A', 'A', 'B', 'B'],
    'math': [90, 85, 88, 92],
    'science': [88, 84, 89, 91]
})

maov = MANOVA.from_formula('math + science ~ group', data=df)
print(maov.mv_test())
Each type of ANOVA peels back a different layer of data structure and relationships.

22. What is the F-test, and how does it relate to hypothesis testing?
-The F-test is a statistical test used to compare two variances or to assess whether a group of variables significantly explains the variability of a response. It’s an integral part of ANOVA (Analysis of Variance) and regression analysis in hypothesis testing.

🎯 What Does the F-Test Do?
At its core, the F-test evaluates:

Are group means different? (in ANOVA)

Is the variance explained by the model significant? (in linear regression)

Do two populations have equal variances? (in a comparison of variances)

The test statistic follows an F-distribution, which is right-skewed and depends on two different degrees of freedom: one for the numerator and one for the denominator.
🧪 Python Example: F-Test for Equal Variances
Using scipy.stats’s f_oneway for ANOVA:
from scipy.stats import f_oneway

# Three groups
group1 = [20, 22, 19, 24, 20]
group2 = [30, 29, 33, 31, 32]
group3 = [25, 28, 27, 26, 29]

f_stat, p_val = f_oneway(group1, group2, group3)
print("F-statistic:", f_stat)
print("P-value:", p_val)
If the p-value is less than your chosen α (e.g., 0.05), you reject the null hypothesis that all group means are equal.

The F-statistic quantifies the ratio of variation between groups to variation within groups.


PRACTICAL----------

In [None]:
1.Write a Python program to perform a Z-test for comparing a sample mean to a known population mean and
interpret the results?
-Here’s a Python program to perform a z-test for comparing a sample mean to a known population mean. The program calculates the z-score and p-value, and interprets the results based on a significance level (commonly 0.05).

Copy the code
import scipy.stats as stats
import math

# Function to perform a z-test
def z_test(sample_mean, population_mean, population_std, sample_size, alpha=0.05):
    # Calculate the standard error
    standard_error = population_std / math.sqrt(sample_size)

    # Calculate the z-score
    z_score = (sample_mean - population_mean) / standard_error

    # Calculate the p-value (two-tailed test)
    p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

    # Interpret the results
    print(f"Z-Score: {z_score:.4f}")
    print(f"P-Value: {p_value:.4f}")

    if p_value < alpha:
        print("Result: Reject the null hypothesis. The sample mean is significantly different from the population mean.")
    else:
        print("Result: Fail to reject the null hypothesis. The sample mean is not significantly different from the population mean.")

# Example usage
sample_mean = 105  # Example sample mean
population_mean = 100  # Known population mean
population_std = 15  # Known population standard deviation
sample_size = 30  # Size of the sample
alpha = 0.05  # Significance level

z_test(sample_mean, population_mean, population_std, sample_size, alpha)

Explanation:

Inputs:

sample_mean: The mean of the sample.
population_mean: The known mean of the population.
population_std: The known standard deviation of the population.
sample_size: The size of the sample.
alpha: The significance level (default is 0.05).

Steps:

Calculate the standard error: $$SE = \frac{\sigma}{\sqrt{n}}$$
Compute the z-score: $$Z = \frac{\bar{x} - \mu}{SE}$$
Determine the p-value for a two-tailed test.
Compare the p-value to the significance level to decide whether to reject the null hypothesis.

Output:

The z-score and p-value are displayed.
The program interprets the results, indicating whether the sample mean is significantly different from the population mean.

This program is adaptable for various datasets—just update the input values accordingly!

In [None]:
2. Simulate random data to perform hypothesis testing and calculate the corresponding P-value using Python?
-Here’s a concise Python example to simulate random data, perform hypothesis testing, and calculate the corresponding p-value using the scipy.stats module:

Copy the code
import numpy as np
from scipy.stats import ttest_ind

# Simulate random data for two groups
np.random.seed(42)  # For reproducibility
group1 = np.random.normal(loc=50, scale=10, size=100)  # Mean=50, Std=10, Size=100
group2 = np.random.normal(loc=55, scale=10, size=100)  # Mean=55, Std=10, Size=100

# Perform two-sample t-test
t_stat, p_value = ttest_ind(group1, group2)

# Display results
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")

Explanation:
Data Simulation: Two random datasets (group1 and group2) are generated using a normal distribution with specified means, standard deviations, and sizes.
Hypothesis Testing: A two-sample t-test is performed to compare the means of the two groups.
Output: The t-statistic and p-value are printed, which help determine whether to reject the null hypothesis.

This example is adaptable for other hypothesis tests (e.g., ANOVA, chi-square) by modifying the statistical test and data generation.

In [None]:
3. Implement a one-sample Z-test using Python to compare the sample mean with the population mean?
-Here is an implementation of a one-sample Z-test in Python to compare a sample mean with a population mean:

Copy the code
import numpy as np
from scipy.stats import norm

def one_sample_z_test(sample, population_mean, population_std):
    # Calculate sample mean
    sample_mean = np.mean(sample)

    # Calculate sample size
    n = len(sample)

    # Calculate the Z-score
    z_score = (sample_mean - population_mean) / (population_std / np.sqrt(n))

    # Calculate the p-value (two-tailed test)
    p_value = 2 * (1 - norm.cdf(abs(z_score)))

    return z_score, p_value

# Example usage
sample = [50, 52, 48, 49, 51, 50, 53]  # Sample data
population_mean = 50  # Population mean
population_std = 2  # Population standard deviation (known)

z_score, p_value = one_sample_z_test(sample, population_mean, population_std)

print(f"Z-Score: {z_score}")
print(f"P-Value: {p_value}")

# Interpretation
if p_value < 0.05:  # Assuming a significance level of 0.05
    print("Reject the null hypothesis: The sample mean is significantly different from the population mean.")
else:
    print("Fail to reject the null hypothesis: The sample mean is not significantly different from the population mean.")

Explanation:

Inputs:

sample: The sample data.
population_mean: The known population mean.
population_std: The known population standard deviation.

Steps:

Compute the sample mean.
Calculate the Z-score using the formula: $$ Z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}} $$ where:
$$\bar{x}$$ is the sample mean,
$$\mu$$ is the population mean,
$$\sigma$$ is the population standard deviation,
$$n$$ is the sample size.
Compute the p-value for a two-tailed test using the cumulative distribution function (CDF) of the standard normal distribution.

Output:

z_score: The Z-statistic.
p_value: The probability of observing the sample mean under the null hypothesis.

This code is adaptable for different datasets and hypotheses.

In [None]:
4. Perform a two-tailed Z-test using Python and visualize the decision region on a plot?
-Here’s how you can perform a two-tailed Z-test in Python and visualize the decision region on a plot. This example assumes you have a sample mean, population mean, population standard deviation, and sample size.

Code Example
Copy the code
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

# Parameters
population_mean = 50  # Hypothesized population mean (H0)
population_std = 10   # Population standard deviation
sample_mean = 53      # Sample mean
sample_size = 30      # Sample size
alpha = 0.05          # Significance level

# Z-test calculation
z_score = (sample_mean - population_mean) / (population_std / np.sqrt(sample_size))
critical_value = stats.norm.ppf(1 - alpha / 2)  # Two-tailed critical value

# Decision
if abs(z_score) > critical_value:
    decision = "Reject the null hypothesis (H0)"
else:
    decision = "Fail to reject the null hypothesis (H0)"

print(f"Z-Score: {z_score:.2f}")
print(f"Critical Value: ±{critical_value:.2f}")
print(f"Decision: {decision}")

# Visualization of decision region
x = np.linspace(-4, 4, 1000)
y = stats.norm.pdf(x, 0, 1)

plt.figure(figsize=(10, 6))
plt.plot(x, y, label="Standard Normal Distribution", color="blue")

# Shade rejection regions
plt.fill_between(x, y, where=(x < -critical_value), color="red", alpha=0.5, label="Rejection Region")
plt.fill_between(x, y, where=(x > critical_value), color="red", alpha=0.5)

# Mark z-score
plt.axvline(z_score, color="green", linestyle="--", label=f"Z-Score = {z_score:.2f}")

# Add labels and legend
plt.title("Two-Tailed Z-Test Decision Region")
plt.xlabel("Z-Score")
plt.ylabel("Probability Density")
plt.axvline(-critical_value, color="black", linestyle="--", label=f"Critical Value = ±{critical_value:.2f}")
plt.axvline(critical_value, color="black", linestyle="--")
plt.legend()
plt.grid()
plt.show()

Explanation:

Z-Score Calculation:

$$ Z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}} $$
Where:
$$ \bar{x} $$ = Sample mean
$$ \mu $$ = Population mean
$$ \sigma $$ = Population standard deviation
$$ n $$ = Sample size

Critical Value:

For a two-tailed test, the critical value is determined using the standard normal distribution and the significance level $$ \alpha $$.

Decision:

If the absolute value of the Z-score exceeds the critical value, reject the null hypothesis.

Visualization:

The plot shows the standard normal distribution with shaded rejection regions and the calculated Z-score.

This code is adaptable for different parameters and provides a clear visual representation of the decision-making process.

In [None]:
5. Create a Python function that calculates and visualizes Type 1 and Type 2 errors during hypothesis testing?
-Here's a Python function that calculates and visualizes Type I and Type II errors during hypothesis testing. The function uses a normal distribution to simulate the null and alternative hypotheses and plots the results for better understanding.

Copy the code
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

def visualize_hypothesis_testing(mu_null=0, mu_alt=1, sigma=1, alpha=0.05, sample_size=30):
    """
    Visualizes Type I and Type II errors in hypothesis testing.

    Parameters:
    - mu_null: Mean of the null hypothesis (H0).
    - mu_alt: Mean of the alternative hypothesis (H1).
    - sigma: Standard deviation of the population.
    - alpha: Significance level (Type I error rate).
    - sample_size: Number of samples used in the test.
    """
    # Calculate the critical value (z-score) for the significance level
    z_critical = norm.ppf(1 - alpha)
    critical_value = mu_null + z_critical * (sigma / np.sqrt(sample_size))

    # Generate x values for the distributions
    x = np.linspace(mu_null - 4 * sigma, mu_alt + 4 * sigma, 1000)

    # Null hypothesis distribution (H0)
    null_dist = norm.pdf(x, mu_null, sigma / np.sqrt(sample_size))

    # Alternative hypothesis distribution (H1)
    alt_dist = norm.pdf(x, mu_alt, sigma / np.sqrt(sample_size))

    # Plot the distributions
    plt.figure(figsize=(10, 6))
    plt.plot(x, null_dist, label="Null Hypothesis (H0)", color="blue")
    plt.plot(x, alt_dist, label="Alternative Hypothesis (H1)", color="red")

    # Shade the Type I error region
    plt.fill_between(x, 0, null_dist, where=(x > critical_value), color="blue", alpha=0.3, label="Type I Error (α)")

    # Shade the Type II error region
    plt.fill_between(x, 0, alt_dist, where=(x <= critical_value), color="red", alpha=0.3, label="Type II Error (β)")

    # Add labels and legend
    plt.axvline(critical_value, color="black", linestyle="--", label=f"Critical Value = {critical_value:.2f}")
    plt.title("Type I and Type II Errors in Hypothesis Testing")
    plt.xlabel("Test Statistic")
    plt.ylabel("Probability Density")
    plt.legend()
    plt.grid()
    plt.show()

# Example usage
visualize_hypothesis_testing(mu_null=0, mu_alt=1, sigma=1, alpha=0.05, sample_size=30)

Explanation:

Inputs:

mu_null: Mean of the null hypothesis distribution.
mu_alt: Mean of the alternative hypothesis distribution.
sigma: Standard deviation of the population.
alpha: Significance level (probability of Type I error).
sample_size: Number of samples used in the test.

Critical Value:

The critical value is calculated based on the significance level and the null hypothesis distribution.

Visualization:

The null hypothesis (H0) and alternative hypothesis (H1) distributions are plotted.
The Type I error region (false rejection of H0) is shaded in blue.
The Type II error region (false acceptance of H0) is shaded in red.

This function provides an intuitive way to understand the trade-offs between Type I and Type II errors in hypothesis testing.

In [None]:
6. Write a Python program to perform an independent T-test and interpret the results?
-Absolutely! An independent t-test compares the means of two independent groups to determine if there's a statistically significant difference between them. Here's a Python example with a walkthrough:
from scipy import stats

# Sample data: scores from two different groups
group_a = [85, 88, 90, 94, 87]
group_b = [78, 80, 82, 76, 79]

# Perform the independent t-test
t_stat, p_val = stats.ttest_ind(group_a, group_b)

# Display the results
print(f"T-statistic: {t_stat:.3f}")
print(f"P-value: {p_val:.3f}")

# Interpretation
alpha = 0.05
if p_val < alpha:
    print("Result: Statistically significant difference between groups (Reject H₀)")
else:
    print("Result: No statistically significant difference (Fail to reject H₀)")
🧠 How to Interpret the Output
Let’s say you get this output:
T-statistic: 5.247
P-value: 0.001
Result: Statistically significant difference between groups (Reject H₀)
This means:

The means are likely different.

With p-value = 0.001 < α = 0.05, you reject the null hypothesis that the group means are equal.

Group A likely performed better than Group B (you can verify this by comparing their means).

In [None]:
7. Perform a paired sample T-test using Python and visualize the comparison results?
-Here’s how you can perform a paired sample t-test in Python and visualize the comparison results. I'll provide a concise example:

Code Example
Copy the code
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

# Example data: Before and After measurements
before = np.array([85, 90, 88, 75, 95, 80, 78, 92, 88, 84])
after = np.array([88, 92, 85, 78, 97, 82, 80, 94, 90, 86])

# Perform paired sample t-test
t_stat, p_value = stats.ttest_rel(before, after)

# Print results
print(f"T-statistic: {t_stat:.3f}")
print(f"P-value: {p_value:.3f}")

# Visualization
plt.figure(figsize=(8, 5))
x = np.arange(len(before)) + 1  # Index for each pair

plt.plot(x, before, 'o-', label='Before', color='blue')
plt.plot(x, after, 'o-', label='After', color='green')
plt.fill_between(x, before, after, color='gray', alpha=0.2, label='Difference')
plt.axhline(y=np.mean(before), color='blue', linestyle='--', label='Mean Before')
plt.axhline(y=np.mean(after), color='green', linestyle='--', label='Mean After')

plt.title('Paired Sample Comparison')
plt.xlabel('Sample Index')
plt.ylabel('Values')
plt.legend()
plt.grid(alpha=0.3)
plt.show()

Explanation
Data: before and after arrays represent paired measurements.
Paired t-test: stats.ttest_rel() is used for paired sample t-tests.
Visualization:
Line plots show the "before" and "after" values for each pair.
Shaded areas highlight the differences.
Dashed lines represent the mean of each group.

This approach provides both statistical results and a clear visual comparison of the paired data.

In [None]:
8. Simulate data and perform both Z-test and T-test, then compare the results using Python.
-Absolutely! Let’s walk through how to simulate data, perform both a Z-test and a T-test, and compare the results using Python. This is a great way to see how these tests behave under different assumptions.

🧪 Step 1: Simulate Data
import numpy as np

# Set seed for reproducibility
np.random.seed(42)

# Simulate two independent samples
group1 = np.random.normal(loc=100, scale=10, size=50)  # mean=100, std=10
group2 = np.random.normal(loc=105, scale=10, size=50)  # mean=105, std=10
📊 Step 2: Perform a T-Test (population std unknown)
from scipy import stats

t_stat, t_pval = stats.ttest_ind(group1, group2)
print(f"T-test → t-statistic: {t_stat:.3f}, p-value: {t_pval:.3f}")
📈 Step 3: Perform a Z-Test (assuming known population std)
import statsmodels.stats.weightstats as sm

# Assume known population std = 10
z_stat, z_pval = sm.ztest(group1, group2, value=0, alternative='two-sided')
print(f"Z-test → z-statistic: {z_stat:.3f}, p-value: {z_pval:.3f}")
🔍 Step 4: Compare the Results
T-test is more conservative when sample size is small or population std is unknown.

Z-test assumes known population std and is more sensitive with large samples.

If both tests give similar p-values, it suggests your sample size is large enough for the t-distribution to approximate the normal distribution.


In [None]:
9. Write a Python function to calculate the confidence interval for a sample mean and explain its significance.?
-🧮 Python Function: Confidence Interval for Sample Mean
import numpy as np
from scipy import stats

def confidence_interval(data, confidence=0.95):
    """
    Calculate the confidence interval for a sample mean.

    Parameters:
        data (list or array-like): Sample data
        confidence (float): Confidence level (default is 0.95)

    Returns:
        tuple: (mean, lower bound, upper bound)
    """
    data = np.array(data)
    n = len(data)
    mean = np.mean(data)
    std_err = stats.sem(data)  # Standard error of the mean
    margin = std_err * stats.t.ppf((1 + confidence) / 2, df=n-1)
    return mean, mean - margin, mean + margin
✅ Example Usage-
sample = [72, 75, 78, 71, 76, 77, 74]
mean, lower, upper = confidence_interval(sample)

print(f"Sample Mean: {mean:.2f}")
print(f"95% Confidence Interval: ({lower:.2f}, {upper:.2f})")
📌 What It Means
A 95% confidence interval means: > If you repeated your sampling many times, about 95% of those intervals would contain the true population mean.

It gives a range of plausible values for the true average, accounting for sample size and variability. Wider intervals imply more uncertainty, while narrower ones suggest more precision.

In [None]:
10.Write a Python program to calculate the margin of error for a given confidence level using sample data?
-Here’s a Python program to calculate the margin of error for a given confidence level using sample data. The program uses the z-score for the confidence level and assumes a normal distribution.

Copy the code
import math
from scipy.stats import norm

def calculate_margin_of_error(sample_mean, sample_std_dev, sample_size, confidence_level):
    # Calculate the z-score for the given confidence level
    z_score = norm.ppf(1 - (1 - confidence_level) / 2)

    # Calculate the standard error
    standard_error = sample_std_dev / math.sqrt(sample_size)

    # Calculate the margin of error
    margin_of_error = z_score * standard_error

    return margin_of_error

# Example usage
sample_mean = 50  # Example sample mean
sample_std_dev = 10  # Example sample standard deviation
sample_size = 100  # Example sample size
confidence_level = 0.95  # 95% confidence level

margin_of_error = calculate_margin_of_error(sample_mean, sample_std_dev, sample_size, confidence_level)
print(f"Margin of Error: {margin_of_error:.2f}")

Explanation:

Inputs:

sample_mean: The mean of the sample data (not directly used in margin of error calculation but often part of the context).
sample_std_dev: The standard deviation of the sample data.
sample_size: The size of the sample.
confidence_level: The desired confidence level (e.g., 0.95 for 95%).

Steps:

The z-score is calculated using the scipy.stats.norm.ppf function.
The standard error is computed as $$\text{standard error} = \frac{\text{sample standard deviation}}{\sqrt{\text{sample size}}}$$.
The margin of error is then $$\text{margin of error} = \text{z-score} \times \text{standard error}$$.

Output:

The program prints the margin of error rounded to two decimal places.

You can modify the inputs (sample_mean, sample_std_dev, sample_size, and confidence_level) to suit your specific data.

In [None]:
11. Implement a Bayesian inference method using Bayes' Theorem in Python and explain the process.
-Here’s an implementation of Bayesian inference using Bayes' Theorem in Python, along with an explanation of the process:

Bayes' Theorem

Bayes' Theorem is expressed as:

$$ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} $$

Where:

P(A|B): Posterior probability (probability of A given B).
P(B|A): Likelihood (probability of B given A).
P(A): Prior probability (initial belief about A).
P(B): Evidence (normalizing constant).
Python Implementation
Copy the code
# Define the probabilities
def bayesian_inference(prior_A, likelihood_B_given_A, likelihood_B_given_not_A):
    """
    Perform Bayesian inference using Bayes' Theorem.

    Parameters:
    - prior_A: P(A) - Prior probability of A
    - likelihood_B_given_A: P(B|A) - Likelihood of B given A
    - likelihood_B_given_not_A: P(B|¬A) - Likelihood of B given not A

    Returns:
    - posterior_A_given_B: P(A|B) - Posterior probability of A given B
    """
    # Calculate P(¬A) (complement of A)
    prior_not_A = 1 - prior_A

    # Calculate P(B) (evidence)
    evidence_B = (likelihood_B_given_A * prior_A) + (likelihood_B_given_not_A * prior_not_A)

    # Calculate P(A|B) (posterior probability)
    posterior_A_given_B = (likelihood_B_given_A * prior_A) / evidence_B

    return posterior_A_given_B


# Example usage
prior_A = 0.01  # Prior probability of having a disease
likelihood_B_given_A = 0.9  # Probability of testing positive if diseased
likelihood_B_given_not_A = 0.05  # Probability of testing positive if not diseased

posterior = bayesian_inference(prior_A, likelihood_B_given_A, likelihood_B_given_not_A)
print(f"Posterior Probability (P(A|B)): {posterior:.4f}")

Explanation of the Process

Define Inputs:

prior_A: The initial belief about the probability of event A (e.g., having a disease).
likelihood_B_given_A: The probability of observing evidence B (e.g., a positive test result) if A is true.
likelihood_B_given_not_A: The probability of observing evidence B if A is false.

Calculate Complement:

Compute the probability of the complement of A, i.e., $$P(\neg A) = 1 - P(A)$$.

Compute Evidence:

The evidence $$P(B)$$ is the total probability of observing B, considering both scenarios (A and ¬A): $$ P(B) = P(B|A) \cdot P(A) + P(B|\neg A) \cdot P(\neg A) $$

Apply Bayes' Theorem:

Use the formula to compute the posterior probability: $$ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} $$

Output the Result:

The result is the updated belief (posterior probability) about A after observing B.
Example Scenario
Prior: 1% of the population has a disease.
Likelihood: A test detects the disease 90% of the time if present, but has a 5% false positive rate.
Posterior: The probability of having the disease given a positive test result is calculated.

This approach can be adapted to various Bayesian inference problems by modifying the inputs.

In [None]:
12. Perform a Chi-square test for independence between two categorical variables in Python?
-To perform a Chi-Square test for independence between two categorical variables in Python, you can use the chi2_contingency function from the scipy.stats module. Here's an example:

Example Code:
Copy the code
import numpy as np
from scipy.stats import chi2_contingency

# Create a contingency table (example data)
# Rows and columns represent the categories of the two variables
data = np.array([[50, 30, 20],  # Category 1 of Variable A
                 [30, 50, 20],  # Category 2 of Variable A
                 [20, 20, 60]]) # Category 3 of Variable A

# Perform the Chi-Square test
chi2, p, dof, expected = chi2_contingency(data)

# Print the results
print("Chi-Square Statistic:", chi2)
print("P-value:", p)
print("Degrees of Freedom:", dof)
print("Expected Frequencies:\n", expected)

# Interpretation
if p < 0.05:
    print("The variables are likely dependent (reject the null hypothesis).")
else:
    print("The variables are likely independent (fail to reject the null hypothesis).")

Explanation:
Input Data: The data array is a contingency table where rows and columns represent the categories of the two variables.
Chi-Square Test: The chi2_contingency function computes:
chi2: The test statistic.
p: The p-value.
dof: Degrees of freedom.
expected: The expected frequencies under the null hypothesis.
Interpretation: Compare the p-value to your significance level (commonly 0.05) to determine if the variables are independent.

This method is flexible and works for any contingency table. Just replace the data array with your own dataset!

In [None]:
13.Write a Python program to calculate the expected frequencies for a Chi-square test based on observed
data?
-Here is a Python program to calculate the expected frequencies for a Chi-Square test based on observed data. The program assumes you have a contingency table (observed data) and calculates the expected frequencies using the formula:

$$E_{ij} = \frac{(R_i \cdot C_j)}{N}$$

Where:

$$E_{ij}$$ is the expected frequency for cell (i, j),
$$R_i$$ is the sum of the i-th row,
$$C_j$$ is the sum of the j-th column,
$$N$$ is the total sum of all observations.
Copy the code
import numpy as np

def calculate_expected_frequencies(observed):
    # Convert observed data to a NumPy array for easier calculations
    observed = np.array(observed)

    # Calculate row sums, column sums, and total sum
    row_sums = observed.sum(axis=1)
    col_sums = observed.sum(axis=0)
    total = observed.sum()

    # Calculate expected frequencies
    expected = np.outer(row_sums, col_sums) / total

    return expected

# Example observed data (contingency table)
observed_data = [
    [50, 30, 20],
    [30, 40, 30],
    [20, 30, 50]
]

# Calculate expected frequencies
expected_frequencies = calculate_expected_frequencies(observed_data)

# Print the result
print("Observed Data:")
print(np.array(observed_data))
print("\nExpected Frequencies:")
print(expected_frequencies)

Explanation:
Input: The observed_data is a 2D list representing the contingency table.
Row and Column Sums: The program calculates the sums of rows and columns using NumPy's sum function.
Expected Frequencies: The expected frequencies are calculated using the formula and stored in a 2D array.
Output: Both the observed data and expected frequencies are printed for comparison.

You can replace the observed_data with your own contingency table to calculate expected frequencies for your specific dataset.

In [None]:
14. Perform a goodness-of-fit test using Python to compare the observed data to an expected distribution?
-To perform a goodness-of-fit test in Python, you can use the Chi-Square Goodness-of-Fit Test from the scipy.stats module. Here's an example:

Example Code
Copy the code
import numpy as np
from scipy.stats import chisquare

# Observed data
observed = np.array([50, 30, 20])

# Expected data (must sum to the same total as observed)
expected = np.array([40, 40, 20])

# Perform the Chi-Square Goodness-of-Fit Test
chi2_stat, p_value = chisquare(f_obs=observed, f_exp=expected)

# Output the results
print(f"Chi-Square Statistic: {chi2_stat}")
print(f"P-Value: {p_value}")

# Interpretation
if p_value < 0.05:
    print("The observed data does not fit the expected distribution (reject null hypothesis).")
else:
    print("The observed data fits the expected distribution (fail to reject null hypothesis).")

Key Notes:

Inputs:

f_obs: Observed frequencies (your data).
f_exp: Expected frequencies (theoretical distribution).

Output:

chi2_stat: The test statistic.
p_value: The probability of observing the data under the null hypothesis.

Assumptions:

Observed and expected frequencies should be non-negative.
Expected frequencies should not be too small (preferably > 5 for each category).

This test helps determine whether the observed data significantly deviates from the expected distribution.

In [None]:
15. Create a Python script to simulate and visualize the Chi-square distribution and discuss its characteristics?
-import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2

# Parameters
df_values = [1, 2, 5, 10]  # degrees of freedom
x = np.linspace(0, 30, 500)

# Plot Chi-square PDFs for different degrees of freedom
plt.figure(figsize=(10, 6))
for df in df_values:
    plt.plot(x, chi2.pdf(x, df), label=f'df={df}')

plt.title('Chi-square Distribution for Various Degrees of Freedom')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()


In [None]:
16. Implement an F-test using Python to compare the variances of two random samples?
-import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2

# Parameters
df_values = [1, 2, 5, 10]  # degrees of freedom
x = np.linspace(0, 30, 500)

# Plot Chi-square PDFs for different degrees of freedom
plt.figure(figsize=(10, 6))
for df in df_values:
    plt.plot(x, chi2.pdf(x, df), label=f'df={df}')

plt.title('Chi-square Distribution for Various Degrees of Freedom')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()


In [None]:
17.Write a Python program to perform an ANOVA test to compare means between multiple groups and
interpret the results?
-from scipy.stats import f_oneway

# Sample data: test scores from three different teaching methods
method_A = [85, 88, 90, 87, 86]
method_B = [78, 80, 82, 76, 79]
method_C = [92, 95, 91, 94, 93]

# Perform one-way ANOVA
f_stat, p_val = f_oneway(method_A, method_B, method_C)

print(f"F-statistic: {f_stat:.3f}")
print(f"P-value: {p_val:.4f}")

# Interpretation
alpha = 0.05
if p_val < alpha:
    print("Result: Statistically significant difference between group means (Reject H₀)")
else:
    print("Result: No statistically significant difference (Fail to reject H₀)")


In [None]:
18.Perform a one-way ANOVA test using Python to compare the means of different groups and plot the results?
-🧪 Step 1: Perform One-Way ANOVA
from scipy.stats import f_oneway

# Sample data: test scores from three different teaching methods
group_A = [85, 88, 90, 87, 86]
group_B = [78, 80, 82, 76, 79]
group_C = [92, 95, 91, 94, 93]

# Run ANOVA
f_stat, p_val = f_oneway(group_A, group_B, group_C)

print(f"F-statistic: {f_stat:.3f}")
print(f"P-value: {p_val:.4f}")
📊 Step 2: Visualize the Group Distributions
import matplotlib.pyplot as plt

# Combine data for plotting
data = [group_A, group_B, group_C]
labels = ['Method A', 'Method B', 'Method C']

plt.boxplot(data, labels=labels)
plt.title('Comparison of Teaching Methods')
plt.ylabel('Test Scores')
plt.grid(True)
plt.tight_layout()
plt.show()


In [None]:
19. Write a Python function to check the assumptions (normality, independence, and equal variance) for ANOVA?
-🧪 Python Function to Check ANOVA Assumptions
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import shapiro, levene
from statsmodels.stats.diagnostic import acorr_ljungbox
import statsmodels.api as sm
from statsmodels.formula.api import ols

def check_anova_assumptions(data, dv, iv):
    """
    Check ANOVA assumptions: normality, independence, and equal variances.

    Parameters:
        data (DataFrame): Your dataset
        dv (str): Name of the dependent variable column
        iv (str): Name of the independent variable column
    """
    print("Fitting ANOVA model...")
    model = ols(f'{dv} ~ C({iv})', data=data).fit()
    residuals = model.resid

    # 1. Normality of residuals
    print("\n1. Normality (Shapiro-Wilk Test):")
    stat, p = shapiro(residuals)
    print(f"Shapiro-Wilk p-value: {p:.4f}")
    if p > 0.05:
        print("✅ Residuals appear normally distributed.")
    else:
        print("⚠️ Residuals may not be normally distributed.")

    # 2. Independence (Ljung-Box Test)
    print("\n2. Independence (Ljung-Box Test):")
    lb_stat, lb_p = acorr_ljungbox(residuals, lags=[1], return_df=False)
    print(f"Ljung-Box p-value: {lb_p[0]:.4f}")
    if lb_p[0] > 0.05:
        print("✅ Residuals appear independent.")
    else:
        print("⚠️ Residuals may not be independent.")

    # 3. Homogeneity of variances (Levene’s Test)
    print("\n3. Equal Variance (Levene’s Test):")
    groups = [group[dv].values for name, group in data.groupby(iv)]
    stat, p = levene(*groups)
    print(f"Levene’s p-value: {p:.4f}")
    if p > 0.05:
        print("✅ Variances appear equal across groups.")
    else:
        print("⚠️ Variances may not be equal.")

    # Optional: Residual plot
    sns.histplot(residuals, kde=True)
    plt.title("Histogram of Residuals")
    plt.xlabel("Residuals")
    plt.ylabel("Frequency")
    plt.grid(True)
    plt.tight_layout()
    plt.show()


In [None]:
20.Perform a two-way ANOVA test using Python to study the interaction between two factors and visualize the
results?
-🧪 Step 1: Create or Load Your Data
import pandas as pd

# Example dataset: Exam scores by teaching method and gender
data = pd.DataFrame({
    'score': [88, 92, 85, 90, 78, 82, 75, 80, 91, 89, 87, 93],
    'method': ['A', 'A', 'B', 'B', 'A', 'A', 'B', 'B', 'A', 'A', 'B', 'B'],
    'gender': ['M', 'F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'F']
})
📊 Step 2: Perform Two-Way ANOVA
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Fit the model with interaction
model = ols('score ~ C(method) + C(gender) + C(method):C(gender)', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)
This will show you:

The main effects of method and gender

The interaction effect between them
📈 Step 3: Visualize the Interaction
import seaborn as sns
import matplotlib.pyplot as plt

sns.pointplot(data=data, x='method', y='score', hue='gender', dodge=True, markers=['o', 's'], capsize=.1)
plt.title('Interaction Plot: Method vs Gender on Score')
plt.grid(True)
plt.tight_layout()
plt.show()


In [None]:
21.Write a Python program to visualize the F-distribution and discuss its use in hypothesis testing?
-🧪 Python Program: Visualize the F-Distribution
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import f

# Define degrees of freedom
df1_values = [1, 2, 5, 10]
df2 = 20
x = np.linspace(0, 5, 500)

# Plot F-distributions for different df1
plt.figure(figsize=(10, 6))
for df1 in df1_values:
    y = f.pdf(x, df1, df2)
    plt.plot(x, y, label=f'df1={df1}, df2={df2}')

plt.title('F-Distribution for Various Degrees of Freedom')
plt.xlabel('F value')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
📌 What the F-Distribution Tells Us
Right-skewed: Especially with small degrees of freedom.

Used in ANOVA: To test if group means differ significantly.

Used in regression: To test if the model explains a significant portion of variance.

F-statistic: Ratio of variance between groups to variance within groups.

In [None]:
22.Perform a one-way ANOVA test in Python and visualize the results with boxplots to compare group means?
-🧪 Step 1: Perform One-Way ANOVA
from scipy.stats import f_oneway

# Sample data: test scores from three different groups
group_A = [85, 88, 90, 87, 86]
group_B = [78, 80, 82, 76, 79]
group_C = [92, 95, 91, 94, 93]

# Run ANOVA
f_stat, p_val = f_oneway(group_A, group_B, group_C)

print(f"F-statistic: {f_stat:.3f}")
print(f"P-value: {p_val:.4f}")
If the p-value < 0.05, you reject the null hypothesis and conclude that at least one group mean is significantly different.
📊 Step 2: Visualize with Boxplots
import matplotlib.pyplot as plt

# Combine data for plotting
data = [group_A, group_B, group_C]
labels = ['Group A', 'Group B', 'Group C']

plt.boxplot(data, labels=labels)
plt.title('Group Comparison via One-Way ANOVA')
plt.ylabel('Scores')
plt.grid(True)
plt.tight_layout()
plt.show()
Boxplots give a clear visual of the spread, median, and outliers for each group—perfect for spotting differences in central tendency and variability.

In [None]:
23.Simulate random data from a normal distribution, then perform hypothesis testing to evaluate the means?
-🧪 Step-by-Step Python Code
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Step 1: Simulate random data from a normal distribution
np.random.seed(42)
sample = np.random.normal(loc=100, scale=10, size=50)  # mean=100, std=10

# Step 2: Define population mean to test against
pop_mean = 105

# Step 3: Perform one-sample t-test
t_stat, p_val = stats.ttest_1samp(sample, pop_mean)

# Step 4: Print results
print(f"Sample Mean: {np.mean(sample):.2f}")
print(f"T-statistic: {t_stat:.3f}")
print(f"P-value: {p_val:.4f}")

# Step 5: Visualize the distribution
plt.hist(sample, bins=10, edgecolor='black', alpha=0.7)
plt.axvline(np.mean(sample), color='blue', linestyle='dashed', linewidth=2, label='Sample Mean')
plt.axvline(pop_mean, color='red', linestyle='dotted', linewidth=2, label='Population Mean')
plt.title('Simulated Normal Distribution')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
🧠 Interpretation
If the p-value < 0.05, you reject the null hypothesis and conclude that the sample mean is significantly different from the population mean.

The histogram helps visualize how the sample is distributed around the sample and population means.

In [None]:
24. Perform a hypothesis test for population variance using a Chi-square distribution and interpret the results?
-🧪 Step-by-Step: Chi-Square Test for Population Variance
import numpy as np
from scipy.stats import chi2

# Sample data
data = [12.5, 13.2, 11.8, 12.9, 13.5, 12.1, 13.0, 12.7]
n = len(data)
sample_var = np.var(data, ddof=1)  # Sample variance
hypothesized_var = 1.0             # Population variance under H₀

# Chi-square test statistic
chi2_stat = (n - 1) * sample_var / hypothesized_var

# Degrees of freedom
df = n - 1

# Two-tailed p-value
p_val = 2 * min(chi2.cdf(chi2_stat, df), 1 - chi2.cdf(chi2_stat, df))

print(f"Sample Variance: {sample_var:.4f}")
print(f"Chi-square Statistic: {chi2_stat:.4f}")
print(f"P-value: {p_val:.4f}")
🧠 Interpretation
Null Hypothesis (H₀): The population variance is equal to the hypothesized value.

Alternative Hypothesis (H₁): The population variance is different.

If the p-value < 0.05, reject H₀ → the sample provides evidence that the population variance is significantly different.

In [None]:
25.Write a Python script to perform a Z-test for comparing proportions between two datasets or groups?
-🧪 Python Script: Two-Proportion Z-Test
import numpy as np
from statsmodels.stats.proportion import proportions_ztest

# Example: Group A and Group B success counts and sample sizes
successes = np.array([45, 30])     # e.g., 45 successes in group A, 30 in group B
samples = np.array([100, 90])      # e.g., 100 trials in group A, 90 in group B

# Perform two-proportion Z-test
z_stat, p_val = proportions_ztest(successes, samples)

print(f"Z-statistic: {z_stat:.3f}")
print(f"P-value: {p_val:.4f}")

# Interpretation
alpha = 0.05
if p_val < alpha:
    print("Result: Statistically significant difference in proportions (Reject H₀)")
else:
    print("Result: No significant difference in proportions (Fail to reject H₀)")
🧠 What This Means
Null Hypothesis (H₀): The proportions in both groups are equal.

Alternative Hypothesis (H₁): The proportions are different.

If the p-value < 0.05, you conclude there's a significant difference between the two proportions.

In [None]:
26.Implement an F-test for comparing the variances of two datasets, then interpret and visualize the results?
-To compare the variances of two datasets using an F-test in Python, you can follow this step-by-step guide. We’ll also visualize the distributions to better understand the variance differences.
🧪 Step 1: Simulate Two Datasets
import numpy as np

np.random.seed(42)
group1 = np.random.normal(loc=50, scale=5, size=30)
group2 = np.random.normal(loc=50, scale=10, size=30)
📊 Step 2: Perform the F-Test
from scipy.stats import f

# Calculate sample variances
var1 = np.var(group1, ddof=1)
var2 = np.var(group2, ddof=1)

# F-statistic (larger variance / smaller variance)
f_stat = var1 / var2 if var1 > var2 else var2 / var1

# Degrees of freedom
df1 = len(group1) - 1
df2 = len(group2) - 1

# Two-tailed p-value
p_val = 2 * min(f.cdf(f_stat, df1, df2), 1 - f.cdf(f_stat, df1, df2))

print(f"F-statistic: {f_stat:.3f}")
print(f"P-value: {p_val:.4f}")
📈 Step 3: Visualize the Distributions
import matplotlib.pyplot as plt

plt.hist(group1, bins=10, alpha=0.6, label='Group 1 (σ≈5)')
plt.hist(group2, bins=10, alpha=0.6, label='Group 2 (σ≈10)')
plt.axvline(np.mean(group1), color='blue', linestyle='dashed', label='Mean Group 1')
plt.axvline(np.mean(group2), color='orange', linestyle='dashed', label='Mean Group 2')
plt.title('Distribution Comparison')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
🧠 Interpretation
Null Hypothesis (H₀): The two groups have equal variances.

If p-value < 0.05, reject H₀ → the variances are significantly different.

The histogram helps you visually confirm the spread difference between the two groups.

In [None]:
27. Perform a Chi-square test for goodness of fit with simulated data and analyze the results.
-Absolutely! A Chi-square goodness-of-fit test checks whether observed categorical data matches an expected distribution. Let’s simulate some data, run the test, and interpret the results.
🧪 Step-by-Step Python Code
import numpy as np
from scipy.stats import chisquare

# Step 1: Simulate observed frequencies (e.g., dice rolls)
observed = np.array([18, 22, 20, 16, 24, 20])  # 6-sided die rolled 120 times

# Step 2: Define expected frequencies (uniform distribution)
expected = np.full_like(observed, fill_value=20)  # Expect 20 per face

# Step 3: Perform Chi-square goodness-of-fit test
chi2_stat, p_val = chisquare(f_obs=observed, f_exp=expected)

print(f"Chi-square Statistic: {chi2_stat:.3f}")
print(f"P-value: {p_val:.4f}")
🧠 Interpretation
Null Hypothesis (H₀): The observed distribution matches the expected distribution.

If p-value < 0.05, reject H₀ → the observed frequencies differ significantly from expected.

If p-value ≥ 0.05, fail to reject H₀ → no significant difference.
📊 Optional: Visualize Observed vs Expected
import matplotlib.pyplot as plt

labels = ['1', '2', '3', '4', '5', '6']
x = np.arange(len(labels))

plt.bar(x - 0.2, observed, width=0.4, label='Observed', color='skyblue')
plt.bar(x + 0.2, expected, width=0.4, label='Expected', color='orange')
plt.xticks(x, labels)
plt.xlabel('Die Face')
plt.ylabel('Frequency')
plt.title('Observed vs Expected Frequencies')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
