Question1: Define the z-statistic and explain its relationship to the standard normal distribution. How is the
z-statistic used in hypothesis testing?

Sol:
The z-statistic, also known as the z-score, is a measure that indicates how many standard deviations an individual data point is from the mean of a dataset. It is calculated using the formula: z= (X−μ)/σ
Where:
X is the value of the data point,
μ is the mean of the population,
σ is the standard deviation of the population.

##### Relationship to the Standard Normal Distribution
The z-statistic transforms data from any normal distribution into the standard normal distribution, which has:
- A mean of 0, and
- A standard deviation of 1.
This transformation allows comparisons across different distributions by standardizing them. A z-score of:
- 0 means the data point is exactly at the mean,
- Positive values mean the data point is above the mean,
- Negative values mean the data point is below the mean.
- ##### Z-Statistic in Hypothesis Testing
The z-statistic is widely used in hypothesis testing, especially for testing population means and proportions when the population standard deviation is known or the sample size is large (usually n>30).

###### Steps in Hypothesis Testing:
1. State the hypotheses:

- Null hypothesis (H 0): Assumes no effect or no difference (e.g., μ=μ 0).
- Alternative hypothesis (H A): Assumes there is an effect or difference (e.g., 𝜇≠𝜇0).

2. Compute the z-statistic: The z-statistic for a sample mean is calculated as: $$ z = \frac{\bar{X} - \mu_0}{\frac{\sigma}{\sqrt{n}}} $$
Where:
$\bar{X}$ is the sample mean,

${\mu_0}$ is the hypothesized population mean,

σ is the population standard deviation,

n is the sample size.

3. Compare to critical values: Based on the chosen significance level (α), the z-statistic is compared against critical values from the standard normal distribution:

- For a two-tailed test, reject ${H_0}$
 if ∣𝑧∣ exceeds the critical value (e.g., ±1.96 for 𝛼=0.05).
- For a one-tailed test, reject ${H_0}$ if the z-statistic is either too high or too low compared to the critical value.

4. Make a decision:
- If the z-statistic falls in the rejection region, the null hypothesis is rejected in favor of the alternative hypothesis.
- Otherwise, the null hypothesis is not rejected.

Question2 : What is a p-value, and how is it used in hypothesis testing? What does it mean if the p-value is
very small (e.g., 0.01)?

A *p-value* is a probability that helps determine the significance of results in hypothesis testing. It measures the likelihood of observing the test statistic, or more extreme values, assuming that the null hypothesis is true.

Role of p-value in hypothesis testing:

1. Hypothesis testing begins with formulating two hypotheses:
- Null hypothesis (${H_0}$): There is no effect or difference.
- Alternative hypothesis (${H_1}$): There is an effect or difference.

2. After collecting data and conducting a statistical test, a p-value is calculated to assess whether the observed data provide enough evidence to reject the null hypothesis.

3. The p-value is compared to a predefined significance level (α), commonly set at 0.05 (or 5%):
- If the p-value is less than α: The null hypothesis is rejected, indicating strong evidence against it, and supporting the alternative hypothesis.
- If the p-value is greater than α: There is insufficient evidence to reject the null hypothesis.

##### Interpreting a very small p-value (e.g., 0.01):
- A p-value of 0.01 indicates that there is only a 1% probability of observing the test statistic, or something more extreme, if the null hypothesis is true.
- Conclusion: The smaller the p-value, the stronger the evidence against the null hypothesis. In this case, if α=0.05, a p-value of 0.01 would lead to the rejection of the null hypothesis. It suggests that the observed results are highly unlikely under the assumption of no effect or difference.

Question3: Compare and contrast the binomial and Bernoulli distributions.

Sol:
The Binomial and Bernoulli distributions are both important probability distributions in statistics, especially for modeling binary or "success-failure" type experiments. Although related, they have distinct characteristics. Here's a comparison:

1. Definition:
- Bernoulli Distribution: Represents the outcome of a single trial that can have two possible outcomes: "success" (with probability 𝑝) and "failure" (with probability 1−𝑝). It’s the simplest discrete distribution.
- Binomial Distribution: Represents the number of successes in a fixed number (𝑛) of independent Bernoulli trials, each with the same probability of success (𝑝).
2. Parameters:
- Bernoulli Distribution:
  - 𝑝: The probability of success (0 ≤ 𝑝 ≤ 1).
- Binomial Distribution:
  - n: The number of trials.
  - p: The probability of success in each trial (0 ≤ p ≤ 1).
3. Support (Possible Values):
- Bernoulli Distribution: The outcome can only be either 0 (failure) or 1 (success). It’s used for a single trial.
  - 𝑋∈{0,1}
- Binomial Distribution: The outcome is the number of successes in n trials, ranging from 0 (no successes) to n (all trials successful).
  - X∈{0,1,2,…,n}
4. Probability Mass Function (PMF):
- Bernoulli PMF: P(X = x) = p^x * (1 - p)^(1 - x), where x ∈ {0, 1}
  Where p is the probability of success.
- Binomial PMF: P(X = k) = (n choose k) * p^k * (1 - p)^(n - k), where k ∈ {0, 1, 2, ..., n}
  Where n is the number of trials, k is the number of successes, and p is the probability of success.
5. Mean and Variance:
- Bernoulli:
  - Mean: E(X)=p
  - Variance: Var(X)=p(1−p)
- Binomial:
  - Mean: E(X)=np
  - Variance: Var(X)=np(1−p)
6. Use Case:
- Bernoulli: Used to model the outcome of a single event, such as flipping a coin once or determining if a product is defective (yes/no).
- Binomial: Used to model the number of successes in multiple independent trials of a Bernoulli process, such as flipping a coin 10 times or counting      how many of 100 products are defective.
7. Relationship Between Binomial and Bernoulli:
- Bernoulli is a special case of the Binomial distribution where n=1. In other words, if you perform a single trial of a binomial experiment, you get a - Bernoulli distribution.
8. Examples:
- Bernoulli Example: Suppose we flip a fair coin once, where heads is considered a success (p=0.5). The probability of getting a head (1) or a tail (0)    follows a Bernoulli distribution.
- Binomial Example: Suppose we flip a fair coin 5 times. The number of heads (successes) follows a Binomial distribution with parameters n=5 and  p=0.5.

Question 4: Under what conditions is the binomial distribution used, and how does it relate to the Bernoulli
distribution?

The binomial distribution is used under specific conditions and is closely related to the Bernoulli distribution. Here’s a breakdown of when to use the binomial distribution and its connection to the Bernoulli distribution:

##### Conditions for Using the Binomial Distribution
1. Fixed Number of Trials (n): The experiment consists of a predetermined number of trials or observations. This number must remain constant.

2. Two Possible Outcomes: Each trial results in one of two outcomes, often referred to as "success" (e.g., a win, a heads in a coin toss) or "failure" (e.g., a loss, tails).

3. Constant Probability (p): The probability of success (denoted as 
𝑝
p) is the same for each trial. Similarly, the probability of failure is 
1
−
𝑝
1−p.

4. Independent Trials: The trials are independent, meaning the outcome of one trial does not affect the outcomes of other trials.

##### Relation to the Bernoulli Distribution
- The Bernoulli distribution is a special case of the binomial distribution where the number of trials n=1. It describes a single trial with two           possible outcomes, typically encoded as 1 (success) and 0 (failure).

- The binomial distribution can be seen as the sum of multiple independent Bernoulli trials. Specifically, if you conduct n independent Bernoulli trials   (each with probability 
  p of success), the total number of successes follows a binomial distribution with parameters n and p.

##### Mathematical Representation
- Bernoulli Distribution: P(X = k) = p^k * (1 - p)^(1 - k) ,where k ∈ {0, 1} 

- Binomial Distribution:  P(X = k) = (n choose k) * p^k * (1 - p)^(n - k), where k ∈ {0, 1, 2, ..., n}
  where (n choose k) is the binomial coefficient, representing the number of ways to choose k successes from n trials.

Question5: What are the key properties of the Poisson distribution, and when is it appropriate to use this 
distribution?

The Poisson distribution is a discrete probability distribution that models the number of events occurring within a fixed interval of time or space. Here are its key properties and the conditions under which it is appropriate to use this distribution:

##### Key Properties of the Poisson Distribution
1. Discrete Nature: The Poisson distribution is used for count data, where the variable of interest is the number of occurrences of an event within a specified interval.

2. Parameter (λ): The distribution is characterized by a single parameter λ (lambda), which represents the average number of events in the given interval. Both the mean and variance of the Poisson distribution equal λ.

3. Probability Mass Function (PMF): The probability of observing k events in an interval is given by the formula
   P(X = k) = (e^(-λ) * λ^k) / k!, for k = 0, 1, 2, ...    where e is the base of the natural logarithm, and
k! is the factorial of
k.

4. Memoryless Property: The Poisson process has a memoryless property, meaning that the number of events occurring in non-overlapping intervals is independent of each other.

5. Additivity: If you have two independent Poisson processes with rates

  andλ
 , the sum of these processes is also a Poisson process with a rate ofλ1 + λ2.

##### When to Use the Poisson Distribution

The Poisson distribution is appropriate under the following conditions:
1. Rare Events: It is commonly used to model the number of rare events occurring in a fixed interval (time, area, volume, etc.). For example, it can be used to model the number of phone calls received at a call center in an hour.

2. Fixed Interval: The events are counted in a defined interval of time or space, such as the number of accidents at a traffic intersection per month.

3. Independence of Events: The events should be independent; the occurrence of one event should not influence the occurrence of another.

4. Constant Rate: The average rate λ of occurrence should be constant over the interval. For instance, if you are counting the number of emails received per hour, you assume a steady rate rather than a fluctuating one.

5. No Simultaneous Events: The probability of two or more events occurring at exactly the same instant is negligible.

##### Examples of Poisson Distribution Applications
- The number of decay events per unit time from a radioactive source.
- The number of arrivals at a service point, such as a bank or a hospital.
- The number of customer purchases during a promotional period in a store.

Question6: Define the terms "probability distribution" and "probability density function" (PDF). How does a 
PDF differ from a probability mass function (PMF)?

##### Probability Distribution
A probability distribution is a mathematical function that describes the likelihood of different outcomes in a random variable. It provides a way to map all possible values of a random variable to their associated probabilities. Probability distributions can be classified into two main types:

1. Discrete Probability Distribution: Used for discrete random variables, where the variable can take on a countable number of values (e.g., the number of heads in coin flips).
2. Continuous Probability Distribution: Used for continuous random variables, where the variable can take on any value within a given range (e.g., the height of individuals).
##### Probability Density Function (PDF)
A probability density function (PDF) is a specific type of probability distribution used for continuous random variables. The PDF describes the relative likelihood of the random variable taking on a particular value. The key properties of a PDF are:

1. Non-Negative: The value of the PDF is always non-negative.
2. Area Under the Curve: The total area under the PDF curve over its entire range is equal to 1, representing the total probability.
3. Probability Calculation: The probability that a continuous random variable falls within a certain interval is given by the area under the PDF curve over that interval.
s##### Differences Between PDF and PMF
The main differences between a probability density function (PDF) and a probability mass function (PMF) are:
| Feature                     | Probability Mass Function (PMF)                      | Probability Density Function (PDF)               |
|-----------------------------|------------------------------------------------------|-------------------------------------------------|
| **Type of Random Variable**  | Used for discrete random variables                   | Used for continuous random variables             |
| **Definition**              | Gives the probability that a discrete random variable equals a specific value | Gives the density of probabilities over a range of values |
| **Probability Values**      | Directly provides probabilities for specific outcomes | Does not give probabilities directly; must integrate to find probabilities over intervals |
| **Sum vs. Area**           | The sum of all probabilities in a PMF equals 1     | The total area under the PDF curve equals 1    |
| **Notation**                | \( P(X = k) \) for \( k \) in discrete values       | \( f(x) \) for \( x \) in continuous value    |
    |
    |


Question7: Explain the Central Limit Theorem (CLT) with example.

The Central Limit Theorem (CLT) is a fundamental statistical principle that states that the distribution of the sample mean (or sum) of a sufficiently large number of independent and identically distributed random variables will approximate a normal distribution, regardless of the original distribution of the variables. This theorem is significant because it allows statisticians to make inferences about population parameters based on sample statistics.

##### Key Points of the Central Limit Theorem
1. Independence: The sampled observations must be independent of each other.
2. Identical Distribution: The random variables should be identically distributed, meaning they have the same probability distribution.
3. Sample Size: As the sample size n increases (typically n≥30 is considered sufficient), the sampling distribution of the sample mean approaches a         normal distribution, even if the original population distribution is not normal.
4. Mean and Variance: If the original population has a mean μ and a standard deviation σ, the sampling distribution of the sample mean will have:
- Mean: μ_X̄ = μμ- 
Standard deviation (Standard Error):σ_X̄ = σ / √n
​
#####  
Example of the Central Limit Theorem
Let's illustrate the CLT with a simple example:

Scenario
Suppose we have a population of exam scores for a class of students that is uniformly distributed between 0 and 100. The uniform distribution means that every score within this range is equally likely. We want to understand the distribution of the average scores from different samples taken from this population.

##### Step-by-Step Illustration
1. Population Distribution- 
The population of scores is uniformly distributed, so its me 0
μ=50 and standard deviatio7
σ≈28.87.
2. Taking Samples- 
We take multiple random samples of siz0
n=30 from this population.

3. Calculating Sample Means- 
For each sample, calculate the mean score. After repeating this process (say, 1000 times), we obtain a distribution of sample mean
  
4. Resulting Distribution-
According to the CLT, even though the original population distribution (exam scores) is uniform and not normal, the distribution of the sampl          e mea s will approximate a normal distribution as the number of samples increase
    
5. Visualizing the Result  - 
If you plot the histogram of the sample means, you will see that it approaches a normal distribution centered around μ=50, with reduced variability (spread) compared to the original uniform distribution.

##### Importance of the Central Limit Theorem
Statistical Inference: The CLT allows us to use normal probability techniques to make inferences about population parameters, even when the original data does not follow a normal distributio.- 

Hypothesis Testing and Confidence Intervals: The theorem forms the basis for many statistical methods, including hypothesis testing and the construction of confidence intervalon

Question8: Compare z-scores and t-scores. When should you use a z-score, and when should a t-score be applied instead?

sol:
Z-scores and t-scores are both standardized scores used in statistics to indicate how many standard deviations an element is from the mean. However, they are used in different contexts and have distinct characteristics. Here’s a comparison of the two:

##### Z-Scores

- Definition: A z-score (or standard score) measures the distance of a data point from the mean in terms of standard deviations. It is calculated using the formula: 𝑧 = (X-𝜇)/σ , where X is the value, μ is the population mean, and σ is the population standard deviation.

- Population Data: Z-scores are used when you have a sample that is large (typically n≥30) or when the population standard deviation (σ) is known.

- Normal Distribution: Z-scores are applied when the underlying data follows a normal distribution, or when the sample size is large enough for the Central Limit Theorem to apply.

- Applications: Commonly used in hypothesis testing, confidence intervals, and when comparing data from different normal distributions.

##### T-Scores

- Definition: A t-score is similar to a z-score but is used when the sample size is small (typically n<30) and the population standard deviation is        unknown. The formula for calculating a t-score is: t = (X - X̄) / (s / √n) , 
where
X is the value,$\bar{X}$  is the sample mean,
s is the sample standard deviation, and
n is the sample size.
- 
Sample Data: T-scores are specifically used when analyzing sample data, particularly when dealing with smaller sample sizes where the variability is greater.
- 
Student's t-Distribution: T-scores follow the Student's t-distribution, which has thicker tails compared to the normal distribution. This accounts for the added uncertainty in estimating the population standard deviation from a small sample.
- 
Applications: Used in hypothesis testing, confidence intervals, and situations where the sample size is small, particularly in estimating population means

##### When to Use Z-Scores vs. T-Scores
| Criteria                     | Z-Score                             | T-Score                                 |
|------------------------------|-------------------------------------|-----------------------------------------|
| **Sample Size**              | Large sample (n ≥ 30)              | Small sample (n < 30)                   |
| **Population Standard Deviation** | Known                           | Unknown                                  |
| **Distribution**             | Normal distribution or large sample | Student's t-distribution                 |
| **Formula**                  | \( z = \frac{X - \mu}{\sigma} \)  | \( t = \frac{X - \bar{X}}{s / \sqrt{n}} \) |
| **Use Case**                 | Hypothesis testing, confidence intervals with known population parameters | Hypothesis testing, confidence intervals with unknown population parameters |
.