#### Q1. What are the Probability Mass Function (PMF) and Probability Density Function (PDF)? Explain with an example.

**Probability Mass Function** and **Probability Density Function**  describe the probability distribution of a random variable.

1. **Probability Mass Function (PMF)** 
* *PMF* is used to describe the probability distribution of a discrete random variable. It gives the probability that a discrete random variable takes on a specific value. The PMF maps each possible value of the random variable to its associated probability.
* Mathematically, for a discrete random variable X, the PMF is defined as:
    
    `PMF(X = x) = P(X = x)`
* *Example* - Suppose we have a six-sided fair die (with sides numbered 1 to 6). The PMF of the random variable X representing the outcome of a single roll of the die would be:
    ```
    PMF(X = 1) = 1/6
    PMF(X = 2) = 1/6
    PMF(X = 3) = 1/6
    PMF(X = 4) = 1/6
    PMF(X = 5) = 1/6
    PMF(X = 6) = 1/6
    ```
2. **Probability Distribution Function (PDF)**
* *PDF* is used to describe the probability distribution of continuous variable. Unlike PMF, a PDF doesn't give the probability of a specific value but provide the probability of a the random variable falling within a certain range.
* Mathematically, for a continuous random variable X, the PDF is denoted as f(x), and it satisfies:

    1. f(x) >= 0 for all x (non-negativity property).
    2. The area under the curve of the PDF over the entire range of X is equal to 1.
* The probability that the random variable X falls within an interval [a, b] is given by the integral of the PDF over that interval:

    `P(a<=X<=b) = ∫[a to b] f(x) dx`
* *Example* - Consider the uniform distribution on the interval [0, 1]. The PDF of this continuous random variable X is:

    ```
    f(x) = 1 for 0 <= x <= 1
    f(x) = 0 for x < 0 or x > 1
    ```
___

#### Q2. What is Cumulative Density Function (CDF)? Explain with an example. Why CDF is used?

* **The Cumulative Density Function (CDF)** describes the probability that a random variable takes on a value less than or equal to a given value. The CDF provides a complete description of the probability distribution of a random variable.

* Mathematically, for a random variable X, the CDF is defined as:

    `CDF(x) = P(X <= x)`

*   Where:

    1. `CDF(x)` is the cumulative probability that the random variable X is less than or equal to x.
    2. `P(X <= x)` is the probability of the event "*X is less than or equal to x.*"

* Lets see a practical example to understand why it's used.
***Example***:
* Suppose you run a coffee shop, and you are interested in analyzing the wait times of customers before they receive their coffee. You collect data on the wait times for a sample of customers and want to understand the probability distribution of these wait times.

* Let's say the wait times (in minutes) for 10 customers in your sample are as follows: [2, 3, 4, 5, 3, 4, 6, 2, 3, 5].

* Step 1: **Empirical CDF** - The empirical CDF (ECDF) is a way to estimate the true CDF based on the observed data. For our example, the ECDF is constructed as follows:

* For each data point, count the number of data points less than or equal to it and divide by the total number of data points.

* Here's the ECDF table for our data:

|Wait Time (x)|Number of Data Points ≤ x|ECDF (P(X ≤ x))|
|-:|-:|-:|
|2|2|0.2|
|3|5|0.5|
|4|7|0.7|
|5|9|0.9|
|6|10|1.0|

* Step 2: Understanding the Results - Using the ECDF, we can answer various questions related to customer wait times:

    1. **Probability of Wait Time**: The CDF (or ECDF) allows us to find the probability of a customer waiting less than or equal to a specific time. For example, the probability that a customer waits less than or equal to 4 minutes is 0.7 (70%).
    2. **Probability of Waiting within a Range**: We can determine the probability of a customer waiting between two specific times. For instance, the probability that a customer waits between 3 and 5 minutes is P(3 ≤ X ≤ 5) = P(X ≤ 5) - P(X ≤ 2) = 0.9 - 0.2 = 0.7 (70%).
    3. **Percentiles**: The CDF helps us find percentiles, which indicate the wait time below which a given percentage of customers fall. For example, the 75th percentile represents the wait time below which 75% of customers fall. In our example, the 75th percentile is 5 minutes.
    4. **Comparing Service Efficiency**: By analyzing the CDF, you can compare the service efficiency of different days or times. A day with a higher CDF value at a particular wait time may indicate better service during that time.
___


#### Q3. What are some examples of situations where the normal distribution might be used as a model? Explain how the parameters of the normal distribution relate to the shape of the distribution.

The **normal distribution**, also known as the **Gaussian distribution** or bell curve, is one of the most widely used probability distributions in statistics. It is applicable in various real-world situations where certain conditions are met. Here are some examples of situations where the normal distribution might be used as a model:
1. **Heights of People**: The heights of a large population of adults often follow a normal distribution. Though there may be some variations, the overall distribution tends to resemble a bell curve.
2. **Exam Scores**: In a well-designed exam, the scores of a large group of students can be modeled using a normal distribution. This is especially true when the exam is designed to have a reasonable level of difficulty and represents a broad range of student abilities.
3. **Distribution of Sample Mean**: Distribution of mean of samples taken from a population will always form a narmal distribution around the mean of the population

The shape of the distribution is determined by two parameters: the `mean (μ)` and the `standard deviation (σ)`. These parameters play a crucial role in defining the characteristics of the bell-shaped curve.

1. **Mean (μ) affects the position of the peak of the distribution:
    * If μ is increased, the peak of the distribution shifts to the right.
    * If μ is decreased, the peak of the distribution shifts to the left.
2. **standard deviation (σ)** influences the spread or width of the distribution:
    * A small σ results in a narrow and tall bell curve, indicating that data points are concentrated around the mean.
    * A large σ results in a wider and flatter bell curve, indicating that data points are more spread out from the mean.

* When μ = 0 and σ = 1, it is referred to as the standard normal distribution, and it serves as a baseline for comparisons and standardizations.
___


#### Q4. Explain the importance of Normal Distribution. Give a few real-life examples of Normal Distribution.

* The **Normanl Distribution** is of significant impostance due to it's widespread occurance in reaal-word phenomena and it's mathematical properties. Some of the key reasons why the normal distribution is essential are:
1. **Central Limit Theorem (CLT)**: The CLT states that the sampling distribution of the sample mean approaches a normal distribution, regardless of the shape of the original population distribution. The mean of the sampling distribution of the sample mean will be equal to the population mean, and the standard deviation of the sampling distribution (also known as the standard error) will be equal to the population standard deviation divided by the square root of the sample size.
2. **Data Modeling**: Many natural and social phenomena exhibit a tendency to cluster around a central value with a symmetric distribution. The normal distribution provides an excellent approximation for such cases, allowing researchers and statisticians to model data efficiently.
3. **Inference and Hypothesis Testing**: The normal distribution simplifies statistical inference and hypothesis testing, as many parametric tests and confidence intervals assume normality. When the sample size is large enough, the use of normality assumptions often leads to accurate results.
4. **Standardization**: The standard normal distribution (with mean 0 and standard deviation 1) facilitates comparison and standardization across different distributions. It allows transforming data into standardized z-scores, making it easier to compare data points from different datasets.
* Real-life examples of the Normal Distribution:
1. Heights of Adults:
2. IQ Scores
3. Exam Scores
4. Errors in Measurements
___


#### Q5. What is Bernaulli Distribution? Give an Example. What is the difference between Bernoulli Distribution and Binomial Distribution?
* **The Bernoulli distribution** is a discrete probability distribution that models a random experiment with two possible outcomes: success (usually denoted by 1) and failure (usually denoted by 0).
* The distribution is characterized by a single parameter, denoted by p, which represents the probability of success in a single trial.
* The probability mass function (PMF) of the Bernoulli distribution is given by:

    `P(X = x) = p^x * (1-p)^(1-x) for x ∈ {0, 1}`

    Where:
    * X is a random variable that takes values 0 or 1 (failure or success, respectively).
    * p is the probability of success in a single trial (0 <= p <= 1).
* An example of a Bernoulli distribution is **flipping a fair coin**.
* Difference between Bernoulli Distribution and Binomial Distribution:

|Bernoulli Distribution|Binomial Distribution|
|----------------------|---------------------|
|Models a single trial or experiment with two possible outcomes.|Models the number of successes in a fixed number of independent Bernoulli trials.|
|Has a single parameter p, which is the probability of success in a single trial.|Has two parameters, n and p, where n is the number of trials and p is the probability of success in a single trial.|
|The random variable X takes only two possible values, 0 or 1.|The random variable Y represents the number of successes in n trials, and it can take values from 0 to n.|
___

#### Q6. Consider a dataset with a mean of 50 and a standard deviation of 10. If we assume that the dataset is normally distributed, what is the probability that a randomly selected observation will be greater than 60? Use the appropriate formula and show your calculations.

* Since the dataset is normally distributed, let's find the z-score for x = 60.
    i.e. `Z = (x - μ)/σ = (60-50)/10 = 1`
* From z-table the value for 1 = 0.84134
* Area under curve where x > 60:
    `P(X>60) = 1 - 0.84134`
    `P(X>60) = 0.15866`
* **Probability that a randomly selected observation will be greater than 60 = 15.86%**
___

#### Q7. Explain uniform Distribution with an example.
* **Uniform distribution** is a type of probability distribution where all possible outcomes of a random variable have equal probabilities. In other words, each value in the range of the random variable has the same likelihood of occurring. It is often represented graphically as a rectangular shape, where the height of the rectangle is constant over a specified interval.
* An example of a uniform distribution is rolling a fair six-sided die. When you roll the die, the outcomes (1, 2, 3, 4, 5, and 6) are all equally likely to occur. Each number has a probability of 1/6 or approximately 0.1667
___

#### Q8. What is the z score? State the importance of the z score.
* The *Z-score*, also known as the *standard score*, is a statistical measure that indicates how many standard deviations a data point is away from the mean of the dataset. 
* It is a dimensionless value and is calculated by subtracting the mean from the data point and then dividing the result by the standard deviation.
* The formula for calculating the Z-score of a data point X in a dataset with mean μ and standard deviation σ is:

    `Z = (X - μ) / σ`
* The Z-score is essential for various reasons, including:
    1. **Standardization**: The Z-score standardizes data, making it easier to compare and analyze values from different distributions. It transforms the original data into a common scale, with a mean of 0 and a standard deviation of 1. This standardization is useful when working with datasets with different units or scales.
    2. **Outlier Detection**: Z-scores help identify outliers in a dataset. Observations with Z-scores significantly larger or smaller than 0 are considered unusual and may indicate potential outliers.
    3. **Probability Calculation**: Z-scores are used in normal distribution calculations to find probabilities associated with specific data values. For example, in a normally distributed dataset, you can use Z-scores to find the probability of a random observation falling above or below a certain value.
    4. **Data Standardization in Machine Learning**: Z-scores are commonly used for data preprocessing in machine learning algorithms. Standardizing input features helps improve the model's performance and convergence during training.
    5. **Data Analysis**: Z-scores are used in hypothesis testing and statistical analysis. They help in comparing individual data points to the overall distribution of the data and draw conclusions based on their positions relative to the mean.
___

#### Q9. What is Central Limit Theorem? State the significance of the Central Limit Theorem.
* **Central Limit Theorem** states that the sampling distribution of the sample mean (or sum) approaches a normal distribution as the sample size becomes large enough, even if the original population is not normally distributed.
* Significance of the Central Limit Theorem:
    1. *Confidence in Inference*: The Central Limit Theorem allows statisticians to use normal distribution-based methods, such as Z-tests and t-tests, even when the population distribution is unknown or non-normal. This is because, for large sample sizes, the distribution of the sample mean is approximately normal, regardless of the original population.
    2. *Hypothesis Testing*: The Central Limit Theorem forms the basis of hypothesis testing, where we compare sample statistics to hypothesized population parameters. It enables the use of parametric tests, which are more powerful and well-established.
    3. *Estimation*: It simplifies the process of estimating population parameters. For example, when calculating the confidence interval for the population mean, the Central Limit Theorem can be applied to obtain a normally distributed sampling distribution of the sample mean, which is used to make inferences about the population mean.
    4. *Generalizability*: The Central Limit Theorem is widely applicable in various fields, including social sciences, engineering, finance, and many more. It makes statistical analysis more accessible and allows us to draw meaningful conclusions from samples in practical situations.
___

#### Q10. State the assumptions of the Central Limit Theorem.

The main assumptions of the Central Limit Theorem are as follows:

1. **Random Sampling**: The samples should be selected randomly from the population of interest. Random sampling ensures that each member of the population has an equal chance of being included in the sample, which helps in avoiding bias in the estimates.
2. **Independent Observations**: The individual observations within each sample should be independent of each other. In other words, the value of one observation should not influence or be influenced by the value of another observation in the same sample.
3. **Finite Variance**: The population from which the samples are drawn should have a finite variance. If the variance is infinite, the Central Limit Theorem may not hold.
4. **Sample Size**: The sample size should be sufficiently large. While there is no strict rule for the minimum sample size, as a general guideline, a sample size of 30 or more is often considered sufficient for the Central Limit Theorem to apply reasonably well. However, larger sample sizes usually yield more accurate results.
5. **No Extreme Skewness**: Although the Central Limit Theorem can work with skewed populations, extremely skewed distributions may require larger sample sizes for the sampling distribution of the sample mean to approximate a normal distribution.
___