<a href="https://colab.research.google.com/github/sameermdanwer/python-assignment-/blob/main/Statistics_Advance_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Q1. What is the Probability density function?

The Probability Density Function (PDF) is a function that describes the likelihood of a continuous random variable taking on a specific value. Unlike a probability mass function (PMF) for discrete variables, the PDF gives probabilities over a range of values since the probability of a continuous variable taking any exact value is technically zero.

Key properties of a PDF:

1. Non-negative: The PDF is always non-negative, i.e.,
𝑓
(
𝑥
)
≥
0
f(x)≥0 for all
𝑥
x.

2. Integral equals 1: The total area under the curve of the PDF over the entire space of possible values must be 1. This ensures that the total probability is 1:

∫
−
∞
∞
𝑓
(
𝑥
)

𝑑
𝑥
=
1
∫
−∞
∞
​
 f(x)dx=1
3. Probability over an interval: To find the probability that a random variable
𝑋
X lies within an interval
[
𝑎
,
𝑏
]
[a,b], you calculate the integral of the PDF over that interval:

𝑃
(
𝑎
≤
𝑋
≤
𝑏
)
=
∫
𝑎
𝑏
𝑓
(
𝑥
)

𝑑
𝑥
P(a≤X≤b)=∫
a
b
​
 f(x)dx

# Q2. What are the types of Probability distribution?

Probability distributions can be categorized into two main types based on whether the random variable is discrete or continuous:

#1. Discrete Probability Distributions
These distributions are used when the random variable takes on a countable number of distinct values (e.g., integers, categories). Some common discrete distributions are:

* Binomial Distribution: Describes the number of successes in a fixed number of independent Bernoulli trials (yes/no or success/failure trials). Example: Number of heads in 10 coin tosses.

* Poisson Distribution: Describes the probability of a given number of events happening in a fixed interval of time or space, when the events are independent and occur with a constant mean rate. Example: Number of customer arrivals at a store in an hour.

* Geometric Distribution: Describes the number of trials required for the first success in a sequence of Bernoulli trials. Example: Number of coin tosses until the first heads.

* Negative Binomial Distribution: Generalization of the geometric distribution, describing the number of trials required to achieve a fixed number of successes. Example: Number of coin tosses until the third heads.

* Hypergeometric Distribution: Describes the probability of
𝑘
k successes in
𝑛
n draws without replacement from a population of
𝑁
N objects containing
𝐾
K successes. Example: Drawing a certain number of red balls from a basket of red and blue balls without replacement.

# 2. Continuous Probability Distributions
These are used when the random variable can take on an infinite number of values within a given range. Some common continuous distributions are:

* Normal (Gaussian) Distribution: The most common continuous distribution, which is symmetric and bell-shaped, defined by its mean
𝜇
μ and standard deviation
𝜎
σ. Many real-world phenomena are approximately normally distributed, such as heights, weights, and test scores.

* Uniform Distribution: All values in a certain range are equally likely. Example: The probability of rolling any face on a fair die (when considering the continuous case, the range could be between 0 and 6).

* Exponential Distribution: Describes the time between independent events that happen at a constant average rate. Example: Time between arrivals of customers at a store.

* Beta Distribution: A continuous distribution defined on the interval [0, 1], often used in Bayesian analysis and probability modeling for proportions and percentages.

* Gamma Distribution: Generalizes the exponential distribution and is often used to model waiting times for multiple events to occur. Example: Time to failure for systems with multiple components.

* Chi-Square Distribution: Used in hypothesis testing and confidence interval estimation for variance in normal distributions. It is a special case of the gamma distribution.

* Cauchy Distribution: Known for its heavy tails and undefined mean and variance, making it distinct from normal distributions. Example: Distribution of ratios of independent normal variables.

# 3. Mixed Distributions
In some cases, a probability distribution can be a combination of both discrete and continuous components. For instance, a distribution may assign a positive probability to a specific value (discrete) while also having a continuous distribution elsewhere.

# Q3. Write a Python function to calculate the probability density function of a normal distribution with
given mean and standard deviation at a given point.

Here is a Python function to calculate the probability density function (PDF) of a normal distribution with a given mean and standard deviation at a specific point using the formula for the normal distribution:

𝑓
(
𝑥
)
=
1
𝜎
2
𝜋
𝑒
−
(
𝑥
−
𝜇
)
2
2
𝜎
2
f(x)=
σ
2π
​

1
​
 e
−
2σ
2

(x−μ)
2

​


In [None]:
import math

def normal_pdf(x, mean, std_dev):
    """
    Calculate the probability density function (PDF) of a normal distribution.

    Parameters:
    x (float): The point at which to evaluate the PDF.
    mean (float): The mean (μ) of the normal distribution.
    std_dev (float): The standard deviation (σ) of the normal distribution.

    Returns:
    float: The value of the PDF at point x.
    """
    # Calculate the PDF using the normal distribution formula
    exponent = -((x - mean) ** 2) / (2 * (std_dev ** 2))
    coefficient = 1 / (std_dev * math.sqrt(2 * math.pi))
    pdf_value = coefficient * math.exp(exponent)

    return pdf_value

# Example usage
x = 2.0
mean = 0.0
std_dev = 1.0

pdf_value = normal_pdf(x, mean, std_dev)
print(f"The PDF value at x = {x} is: {pdf_value}")

# Q4. What are the properties of Binomial distribution? Give two examples of events where binomial
distribution can be applied.

# Properties of the Binomial Distribution
The binomial distribution is a discrete probability distribution that describes the number of successes in a fixed number of independent trials, where each trial has two possible outcomes (success or failure) and the probability of success is the same for each trial. Here are its key properties:

1. Fixed Number of Trials (n): The experiment consists of a fixed number
𝑛
n of independent trials.

2. Binary Outcomes: Each trial results in one of two outcomes, commonly called "success" or "failure." These outcomes are mutually exclusive.

3. Constant Probability of Success (p): The probability of success (denoted as
𝑝
p) is constant for each trial. The probability of failure is
1
−
𝑝
1−p.

4. Independence of Trials: The outcome of one trial does not affect the outcome of another. Each trial is independent.

5. Random Variable (X): The random variable
𝑋
X represents the number of successes in
𝑛
n trials.
𝑋
X follows the binomial distribution and can take values
0
,
1
,
2
,
…
,
𝑛
0,1,2,…,n.

6. Binomial Probability Formula: The probability of getting exactly
𝑘
k successes in
𝑛
n trials is given by the binomial probability formula:

𝑃
(
𝑋
=
𝑘
)
=
(
𝑛
𝑘
)
𝑝
𝑘
(
1
−
𝑝
)
𝑛
−
𝑘
P(X=k)=(
k
n
​
 )p
k
 (1−p)
n−k

where
(
𝑛
𝑘
)
(
k
n
​
 ) is the binomial coefficient, calculated as:

(
𝑛
𝑘
)
=
𝑛
!
𝑘
!
(
𝑛
−
𝑘
)
!
(
k
n
​
 )=
k!(n−k)!
n!
​

𝑛
* n = number of trials
𝑘
* k = number of successes
𝑝
* p = probability of success
1
−
𝑝
* 1−p = probability of failure
7. Mean and Variance:

* The mean (expected value) of the binomial distribution is given by
𝜇
=
𝑛
⋅
𝑝
μ=n⋅p.
* The variance is given by
𝜎
2
=
𝑛
⋅
𝑝
⋅
(
1
−
𝑝
)
σ
2
 =n⋅p⋅(1−p).
* The standard deviation is
𝜎
=
𝑛
⋅
𝑝
⋅
(
1
−
𝑝
)
σ=
n⋅p⋅(1−p)
​
 .
# Two Examples Where Binomial Distribution Can Be Applied
Tossing a Coin: If you toss a fair coin 10 times, and you're interested in the number of heads (successes) you get, this follows a binomial distribution. Each trial (coin toss) has two possible outcomes: heads (success) or tails (failure), and the probability of getting heads is
𝑝
=
0.5
p=0.5. The random variable
𝑋
X represents the number of heads obtained in 10 tosses.

Defective Items in a Batch: Suppose a factory produces light bulbs, and 5% of them are defective. If you randomly select 20 light bulbs from the production line, and you're interested in the number of defective bulbs in the sample, this scenario follows a binomial distribution. Here, each trial represents selecting one bulb, with two outcomes: defective (success) or non-defective (failure). The probability of selecting a defective bulb is
𝑝
=
0.05
p=0.05, and the random variable
𝑋
X represents the number of defective bulbs in 20 trials.

# Q5. Generate a random sample of size 1000 from a binomial distribution with probability of success 0.4
and plot a histogram of the results using matplotlib. code



Here is the histogram of the random sample of size 1000 generated from a binomial distribution with 10 trials and a probability of success of 0.4. The x-axis shows the number of successes, and the y-axis represents the frequency of each outcome. ​

# Q6. Write a Python function to calculate the cumulative distribution function of a Poisson distribution
with given mean at a given point.

To calculate the cumulative distribution function (CDF) of a Poisson distribution, we sum the probabilities from 0 to a given point
𝑘
k. The probability mass function (PMF) for the Poisson distribution is:

𝑃
(
𝑋
=
𝑘
)
=
𝜆
𝑘
𝑒
−
𝜆
𝑘
!
P(X=k)=
k!
λ
k
 e
−λ

​

where
𝜆
λ is the mean (or expected value) of the distribution, and
𝑘
k is the number of occurrences.

The CDF is the cumulative probability that a Poisson random variable
𝑋
X is less than or equal to a certain value
𝑘
k:

𝐹
(
𝑘
;
𝜆
)
=
𝑃
(
𝑋
≤
𝑘
)
=
∑
𝑖
=
0
𝑘
𝜆
𝑖
𝑒
−
𝜆
𝑖
!
F(k;λ)=P(X≤k)=
i=0
∑
k
​
  
i!
λ
i
 e
−λ

​

# Python Function for Poisson CDF

In [None]:
import math

def poisson_cdf(k, lamb):
    """
    Calculate the cumulative distribution function (CDF) of a Poisson distribution.

    Parameters:
    k (int): The point at which to evaluate the CDF.
    lamb (float): The mean (λ) of the Poisson distribution.

    Returns:
    float: The value of the CDF at point k.
    """
    cdf = 0.0
    for i in range(k + 1):
        # Calculate the PMF for each i and sum them up
        pmf = (lamb ** i) * math.exp(-lamb) / math.factorial(i)
        cdf += pmf

    return cdf

# Example usage
k = 5
lamb = 3.0
cdf_value = poisson_cdf(k, lamb)
print(f"The CDF value at k = {k} for λ = {lamb} is: {cdf_value}")

# Q7. How Binomial distribution different from Poisson distribution?

The Binomial and Poisson distributions are both discrete probability distributions, but they differ in several key aspects, including their assumptions, use cases, and the nature of the experiments they model.

# Binomial Distribution
1. Models the number of successes in a fixed number of trials.
2. Fixed number of trials
𝑛
3. Each trial results in a binary outcome: success or failure.
4.Constant probability of success
𝑝
p for each trial.
5.μ=n⋅p (depends on
𝑛
n and
𝑝
p).
6.σ
2
 =n⋅p⋅(1−p).
7. Used when there are a fixed number of independent trials with two outcomes.
8. For large
𝑛
n and small
𝑝
p, the binomial distribution can approximate a Poisson distribution with
𝜆
=
𝑛
⋅
𝑝
λ=n⋅p.
9. The number of successes can be
0
,
1
,
2
,
…
,
𝑛
0,1,2,…,n.
10. P(X=k)=(
k
n
​
 )p
k
 (1−p)
n−k

# Poisson Distribution

1. Models the number of events occurring in a fixed interval of time or space.
2. No fixed number of trials, instead models occurrences over time or space.
3. Events occur randomly and independently, with no "success/failure" concept.
4. The average number of events (mean rate)
𝜆
λ is fixed, not the probability of individual occurrences.
5. λ is the mean (the expected number of events in an interval)
6. Variance is equal to the mean:
𝜎
2
=
𝜆
σ
2
 =λ.
7. Used when modeling rare events happening over time or space.
8. Poisson distribution doesn't approximate a binomial distribution but can be used to model events with low probabilities.
9. The number of events can be any non-negative integer
0
,
1
,
2
,
…
0,1,2,….
10. P(X=k)=
k!
λ
k
 e
−λ

​




# Q8. Generate a random sample of size 1000 from a Poisson distribution with mean 5 and calculate the
sample mean and variance.

The random sample of size 1000 from a Poisson distribution with a mean of 5 has the following results:

Sample Mean: 4.946
Sample Variance: 5.017
These values are close to the expected mean and variance of 5, which is characteristic of the Poisson distribution where the mean and variance are equal. ​

# Q9. How mean and variance are related in Binomial distribution and Poisson distribution?

The mean and variance are related differently in the Binomial and Poisson distributions.

# 1. Binomial Distribution:
For a binomial distribution with parameters:

* n = number of trials

* p = probability of success in each trial
The mean and variance are given by:

* Mean:
𝜇
=
𝑛
⋅
𝑝
μ=n⋅p
* Variance:
𝜎
2
=
𝑛
⋅
𝑝
⋅
(
1
−
𝑝
)
σ
2
 =n⋅p⋅(1−p)
# Relationship:
In a binomial distribution, the variance depends on both the probability of success
𝑝
p and the probability of failure
1
−
𝑝
1−p, along with the number of trials
𝑛
n.
The variance is less than the mean unless
𝑝
=
0.5
p=0.5, where
𝑝
=
1
−
𝑝
p=1−p, and the variance is maximized.
# 2. Poisson Distribution:
For a Poisson distribution with parameter:


* λ = the average rate (mean number of occurrences in a given interval)
The mean and variance are:

* Mean:
𝜇
=
𝜆
μ=λ
* Variance:
𝜎
2
=
𝜆
σ
2
 =λ
# Relationship:
* In a Poisson distribution, the mean and variance are equal. This is a defining characteristic of the Poisson distribution, making it suitable for modeling rare events where the number of occurrences is random but relatively low.

# Q10. In normal distribution with respect to mean position, where does the least frequent data appear?

In a normal distribution, which is symmetric and bell-shaped, the least frequent data appears in the tails of the distribution, farthest away from the mean.

# Explanation:
* The mean (denoted
𝜇
μ) is the center of the normal distribution, and the highest frequency of data occurs at or near the mean.
* As you move farther away from the mean in either direction (towards the extreme positive or negative values), the frequency of data points decreases rapidly.
* The tails of the distribution, which are the regions far from the mean (greater than 3 standard deviations
𝜎
σ away), contain the least frequent data. In these regions, the probability of observing data points is extremely low.
# Key Points:
* Most frequent data: Found near the mean
𝜇
μ.
* Least frequent data: Found in the extreme tails, far from the mean, typically beyond 2 or 3 standard deviations (i.e.,
𝜇
+
3
𝜎
μ+3σ or
𝜇
−
3
𝜎
μ−3σ).
In a standard normal distribution (mean
𝜇
=
0
μ=0, standard deviation
𝜎
=
1
σ=1), approximately:

* 68% of the data lies within 1 standard deviation of the mean.
* 95% of the data lies within 2 standard deviations of the mean.
* 99.7% of the data lies within 3 standard deviations of the mean.
Thus, the data points farther than 3 standard deviations from the mean are the least frequent.