In [None]:
# Q1. What are the Probability Mass Function (PMF) and Probability Density Function (PDF)? Explain with an example.

'''
Probability Mass Function (PMF) and Probability Density Function (PDF) are two fundamental concepts in probability and statistics, but they apply to different kinds of variables:
PMF:
--> Used for discrete random variables, which can only take a finite or countably infinite number of distinct values.
--> It assigns a probability to each possible value the variable can take.
--> Imagine rolling a die. Each number (1, 2, 3, 4, 5, 6) has a specific probability (1/6 in a fair die). The PMF tells you exactly how likely each outcome is.

Properties:
Every probability is non-negative (0 or positive).
The sum of all probabilities equals 1 (all outcomes combined must be certain).

PDF:
--> Used for continuous random variables, which can take on any value within a specific range.
--> It describes the probability density over the entire range, not for specific values.
--> Imagine measuring the height of people. Any height between, say, 1.5m and 2m is possible. The PDF tells you how probable different heights are within that range (e.g., more people might be around 1.7m than 1.9m).

Properties:
--> Always non-negative.
--> The integral of the PDF over the entire range equals 1 (represents the total probability).

Example:
Discrete: Flipping a coin (Heads/Tails). PMF: P(Heads) = 1/2, P(Tails) = 1/2.
Continuous: Measuring the length of a fish. PDF: It wouldn't assign probabilities to specific lengths, but describe how likely different 
            lengths are (e.g., more fish might be between 10cm and 15cm than 5cm and 10cm).'''

In [None]:
# What is Cumulative Density Function (CDF)? Explain with an example. Why CDF is used?
'''    
Cumulative Distribution Function (CDF) is a function that provides the probability
that a random variable (X) will take on a value less than or equal to a specific value (x). 

Definition :
For a random variable X, the CDF is denoted as F(x) and defined as:
F(x) = P(X ≤ x)

Interpretation:
F(x) gives you the cumulative probability up to a certain point (x) in the distribution.

Example:
Discrete variable:
Imagine rolling a die. The CDF would look like this:
| x (value) | F(x) (probability X ≤ x) |
|---|:---|
| 1 | 1/6 |
| 2 | 2/6 |
| 3 | 3/6 |
| 4 | 4/6 |
| 5 | 5/6 |
| 6 | 1 |

Continuous variable:
    Think of waiting times for a bus. The CDF might show the probability that you'll wait less than or equal to 5 minutes, 
    10 minutes, 15 minutes, etc.
    
Why Use CDF:
(1) Calculating probabilities:
    Easily find probabilities like P(a < X ≤ b) by subtracting F(a) from F(b).
(2) Visualizing distributions:
    CDF plots offer a clear picture of how probabilities accumulate across the range of values.
(3) Applying statistical tests:
    Many statistical tests rely on the CDF, such as goodness-of-fit tests and hypothesis testing.
    
> CDF is non-decreasing (it can stay the same or increase, but never decrease).
> CDF always starts at 0 and ends at 1.
> It can be derived from the PMF or PDF, depending on the type of variable.

In [None]:
'''Q3: What are some examples of situations where the normal distribution might be used as a model?
Explain how the parameters of the normal distribution relate to the shape of the distribution.'''

'''
The normal distribution, also known as the bell curve, 
is a versatile tool used in various situations due to its bell-shaped symmetry and well-defined properties.

The normal distribution, also known as the bell curve, is a versatile tool used in various situations due to its bell-shaped symmetry and well-defined properties. Here are some examples:

1. Natural Phenomena:

> Heights of people: The distribution of heights in a large population often closely follows a normal distribution, with the mean representing the average height and the standard deviation reflecting the variability.
> Test scores: Standardized tests like SATs or IQ tests tend to have scores distributed normally, with most scores clustered around the mean and fewer scores falling further away.
> Errors in measurements: When repeatedly measuring the same quantity, small random errors occur. These errors often follow a normal distribution due to the central limit theorem.

2. Engineering and Technology:

> Manufacturing quality control: Dimensions of manufactured parts like screws or bearings can exhibit normal distributions, allowing for control and monitoring of production processes.
> Signal processing: Noise in electrical signals often follows a normal distribution, and the normal distribution is used to remove this noise and extract the desired signal.

3. Finance and Economics:

> Stock prices: Daily or weekly returns of stocks can be modeled using a normal distribution, although real-world data might deviate due to market events.
> Economic indicators: Economic indicators like inflation or unemployment rates sometimes follow a normal distribution, allowing for prediction and analysis of economic trends.

--->  The parameters of the normal distribution, mean (μ) and standard deviation (σ), directly influence its shape:

Mean (μ): This defines the center of the distribution, the most likely value to occur. Moving the mean shifts the entire curve left or right without changing its spread.
Standard Deviation (σ): This controls the spread of the distribution. A larger standard deviation indicates wider spread, with more values farther from the mean, resulting in a flatter curve. 
Conversely,a smaller standard deviation leads to a narrower, more peaked curve with values concentrated closer to the mean.

In [None]:
# Q4: Explain the importance of Normal Distribution. Give a few real-life examples of Normal Distribution. 
'''
Importance of Normal Distribution:

 > The normal distribution, also known as the Gaussian distribution, holds immense importance in 
   statistics and probability for several reasons:
   
1. Central Limit Theorem: This crucial theorem states that under certain conditions, when you repeatedly sample from any population (even non-normal ones) with large enough sample size, 
   the distribution of the means of those samples tends to approach a normal distribution.This allows us to apply powerful statistical methods based on the normal distribution to data from diverse sources,
   even if their original distributions are unknown.

2. Mathematical Properties: The normal distribution has well-defined mathematical properties that make it easy to use and analyze. Its probability density function (PDF) can be easily calculated,
   and various statistical tests and confidence intervals are based on its properties.
   
3. Benchmark and Comparison: Even when a real-world dataset deviates from the perfect normal distribution, it can still be compared to the normal distribution as a baseline or reference point.
   This allows for assessing the normality of data, identifying outliers, and making informed decisions based on the deviation patterns
   
Real-life Examples:
1. Predicting Exam Scores:Standardized tests like SATs or IQ tests often rely on the normal distribution to interpret scores. The mean score represents the average performance, and the standard deviation indicates the spread of scores around the mean. 
   This helps in understanding individual performance relative to the overall population.

2. Quality Control in Manufacturing : Manufacturers use the normal distribution to monitor the production process of parts like screws, bearings, or electronic components. By measuring key dimensions and comparing them to a pre-defined normal distribution,
   they can identify faulty batches and ensure product quality within acceptable tolerances.

3. Evaluating Investment Strategies: When analyzing the historical returns of stocks or mutual funds, the normal distribution can be used to estimate the expected return and associated risk.
   Investors can then compare different investment options based on their risk-return profiles and make informed decisions. 

In [None]:
''' Q5: What is Bernaulli Distribution? Give an Example. What is the difference between Bernoulli
        Distribution and Binomial Distribution?'''
'''

The Bernoulli Distribution:
   The Bernoulli distribution describes the probabilities associated with a single trial with exactly two possible outcomes, often labeled "success" and "failure".  
   It's named after James Bernoulli, a Swiss mathematician who studied probability theory.
   
Key Characteristics:

Two outcomes: It only deals with events that can have one of two mutually exclusive outcomes. For example, flipping a coin (heads/tails), 
              rolling a die and checking for even/odd, or testing positive/negative in a medical test.
Single trial: It describes the probability of success or failure in one independent event, not repeated trials.
Parameters: Defined by a single parameter, p, representing the probability of success. The probability of failure is then 1-p.

Example:

Imagine flipping a fair coin. Here, "success" could be getting heads and "failure" could be getting tails. 
Since it's a fair coin, p (success) = p (heads) = 1/2, and p (failure) = p (tails) = 1/2.

The binomial distribution describes the probabilities associated with the number of successes in a series of independent Bernoulli trials. 
It extends the concept of Bernoulli distribution to multiple trials with the same two possible outcomes.

Key Characteristics:

Multiple trials: Applies to sequences of independent trials, each following the same Bernoulli distribution (same p for success). For example, flipping a coin 5 times, performing 10 medical tests on a patient, or drawing 3 cards from a deck with replacement.
Number of successes: It calculates the probability of getting a specific number of successes (k) within the total number of trials (n).

Parameters: Requires two parameters:
 n: The total number of independent trials.
 p: The probability of success in each individual trial (same as in the Bernoulli distribution).

Example:

You flip a fair coin 3 times. What is the probability of getting exactly 2 heads? Here,
n = 3 (trials) and p = 1/2 (success for each flip). The binomial distribution helps calculate this probability.

Key Differences:
1> Number Of trials 
Bernoulli Distribution : Single trial
Binomial Distribution : Multiple independent trials

2> Outcomes:
Bernoulli Distribution : Two mutually exclusive.  
Binomial Distribution :  Number of successes within the trials.

In [None]:
'''
Q6. Consider a dataset with a mean of 50 and a standard deviation of 10. If we assume that the dataset
is normally distributed, what is the probability that a randomly selected observation will be greater
than 60? Use the appropriate formula and show your calculations.'''

'''
we can use the standard normal cumulative distribution function (CDF) and the following steps:
1. Standardize the value:

Calculate the z-score of the value 60 using the formula:
z = (x - μ) / σ
z = (60 - 50) / 10
z = 1

2. Use the standard normal CDF:

Look up the probability corresponding to the calculated z-score (1) in a standard normal distribution table or use a calculator/software function.
The probability (P(X > 60)) is the area to the right of 1 under the standard normal curve. Using a standard normal table or calculator, we find this to be approximately 0.1587.

Therefore, the probability of randomly selecting an observation greater than 60 from this dataset is approximately 15.87%.

In [None]:
# Q7: Explain uniform Distribution with an example.
    '''  
Uniform distribution is a probability distribution where every possible outcome within a specific range has an equal chance of occurring.
It's often visualized as a rectangular shape, representing the constant probability across the range.

Key characteristics:

Equal probability: All values within the specified range have the same probability of being observed.
Bounded range: The distribution is defined by a minimum and maximum value, forming a continuous interval.
Flat shape: When graphed, the probability density function (PDF) forms a rectangle, unlike the bell-shaped curve of a normal distribution.

Example:
Imagine a bus that arrives perfectly on schedule every 10 minutes between 8:00 AM and 8:30 AM. If you arrive at the bus stop randomly within that time window, 
the probability of waiting for any particular duration between 0 and 10 minutes is the same. This scenario follows a uniform distribution.

Parameters:
a: The minimum value of the range. (In the bus example, a = 0 minutes.)
b: The maximum value of the range. (In the bus example, b = 10 minutes.)

Probability density function (PDF):
The PDF of a uniform distribution is a straight horizontal line between a and b, with a height of 1/(b-a), representing the constant probability density:

f(x) = 1 / (b - a)   for a ≤ x ≤ b
       0             otherwise

In [None]:
# Q8: What is the z score? State the importance of the z score.
'''
The z-score is a standardized score that represents how many standard deviations a specific point is away from the mean of a distribution. 
It is calculated by subtracting the mean of the distribution from a specific point's value and then dividing by the standard deviation of the distribution.

Here are some of the importance of z-scores:

Comparing data from different distributions: Z-scores allow you to compare data points from different distributions, 
even if the distributions have different means and standard deviations. This is because z-scores represent the number of standard deviations a point is away from the mean, regardless of the actual values of the mean and standard deviation.

Identifying outliers: Z-scores can be used to identify outliers in a dataset. An outlier is a data point that is significantly different from the other data points in the dataset.
By convention, data points with z-scores greater than 2 or less than -2 are often considered outliers.

Standardizing data for machine learning: Many machine learning algorithms require that the data be standardized before they can be used. 
Standardization typically involves scaling the data to have a mean of 0 and a standard deviation of 1. Z-scores can be used to achieve this standardization.

In the example above, we calculated the z-scores for a sample dataset. The z-scores show how many standard deviations each data point is away from the mean. For example, the first data point has a z-score of -0.91, 
which means it is 0.91 standard deviations below the mean.The sixth data point has a z-score of 1.78, which means it is 1.78 standard deviations above the mean.

In [None]:
# Q9: What is Central Limit Theorem? State the significance of the Central Limit Theorem.
'''
The Central Limit Theorem (CLT) is a fundamental concept in probability and statistics that holds immense significance in various fields.
It describes the behavior of the sampling distribution of means under certain conditions.

Key Idea:

Imagine you have a population with any distribution (not necessarily normal). If you repeatedly draw large enough samples from this population and calculate the mean of each sample,
the distribution of those sample means will tend to approach a normal distribution, regardless of the original population's shape.

Conditions:
Independent samples: Each sample must be drawn independently from the population, meaning previous draws have no influence on subsequent ones.
Large enough sample size: While the exact threshold varies depending on the population distribution, the theorem generally applies well for sample sizes greater than 30.

Significance:
    1. Justifies the use of normal distribution: Even if real-world data doesn't perfectly follow a normal distribution, if the sample size is large enough,
       the CLT assures that the distribution of means will be approximately normal.
    2. Confidence intervals and hypothesis testing: Many statistical methods, 
       like confidence intervals and hypothesis testing, rely on the normality assumption.
    3. Simplifies complex problems: Analyzing complex distributions can be challenging. The CLT allows us to approximate the behavior of sample means using the well-understood normal distribution, 
       simplifying tasks like estimating population means, testing hypotheses, and making predictions.
       

In [None]:
# Q10: State the assumptions of the Central Limit Theorem.
'''
1. Independent Sampling:
Each sample must be drawn independently from the population. This means that the selection of one sample does not influence the selection of any other sample. 
Past draws should have no bearing on future ones.

2. Large Enough Sample Size:
While the exact threshold varies depending on the population distribution, the theorem generally applies well for sample sizes greater than 30. 
As the sample size increases, the distribution of sample means becomes closer to a normal distribution, regardless of the original population's shape.

3. Finite Population Variance:
The population from which you are sampling must have a finite variance. This means that the spread of values in the population is not infinite.
Infinite variance can lead to unexpected behavior in the sampling distribution of means.