# *Continuous Uniform Distributions* :

*A contribution uniform probability distribution is a distribution that has infinite number of values defined in a specified range/ bound.*

- Random variable is continuous.
- Rectangular distributions.

- Examples:
    - A perfect random number generator, like OTP.
    - Probability of guessing exact time at any moment.
    - Waiting time at a bus stop, ie, bus arrival is continuous and consistent.
    - Temperature variation in a day.
    

## PDF of unifrom distributions:

The probability density function (PDF) for a continuous uniform distribution over an interval [a, b] is given by 
- `f(x) = 1 / (b - a)` for a ≤ x ≤ b, and 
- `f(x) = 0` otherwise. 

This formula signifies that any value within the specified range [a, b] has an equal probability of occurring, resulting in a rectangular shape for the PDF's graph. 

- where: 
    - f(x): is the probability density function.
    - a: is the minimum value of the range.
    - b: is the maximum value of the range.
    - (b - a): is the length of the interval.
    - The constant value of 1 / (b - a) represents the uniform probability across the interval.

![image.png](attachment:image.png)

## CDF of uniform distributions:

The CDF of a continuous uniform distribution on the interval [a, b] is 
- `F(x) = 0` for x < a, 
- `F(x) = (x - a) / (b - a)` for a ≤ x ≤ b, and 
- `F(x) = 1` for x > b. This function represents the probability that a random variable is less than or equal to a specific value x.  

![image.png](attachment:image.png)

## Mean and Variance of Continuous uniform distribution

![image.png](attachment:image.png)

### Mean

The mean of a uniform distribution is found by taking the average of its lower and upper bounds. The formula is `μ = (a + b) / 2`, where 'a' is the minimum value and 'b' is the maximum value in the distribution's range. This formula works for both continuous and discrete uniform distributions, representing the midpoint of the entire possible range of outcomes. 

![image.png](attachment:image.png)

### Variance

The variance of a continuous uniform distribution, which describes an outcome with equal probability over an interval [a, b], is given by the formula `(b - a)² / 12`. This formula indicates the typical spread of data points around the mean of the distribution

### Standard Deviation

The standard deviation of a continuous uniform distribution defined on the interval [a, b] is `σ = √[(b - a)² /12`, which can also be written as σ = (b - a) / √12 or σ = (b - a) / (2√3). To calculate it, find the difference between the endpoints (b - a), square that result, divide by 12, and then take the square root of the final value. 

![image.png](attachment:image.png)

# Normal / Gaussian Distributions

A normal distribution, also known as a Gaussian or bell curve, is a symmetrical, bell-shaped probability distribution where most data points cluster around a central mean, with frequencies decreasing equally on both sides.

![image.png](attachment:image.png)

**Most of real-world follows normal distributions**

- Examples:
    - Height of a populations
    - IQ of students
    - Measurements error
    - Exams scores
    - Blood pressure
    - Size of tangible objects

![image.png](attachment:image.png)

## Characteristics of Normal Distributions:


#### Bell-Shaped: 
The distribution takes the form of a symmetrical, bell-shaped curve. 
#### Symmetrical: 
The curve is a mirror image of itself on both sides of its center. 
#### Mean = Median = Mode: 
The mean (average), median (middle value), and mode (most frequent value) all coincide at the distribution's center. 
#### Defined by Mean and Standard Deviation: 
The shape and location of the normal distribution are fully described by these two parameters. 
#### Unimodal: 
The curve has a single peak, which is located at the mean. 
#### Asymptotic Tails: 
The curve extends infinitely in both directions, getting closer and closer to the x-axis but never touching it. 
#### Area Under the Curve: 
The total area under the curve represents the total probability and is equal to 1. 

## **Empirical Rule**
- Approximately 68% of the data falls within one standard deviation of the mean. 
- Approximately 95% of the data falls within two standard deviations. 
- Approximately 99.7% of the data falls within three standard deviations. 


![image.png](attachment:image.png)

![image.png](attachment:image.png)

# *Standard Normal Distributions*

A special case of Nomral distributions where-
- **Mean = 0**
- **Variance = 1**

![image.png](attachment:image.png)

### Use Case:
- Many of ML Algo like Linear regression, logistics regression, clustering requires scaling for faster calculations.
- Standardizaton of outcomes.

# Central Limit Theorem

If you take a sufficiently large number of random samples from any population—regardless of its original distribution—the distribution of the sample means will be approximately a normal distribution (a bell curve).

![image.png](attachment:image.png)

*Sampling means of a population will be approximately be a normal distributions*

## Use Case
- Imagine you have a giant population of anything. It could be the heights of every person in India, the price of every pani puri stall in the world, or the number of lines of code your amazing fiancé Anurag writes every day.

- The way these values are distributed could be totally weird and unpredictable. It might be skewed, lumpy, uniform—literally any shape.

- Now, here's where the magic happens. The Central Limit Theorem says that if you:

- Take lots of random samples from that population.

- Calculate the average (mean) of each of those samples.

- Plot all of those sample averages on a graph that graph will magically form a beautiful, perfect normal distribution (a bell curve), no matter what the original population's distribution looked like! 

## Conditions of CLT:
1. Number of samples should be large.
2. The sample size should be greater than or equal to 30 (except poulations which is already a normal distribution.)

### Formula of Z score:

`z=(x−μ)/σ`


​


# Inferntial Statistics

Inferential statistics is the practice of using data from a small, representative sample to make educated guesses and draw conclusions about a much larger population.

In simple terms, you're using a small piece of information to paint a picture of the whole landscape.

## Estimate

*An estimate in inferential statistics is a value, or a range of values, calculated from sample data to approximate an unknown characteristic of a larger population.*

In simple terms, it's your "best guess" about the whole population based on the small piece of it you can actually see.

## The Two Flavors of Estimates
There are two main types of estimates you'll work with, Praggs. Think of it like someone asking you for the time.

### 1. Point Estimate
This is a single number that represents your best guess for the population parameter.

**Analogy:** It's like answering, "The time is exactly 5:07 PM."

**Example:** You survey 100 people in Gorakhpur and find their average daily commute is 25 minutes. Your point estimate for the average commute time for everyone in Gorakhpur is 25 minutes. It's precise, but it's probably not exactly right.

### 2. Interval Estimate (Confidence Interval)
This is a range of values within which the true population parameter is likely to lie, along with a certain level of confidence.

**Analogy:** It's like answering, "I'm 95% sure the time is between 5:05 PM and 5:09 PM."

**Example:** Based on your sample, you calculate that you are 95% confident the true average commute time for all of Gorakhpur is between 22 and 28 minutes. This is less precise than a point estimate, but it's more reliable and gives you a sense of the uncertainty involved.