Consider that Walmart's Quality Control department wants to know how much of company’s products in its warehouses are defective. For this, the team can simply select a `small sample of 1000 products` instead of inspecting all the products in the warehouse(which would be impossible to inspect). It can then find the defect rate (i.e., the proportion of defective products) for the sample, based on which it can further infer the defect rate for all the products in the warehouses.

This process of deriving insights or drawing inferences from sample data is called `inferential statistics`. Situations like the one above arise all the time in big companies like Amazon and Flipkart, among others.

`Inferential Statistics is used in the industry in multiple ways like:`

**1.Finance: Risk Assessment**

Financial institutions use inferential statistics to assess and manage risks. This includes predicting market trends, estimating the likelihood of default on loans, and analyzing investment portfolios.

**2.Marketing: Consumer Surveys**

In marketing, inferential statistics are employed to make inferences about the preferences and behaviors of a target market based on survey data, helping businesses make informed decisions about product development and advertising strategies.

**3.Manufacturing: Quality Control**

Inferential statistics are used in quality control processes to make inferences about the quality of products based on a sample of items, helping manufacturers maintain consistent product quality.

**4.Retail: Demand Forecasting**

In retail, inferential statistics are applied to analyze past sales data and make predictions about future demand for products, optimizing inventory management and supply chain logistics.

**5.Telecommunications: Network Performance**

Telecom companies use inferential statistics to assess network performance by analyzing data from a sample of users, helping them make inferences about the quality and reliability of their services for the entire user base.

## <font color = 'Maroon'>Probability

Probability can be defined as the measure of certainty(or uncertainty) that a certain event or 
outcome will occur given a certain random process. It is represented numerically 
as a number between zero and one. The probabilities of zero and one both represent certainty.

**Probability measures the likelihood that an event will occur. Probability values have two properties:**

`They always lie in the range of 0 to 1.` The value is 0 when an event is impossible (for example, the probability of you being in India and America at the same time) and 1 when an event is sure to occur (for example, the probability of the sun rising in the East tomorrow).

`The sum of the probabilities of all outcomes of an experiment is always 1.` For instance, in a coin toss, there can be two outcomes: heads or tails. Each outcome has a probability of 0.5. Hence, the sum of the probabilities is 0.5 + 0.5 = 1.

### Probability Terminologies

**Experiment** is a process or action that results in one of several possible outcomes. Examples include tossing a coin, rolling a die, or drawing a card from a deck.

**Random Experiment** is an experiment or process for which the outcome cannot be predicted with certainty. Each performance of the experiment is called a trial.

**Outcome** is a possible result of an experiment. For example, when rolling a six-sided dice, the possible outcomes are 1, 2, 3, 4, 5, and 6.

**Sample Space** is a set of all possible outcomes of an experiment. Tossing two coins : {HH, HT, TT, TH}.

**Event** An event is a subset of the sample space. It consists of one or more outcomes. For example, getting an even number when rolling a die is an event that includes the outcomes {2,4,6}.

**Favourable Outcome** is an outcome that satisfies the event of interest. For example, if the event is rolling a number greater than 4 on a die, the favorable outcomes are 5 and 6.

### Random Variables

Random Variables are variables that represent the outcomes of a random experiment. For example, the collection of outcomes of a series of coin tosses is a random Variable. Here the possible set of outcomes are just two - Heads & Tails. If we map Heads to the number 1 and Tails to 0. Then the Random Variable could look something like (1,1,0,1,0,0,1,0) for eight coin flips. The values of a Random
Variable can change the next time it is recorded, but they can only contain a specific set of values.

A random variable is denoted with a capital letter (typically, X, Y, Z, etc.), and specific values are denoted with lowercase letters (e.g., X = x or X ≤ x).

Ex: Tossing two coins together

- X=0 if both tosses result in no heads. `P(X=0) = 1/4`
- X=1 if one of the tosses results in heads. `P(X=1) = 2/4 = 1/2`
- X=2 if both tosses result in heads. `P(X=2) = 1/4`

**Random Variables are of two types:**

1. **Discrete RV:** They take a fixed set of possible outcomes. Each outcome has an associated probability. `Ex: Number of heads in two tosses, The creditworthiness of a loan applicant, Marital Status, Gender etc.`

2. **Continuous RV:** They can take any value within a range. `Ex: Age of a person, Income of a person, Subscription of any platform like Netflix, Disney, Hotstar etc.`

![Screenshot%202024-06-14%20112814.png](attachment:Screenshot%202024-06-14%20112814.png)

## <font color = 'Maroon'>Probability Distribution

It lists out all the possible outcomes in the sample space with their probabilites.

**<font color = 'Blue'>Types of Probability Distributions:</font>**

**<font color = 'Green'> 1. Discrete Probability Distributions</font>**

**1.1. Binomial Distribution**: Describes the number of successes in a fixed number of independent Bernoulli trials (e.g., number of heads in 10 coin tosses).

**1.2. Poisson Distribution**: Gives the probability of a given number of events happening in a fixed interval of time or space (e.g., number of emails received in an hour).

**<font color = 'Green'>2. Continuous Probability Distributions</font>**

**2.1. Normal Distribution**: Also known as the bell curve, it describes data that clusters around a mean. It is characterized by its mean (μ) and standard deviation (σ).

**2.2. Exponential Distribution**: Describes the time between events in a Poisson process, where events occur continuously and independently at a constant average rate.

**2.3. Continuous Uniform Distribution**: All outcomes are equally likely within a certain range.

### <font color = 'Blue'>Some Key Concepts</font>: 

**Probability Mass Function (PMF)**: Used for discrete random variables, it gives the probability that a discrete random variable is exactly equal to some value.

**Probability Density Function (PDF)**: Used for continuous random variables, it describes the likelihood of a random variable to take on a particular value. The probability of the variable falling within a particular range is given by the area under the curve of the PDF over that range.

**Cumulative Distribution Function (CDF)**: This function gives the probability that a random variable is less than or equal to a certain value. It is applicable to both discrete and continuous random variables.

## <font color = 'Maroon'>Discrete Probability Distributions

A discrete probability distribution is a type of probability distribution that shows all possible values of a discrete random variable along with the associated probabilities. A discrete random variable is one that has a finite or countable number of possible outcomes.

Consider the roll of a fair six-sided die. The possible outcomes are `{1, 2, 3, 4, 5, 6}`, and each outcome has a probability of `1/6`.

### <font color = 'Blue'>1. Bernoulli Distribution

The Bernoulli distribution is a discrete probability distribution for a random variable that can take on only two possible outcomes: success (1) and failure (0). The probability of success is denoted by 𝑝 and the probability of failure is 1−𝑝.

**Example:**
A coin flip can be modeled by a Bernoulli distribution where the probability of getting heads(success) is p=0.5.

### <font color = 'Blue'>2. Binomial Distribution

The binomial distribution is a discrete probability distribution that models the number of successes in a fixed number of independent Bernoulli trials. Each trial has the same probability of success 𝑝.

**Example:** If a fair coin is flipped 10 times, the probability of getting exactly 5 heads can be calculated using the binomial distribution with n=10 and 𝑝=0.5.

### <font color = 'Green'>Probability Mass Functions

A probability mass function (PMF) is the probability function that defines the probability of observing a particular value of a discrete random variable. For example, a PMF can be used to calculate the probability of rolling a three on a fair six-sided die.

For example, suppose that we flip a fair coin some number of times and count the number of heads. The probability mass function that describes the likelihood of each possible outcome (eg., 0 heads, 1 head, 2 heads, etc.) is called the binomial distribution. The parameters for the binomial distribution are:

- `n` for the number of trials (eg., n=10 if we flip a coin 10 times)
- `p` for the probability of success in each trial (probability of observing a particular outcome in each trial. In this example, p= 0.5 because the probability of observing heads on a fair coin flip is 0.5)

If we flip a fair coin 10 times, we say that the number of observed heads follows a Binomial(n=10, p=0.5) distribution. The graph below shows the probability mass function for this experiment. The heights of the bars represent the probability of observing each possible outcome as calculated by the PMF.

![Binomial Graph](https://static-assets.codecademy.com/skillpaths/master-stats-ii/probability-distributions/binom_pmf_10_5.svg)

The binom.pmf() method from the scipy.stats library can be used to calculate the PMF of the binomial distribution at any value. This method takes 3 values:

- `x:` the value of interest
- `n:` the number of trials
- `p:` the probability of success

For example, suppose we flip a fair coin 10 times and count the number of heads. We can use the binom.pmf() function to calculate the probability of observing 6 heads as follows:

In [7]:
import scipy.stats as stats

#stats.binom.pmf(x, n, p)
print(stats.binom.pmf(6, 10, 0.5))

0.2050781249999999


**Using the Probability Mass Function Over a Range**

We have seen that we can calculate the probability of observing a specific value using a probability mass function. What if we want to find the probability of observing a range of values for a discrete random variable? One way we could do this is by adding up the probability of each value.

For example, let’s say we flip a fair coin 5 times, and want to know the probability of getting between 1 and 3 heads. We can visualize this scenario with the probability mass function:

<img src="https://static-assets.codecademy.com/skillpaths/master-stats-ii/probability-distributions/Binomial-Distribution-PMF-Probability-over-a-Range.gif" width="600" height="400">


- P(1to3heads)=P(1<=X<=3)
- P(1to3heads)=P(X=1)+P(X=2)+P(X=3)
- P(1to3heads)=0.1562+0.3125+0.3125
- P(1to3heads)=0.7812

For further experimenting with above examples visit [this](https://static-assets.codecademy.com/skillpaths/master-stats-ii/probability-distributions/binomial-range_v2/index.html)

In [5]:
import scipy.stats as stats

# calculating P(2-4 heads) = P(2 heads) + P(3 heads) + P(4 heads) for flipping a coin 10 times
print(stats.binom.pmf(2, n=10, p=.5) + 
      stats.binom.pmf(3, n=10, p=.5) + stats.binom.pmf(4, n=10, p=.5))

0.36621093750000033


In [6]:
import scipy.stats as stats

#probability of observing 8 or fewer heads from 10 coin flips
print(stats.binom.pmf(0, n = 10, p = 0.5) + 
stats.binom.pmf(1, n = 10, p = 0.5) + 
stats.binom.pmf(2, n = 10, p = 0.5) + 
stats.binom.pmf(3, n = 10, p = 0.5) + 
stats.binom.pmf(4, n = 10, p = 0.5) + 
stats.binom.pmf(5, n = 10, p = 0.5) + 
stats.binom.pmf(6, n = 10, p = 0.5) + 
stats.binom.pmf(7, n = 10, p = 0.5) + 
stats.binom.pmf(8, n = 10, p = 0.5))

0.9892578125000009


### <font color = 'Green'>Cumulative Distribution Function

The cumulative distribution function for a discrete random variable can be derived from the probability mass function. However, instead of the probability of observing a specific value, the cumulative distribution function gives the probability of observing a specific value OR LESS.

As previously discussed, the probabilities for all possible values in a given probability distribution add up to 1. The value of a cumulative distribution function at a given value is equal to the sum of the probabilities lower than it, with a value of 1 for the largest possible number.

Cumulative distribution functions are constantly increasing, so for two different numbers that the random variable could take on, the value of the function will always be greater for the larger number. Mathematically, this is represented as:
- `If x1 < x2 : CDF(x1) < CDF(x2)`

We saw how the probability mass function can be used to calculate the probability of observing less than 3 heads out of 10 coin flips by adding up the probabilities of observing 0, 1, and 2 heads. The cumulative distribution function produces the same answer by evaluating the function at CDF(X=2). In this case, `using the CDF is simpler than the PMF` because it requires one calculation rather than three.

Ex : `P(3 <= X <= 6) = P(X <= 6) - P(X < 3)`

In [7]:
import scipy.stats as stats

# P(6 or fewer heads) = P(0 to 6 heads)
print(stats.binom.cdf(6, 10, 0.5))

0.828125


In [8]:
import scipy.stats as stats

#P(4 to 8 heads) = P(0 to 8 heads) - P(0 to 3 heads)
print(stats.binom.cdf(8, 10, 0.5) - stats.binom.cdf(3, 10, 0.5))

0.8173828125


In [9]:
print(stats.binom.cdf(3, 10, 0.5))

print('vs')

print(stats.binom.pmf(0, n=10, p=.5) + 
      stats.binom.pmf(1, n=10, p=.5) + 
      stats.binom.pmf(2, n=10, p=.5) + stats.binom.pmf(3, n=10, p=.5))

0.17187499999999994
vs
0.17187500000000014


### <font color = 'Blue'>3. Poisson Distribution

The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space, provided these events happen with a known constant mean rate and independently of the time since the last event. 

The Poisson distribution is characterized by the parameter λ (lambda), which is the average number of occurrences in the given interval.



`Poisson Distribution` is a discrete probability distribution so it can be described as PMF and CDF.

Suppose a call center receives an average of 10 calls per hour. We want to find the probability that the call center will receive exactly 5 calls in the next hour.

Given: λ=10 (average number of calls per hour) and k=5 (number of calls we are interested in)

In [10]:
import scipy.stats as stats

lambda_ = 10  # average number of calls per hour
k = 5  # number of calls we are interested in

# Calculate the probability
probability = stats.poisson.pmf(k, lambda_)

print(f"The probability of receiving exactly {k} calls in the next hour is: {probability:.4f} or {round((probability*100), 2)}%")

The probability of receiving exactly 5 calls in the next hour is: 0.0378 or 3.78%


In [11]:
# expected value = 10 calls between 1-2PM, probability of observing 12-14 calls
probability2 = stats.poisson.pmf(12, 10) + stats.poisson.pmf(13, 10) + stats.poisson.pmf(14, 10)

print(f"The probability of receiving exactly {k} calls in the next hour is: {probability2:.4f} or {round((probability2*100), 2)}%")

The probability of receiving exactly 5 calls in the next hour is: 0.2198 or 21.98%


In [5]:
# expected value = 10, probability of observing 6 or less
stats.poisson.cdf(6, 10)

0.130141420882483

In [6]:
# expected value = 10, probability of observing 12 or more
1 - stats.poisson.cdf(11, 10)

0.30322385369689386

In [7]:
# expected value = 10, probability of observing between 12 and 18
stats.poisson.cdf(18, 10) - stats.poisson.cdf(11, 10)

0.29603734909303947

#### <font color = 'Blue'>Use Cases of Poisson Distribution

**Number of Defects in Manufacturing**:
A factory produces light bulbs, and on average, 2% of the bulbs are defective. If a sample of 100 bulbs is taken, the probability of finding exactly 3 defective bulbs can be calculated using the Poisson distribution with λ=2 (since λ=np=100×0.02).

**Arrival of Customers at a Restaurant**:
A restaurant receives an average of 20 customers per hour. To find the probability that exactly 15 customers will arrive in the next hour, use λ=20 and k=15.

**Traffic Accidents**:
If a particular intersection averages 3 traffic accidents per month, the probability of having exactly 2 accidents in a month can be calculated with λ=3 and k=2.

## <font color = 'Maroon'>Continuous Probability Distribution

A continuous probability distribution describes the probabilities of the possible values of a continuous random variable. Unlike discrete random variables, which have a finite or countable number of possible values, continuous random variables can take on any value within a given range. This range is infinite and uncountable, meaning that the variable can assume an infinite number of values within any interval.

### <font color = 'Blue'>1. Normal Distribution</font>

It is a continuous probability distribution characterized by its symmetric, bell-shaped curve. This distribution is crucial for various statistical analyses and modeling techniques used in data science.

![Standard-normal-distribution.jpg](attachment:Standard-normal-distribution.jpg)

### <font color = 'Green'>Probability Density Functions

Similar to how discrete random variables relate to probability mass functions, continuous random variables relate to probability density functions. They define the probability distributions of continuous random variables and span across all possible values that the given random variable can take on.

When graphed, a probability density function is a curve across all possible values the random variable can take on, and the total area under this curve adds up to 1.

The following image shows a probability density function. The highlighted area represents the probability of observing a value within the highlighted range.

<img src="https://static-assets.codecademy.com/skillpaths/master-stats-ii/probability-distributions/Adding-Area.gif" width="600" height="400">

In a probability density function, we cannot calculate the probability at a single point. This is because the area of the curve underneath a single point is always zero. The gif below showcases this.

<img src="https://static-assets.codecademy.com/skillpaths/master-stats-ii/probability-distributions/Normal-Distribution-Area-to-Zero.gif" width="600" height="400">

As we can see from the visual above, as the interval becomes smaller, the width of the area under the curve becomes smaller as well. When trying to evaluate the area under the curve at a specific point, the width of that area becomes 0, and therefore the probability equals 0.

Let’s say we want to know the probability that a randomly chosen woman is less than 158 cm tall. We can use the cumulative distribution function to calculate the area under the probability density function curve from 0 to 158 to find that probability.

![pdf](https://static-assets.codecademy.com/skillpaths/master-stats-ii/probability-distributions/norm_pdf_167_8_filled.svg)

In [11]:
import scipy.stats as stats
#x : value of interest
#loc: mean of the distribution
#scale : std dev of the distribution

# stats.norm.cdf(x, loc, scale)
print(stats.norm.cdf(158, 167.64, 8))

0.11410165094812996


**Demo: Some examples on the Normal Distribution and Z-Score calculations**

## <font color = 'Maroon'>Population vs Sample

**`Population`**
The population in statistics refers to the entire group that is the subject of the study. It includes all the individuals or items that meet a particular set of criteria. The population is the complete set of observations or elements that share a common characteristic and is of interest to the researcher.

**Example:**
If you were studying the average income of households in a city, the population would be all the households in that city. Every single household, regardless of size or income level, is part of the population.

**`Sample:`**
A sample is a subset of the population that is selected for the actual study. It is not always feasible or practical to collect data from an entire population, so researchers choose a representative sample to draw conclusions about the population. The goal is to ensure that the sample is representative enough that findings from the sample can be generalized to the entire population.

**Example:**
In the household income study mentioned earlier, it might be impractical to survey every single household in the city. Instead, a researcher might select a random sample of, say, 500 households to study. The 500 households form the sample, and the researcher uses the data collected from this sample to make inferences about the income of all households in the city.

### <font color = 'Blue'>Sampling Techniques

We saw in CLT that the samples must be a correct representation of the population in order to arrive at correct conclusions about the population. So let's take a look at the various sampling techniques that are available:

1. Simple random sampling
2. Stratified sampling 
3. Systematic sampling
4. Cluster sampling 
5. Judgment sampling

### Central Limit Theorem

The Central Limit Theorem (CLT) is a fundamental concept in statistics and plays a crucial role in data science. It states that, regardless of the shape of the original population distribution, the sampling distribution of the sample mean will be approximately normally distributed for sufficiently `large sample sizes(n > 30)`. This theorem is particularly important in inferential statistics, where we make inferences about a population based on a sample.

**CLT in a Nutshell:**
If you have a population with any shape of distribution and you repeatedly draw random samples of a certain size from that population, the distribution of the sample means will be approximately normal, regardless of the shape of the original population distribution.

**Real World Industry Use Cases in Data Science:**

**1.Quality Control in Manufacturing:**

`Scenario:` A manufacturing plant produces a large number of products each day, and the quality control team is interested in the average weight of the products.

`Use of CLT:` By collecting random samples of product weights and calculating the sample means, the quality control team can apply the CLT to assume that the distribution of sample means is approximately normal. This allows them to make statistical inferences about the average weight of all products.

**2.Financial Modeling and Risk Assessment:**

`Scenario:` A financial analyst wants to assess the average return on investment (ROI) of a portfolio of stocks.

`Use of CLT:` By taking multiple random samples of historical ROI data and calculating sample means, the analyst can apply the CLT. This enables them to make more reliable predictions about the average ROI of the entire portfolio.

**3.Marketing and A/B Testing:**

`Scenario:` A marketing team is running an A/B test to compare the effectiveness of two different versions of an advertisement.

`Use of CLT:` By collecting random samples of user responses to each version and calculating sample means, the marketing team can use the CLT to make statistical inferences about the average effectiveness of each advertisement version for the entire target audience.

**4.Healthcare and Clinical Trials:**

`Scenario:` In a clinical trial for a new drug, researchers want to estimate the average reduction in symptoms.

`Use of CLT:` By repeatedly collecting random samples of patient data and calculating sample means, researchers can apply the CLT. This allows them to make inferences about the average impact of the drug on the entire population of interest.

**5.E-commerce and Customer Behavior:**

`Scenario:` An e-commerce platform wants to understand the average time spent by customers on their website.

`Use of CLT:` By taking random samples of user engagement data and calculating sample means, the data science team can leverage the CLT to make statistically valid predictions about the average time spent on the website for all users.