# Chapter 1

Statistics is a branch of mathematics dealing with the  collection, analysis, interpretation, and presentation of  numerical or quantitative data. 2 Types of statistics
1) Descriptive : 
	- Summarized information about sample
    - Describes the dataset / sample
    - Examples: Central tendencies, Dispersion or variation, Skewness
2) Inferential : 
	- Making generalizations (Inference) about population
    - Helps to infer unknown parameters of populations (population mean, population variance etc)
    - Examples: Hypothesis testing, Confidence Interval, Regression Analysis

### Measure of centre

- Mean : 
    - Average of values
    - Sensitive to outliers
    - Outliers pull mean to a direction, which causes skewness
    - Only good for symmetrical distribution
- Median : 
    - Middle value of sorted data
    - Non sensitive to outliers
    - Good for non-symmetrical data
- Mode : 
    - Most frequent value in a dataset
    - points out the peak of a distribution

### Measure of spread

- Variance:
    - average of each datapoint's distance from the mean
    - the higher the variance, the more spread out the data is
    - longer distances are penalized more (since square is used)
- standard deviation
    - Square root of the variance
- Mean absolute deviation
    - average of each datapoint's absolute distance from the mean
    - penalizes each distance equally
- Quantile or percentile
    - separates the data into 100 equal parts

- mode : `statistics.mode(df['col'])`
- median : `statistics.median(df['col'])`
- mean : `statistics.mean(df['col'])`
- variance : `np.var(df['col'], ddof=1)`
- standard deviation : `np.std(df['col'], ddof=1)`
- quantile : `np.quantile(df['col'], [0, 0.25, 0.5, 0.75, 1])`
- iqr : `scipy.stats.iqr(df['col'])`


# Chapter 2

### Chances

- Probability : 
	- Chance that an event will take place. 
    - formula : (no of ways an event can happen / total no of possible ways)
    - Summation of area in a probability distribution graph
- Expected value:
	- Mean of a probability distribution
    - formula : sum of all (event no * probability)
    - eg: Expected value of fair die: 1*(1/6) + 2*(1/6) + 3*(1/6) + 4*(1/6) + 5*(1/6) + 6*(1/6)
    - Law of large numbers : As the size of your sample increases, the sample mean will approach the expected value.
- Dependent event:
    - Probability of second event is affected by the outcome of first event
    - sampling with replacement
    - eg : representative selection. we need 2 representatives from 10 people. when one representative is selected, we need to find the other representative from the remaining 9 people. Here, due to previous selection, the chance for each person being selected decreased.
    - `df["col"].sample(5, replace = False)`
- Independent event:
    - Probability of second event is not affected by the outcome of first event
    - sampling without replacement
    - eg: lottery. When a person wins the first price, it is not necessarily true that he will not also win the second price. He may also win the second price. The chance remains same for every winning place.
    - `df["col"].sample(5, replace = True)`

### Distribution

- uniform distribution:
	- all outcomes are equally likely outcome
    - flat probability density function across the entire range.
- binomial distribution:
	- discrete probability distribution
    - binary outcomes (success or failure)
    - independent trials
- normal distribution:
    - symmetrical
    - probability never hits 0
    - described by mean and std
    - standard normal distribution has mean 0 and std 1
    - 65% area in 1-SD of mean, 95% area in 2-SD of mean, 99.7% are in 3-SD of mean
    - Central Limit Theorem : sampling distribution becomes closer to the normal distribution as the number of trials increases when sampling is done purely randomly and independently (mean of sample means/std etc).
- Poisson distribution:
    - events appear at certain rate (constant rate) over a fixed interval of time
    - expected value (lambda) represents average number of events per unit time interval
    - events occurrence is completely random
    - Discrete event (Since it represents number of events)
    - eg: 5 adoptions each week from a pet shelter. However at which time they will be adopted is random.                                                                        
- Exponential distribution:
    - probability of time between poisson events
    - Same lambda as average rate as poisson distribution
    - Continuous event (Since it represents time)
    - scale = 1/ lambda, where it measures number of time per unit event
    - example: one person requests ticket every 2 minutes. 
        - So, 1 minute serves 0.5 request. So, poisson rate of lambda = 0.5
        - And, Exponential rate of lambda, = 1/ lambda = 1/ 0.5 = 2
- t-distribution
    - tails are thicker than normal distribution
    - Observations are more likely to fall further from the mean
    - has degree of freedom that controls the thickness of the tail
        - lower degree of freedom = thicker tail + higher std
        - higher degree of freedom = thinner tail + lower std = more like normal distribution
- Log normal distribution
    - logarithm of variable is normally distributed
    - Works on mean and standard daviation

```
# probability of happenning 7 or less in range 0 to 12 in a continuous uniform distribution 
from scipy.stats import uniform
uniform.cdf(7, 0, 12)

# probability of 7 or fewer success in 10 trials with 50% success rate in discrete distribution
from scipy.stats import binom
binom.cdf(7, 10, 0.5)
# probability of exactly 7 success in 10 trials with 50% success rate
binom.pmf(7, 10, 0.5)

# probability of people shorter than 154 with mean height 161 and std of 7
from scipy.stats import norm
norm.cdf(154, 161, 7)
# Percentile (quantile) at which 90% of the distribution is below
norm.ppf(0.9, 161, 7)
#Percentile (quantile) at which 90% of the distribution is above
norm.ppf((1-0.9), 161, 7)

# probability of happenning exactly 5 when average value lambda is 8
from scipy.stats import poisson
poisson.pmf(5, 8)
# probability of happenning 5 or less when average value lambda is 8
poisson.cdf(5, 8)

# probability of waiting less than 1 minute when average value lambda is 0.5 (0.5 event per unit time)
from scipy.stats import expon
scale = 1/λ = 1/0.5 = 2
expon.cdf(1, scale=2)

# Probability of having a t-value less than 2.0 with 5 degrees of freedom
from scipy.stats import t
t.cdf(2.0, df=5)
# Percentile (quantile) at which 90% of the distribution is below
t.ppf(0.9, df=5)


# Probability of a log-normal variable being less than 2.0 with mean 1.5 and standard deviation 0.8
from scipy.stats import lognorm
lognorm.cdf(2.0, s=0.8, scale=1.5)
# Percentile (quantile) at which 75% of the distribution is below
lognorm.ppf(0.75, s=0.8, scale=1.5)


```

### Random Values

```
# Generate 10 uniform random values between 0 to 5
from scipy.stats import uniform
uniform.rvs(0, 5, size=10)
# Generate binomial random values
from scipy.stats import binom
binom.rvs(1, 0.5, size=8) # 1 coin, flip 8 times, probability of success 50%
binom.rvs(8, 0.5, size=1) # 8 coins, flip 1 time, probability of success 50%
binom.rvs(3, 0.5, size=10) # 3 coins, flip 10 times, probability of success 50%

# Generate 10 random normal values with mean 161 and std of 7
from scipy.stats import norm
norm.rvs(161, 7, size=10)

# Generate 10 random poisson values with lambda of 8
from scipy.stats import poisson
poisson.rvs(8, size=10)

# Generate 10 random values from a t-distribution with 5 degrees of freedom
from scipy.stats import t
t.rvs(df=5, size=10)

# Generate 10 random values from a log-normal distribution with mean 1.5 and standard deviation 0.8
from scipy.stats import lognorm
lognorm.rvs(s=0.8, scale=1.5, size=10)


```