In [1]:
import scipy
import numpy as np
from scipy.stats import binom
from scipy.stats import poisson
from scipy.stats import uniform
from scipy.stats import norm
from scipy.stats import hypergeom
from scipy.stats import expon

### Binomial Distribution

![title](http://www.stat.yale.edu/Courses/1997-98/101/binpdf.gif)

The binomial distribution describes the behavior of a count variable X if the following conditions apply:
<br>1: The number of observations n is fixed.</br>
<br>2: Each observation is independent.</br>
<br>3: Each observation represents one of two outcomes ("success" or "failure").</br>
<br>4: The probability of "success" p is the same for each outcome.</br>

A survay found that 65% of all financial consumers were very satisfied with primary financial insitution. Soppose the 25 financial consumers are sampled and if survey results still holds true today, what is the probability that exactly 19 are very satisfied with their primary financial insitution?

In [2]:
help(binom.pmf)

Help on method pmf in module scipy.stats._distn_infrastructure:

pmf(k, *args, **kwds) method of scipy.stats._discrete_distns.binom_gen instance
    Probability mass function at k of the given RV.
    
    Parameters
    ----------
    k : array_like
        Quantiles.
    arg1, arg2, arg3,... : array_like
        The shape parameter(s) for the distribution (see docstring of the
        instance object for more information)
    loc : array_like, optional
        Location parameter (default=0).
    
    Returns
    -------
    pmf : array_like
        Probability mass function evaluated at k



In [3]:
binom.pmf(k=19, n=25, p=0.65)

0.09077799859322791

In [4]:
type(binom.pmf(k=19, n=25, p=0.65))

numpy.float64

According to USCensus Bureau approximately 6% of all workers in Jackson Mississippi are unemployedin conducting a random telephone survey in Jackson. What is the probability of getting two or fewer unemployed workers in a sample of 20?

n = 20
p = 0.06
here we need p(x = 0) + p(x = 1) + p(x = 2) as question is 2 or less hence we required cumulative distribution function.

In [5]:
help(binom.cdf)

Help on method cdf in module scipy.stats._distn_infrastructure:

cdf(k, *args, **kwds) method of scipy.stats._discrete_distns.binom_gen instance
    Cumulative distribution function of the given RV.
    
    Parameters
    ----------
    k : array_like, int
        Quantiles.
    arg1, arg2, arg3,... : array_like
        The shape parameter(s) for the distribution (see docstring of the
        instance object for more information).
    loc : array_like, optional
        Location parameter (default=0).
    
    Returns
    -------
    cdf : ndarray
        Cumulative distribution function evaluated at `k`.



In [6]:
binom.cdf(k=2, n=20, p=0.06)

0.8850275957378545

Solve the binomial probability for n = 20, p = .40, and x = 10

In [7]:
binom.pmf(k=10, n=20, p=0.4)

0.11714155053639011

### Poisson distribution

In [8]:
help(poisson.pmf)

Help on method pmf in module scipy.stats._distn_infrastructure:

pmf(k, *args, **kwds) method of scipy.stats._discrete_distns.poisson_gen instance
    Probability mass function at k of the given RV.
    
    Parameters
    ----------
    k : array_like
        Quantiles.
    arg1, arg2, arg3,... : array_like
        The shape parameter(s) for the distribution (see docstring of the
        instance object for more information)
    loc : array_like, optional
        Location parameter (default=0).
    
    Returns
    -------
    pmf : array_like
        Probability mass function evaluated at k



In [9]:
poisson.pmf(k=3,mu=3)

0.22404180765538775

Suppose bank customers arrives randomly on any weekday afternoon at an average of 3.2 customers every 4 minutes, what is the probability of exactly 5 customers arriving in a 4 minute interval on a weekday afternoon?

mu = 3.2 customers every 4 minutes 
k = 5 customers every 4 minutes

In [10]:
poisson.pmf(k=5,mu=3.2)

0.11397938346351824

P(k;mu) =  e^(-mu) * (mu)^k / k!

Bank customers arrive randomly on weekday afternoon at an average of 3.2 customers every 4 minutes. What is the probability of having more than 7 customers in you 4 minute interval on a week day afternoon?

mu = 3.2 customers every 4 minutes

k = 7 customers every 4 minutes

##### Note that here in question it is asking more than 7 hence we need to get it till 7 and substract with 1 using cdf

In [11]:
1 - poisson.cdf(k= 7, mu = 3.2)

0.01682984174895752

A bank has an average random arrival rate of 3.2 customers every 4 minutes. What is the probability of getting exactly 10 customers during 8 minutes interval?

#### Note that here unit of lambda/mu are not same for x and mu, hence we need to make it same first.

2* mu = 6.4 customers every 4 minutes

In [12]:
poisson.pmf(k= 10, mu = 6.4)

0.052790043854115495

### Uniform distribution

Suppose the amount of time it takes to assembly a plastic module ranges from 27 to 39 seconds and the assembly time are uniformly distributed. Describe the distribution. What is the probability that a given assembly will take between 30 to 35.

P(30 < X < 35) = 35-30/39-27

In [13]:
help(uniform.pdf)

Help on method pdf in module scipy.stats._distn_infrastructure:

pdf(x, *args, **kwds) method of scipy.stats._continuous_distns.uniform_gen instance
    Probability density function at x of the given RV.
    
    Parameters
    ----------
    x : array_like
        quantiles
    arg1, arg2, arg3,... : array_like
        The shape parameter(s) for the distribution (see docstring of the
        instance object for more information)
    loc : array_like, optional
        location parameter (default=0)
    scale : array_like, optional
        scale parameter (default=1)
    
    Returns
    -------
    pdf : ndarray
        Probability density function evaluated at x



In [14]:
39-27

12

In [15]:
uniform.mean(loc = 27, scale = 12)

33.0

In [16]:
uniform.cdf(np.arange(30,36,1), loc = 27, scale = 12)

array([0.25      , 0.33333333, 0.41666667, 0.5       , 0.58333333,
       0.66666667])

In [17]:
prob =  0.66666667 - 0.25
prob

0.41666667

According to the National Association of Insurance Commissioner sthe average annual cost of automobile insurance in the United States in a recent year was 691 dollar. Suppose the automobile insurance cost are uniformly distributed in the United States with an average of from $200 to 1182 dollar. what is the standard deviation of this uniform distribution?

In [18]:
uniform.mean(loc = 200, scale = 982)

691.0

In [19]:
uniform.std(loc = 200, scale = 982)

283.4789821721062

### Uniform distribution

![title](https://www.simplypsychology.org/Empirical-Rule.jpg)

In [20]:
x = 68
mu = 65.5
st_dev = 2.5

In [21]:
help(norm.cdf)

Help on method cdf in module scipy.stats._distn_infrastructure:

cdf(x, *args, **kwds) method of scipy.stats._continuous_distns.norm_gen instance
    Cumulative distribution function of the given RV.
    
    Parameters
    ----------
    x : array_like
        quantiles
    arg1, arg2, arg3,... : array_like
        The shape parameter(s) for the distribution (see docstring of the
        instance object for more information)
    loc : array_like, optional
        location parameter (default=0)
    scale : array_like, optional
        scale parameter (default=1)
    
    Returns
    -------
    cdf : ndarray
        Cumulative distribution function evaluated at `x`



In [22]:
norm.cdf(x = 68, loc= mu, scale= st_dev)

0.8413447460685429

##### if you want to know cumulative distributionof x greater than value you have to substrate from 1.
#####  If we want to move the value 68 and above.So, already we know, up to 68 this much value so, the remaining area is because we know the area of the normal distribution is 1, So, 1 minus remaining that value will give you the right side area

In [23]:
1 - norm.cdf(x = 68, loc= mu, scale= st_dev)

0.15865525393145707

##### if you want to know cumulative distributionof between val1 = 68 and val2 = 63.

In [24]:
norm.cdf(x = 68, loc= mu, scale= st_dev) - norm.cdf(x = 63, loc= mu, scale= st_dev)

0.6826894921370859

What is the probability of obtaining a score greater than 700 on your GMAT test that has mean 494 and standard deviation of 100? Assume GMAT score are normally distributed.
##### Note that here we need greater than 700

In [25]:
mu = 494
st_dev = 100
x = 700

In [26]:
1 - norm.cdf(x = 700, loc = mu, scale = st_dev)

0.019699270409376912

For the same GMAT examination, what is the probability that randomly drawing his score the 550 or less?

In [27]:
norm.cdf(x = 550, loc = mu, scale = st_dev)

0.712260281150973

What is the probability of randomly obtaining a score between 300 to 600 the GMAT examination?

In [28]:
norm.cdf(x = 600, loc = mu, scale = st_dev) - norm.cdf(x = 300, loc = mu, scale = st_dev)

0.8292378553956377

Now suppose the area is given, we want to know the x value, if it is a standard normal distribution. we want to know the z value because the default function is the standard normal distribution where the mean equal to 0 standard deviation 1

In [29]:
norm.ppf(0.95)

1.6448536269514722

This is Z point

In [30]:
norm.ppf(1 - 0.6772)

-0.45988328292440145

#### Hypergeometric distribution

Suppose 18 major computer companies operate in the United States and 12 are located in California's Silicon Valley. If 3 computer companies are selected randomly from their entire list. what is the probability that one or more of the selected companies are located in the Silicon Valley?

###### One or More
N = 18 ; n = 3; A = 12; x = x-1 ;
sf = 1-cdf

In [31]:
help(hypergeom.sf)

Help on method sf in module scipy.stats._distn_infrastructure:

sf(k, *args, **kwds) method of scipy.stats._discrete_distns.hypergeom_gen instance
    Survival function (1 - `cdf`) at k of the given RV.
    
    Parameters
    ----------
    k : array_like
        Quantiles.
    arg1, arg2, arg3,... : array_like
        The shape parameter(s) for the distribution (see docstring of the
        instance object for more information).
    loc : array_like, optional
        Location parameter (default=0).
    
    Returns
    -------
    sf : array_like
        Survival function evaluated at k.



In [32]:
hypergeom.sf(1-1, 18, 3, 12)

0.9754901960784306

Western city has 18 police officers eligible for promotion. 11 of 18 are Hispanic, suppose only 5 of the police officers are chosen for promotion. If the officer chosen for promotion had been selected by chance alone. what is the probability that one or fewer of the 5 promoted officers would have been his Hispanic?

###### One or fewer means cumulative probability
N = 18 ; n = 5; A = 11; x = 1 ;

In [33]:
hypergeom.cdf(1, 18, 5, 11)

0.04738562091503275

#### Exponential distribution

A manufacturing firm has involved in statistical quality control for several years. As part of the production process, parts are randomly selected and tested from the records of these tests it has been established that the defective part occur in a pattern that is a Poisson distributed on the average of 1.38 defects every 20 minutes during productionrun. Use the information to determine the probability of less than 15 minutes will elapse between any 2 defects.

##### Note : Here mean of poission distribusion has been given and second thing is the between any 2 defects. Whenever the between 2 things you have to go for exponentialdistribution

mean of exponential = 1/ mean of Poisson

In [34]:
mu_expo = 1/1.38
mu_expo

0.7246376811594204

In [35]:
help(expon.cdf)

Help on method cdf in module scipy.stats._distn_infrastructure:

cdf(x, *args, **kwds) method of scipy.stats._continuous_distns.expon_gen instance
    Cumulative distribution function of the given RV.
    
    Parameters
    ----------
    x : array_like
        quantiles
    arg1, arg2, arg3,... : array_like
        The shape parameter(s) for the distribution (see docstring of the
        instance object for more information)
    loc : array_like, optional
        location parameter (default=0)
    scale : array_like, optional
        scale parameter (default=1)
    
    Returns
    -------
    cdf : ndarray
        Cumulative distribution function evaluated at `x`



15/20 = 0.75 ; loc = 0 because y = (x-loc)/scale 

In [36]:
expon.cdf(0.75, 0, mu_expo)

0.6447736190750485