# Probability and Probability Distributions:

## Scipy
* package for math, science, and engineering
* Built on top of Numpy
* Provides function for e.g. statistics (different distributions), linear algebra, interpolation and much more

In [14]:
import numpy as np
import pandas as pd

### The `stats` submodule in scipy:

In [4]:
# list of numbers
mylist = [3, 3, 4, 5, 6, 7, 8]

In [3]:
import scipy.stats
help(scipy.stats)

Help on package scipy.stats in scipy:

NAME
    scipy.stats - .. _statsrefmanual:

DESCRIPTION
    Statistical functions (:mod:`scipy.stats`)
    
    .. currentmodule:: scipy.stats
    
    This module contains a large number of probability distributions as
    well as a growing library of statistical functions.
    
    Each univariate distribution is an instance of a subclass of `rv_continuous`
    (`rv_discrete` for discrete distributions):
    
    .. autosummary::
       :toctree: generated/
    
       rv_continuous
       rv_discrete
       rv_histogram
    
    Continuous distributions
    
    .. autosummary::
       :toctree: generated/
    
       alpha             -- Alpha
       anglit            -- Anglit
       arcsine           -- Arcsine
       argus             -- Argus
       beta              -- Beta
       betaprime         -- Beta Prime
       bradford          -- Bradford
       burr              -- Burr (Type III)
       burr12            -- Burr (Type XII)
   

### `mode` function from scipy

In [7]:
from scipy.stats import mode
mode(mylist)

# the results tell you that the mode is 3 and it appears two times!
# if you want JUST the number, youd have to do something like:
# mode(mylist)[0][0]

3

In [9]:
scipy.stats.describe(mylist)

DescribeResult(nobs=7, minmax=(3, 8), mean=5.142857142857143, variance=3.80952380952381, skewness=0.2223476479805886, kurtosis=-1.3526562500000006)

### Contingency tables:

* Also called a Cross Tabulation.
* Displays the multivariate frequency distribution of variables

In [18]:
hair = ['brown', 'black', 'black', 'brown', 'blond', 'blond']
eyes = ['blue', 'brown', 'brown', 'green', 'blue', 'green']
zip(hair, eyes)
df = pd.DataFrame(zip(hair, eyes), columns = ['hair','eyes'])
df

Unnamed: 0,hair,eyes
0,brown,blue
1,black,brown
2,black,brown
3,brown,green
4,blond,blue
5,blond,green


In [None]:
# OR: 

# df = pd.DataFrame()
# df['hair'] = hair
# df['eyes'] = eyes
# df

In [19]:
pd.crosstab(df['hair'], df['eyes'])

# Results show you the # of people or observations in each bucket! 
# e.g. everyone with black hair has brown eyes (not green or blue!)
# e.g. people with blond hair have either blue or green eyes! 

eyes,blue,brown,green
hair,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
black,0,2,0
blond,1,0,1
brown,1,0,1


### The `linalg` Submodule

In [20]:
import scipy.linalg

In [21]:
help(scipy.linalg)

Help on package scipy.linalg in scipy:

NAME
    scipy.linalg

DESCRIPTION
    Linear algebra (:mod:`scipy.linalg`)
    
    .. currentmodule:: scipy.linalg
    
    Linear algebra functions.
    
    .. eventually, we should replace the numpy.linalg HTML link with just `numpy.linalg`
    
    .. seealso::
    
       `numpy.linalg <https://www.numpy.org/devdocs/reference/routines.linalg.html>`__
       for more linear algebra functions. Note that
       although `scipy.linalg` imports most of them, identically named
       functions from `scipy.linalg` may offer more or slightly differing
       functionality.
    
    
    Basics
    
    .. autosummary::
       :toctree: generated/
    
       inv - Find the inverse of a square matrix
       solve - Solve a linear system of equations
       solve_banded - Solve a banded linear system
       solveh_banded - Solve a Hermitian or symmetric banded system
       solve_circulant - Solve a circulant system
       solve_triangular - Solve a

In [22]:
# calling the help function allows you see what specific package or method you'd want to import (e.g. from scipy.stats import mode!)

# Probability Distributions

## Discrete Probability Distributions:

In [26]:
from scipy.stats import randint
# help(scipy.stats)

In [34]:
uni = randint(1, 7)

In [35]:
# help(randint.pmf)
uni.pmf(7) 
# PMF: this gives us the probability of a single number occuring! 

0.0

In [36]:
uni.pmf(1)

0.16666666666666666

In [38]:
uni.cdf(8)
# CDF: is the cumulative, note the difference!

1.0

In [39]:
uni.cdf(4)

0.6666666666666666

In [40]:
uni.cdf(1000)
# note the CDF tends to 1 the larger it is

1.0

In [41]:
uni.cdf(-5)
# note the CDF tends to zero the closer to negative infinity it goes

0.0

#### What happens if I roll a die x amount of times?

Use method rvs (random variable sample)

In [45]:
uni.rvs(size = 10)
# as if you had rolled the die 10 times

array([1, 4, 3, 2, 5, 2, 6, 1, 5, 3])

In [44]:
uni.rvs(size = 1000)
# as if you had rolled the die 1000 times

array([3, 1, 4, 3, 2, 1, 3, 4, 5, 6, 6, 1, 3, 5, 4, 6, 1, 2, 4, 2, 2, 4,
       1, 5, 6, 1, 4, 6, 5, 6, 2, 4, 2, 3, 3, 6, 5, 2, 4, 4, 2, 6, 4, 1,
       3, 6, 1, 6, 5, 3, 4, 3, 5, 1, 4, 6, 4, 6, 5, 4, 4, 6, 3, 4, 2, 2,
       3, 2, 5, 3, 1, 2, 2, 3, 1, 5, 6, 1, 2, 4, 3, 6, 5, 3, 6, 2, 1, 6,
       1, 3, 5, 2, 4, 6, 6, 5, 4, 4, 6, 2, 3, 1, 2, 1, 6, 5, 3, 1, 4, 5,
       6, 3, 4, 4, 5, 2, 1, 6, 2, 4, 3, 6, 3, 5, 4, 4, 5, 6, 5, 6, 1, 2,
       2, 5, 5, 4, 5, 6, 4, 6, 1, 6, 3, 5, 4, 1, 4, 5, 5, 1, 2, 2, 5, 1,
       3, 1, 4, 2, 2, 3, 1, 4, 3, 6, 4, 2, 6, 4, 3, 3, 1, 1, 1, 5, 5, 2,
       6, 2, 1, 3, 1, 2, 6, 3, 6, 5, 2, 4, 5, 1, 6, 2, 5, 6, 5, 5, 1, 2,
       3, 2, 2, 3, 4, 3, 1, 2, 2, 4, 4, 4, 2, 5, 3, 5, 1, 5, 3, 1, 4, 1,
       1, 3, 6, 6, 4, 1, 6, 5, 3, 3, 2, 2, 6, 4, 2, 1, 6, 6, 4, 1, 2, 4,
       1, 1, 5, 5, 5, 2, 6, 3, 5, 1, 4, 3, 6, 1, 6, 6, 3, 3, 2, 4, 6, 4,
       3, 2, 4, 2, 1, 3, 1, 6, 6, 3, 3, 4, 2, 1, 6, 6, 2, 1, 4, 2, 3, 4,
       1, 6, 2, 6, 6, 2, 4, 5, 6, 1, 3, 4, 1, 6, 3,

#### Bernoulli Distribution

#### Binomial Distribution

#### Poisson Distribution

## Continuous Distributions:

#### Continuous Uniform Distributions: