# Tutorial from the User Guide for scipy.stats

In [None]:
# General imports
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Import scipy
from scipy import stats

In [None]:
# Basic help information
print(stats.norm.__doc__)

In [None]:
print(f'bounds of distribution: lower={stats.norm.support()[0]}, upper={stats.norm.support()[1]}')

In [None]:
# List all the methods and properties of the distribution
dir(stats.norm)

In [None]:
# List the "real" main methods by listing the methods of a frozen distribution
rv = stats.norm()
dir(rv)

In [None]:
# We can obtain the list of available distributions through introspection.
dist_continu = [d for d in dir(stats) if isinstance(getattr(stats, d), stats.rv_continuous)]
dist_continu

In [None]:
dist_discrete = [d for d in dir(stats) if isinstance(getattr(stats, d), stats.rv_discrete)]
dist_discrete

## Common methods

The main public methods for continuous random variables (RVs) are:

- rvs: Random Variables
- pdf: Probability Density Function
- cdf: Cumulative Distribution Function
- sf: Survival Function (1 - CDF)
- ppf: Percent Point Function (Inverse of CDF)
- isf: Inverse Survival Function (Inverse of SF)
- stats: Return mean, variance, (Fisher's) skew, or (Fisher's) kurtosis
- moment: non-central moments of the distribution

In [None]:
# Let's take a normal random variable as an example
stats.norm.cdf(0)

In [None]:
# To compute the cdf at a number of points, one can pass a list or a `numpy` array
stats.norm.cdf([-1., 0, 1])

In [None]:
stats.norm.cdf(np.array([-1, 0, 1]))

Thus, the basic methods, such as `pdf`, `cdf`, and so on, are vectorized.

In [None]:
# Other generally useful methods are also supported:
stats.norm.mean(), stats.norm.std(), stats.norm.var()

In [None]:
stats.norm.stats(moments='mv')  # [m]ean and [v]ariance

In [None]:
# To find the median of a distribution, we can use the percent point
# function, `ppf`, which is the inverse of the `cdf`
median = stats.norm.ppf(0.5)
median

In [None]:
# To generate a sequence of random variates, use the `size` keyword argument
# to `rvs`
stats.norm.rvs(size=3)

In [None]:
# WARNING: Don't think that `stats.norm.rvs(5)` generates 5 variates:
stats.norm.rvs(5)

In this call, 5 is assigned to the first possible keyword value, `loc` which is the first of a pair of keyword arguments taken by **all** distributions.

## Random number generation (skipped - for now)

## Shifting and scaling

All continuous distributions take `loc` and `scale` as keyword parameters to adjust the location and scale of the distribution. For example, for the standard normal deviation, the location and scale map to mean and standard deviation, respectively.

In [None]:
stats.norm.stats(loc=3, scale=4, moments='mv')

In many cases, the standardized distribution for a random variable, $X$, is obtained through the transformation, $(X - loc) / scale$. The default values for these keyword arguments are `loc=0` and `scale=1`.

Smart use of `loc` and `scale` can help modify the standard distributions in many ways. To illustrate scaling further, the CDF of an exponentially distributed random variable with mean $\frac{1}{\lambda}$ is given by

$$
    F(x) = 1 - \exp(-\lambda x)
$$

By applying the scaling rule above, it can be seen that taking `scale = 1./lambda` we get the proper scale.

In [None]:
stats.expon.mean(scale=3.)

The uniform distribution is also interesting:

In [None]:
# The lower bound is `loc` and the length is `scale`; that is,
# the upper bound is `loc + scale`.
stats.uniform.cdf([0, 1, 2, 3, 4, 5], loc=1, scale=4)

We recommend that you set `loc` and `scale` parameters **explicitly**, by passing the values as keywords rather than position. Repetition can be minimized when calling more than one method of a distribution by using the technique of "Freezing a Distribution," as explained below.

## Shape parameters (skipped for now)

## Freezing a distribution

Passing the `loc` and `scale` parameters time and again can become quite bothersome. The concept of _freezing_ a random variable is used to solve such problems.

In [None]:
rv = stats.gamma(a=1, scale=2)

Now, when we use `rv`, we no longer have to include the scale or the shape parameters.

Consequently, distributions can be used in one of two ways:

- Passing all the distribution parameters to each method call
- Freezing the parameters for the instance of the distribution

In [None]:
# The mean and standard deviation of the frozen distribution
rv.mean(), rv.std()