# Statistics (scipy.stats)

https://docs.scipy.org/doc/scipy/reference/tutorial/stats.html

## Random Variables

超过80种[连续统计分布](https://docs.scipy.org/doc/scipy/reference/tutorial/stats/continuous.html#continuous-random-variables)、10种[离散统计分布](https://docs.scipy.org/doc/scipy/reference/tutorial/stats/discrete.html#discrete-random-variables)


In [1]:
# 汇入套件
from scipy import stats

import numpy as np

# 汇入个别物件
from scipy.stats import norm

### Getting Help

In [2]:
print stats.norm.__doc__

A normal continuous random variable.

    The location (loc) keyword specifies the mean.
    The scale (scale) keyword specifies the standard deviation.

    As an instance of the `rv_continuous` class, `norm` object inherits from it
    a collection of generic methods (see below for the full list),
    and completes them with details specific for this particular distribution.
    
    Methods
    -------
    ``rvs(loc=0, scale=1, size=1, random_state=None)``
        Random variates.
    ``pdf(x, loc=0, scale=1)``
        Probability density function.
    ``logpdf(x, loc=0, scale=1)``
        Log of the probability density function.
    ``cdf(x, loc=0, scale=1)``
        Cumulative distribution function.
    ``logcdf(x, loc=0, scale=1)``
        Log of the cumulative distribution function.
    ``sf(x, loc=0, scale=1)``
        Survival function  (also defined as ``1 - cdf``, but `sf` is sometimes more accurate).
    ``logsf(x, loc=0, scale=1)``
        Log of the survival function.
   

In [3]:
print norm.a

-inf


In [4]:
# list all methods and properties of the distribution 
dir(norm)

['__call__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__doc__',
 '__format__',
 '__getattribute__',
 '__getstate__',
 '__hash__',
 '__init__',
 '__module__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_argcheck',
 '_argcheck_rvs',
 '_cdf',
 '_cdf_single',
 '_cdfvec',
 '_construct_argparser',
 '_construct_default_doc',
 '_construct_doc',
 '_ctor_param',
 '_entropy',
 '_fit_loc_scale_support',
 '_fitstart',
 '_isf',
 '_logcdf',
 '_logpdf',
 '_logsf',
 '_mom0_sc',
 '_mom1_sc',
 '_mom_integ0',
 '_mom_integ1',
 '_munp',
 '_nnlf',
 '_nnlf_and_penalty',
 '_open_support_mask',
 '_parse_args',
 '_parse_args_rvs',
 '_parse_args_stats',
 '_pdf',
 '_penalized_nnlf',
 '_ppf',
 '_ppf_single',
 '_ppf_to_solve',
 '_ppfvec',
 '_random_state',
 '_reduce_func',
 '_rvs',
 '_sf',
 '_stats',
 '_stats_has_moments',
 '_support_mask',
 '_unpack_loc_scale',
 '_updated_ctor_param',
 'a',
 'b',
 '

In [5]:
# obtain the real main methods
rv = norm()
dir(rv)

['__class__',
 '__delattr__',
 '__dict__',
 '__doc__',
 '__format__',
 '__getattribute__',
 '__hash__',
 '__init__',
 '__module__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'a',
 'args',
 'b',
 'cdf',
 'dist',
 'entropy',
 'expect',
 'interval',
 'isf',
 'kwds',
 'logcdf',
 'logpdf',
 'logpmf',
 'logsf',
 'mean',
 'median',
 'moment',
 'pdf',
 'pmf',
 'ppf',
 'random_state',
 'rvs',
 'sf',
 'stats',
 'std',
 'var']

In [6]:
import warnings
warnings.simplefilter('ignore', DeprecationWarning)

dist_continu = [d for d in dir(stats) if isinstance(getattr(stats,d), stats.rv_continuous)]
dist_discrete = [d for d in dir(stats) if isinstance(getattr(stats,d), stats.rv_discrete)]

print 'number of continuous distributions: ', len(dist_continu)
print 'number of discrete distributions: ', len(dist_discrete)

number of continuous distributions:  95
number of discrete distributions:  13


### Common Methods

- rvs: Random Variates
- pdf: Probability Density Function
- cdf: Cumulative Distribution Function
- sf: Survival Function (1-CDF)
- ppf: Percent Point Function (Inverse of CDF)
- isf: Inverse Survival Function (Inverse of SF)
- stats: Return mean, variance, (Fisher’s) skew, or (Fisher’s) kurtosis
- moment: non-central moments of the distribution

In [7]:
# cdf: Cumulative Distribution Function
norm.cdf(0)

0.5

In [8]:
norm.cdf([-1., 0, 1])

array([ 0.15865525,  0.5       ,  0.84134475])

In [9]:
norm.mean(), norm.std(), norm.var()

(0.0, 1.0, 1.0)

In [10]:
# ppf: Percent Point Function (Inverse of CDF) 
norm.ppf(0.5)

0.0

In [11]:
# 产生三个随机数
norm.rvs(size=3)

array([-0.13028443, -0.02539027, -0.0518032 ])

In [12]:
# 重新设定随机种子
np.random.seed(1234)
norm.rvs(size=3)

array([ 0.47143516, -1.19097569,  1.43270697])

In [13]:
# 指定随机种子
norm.rvs(size=3, random_state=1234)

array([ 0.47143516, -1.19097569,  1.43270697])

### Shifting and Scaling

### Shape Parameters

### Freezing a Distribution

### Broadcasting

### Specific Points for Discrete Distributions

### Fitting Distributions

### Performance Issues and Cautionary Remarks

### Remaining Issues

## Building Specific Distributions

### Making a Continuous Distribution, i.e., Subclassing `rv_continuous`

### Subclassing `rv_discrete`

## Analysing One Sample

### Descriptive Statistics

### T-test and KS-test

### Tails of the distribution

### Special tests for normal distributions

## Comparing two samples

### Comparing means

### Kolmogorov-Smirnov test for two samples ks_2samp

## Kernel Density Estimation 

### Univariate estimation

### Multivariate estimation