# SkPro distribution interface



SkPro proposes an object oriented distribution class implementation. Every class inherits some cdf, pmf or pdf functions (depending wether the distribution is discrete or continuous).

In [1]:
#@title IMPORTS
import skpro
from skpro.distributions.distribution_normal import NormalDistribution
from skpro.distributions.distribution_multivariate_normal import MultiVariateNormal

## 1. Distribution basics

In [2]:
# univariate normal instanciation
n = NormalDistribution(loc = 0, scale = 1)
print(n)

NormalDistribution(loc=0, scale=1)


All distribution classes have a "scikit-learn" like paramerters interfacing through inheritence of the scikit-learn  "BaseEstimator" class. It thus possesses the "get_params" and "set_params" methods.
- '__get_params__' : returns a dict of the '__init__' parameters of the estimator, together with their values.
- '__set_params__' : takes as input a dict of the form {parameter, value} and sets the parameter of the estimator using this dict. Return value must be estimator itself.

In [3]:
n = NormalDistribution()

print("default parameters :")
params = n.get_params()
print(params) 
print("\t")

print("modified parameters :")
n.set_params(loc = 3.0)
params = n.get_params()
print(params)
print("\t")

print("overwritten parameters :")
parameters = {'loc': 0.0, 'scale': 2.0}
n.set_params(**parameters)
print(n.get_params())

default parameters :
{'loc': 0.0, 'scale': 1.0}
	
modified parameters :
{'loc': 3.0, 'scale': 1.0}
	
overwritten parameters :
{'loc': 0.0, 'scale': 2.0}


Basic member accessor are implemented for each distribution class :
- __name()__ : return the string tag of the distribution
- __support()__ : return the support object of the distribution (with a method 'inSupport(x)' returing True if x is in the support)
- __dtype()__ : return the type of the distribution ['UNDEFINED', 'DISCRETE', 'CONTINUOUS, 'MIXED']
- __variateSize()__ : return the dimension (i.e. 1 for univariate)
- __vectorSize()__ : return the vector size

Basic stats method :
- __point()__ : returns what is defined as the point estimate of the distribution
- __mean()__
- __variance()__
- __std()__
- __mode()__

Basic dpqr methods :
- __pdf(X)__ : returns the pdf of (X) if the distribution type is defined as 'CONTINUOUS' or 'MIXED' else raise an error
- __pmf(X)__ : returns the pmf of (X) if the distribution type is defined as 'DISCRETE' or 'MIXED' else raise an error
- __cdf(X)__ : returns the cdf of (X) if the distribution
- __squared_norm()__ : returns the L2 norm of the distribution if implemented else raise an error


In [4]:
n = NormalDistribution()
print(n)
print('name : ' + str(n.name()))
print('type : ' + str(n.dtype()))
print("size : " + str(n.vectorSize()))
print("dimension : " + str(n.variateSize()))
print(n.support())
print("\t")

x = 0
if(n.support().inSupport(x)):
    print("pdf(0) :" + str(n.pdf(x)))
    print("cdf(0) :" + str(n.cdf(x)))

      

NormalDistribution(loc=0.0, scale=1.0)
name : normal
type : distType.CONTINUOUS
size : 1
dimension : 1
<skpro.distributions.component.support.RealContinuousSupport object at 0x000002D68A9CFF88>
	
pdf(0) :[0.39894228]
cdf(0) :[0.5]


## 2. Vectorization

The distribution object can be vectorized within a single object instanciation. A dictionary of parameters is processed within the distributionBase '__init__' method and kept as member. The subset of the vector can be accessed through: 
- '__get_params(index)__' that returns a dict of parameters for the given index. It is implemented as an override of the 'BaseEstimator' 'get_params'. Without indexing it simply calls the base global "get_params" otherwise it returns the indexed parameters set only.
- '__getitem(slice)__' that enables the evaluation of self[slice]. It returns a distribution object copy containing only the sliced subset of the distribution.

In [5]:
n = NormalDistribution([5, 10, 0], [20, 30, 10])
print("size : " + str(n.vectorSize()))
print("dimension : " + str(n.variateSize()))
print("\t")

print("base parameter sets dictionary :")
print(n.get_params())
print("\t")

print("first distribution function parameters set :")
print(n.get_params(0))
print("\t")

print("second distributon function parameters set :")
print(n.get_params(1))
print("\t")

print("third distributon function parameters set :")
print(n.get_params(2))
print("\t")

print("second distribution :")
print(n[1])
print("\t")

print("first two distribution :")
print(n[0:1])
print("\t")

print("second distributon function parameters set :")
print(n[1].get_params())


size : 3
dimension : 1
	
base parameter sets dictionary :
{'loc': [5, 10, 0], 'scale': [20, 30, 10]}
	
first distribution function parameters set :
{'loc': 5, 'scale': 20}
	
second distributon function parameters set :
{'loc': 10, 'scale': 30}
	
third distributon function parameters set :
{'loc': 0, 'scale': 10}
	
second distribution :
NormalDistribution(loc=10, scale=30)
	
first two distribution :
NormalDistribution(loc=[5, 10], scale=[20, 30])
	
second distributon function parameters set :
{'loc': 10, 'scale': 30}


For a vectorized distribution object, evaluation functions (cdf, pdf, pmf, ...) can be called in different mode. Assuming a m.size distribution object and a n.size samples of evaluation point : 

In [6]:
#using a size 3 vectorize univariate normal object
n = NormalDistribution([0, 0.1], [1, 1.2])
evaluation_array = [0, 0.25, 0.5]

1. '__batch__' evaluation mode [active by-default] : evaluates on a each-for-each basis, 
    i.e. returns a nxm matrix output if (n > 1) or a mx1 vector if (n = 1)

In [7]:
#@point pdf/cdf 
print("cdf 'single 0' evaluation:")
print(n.pdf(0))
print("\t")

print("pdf 'single 0' evaluation:")
print(n.cdf(0))

cdf 'single 0' evaluation:
[0.39894228 0.33129956]
	
pdf 'single 0' evaluation:
[0.5        0.46679325]


In [8]:
#@batch_wise pdf/cdf

print("pdf 'batch' evaluation:")
print(n.pdf(evaluation_array))
print("\t")

print("cdf 'batch' evaluation:")
print(n.cdf(evaluation_array))

pdf 'batch' evaluation:
[[0.39894228 0.33129956]
 [0.38666812 0.32986474]
 [0.35206533 0.31448602]]
	
cdf 'batch' evaluation:
[[0.5        0.46679325]
 [0.59870633 0.54973822]
 [0.69146246 0.63055866]]


2. __'element_wise'__ mode evaluates on a one for one basis. It repeats the sequence of distribution p_i until there are m, i.e., p_1,...,p_n,p_1,p_2,...,p_n,p_1,...,p_m' where m is the remainder of dividing m by n. Thus will output a m sized array.

The 'Mode' can be accessed or changed using the following methods:
- '__getMode__' poutput the current active Mode 
- __setMode(.)__' reset the current mode to (.). Accept a 'Mode' enum argument [Mode.ELEMENT_WISE, Mode.BARCH]

In [9]:
#@element_wise pdf/cdf on size 2 vectorized univariate
from skpro.distributions.distribution_base import Mode

n = NormalDistribution([0, 0.1], [1, 1.2])
print('default mode: ' + str(n.getMode()))

n.setMode(Mode.ELEMENT_WISE)
print('reset to: ' + str(n.getMode()))
print("\t")

print("cdf 'element-wise' evaluation:")
print(n.pdf(evaluation_array))
print("\t")

print("pdf 'element-wise' evaluation:")
print(n.cdf(evaluation_array))


default mode: Mode.BATCH
reset to: Mode.ELEMENT_WISE
	
cdf 'element-wise' evaluation:
[0.3989422804014327, 0.3298647391206246, 0.3520653267642995]
	
pdf 'element-wise' evaluation:
[0.5, 0.5497382248301129, 0.691462461274013]


## 2. Multivariate distributions

Multivariate distributions can be instanciated in a similar way. See below with a bivariate normal

In [10]:
#@ instanciation of a bivariate normal
covariance = [[1, 0],[0, 1]]
nbiv = MultiVariateNormal([0, 0.5], covariance)
    
print("size : " + str(nbiv.vectorSize()))
print("dimension : " + str(nbiv.variateSize()) + "\n")

print("instanciated parameters :")
print(nbiv.get_params())
print("\t")

size : 1
dimension : 2

instanciated parameters :
{'cov': <skpro.distributions.covariance.CovarianceMatrix object at 0x000002D68A9D7D08>, 'loc': [0, 0.5]}
	


Here the multivariate normal distribution processes the covariance array and convert it within the 'init' method into a 'CovarianceMatrix' object member. 

In [11]:
covobj = nbiv.get_params()['cov']

print('cov object type:')
print(type(covobj))
print("\t")

print('cov print:')
print(covobj)


cov object type:
<class 'skpro.distributions.covariance.CovarianceMatrix'>
	
cov print:
<skpro.distributions.covariance.CovarianceMatrix object at 0x000002D68A9D7D08>


Multivariate distributions can be be vectorized again in a similar way:

In [12]:
#@ vectorization of a size 2 bivariate normal
covariance_1 = [[1, 0],[0, 1]]
covariance_2 = [[1, 0.5],[0.5, 1]]
n = MultiVariateNormal(loc = [[0, 0.5], [0, 0]], cov = [covariance_1, covariance_2])

print("size : " + str(n.vectorSize()))
print("dimension : " + str(n.variateSize()) + "\n")

print("base parameter sets dictionary :")
print(n.get_params())
print("\t")

print("first bivariate distribution parameters set :")
print(n.get_params(0))
print("\t")

print("second bivariate distributon parameters set :")
print(n.get_params(1))
print("\t")

size : 2
dimension : 2

base parameter sets dictionary :
{'cov': [<skpro.distributions.covariance.CovarianceMatrix object at 0x000002D68A9DCA48>, <skpro.distributions.covariance.CovarianceMatrix object at 0x000002D68A9DC3C8>], 'loc': [[0, 0.5], [0, 0]]}
	
first bivariate distribution parameters set :
{'cov': <skpro.distributions.covariance.CovarianceMatrix object at 0x000002D68A9DCA48>, 'loc': [0, 0.5]}
	
second bivariate distributon parameters set :
{'cov': <skpro.distributions.covariance.CovarianceMatrix object at 0x000002D68A9DC3C8>, 'loc': [0, 0]}
	
