# SkPro distribution description


SkPro proposes an object oriented distribution class implementation. We provide here its implementation main principles. 

The __'DistributionBase'__ class serves as the main base abstract for all distribution concrete class. <br>
'DistributionBase' inherits itself from the 3 following classes : 

- __BaseEstimator__ : scikit-learn base class for estimators, used to inherit the 'get_params' and 'set_params' scikit-learn functionalities.

<t>

- __DPQRMixin__ : mixin class that contains the pdf/pmf/cdf abstract methods that should be overriden in the distribution subclass (else raise a "non implemented"-like error message).

<t>

- __BasicStatsMixin__ : mixin class for some basic stats abstract methods that should again be ovrriden iin the distribution subclass (else raise a  "non implemented"-like error message) 


### ULM

<img src="skpro_distribution.png">

### 1. Distribution class initialisation and parameters storage

Concrete distribution initialization method must call the distribution base constructor (i.e. super().init()). The base class constructor takes itself as arguments different skpro modular sub-components specifying the distribution attributes. A full list and description of these components are indicated below.



__DistributionBase class init() pseudo code :__        


```python
class DistributionBase(BaseEstimator, BasicStatsMixin, DPQRMixin) :
    
    def __init__(self,
             dtype = distType.UNDEFINED, 
             vectorSize = 1, 
             variateComponent = VariateInfos(), 
             support = NulleSupport(),
             mode = Mode.BATCH):
             (...)

```


__DistributionBase class init() arguments :__


|Parameter |Description|
|:---|:---|
|dtype | Enum specifying the distribution type [CONTINUOUS, DISCRETE, MIXED, UNDEFINED]|
|vectorSize | int specifying the distribution vector size (1 by default) |
|variateComponent |  VariateInfos structure (skpro.distributions.component.VariateInfos) that stores the dimension informations (int size and variateEnum [UNIVARIATE, MULTIVARIATE])|
|support | support object (skpro.distributions.component.Support). To be used to assess the validity of the evaluation point passed to the pdf/pmf/cdf methods (through its method inSupport(.)) |
| mode | Enum specifying the default evaluation mode to be used for the pdf/pmf/cdf methods|

__Concrete distribution init() example :__


```python
class NormalDistribution(DistributionBase) :

     def __init__(self, loc = 0.0, scale = 1.0):
        
        self.loc = loc
        self.scale = scale 

        super().__init__(
                name = 'normal', 
                dtype = distType.CONTINUOUS,
                vectorSize = utils.dim(loc), 
                support = RealContinuousSupport()
        )
```

### 2. DPQR implicit interface

The evaluation methods implementation rely on a __implicit interface__ design, to separate some processing/checking common to all distribution from the more specific implementation details. 

The interface methods (pdf/pmf/cdf) are the ones visible and used by the the user. Each of these first perform a series of checks i.e. : 1. assess the validity of the distribution type for the function considered (ex. pdf should only accept continuous or mixed distribution), 2. ensure that the argument is within the support range, or 3. manage the mode of evaluation ([BATCH, ELEMENT_WISE] - describe later). 

And then call some abstracted implied methods (pdf_impl/pmf_impl/cdf_impl) that must contains the implementation details. They are declared in the DPQRMixin and must be overriden in the sub-concrete class. If no concrete override is implemented, the DPQRMixin implementation raise an error by default.


__DPQR interface pseudo-code :__


```python
class DPQRMixin():
    """Mixin class containing the abstract implied methods declaration
    """
    
    def pdf_imp(self, X):
        raise ValueError('pdf function not implemented')
        
    (...)

      
class DistributionBase(..., DPQRMixin):  
    (...)
        
    def pdf(self, X):
    """ Main interface for the pdf method.
    """     
        if(self.dtype in [distType.DISCRETE, distType.UNDEFINED]):
            raise ValueError('pdf function not permitted for non continuous distribution')
                
        if(not self.support().inSupport(X)):
            raise ValueError('X is outside permitted support')
                    
        if(self.mode_ is Mode.ELEMENT_WISE):
            self.pdf_imp = self.elementWiseDecorator(self.pdf_imp)
        
        return self.pdf_imp(X)
```


### 3. Mode Mechanism for evaluation

For a vectorized distribution object, evaluation functions (cdf, pdf, pmf, ...) can be called in two different mode. 
Assuming a m.size distribution object and a n.size samples of evaluation point :
                
- __[BATCH]__ evaluation mode [active by-default], evaluates on a each-for-each basis, i.e. returns a nxm matrix output if (n > 1) or a mx1 vector if (n = 1).

<t>
                   
- __[ELEMENT_WISE]__ evaluation mode evaluates on a one for one basis. It repeats the sequence of distribution p_i until there are m, i.e., p_1,...,p_n,p_1,p_2,...,p_n,p_1,...,p_m' where m is the remainder of dividing m by n. Thus will output a m sized array.


Every implied evaluation methods should be implemented by default assuming a BATCH mode. The parameters arguments should be called using a private method 'get_cached_param' declared in the distribution base class. It returns the vectorized parameters sliced by a private 'cached_index' member. By default 'cached_index' is set (and reset) to 'slice(None)' which corresponds to a BATCH evaluation where all distributions are evaluated. 


__Default BATCH evaluation mode pseudo-code :__

```python
class DistributionBase(...):
    
    def __init__(self, ...) :
        (...)
        self.cached_index_ = slice(None)
        self._register()
    
    (...)
    
    def reset(self):
        self.cached_index_ = slice(None)
    
    def get_cached_param(self, key):
        """ private method that return a list containing the keyed parameter sliced by the 'cached_index' member. 
        """
        if not isinstance(key, str):
             raise ValueError('key index must be a parameter string')

        return np.array(self.paramsFrame_.getParameter(key)[self.cached_index_])

    
class NormalDistribution(DistributionBase) :  
    
    def __init__(self, loc = 0.0, scale = 1.0):
        self.loc = loc
        self.scale = scale
        (...)
    
    def pdf_imp(self, X):
        loc = self.get_cached_param('loc')
        scale = self.get_cached_param('scale')        
        (...)
            
        return results   
```

The __ELEMENT_WISE mode__ (if activated) operates by decorating the the implied methods before being called in the interface methods. The 'elementWiseDecorator' wrapp the implied methods into a new methods that loops through the distributions and only evaluate for each distribution the corresponding samples (according to the element-wise rule). The results are then aggregated back into a result list

The iterative subseting of the vectorized distribution is made by simply modifying iteratively within the loop the 'cached_index'. The implied evaluation methods will then automatically call the adequate distribution parameters (through get_cached_param() ).

__ELEMENT_WISE evaluation mode pseudo-code :__


```python
class DistributionBase(...):
    (...)
    
    def elementWiseDecorator(self, fn):
        """ Decorate the pdf/pmf/cdf implied methods to perform an element wise evaluation.
        """
        def wrapper(X, *args):
  
            result = [0]*dim(X)
            step = min(self.vectorSize, dim(X))
            
            for index in range(step) :
                self.cached_index_ = index
                s = slice(index, dim(X) , step)
                at = X[s]
                result[s] = fn(at)
                
            self.reset()
            
            return result
        
         return wrapper
        
        
    def pdf(self, X):
    """ Main interface for the pdf method.
    """
        (...)

        if(self.mode_ is Mode.ELEMENT_WISE):
            func = self.elementWiseDecorator(self.pdf_imp)
            return func(X)
        
        return self.pdf_imp(X)
```

The 'Mode' of a distribution object can be accessed or changed by the user using the following methods:
- '__getMode__' output the current active Mode 
- __setMode(.)__' reset the current mode to (.). Accept a 'Mode' enum argument [Mode.ELEMENT_WISE, Mode.BARCH]

In [3]:
#@element_wise pdf/cdf on size 2 vectorized univariate
from skpro.distributions.distribution_base import Mode
from skpro.distributions.distribution_normal import Normal

n = Normal([0, 0.1], [1, 1.2])
print('default mode: ' + str(n.getMode()))

n.setMode(Mode.ELEMENT_WISE)
print('reset to: ' + str(n.getMode()))

default mode: Mode.BATCH
reset to: Mode.ELEMENT_WISE
