# SkPro distribution description


SkPro proposes an object oriented distribution class implementation. We provide here its implementation main principles. 

The __'DistributionBase'__ class serves as the main base abstract for all distribution concrete class. <br>
'DistributionBase' inherits itself from the 3 following classes : 

- __BaseEstimator__ : base class for all estimators in scikit-learn. All distribution class then possess the 'get_params' and 'set_params' methods present in scikit-learn.

<t>

- __DPQRMixin__ : mixin class that contains the pdf/pmf/cdf abstract methods that should be overriden in the distribution subclass (else raise a "non implemented"-like error message).

<t>

- __BasicStatsMixin__ : mixin class for some basic stats abstract methods that should again be ovrriden iin the distribution subclass (else raise a  "non implemented"-like error message) 


### Distribution class initialisation and parameters storage

Every distribution concrete classes init() method must call the distribution base constructor (i.e. super().init() ). Different skpro modular sub-components should be instantiated from the init() arguments and passed to the super().init() ( depending on the distribution attributes ). A full list and description of these components are indicated below.

For easy and flexible access, every concrete distribution has its (vectorized) parameters stored into a single skpro object container called 'parametersFrame'. A 'parametersFrame' takes a dictionary of parameters lists and stores them in a Pandas DataFrame (as member with shape : n.distribution.samples, m.parameters). 

<t>
    
This 'parameterFrame' is declared as a private member of the distribution base class. When initialized the distribution base calls its private method 'register' that get the distribution parameters (with 'get_params()' )  and instantiates the 'parameterFrame' private member.


$\;\;\;\;\;\;$



__DistributionBase class init() pseudo code :__        


```python
class parametersFrame :

    def __init__(self, data = []):
        self.data_ = data
        
    def setData(self, data):
        self.data_ = pd.DataFrame(data)
     
    (...)
    
    
class DistributionBase(BaseEstimator, BasicStatsMixin, DPQRMixin) :
    
    def __init__(self,
             dtype = distType.UNDEFINED, 
             vectorSize = 1, 
             variateComponent = VariateInfos(), 
             support = NulleSupport(),
             mode = Mode.BATCH):
              
             (...)

             self._register()
        
                
    def _register(self):

        if self.vectorSize() > 1 :
            self.paramsFrame_.setData(self.get_params())

        else : 
            dic = {}
            for key, val in self.get_params().items() :
                dic[key] = [val]

            self.paramsFrame_.setData(dic)
```
$\;\;\;\;\;\;$
        

__DistributionBase class init() arguments :__       


| | |
| :---| :--- |
|dtype | Enum specifying the distribution type [CONTINUOUS, DISCRETE, MIXED, UNDEFINED]|
|vectorSize | int specifying the distribution vector size (1 by default) |
|variateComponent |  VariateInfos structure (skpro.distributions.component.VariateInfos) that stores the dimension informations (int size and variateEnum [UNIVARIATE, MULTIVARIATE])|
|support | support object (skpro.distributions.component.Support). To be used to assess the validity of the evaluation point passed to the pdf/pmf/cdf methods (through its method inSupport(.)) |
| mode | Enum specifying the default evaluation mode to be used for the pdf/pmf/cdf methods|



$\;\;\;\;\;\;$


__Concrete distribution init() example :__


```python
class NormalDistribution(DistributionBase) :

     def __init__(self, loc = 0.0, scale = 1.0):
        
        self.loc = loc
        self.scale = scale 

        super().__init__(
                name = 'normal', 
                dtype = distType.CONTINUOUS,
                vectorSize = utils.dim(loc), 
                support = RealContinuousSupport()
        )
```






### DPQR implicit interface

The  pdf/pmf/cdf methods work with an __implicit interface implementation__. The interface methods are declared in the base distribution class. They serves as the sole methods the user would use to evaluate from the distribution.

Each of these eventually perform a series of checks before calling their respective implied method  (pdf_impl/pmf_impl/cdf_impl) that contains the implementation details. If needed they can : 1. check the validity of the distribution type for the function considered (ex. pdf should only accept continuous or mixed distribution), 2. ensure that the argument is within the support range, and 3. manage the mode of evaluation ( [BATCH, ELEMENT_WISE] ).

The base class inherits an abstract version of the implied methods from the DPQRMixin. They should  then be overriden in every concrete distribution class (with some details relative to the distribution). If no concrete override is implemented, the DPQRMixin implementation returns an error message (of type ValueError('function not implemented') ).


$\;\;\;\;\;\;$

__DPQR interface pseudo-code :__


```python
class DPQRMixin():
    
    def pdf_imp(self, X):
        raise ValueError('pdf function not implemented')
        
    (...)


        
class DistributionBase(..., DPQRMixin):
        
        (...)
        
            def pdf(self, X):
                
                if(self.dtype in [distType.DISCRETE, distType.UNDEFINED]):
                    raise ValueError('pdf function not permitted for non continuous distribution')
                
                if(not self.support().inSupport(X)):
                    raise ValueError('X is outside permitted support')
                    
                if(self.mode_ is Mode.ELEMENT_WISE):
                    self.pdf_imp = self.elementWiseDecorator(self.pdf_imp)
        
                return self.pdf_imp(X)
```


### Mode Mechanism

### ULM Summury

<img src="skpro_distribution.png">