# Sets of random variables

## Introduction

In Fesslix, sets of random variables can be defined. In this context, a set is a collection of random variables. Sets and random variables are addressed by their unique ID. Each set of random variables has a unique ID of type rvSetID. All random variables within a specific set have a unique ID of type rvID. The full ID (rvFullID) of a random variable is a
combination of the set-ID and its internal ID: `rvSetID::rvID`.

In [1]:
import fesslix as flx
flx.load_engine()
import numpy as np

Random Number Generator: MT19937 - initialized with rand()=775222211;
Random Number Generator: MT19937 - initialized with 1000 initial calls.


## Sets of general random variables
### Definition

```{eval-rst}
.. function:: flx.rv_set

    Syntax:
        ``flx.rv_set( config_set, rv_lst )``

    Description:
        Returns a set of general random variables of type :class:`flx.rvset`.
        
    :param config_set: The quantile at which to evaluate the inverse of the CDF. The following keys are allowed in `config_set`:
    
         - ``name`` (type :type:`rvSetID`): The name of the set of random variables to create.
         - ``is_Nataf`` (*bool*, default: *False*): ``True``: the set is based on the Nataf transformation; ``False``: the set is based on the Rosenblatt transformation.

     In case of the **Rosenblatt transformation** (i.e., for ``config_set['is_Nataf']=False``), additionally, the following keys can be specified:
    
         - ``parents`` (*list*, default: *[]*): A list (of already defined sets or random variables) on which the set to be defined is to be conditioned on. The list entries must be of type :type:`rvSetID`. Used this key if the parameters of random variables in the current set depend on the values of random variables in another set.
         - ``allow_x2y`` (*bool*, default: *False*): By default, only the transformation from standard normal space to original space is supported by this set. However, sometimes the reverse transformation is required – which is more involved from a mathematical and numerical point of view. By setting this parameter to ``True``, the reverse transformation is activated if possible.
     
     In case of the **Nataf transformation** (i.e., for ``config_set['is_Nataf']=True``), additionally, the following keys can be specified:
    
         - ``corr`` (*list*, default: *[]*): A list of correlations. The entries in the list must be of type *dict*; the following keys are allowed:

            - ``rv_1`` (:type:`rvID`): identifier (name) of random variable in current set
            - ``rv_2`` (:type:`rvID`): identifier (name) of random variable in current set (must be different from `rv_1`)
            - ``value`` (*float*): value of the correlation coefficient of the pair of random variables `rv_1` and `rv_2`.
            - ``corr_approx`` (*bool*, default: *True*):  ``True``: an approximate empirical relationship is used to determine the correlation coefficient of the underlying pair of standard Normal random variables. ``False``: this value is evaluated numerically by means of the algorithm described in TODO. This parameter is only relevant if `rhogauss` is set to *False*.
            - ``rhogauss`` (*bool*, default: *False*): If *True*, the specified correlation coefficient is associated with the underlying standard Normal random variables (i.e., the correlation coefficient is applied to the random variables transformed to standard Normal space). Activating this option can reduce the computational costs of assembling the correlation matrix.
     
         - ``is_Nataf_only_once`` (*bool*, default: *True*): If ``True``, the parameters of the marginal distributions of the Nataf transformation are evaluated only once. Only experienced users should consider setting the value of this parameter to ``False``.
     
    :type config_set: dict
    :param rv_lst: A list with configurations for the random variables to create in the set. The list entries must be of type :type:`flx_rv_config`.
    
     In case of the **Rosenblatt transformation** (i.e., for ``config_set['is_Nataf']=False``), additionally, the following keys can be specified for the individual entries:
    
         - ``corr`` (*dict*, *optional*): Defines a correlation coefficient between this random variable and another random variable in the set. The following keys are allowed:

            - ``rv_name`` (:type:`rvID`): identifier (name) of random variable in current set
            - ``value`` (*float*): value of the correlation coefficient of the current random variable with the random variable specified in `rv_name`.
            - ``fix`` (*bool*, default: *False*): ``True``: The correlation coefficient is evaluated a single time and the parameters of the involved random variables are constant. In this case the correlation of the underlying standard normal random variables is a constant and, thus, needs to be evaluated only once. ``False``:  The correlation of the underlying standard normal random variables is not treated as a constant and needs to be evaluated anew every time a new realization is generated. This mode can be computationally demanding – and should be used only if really needed.
         
    :type rv_lst: list
    :rtype: :class:`flx.rvset`
    
```

```{eval-rst}
.. py:type:: rvSetID
   :canonical: Word
   
   Syntax:
       ``Word``

   Description:
       This data-type assigns a unique identifier (of type :type:`Word`) to a set (i.e., a collection) of random variables.
```

### Working with sets of random variables

```{eval-rst}
.. class:: flx.rvset

   A set (i.e., collection) of random variables.

   .. py:method:: get_name()

      Retrieves the name of the set of random variables.

      :returns: name of set
      :rtype: :type:`rvSetID`

   .. py:method:: get_values(mode)

      Returns an array of quantities of all entries contained in the set of random variables.

      :param mode: Specifies the mode of the operation. 
      
          The following keywords are allowed:
          
              - ``x``: Return an array with the current realizations of the random variables in the set.
              - ``y``: Return an array with the standard Normal transformed values of the current realizations of the random variables in the set.
              - ``mean``: Return an array with the mean values of the random variables in the set.
              - ``sd``: Return an array with the standard deviations of the random variables in the set.
              
      :type mode: :type:`Word`
      
      :rtype: numpy.ndarray
```

```{eval-rst}
.. function:: flx.get_rv_from_set

    Syntax:
        ``flx.get_rv_from_set( rv_name )``

    Description:
        Retrieve random variable `rv_name` from a set of random variables.
        
    :param rv_name: A unique global identifier of the targeted random variable.
    :type rv_name: :type:`rvFullID`
    :rtype: :class:`flx.rv`
    
```

```{eval-rst}
.. py:type:: rvFullID
   :canonical: str
   
   Syntax:
       ``rvSetID::rvID``

   Description:
       This data-type assigns a unique identifier (of type :type:`Word`) to a random variable (of type :class:`flx.rv`).
```

```{eval-rst}
.. function:: FlxFunction.rbrv

    Syntax:
        ``rbrv( rv_name )``

    Description:
        This :type:`FlxFunction` returns the current realization of random variable `rv_name`.
        
    :param rv_name: A unique global identifier of the targeted random variable.
    :type rv_name: :type:`rvFullID`
    
```

### Generating random samples

```{eval-rst}
.. class:: flx.sampler

   Used to generate random realizatios from a set (or multiple sets) of random variables.

   .. method:: __init__(config)

      Initialize the `flx.rv` instance with the given configuration.
      
      :param config: The configuration to use for the random variable.
      :type config: :type:`flx_rv_config`

   .. py:method:: sample()

      Generate a random realization for a collection of sets of random variables. The generated realizations can be accessed using :meth:`flx.rv.get_value` or :meth:`flx.rvset.get_values`.

      :rtype: None

```

### Examples

#### Set without dependencies (i.e., parents)

In [2]:
## ================================================================
## Set without dependencies (i.e., parents)
## ================================================================

## ------------------------------------------
## Definition
## ------------------------------------------
config_rv_a1 = { 'name':'rv1', 'type':'stdn' }
config_rv_a2 = { 'name':'rv2', 'type':'logn', 'mu':1., 'sd':2. }
rv_set_a = flx.rv_set( {'name':'rv_set_a'}, [ config_rv_a1, config_rv_a2 ] )

## ------------------------------------------
## Retrieve random variables from the set
## ------------------------------------------
rv_a1 = flx.get_rv_from_set("rv_set_a::rv1")
rv_a2 = flx.get_rv_from_set("rv_set_a::rv2")
print( rv_a1.info() )

## ------------------------------------------
## Generate random samples
## ------------------------------------------
sampler_a = flx.sampler(['rv_set_a'])
for i in range(10):
    sampler_a.sample()
    print(f"sample {i+1:2.0f}: {rv_a1.get_value():8.2f}, {rv_a2.get_value():8.2f}" )

{'type': 'stdn', 'name': 'rv_set_a::rv1', 'mean': 0.0, 'sd': 1.0, 'entropy': 1.4189385332046727}
sample  1:     0.02,     0.11
sample  2:    -1.33,     0.07
sample  3:     1.28,     0.76
sample  4:    -0.89,     0.90
sample  5:    -0.71,     1.41
sample  6:    -0.51,     0.13
sample  7:    -0.46,     0.29
sample  8:    -0.42,     0.39
sample  9:    -0.90,     4.60
sample 10:     1.42,     1.24


#### Set that depends on another set

In [3]:
## ================================================================
## Set that depends on 'rv_set_a'
## ================================================================

## ------------------------------------------
## Definition
## ------------------------------------------
config_rv_b1 = { 'name':'rv1', 'type':'normal', 'mu':"rbrv(rv_set_a::rv2)", 'sd':0.1 }
config_rv_b2 = { 'name':'rv2', 'type':'normal', 'mu':"rbrv(rv_set_b::rv1)", 'sd':0.05 }
rv_set_b = flx.rv_set( {'name':'rv_set_b', 'parents':['rv_set_a']}, [ config_rv_b1, config_rv_b2 ] )

## ------------------------------------------
## Retrieve random variables from the set
## ------------------------------------------
rv_b1 = flx.get_rv_from_set("rv_set_b::rv1")
rv_b2 = flx.get_rv_from_set("rv_set_b::rv2")

## ------------------------------------------
## Generate random samples
## ------------------------------------------
sampler_b = flx.sampler(['rv_set_b'])   ## 'rv_set_a' is added implicitly!
for i in range(10):
    sampler_b.sample()
    print(f"sample {i+1:2.0f}: {rv_a1.get_value():8.2f}, {rv_a2.get_value():8.2f}, {rv_b1.mean():8.2f}, {rv_b1.get_value():8.2f}, {rv_b2.mean():8.2f}, {rv_b2.get_value():8.2f}" )


sample  1:     0.68,     0.19,     0.19,     0.30,     0.30,     0.36
sample  2:    -1.67,     0.10,     0.10,     0.34,     0.34,     0.43
sample  3:    -0.56,     0.14,     0.14,     0.25,     0.25,     0.30
sample  4:    -1.93,     0.32,     0.32,     0.35,     0.35,     0.23
sample  5:     0.62,     2.40,     2.40,     2.44,     2.44,     2.43
sample  6:    -0.30,     1.19,     1.19,     1.18,     1.18,     1.21
sample  7:    -0.38,     0.35,     0.35,     0.27,     0.27,     0.27
sample  8:     1.42,     0.15,     0.15,     0.40,     0.40,     0.34
sample  9:     0.75,     1.48,     1.48,     1.60,     1.60,     1.55
sample 10:     1.01,     0.65,     0.65,     0.74,     0.74,     0.71


#### Set with correlated random variables

In [4]:
## ================================================================
## Set with correlated random variables
## ================================================================

## ------------------------------------------
## Definition
## ------------------------------------------
config_rv_c1 = { 'name':'rv1', 'type':'normal', 'mu':2., 'sd':1. }
config_rv_c2 = { 'name':'rv2', 'type':'stdn', 'corr':{ 'rv_name': 'rv1', 'value':0.95, 'fix':True } }
config_rv_c3 = { 'name':'rv3', 'type':'logn', 'mu':5., 'sd':5., 'corr':{ 'rv_name': 'rv1', 'value':0.7, 'fix':True } }
rv_set_c = flx.rv_set( {'name':'rv_set_c'}, [ config_rv_c1, config_rv_c2, config_rv_c3 ] )

## ------------------------------------------
## Retrieve random variables from the set
## ------------------------------------------
rv_c1 = flx.get_rv_from_set("rv_set_c::rv1")
rv_c2 = flx.get_rv_from_set("rv_set_c::rv2")
rv_c3 = flx.get_rv_from_set("rv_set_c::rv3")

## ------------------------------------------
## Output mean and std.dev. vector
## ------------------------------------------
print( "mean:", rv_set_c.get_values('mean') )
print( "sd:", rv_set_c.get_values('sd') )

## ------------------------------------------
## Generate random samples
## ------------------------------------------
N = 10000   ## number of samples to generate
sampler_c = flx.sampler(['rv_set_c'])  
smpl_mtx = np.empty((N, 3))
for i in range(N):
    sampler_c.sample()
    smpl_mtx[i] = rv_set_c.get_values('x')
    
## ------------------------------------------
## evaluate correlation of sample matrix
## ------------------------------------------
corr_mtx = np.corrcoef( smpl_mtx, rowvar=False )
print(corr_mtx)

mean: [2. 0. 5.]
sd: [1. 1. 5.]
[[1.         0.95007326 0.70769658]
 [0.95007326 1.         0.67198567]
 [0.70769658 0.67198567 1.        ]]


#### Set based on the Nataf transformation

In [5]:
## ================================================================
## Set based on the Nataf transformation
## ================================================================

## ------------------------------------------
## Definition
## ------------------------------------------
config_rv_d1 = { 'name':'rv1', 'type':'normal', 'mu':2., 'sd':1. }
config_rv_d2 = { 'name':'rv2', 'type':'stdn' }
config_rv_d3 = { 'name':'rv3', 'type':'logn', 'mu':5., 'sd':5. }
rv_set_d = flx.rv_set( {'name':'rv_set_d', 
                        'is_Nataf':True, 
                        'allow_x2y':True,
                        'corr': [ {'rv_1':'rv1', 'rv_2':'rv2', 'value':0.95 }, 
                                  {'rv_1':'rv1', 'rv_2':'rv3', 'value':0.7 } , 
                                  {'rv_1':'rv2', 'rv_2':'rv3', 'value':0.8 } 
                                ]
                       }, [ config_rv_d1, config_rv_d2, config_rv_d3 ] )

## ------------------------------------------
## Generate random samples
## ------------------------------------------
N = 10000   ## number of samples to generate
sampler_d = flx.sampler(['rv_set_d'])  
smpl_mtx = np.empty((N, 3))
for i in range(N):
    sampler_d.sample()
    smpl_mtx[i] = rv_set_d.get_values('x')
    
## ------------------------------------------
## evaluate correlation of sample matrix
## ------------------------------------------
corr_mtx = np.corrcoef( smpl_mtx, rowvar=False )
print(corr_mtx)

[[1.         0.95042585 0.69454652]
 [0.95042585 1.         0.79314782]
 [0.69454652 0.79314782 1.        ]]
