# `tools` - a collection of useful features


```{todo}
Write this section.
```


In [1]:
import fesslix as flx
flx.load_engine()
import fesslix.tools
import fesslix.model_templates as flx_model_templates
import fesslix.plot as flx_plot

import matplotlib.pyplot as plt
import numpy as np

Random Number Generator: MT19937 - initialized with rand()=66259881;
Random Number Generator: MT19937 - initialized with 1000 initial calls.


## Working with files

```{eval-rst}
.. function:: fesslix.tools.replace_in_template

    Syntax:
        ``fesslix.tools.replace_in_template(fn_in, fn_out, dmap, var_indi_start="@{", var_indi_end="}")``

    Description:
        Replaces expressions of the type ``@{VARNAME}`` in file ``fn_in`` with the values in ``dmap`` and writes the processed file to ``fn_out``.

        If the expression in ``VARNAME`` starts with a ``!``, the characters after the ``!`` are interpreted as expression of type :type:`rvFullID` and the value of the associated random variable is inserted. 
        
    :param fn_in: File name of the template input file.
    :type fn_in: str
    :param fn_out: File name of the output file to generate.
    :type fn_out: str
    :param dmap: A dictionary of all variables that can potentially appear in the file *fn_in*.
    :type dmap: dict
    :param var_indi_start: Unique expression (w.r.t. structure of file) for the beginning of the expression to replace.
    :type var_indi_start: str
    :param var_indi_end: Unique expression (w.r.t. structure of file) for the ending of the expression to replace.
    :type var_indi_end: str
    :rtype: None
```
**Example:**

In [2]:
## ================================================================
## Generate a dictionary
## ================================================================
dmap = {}
dmap['var1'] = 42.42
dmap['var2'] = "Hello world!"

## ================================================================
## Generate a set of random variables
## ================================================================
config_rv_a1 = { 'name':'rv1', 'type':'stdn' }
config_rv_a2 = { 'name':'rv2', 'type':'logn', 'mu':1., 'sd':2. }
rv_set_a = flx.rv_set( {'name':'rv_set_a'}, [ config_rv_a1, config_rv_a2 ] )
sampler_a = flx.sampler(['rv_set_a'])
sampler_a.sample()
    
## ================================================================
## Open template file and start replacing
## ================================================================
fesslix.tools.replace_in_template(fn_in="../data/sample_text.txt", fn_out="modified_text.txt", dmap=dmap)

The content of the template file is:

In [7]:
!cat ../data/sample_text.txt

@{var2} » This is an example text file @{var1}. 

The content is to be modified by »fesslix.tools.replace_in_template«.

Value of random variable: @{!rv_set_a::rv1}






The content of the generated output file is:

In [8]:
!cat modified_text.txt

Hello world! » This is an example text file 42.42. 

The content is to be modified by »fesslix.tools.replace_in_template«.

Value of random variable: 0.155968






## Discretization

```{eval-rst}
.. function:: fesslix.tools.discretize_x

    Syntax:
        ``fesslix.tools.discretize_x(x_low, x_up, x_disc_N=int(1e3), x_disc_shift=False, x_disc_on_log=False)``

    Description:
        Discretizes a domain between `x_low` and `x_up` either linearly or in log scale.
        
    :param x_low: Lower bound of the discretization interval.
    :type x_low: float
    :param x_up: Upper bound of the discretization interval.
    :type x_up: float
    :param x_disc_N: Number of discretization points to generate.
    :type x_disc_N: int
    :param x_disc_shift: If `True`, the start and end point is shifted by half the mesh size so that `x_low` and `x_up` are not part of the returned set of discretization points.
    :type x_disc_shift: bool
    :param x_disc_on_log: If `True`, the discretization is performed in log-space (and both `x_low` and `x_up` must be larger than `zero`).
    :type x_disc_on_log: bool
    :returns: The `x_disc_N` points of discretization.
    :rtype: numpy.ndarray
```
**Example:**

In [3]:
print( fesslix.tools.discretize_x(x_low=0., x_up=1., x_disc_N=5, x_disc_shift=False, x_disc_on_log=False) )
print( fesslix.tools.discretize_x(x_low=0., x_up=1., x_disc_N=5, x_disc_shift=True, x_disc_on_log=False) )
print( fesslix.tools.discretize_x(x_low=0.01, x_up=1., x_disc_N=5, x_disc_shift=False, x_disc_on_log=True) )

[0.   0.25 0.5  0.75 1.  ]
[0.1 0.3 0.5 0.7 0.9]
[0.01       0.03162278 0.1        0.31622777 1.        ]


```{eval-rst}
.. function:: fesslix.tools.discretize_x_get_diff

    Syntax:
        ``fesslix.tools.discretize_x_get_diff(x_low, x_up, x_disc_N=int(1e3), x_disc_on_log=False)``

    Description:
        Discretizes a domain between `x_low` and `x_up`. In addition to the discretization points, information about the grid size is returned.

        Internally, this function calls :func:`fesslix.tools.discretize_x` with ``x_disc_shift=True``. The returned discretization points ``x`` are then interpreted as the mid-points of a discretization grid and the size of each grid element is returned as a second vector ``dx``. 
        
    :param x_low: Lower bound of the discretization interval.
    :type x_low: float
    :param x_up: Upper bound of the discretization interval.
    :type x_up: float
    :param x_disc_N: Number of discretization points to generate.
    :type x_disc_N: int
    :param x_disc_on_log: If `True`, the discretization is performed in log-space (and both `x_low` and `x_up` must be larger than `zero`).
    :type x_disc_on_log: bool
    :returns: ``(x, dx)`` » ``x``: mid points of grid elements; ``dx`` size of grid elements
    :rtype: (numpy.ndarray, numpy.ndarray)
```
**Example:**

In [4]:
x, dx = fesslix.tools.discretize_x_get_diff(x_low=0., x_up=1., x_disc_N=5, x_disc_on_log=False) 
print(x)
print(dx)

[0.1 0.3 0.5 0.7 0.9]
[0.2 0.2 0.2 0.2 0.2]


```{eval-rst}
.. function:: fesslix.tools.discretize_stdNormal_space

    Syntax:
        ``fesslix.tools.discretize_stdNormal_space(q_low=1e-3, q_up=None, x_disc_N=int(1e3))``

    Description:
        Returns equally spaced discretization points in the standard Normal domain. The parameters ``q_low`` and ``q_up`` are used to define the bounds in standard Normal space.
        
    :param q_low: Quantile of a lower bound..
    :type q_low: float
    :param q_up: Quantile of an upper bound. If ``None`` is specified, ``q_up=1.-q_low`` is assigned.
    :type q_up: float
    :param x_disc_N: Number of discretization points to generate.
    :type x_disc_N: int
    :returns: The `x_disc_N` points of discretization.
    :rtype: numpy.ndarray
```
**Example:**

In [5]:
print( fesslix.tools.discretize_stdNormal_space( x_disc_N=5 ) )

[-3.09023231 -1.54511615  0.          1.54511615  3.09023231]


```{eval-rst}
.. function:: fesslix.tools.detect_bounds_x

    Syntax:
        ``flx_tools.detect_bounds_x(rv, config_dict, q_low=1e-3, q_up=None, mode='ignore')``

    Description:
        Evaluates and sets bounds `x_low` and `x_up` of random variable `rv` based on quantile values `q_low` and `q_up`. If existing values are to be overwritten is controled by `mode`.
        
    :param rv: random variable
    :type rv: :class:`flx.rv`
    :param config_dict: configuration dictionary » This function ensures that the parameters ``x_low`` and ``x_up`` are assigned.
    :type config_dict: :type:`flx_plot_config`
    :param q_low: Quantile value for lower bound.
    :type q_low: :type:`flx_pr`
    :param q_up: Quantile value for upper bound.
    :type q_up: :type:`flx_pr`
    :param mode: Controls how **existing** values of `x_low` and `x_up` in `config_dict` are handled. 
        The following keywords are allowed:    
        
        - ``'ignore'``: ignore bounds of `rv`, if `x_low` and `x_up` are already set in `config_dict`
        - ``'overwrite'``: use bounds of `rv`, even if `x_low` and `x_up` are already set in `config_dict`
        - ``'minmax'``: use smallest value for bounds, if `x_low` and `x_up` are already set in `config_dict`
        
    :type mode: str    
    :returns: `None`
```
**Example:**

In [6]:
rv = flx.rv({'type':'stdn'})
era_dict = { }
fesslix.tools.detect_bounds_x(rv, era_dict,q_low=1e-6,q_up=0.99)
print(era_dict)

{'x_low': -4.753424308822899, 'x_up': 2.3263478740408408}


## Data fitting

```{eval-rst}
.. function:: fesslix.tools.discretize_x_from_data

    Syntax:
        ``fesslix.tools.discretize_x_from_data(data,config={}, data_is_sorted=False, lower_bound=None, upper_bound=None)``

    Description:
        Discretize the parameter space into bins based on a data array.
        
    :param data: vector of data/samples
    :type data: numpy.ndarray
    :param config: configuration dictionary
    :type config: dict
    :param data_is_sorted: Set this to ``True`` if the values in *data* are sorted from smallest to largest.
    :type data_is_sorted: bool
    :param lower_bound: Value of an absolute lower bound. Set *lower_bound* to ``None`` if a lower bound does not exist.
    :type lower_bound: float | None
    :param upper_bound: Value of an absolute upper bound. Set *upper_bound* to ``None`` if an upper bound does not exist.
    :type upper_bound: float | None

    Configuration directory:
        The following keys are allowed in the configuration dictionary *config*:

        - ``mode`` (:type:`Word`, default:``adaptive``): mode for the discretization of data into bins. The following modes for discretization are supported:

            - ``adaptive``: The bin size is selected adaptively based on a minimum number of data-points per bin and on a minimum bin size.

                For this mode, the following keys are additionally accepted in the configuration dictionary:

                - ``N_points_per_bin_min`` (*int*, default: 100): Minimum number of data-points per bin. Any bin must contain at least *N_points_per_bin_min* data-points. The specified integer value must be positive.
                - ``dx_min`` (*float*): Minimum size of a bin in parameter space. Any bin must have at least a width of *dx_min*. If *dx_min* is not specified, the default value is assigned such that at most 8 bins fit into the interval spanned by the 75% and 25% quantile.
                
            - ``equidist_p``: An equidistant grid in probability space is used to generate the bins.
            
                For this mode, the following keys are additionally accepted in the configuration dictionary:
                
                - ``N_bins`` (*int*, *optional*): Total number of bins.
                - ``N_points_per_bin`` (*int*, default: 100): Number of data-points per bin. This parameter is only considered if *N_bins* is not specified.
                
            - ``fixed_p``: The user provides the grid layout.
            
                For this mode, the following key must be specified in the configuration dictionary:
                
                - ``p_vec`` (*numpy.ndarray*): A numpy array with the probabilities of the discretization points of the grid (i.e., the edges of the bins). The first entry in *p_vec* must be *zero* and the last entry must equal *one*.
                
        - ``tail_upper`` (*dict*, default:*None*): Sets information about the location of the start of the upper tail. The following key-value pairs are accepted:

            - ``p`` (*float*): Probability value that the distribution is smaller or equal than the starting value of the tail.
            - ``x`` (*float*): Starting value of the tail. 
            - ``data`` (*numpy.ndarray*, optional): A vector of samples in the tail. If specified, these samples are used to fit the tail instead of the samples in the global data array.
        
        - ``tail_lower`` (*dict*, default:*None*): Sets information about the location of the start of the upper tail. Configuration is identical to ``tail_upper``.
    
    :returns: A Python dictionary that contains the configuration of *data* into bins. 

        The returned *dict* has the following structure:

            - ``N_total`` (*int*): The total number of data-points in the array specified by the input parameter *data*. 
            - ``N_bins`` (*int*): The total number of bins generated. 
            - ``q_vec`` (*numpy.ndarray*): Vector of quantiles of the edges of the bins (of size ``N_bins+1``). 
            - ``p_vec`` (*numpy.ndarray*): Vector of probabilities associated with the values in *q_vec* (of size ``N_bins+1``).
            - ``N_vec`` (*numpy.ndarray*): Number of data-points that fall into the individual bins (of size ``N_bins+1``).
            - ``tail_upper`` (*dict*): A configuration dictionary for modelling the upper tail. The structure of the *dict* corrsponds to the one returned by :func:`fesslix.tools.fit_tail_to_data`.
            - ``tail_lower`` (*dict*): A configuration dictionary for modelling the lower tail. The structure of the *dict* corrsponds to the one returned by :func:`fesslix.tools.fit_tail_to_data`.

            - ``type`` (:type:`Word`): Set to ``quantiles``, so that the returned configuration dictionary can directly be used to generate a :ref:`content:rv:quantiles`. 
            - ``interpol`` (:type:`Word`): Mode of interpolation. For documentation, please see key ``ìnterpol`` in the configuration of :ref:`content:rv:quantiles`.
            - ``use_tail_fit`` (*bool*): ... see key ``use_tail_fit`` in the configuration of :ref:`content:rv:quantiles`.
            - ``bin_rvbeta_params`` (*bool*): ... see key ``bin_rvbeta_params`` in the configuration of :ref:`content:rv:quantiles`.
            - ``bin_rvlinear_params`` (*bool*): ... see key ``bin_rvlinear_params`` in the configuration of :ref:`content:rv:quantiles`.
    
        The returned *dict* can directly (i.e., without modification) be used as configuration dictionary to generate a :ref:`content:rv:quantiles`.
        
    :rtype: dict

    Examples:
        Usage of this function is demonstrated in the examples of the :ref:`content:rv:quantiles`.
```

```{eval-rst}
.. function:: fesslix.tools.fit_tail_to_data

    Syntax:
        ``fesslix.tools.fit_tail_to_data(tail_data_transformed, bound=None)``

    Description:
        Fits a probabilistic model to the data-points associated with the tails of a distribution.

        This function is called internally by :func:`fesslix.tools.discretize_x_from_data` for fitting the lower and upper tail to data.

    :param tail_data_transformed: The data-points associated with the tail. The values need to be transformed such that they are all positive. A value of *zero* is associated with the cutting-quantile of the tail. The larger the value, the further the point is in the tail.
    :type tail_data_transformed: *numpy.ndarray*
    :param bound: A value specifying an absolute upper value (bound) for the tail. The value needs to be transformed as the data in *tail_data_transformed*.
    :type bound: *float*

    Supported probabilistic models for the tail:
        Currently, the following probabilistic distribution models for the tail are supported:

        - ``genpareto`` » :ref:`content:rv:genpareto`
        - ``logn`` » :ref:`content:rv:logn`
        - ``beta`` » :ref:`content:rv:beta` (only if *bound* is not *None*)
    
    :returns: A Python dictionary that contains the configuration for the probabilistic model of the tail. 

        The returned *dict* has the following structure:

            - ``models`` (*dict*): A Python dictionary that contains all fitted models. All supported models are listed above. The *key* (of type :type:`flx_rv_type`) corresonds to the respective model. The value is a Python dictionary that contains with the following structure:

                - ``type`` (:type:`flx_rv_type`): The type of the tail model.
                - ``pdf_0`` (*float*): The value of the PDF at *zero*.
                - ``nll`` (*float*): Value of the negative log-likelihood of the data (w.r.t. the fitted probabilistic model).
                - ``kstest_D`` (*float*): KS test statistic
                - ``kstest_p`` (*float*): p value from KS test

                Additionally, for each probabilistic model, all parameters are specified such that the returned *dict* can be used as configuration *dict* to define a random variable (:func:`flx.rv.__init__`).         
                
            - ``best_model`` (:type:`flx_rv_type`): A reference to the model in *models* with the smallest value of the negative log likelihood (*nll*).
            - ``use_model`` (:type:`flx_rv_type`): A reference to the model in *models* that should actually be used as probabilistic model for the tail. By default, this value is set equal to the value of *best_model*. This value needs to be modified by the user if a model different from the one with the largest likelihood should be used. If ``use_model`` is set to ``"None"`` (as string), no probabilistic distribution model is associated with the tail.
            
    :rtype: dict
```

```{eval-rst}
.. function:: fesslix.tools.fit_pdf_based_on_qvec

    Syntax:
        ``fesslix.tools.fit_pdf_based_on_qvec(data, config)``

    Description:
        Fits a PDF based on linear interpolation to a data vector.
        
    :param data: vector of data/samples
    :type data: numpy.ndarray
    :param config: Configuration dictionary, as returned by :func:`fesslix.tools.discretize_x_from_data`. 
    :type config: dict

    :returns: *None*

    Modification of the configuration dictionary:
        The configuration dictionary *config* is extended by this function.
        Specifically, the key ``pdf_vec`` is added and the key ``ìnterpol`` is changed to ``pdf_linear`` (compare :ref:`content:rv:quantiles`).

    Examples:
        Usage of this function is demonstrated in the examples of the :ref:`content:rv:quantiles`.

```