# Probability Modelling in fmdkit
Here we will demonstrate the fmdkit's abilities to find a set of scenarios that represent the effects of a given mode.

In [1]:
#First, import the fault propogation library as well as the model
#since the package is in a parallel location to examples...
import sys
sys.path.append('../')

import fmdkit.faultprop as fp
import fmdkit.resultproc as rp
from ex_pump import *
from IPython.display import HTML
mdl = Pump()

## Theory and Goals
There are two reasons to run a list of scenarios in fmdkit.
- to determine the expected utility or cost of faults, and
- to determine the relative priority of faults

That is, one would prefer when running scenarios for the expected cost of scenarios to be the same as the expected costs of individual modes:
$C = \sum_{s \in S} P_s*C_s = \sum_{m \in M} \int f_M*C_M$ 


Scenarios have several different aspects:
- which faults are initiated to cause the scenario
- what time the faults are initiated

In general, when one is working with fault models, the assumption is that joint faults are very rare, although work is also being done to incorporate probability models for joint scenarios in this model. However, it may not be clear *when in time* faults should be injected to such that the set of scenarios considered for a fault constitutes the risk from a fault mode.

$C = \sum P_s*C_s $

$C= \sum P_s* \int_{t=t_F}^{t_E} c_s(t) dt$

$C= \sum_m \int_{t=0}^{T_E} f_m(t)* (\int_{t=t_F}^{t_E} c_m(t) dt) dt$
where:
- $P_s$ is the probability of a scenario
- $t$ is time
- $t_F$ is the fault injection time
- $t_E$ is the final modelling time
- $c_s(t)$ is the cost rate of a scenario over time (loss per unit time)
- $m$ is a mode
- $f_m(t)$ is the probability density funciton of a fault mode over the modelled time
- $c_m(t)$ is the cost rate of the scenario resulting from a mode injected

### Solution strategy - take advantage of operational phases
The first thing we might consider about $f_m(t)$ and $c_m(t)$ is in the early design phase is that they both vary depending on the operational phase of the system. That is, we can break the time intervals into phases based on how the system is operating at that time. In the simplest case we might consider an off-state for the system and an on-state for the system, each with different (but constant in the interval) fault rates. To better align with failure data, we might consider a base fault rate for components $r_c$ which is then split between modes with vector $[r_{m_1}, r_{m_2}...r_{m_3}] =  [v_1, v_2...v_3] * r_c$. Then the rates over time are:
$[[r_{m_{1_{t1}}}, r_{m_{2_{t1}}}...r_{m_{w_{t1}}}], [r_{m_{1_{t2}}}, r_{m_{2_{t2}}}...r_{m_{w_{t2}}}] ... [r_{m_{1_{tn}}}, r_{m_{2_{tn}}}...r_{m_{w_{tn}}}]] =  [v_1, v_2...v_3] * r_c * [[s_{m_{1_{t1}}}, s_{m_{2_{t1}}}...s_{m_{w_{t1}}}], [s_{m_{1_{t2}}}, s_{m_{2_{t2}}}...s_{m_{w_{t2}}}] ... [s_{m_{1_{tn}}}, s_{m_{2_{tn}}}...s_{m_{w_{tn}}}]]$

where $r_{m_{j_{ti}}}$ is an individual rate for mode $j$ for a given phase of time $i$. By representing the rate as constant between operational phases. We can now write the above integral as:
$C= \sum_m \sum_{i} r_{m_i} \int_{t=t_i}^{t_{i+1}} (\int_{t=t_F}^{t_E} c_m(t) dt) dt$ where $r_{m_i}$ is the rate for a mode (given above) $t_{i+1}-t_i$ is the time of a given phase, and $t_F$ is a time in the phase $t_{i+1}-t_i$ when the fault is injected. However, estimating the formula still presents a problem. While we could run a simulation at each timestep to get the value of the integral $\int_{t=t_i}^{t_{i+1}} (\int_{t=t_F}^{t_E} c_m(t) dt) dt$, doing so will not be compationally efficient, which is contrary to the point of a model like this. Instead we'd like to be able to run one simulation that is "representative" of the entire phase. In the next few sections I will show a few different sampling strategies to approximate this integral and then run a comparison of each.

### Idea: Linear Approximation (sampling the center)
Suppose the cost rate of each mode $c_m(t)$ is linear. This is a poor assumption (don't we expect odd nonlinear behaviors?), however it could be that the assumption is a good enough approximation to get an overall utility number for most faults. In any case, if we make this assumption $c_m(t) = c_{m_i}$, then $\int_{t=t_i}^{t_{i+1}} (\int_{t=t_F}^{t_E} c_{m_i}) dt) dt = \int_{t=t_i}^{t_{i+1}} c_{m_i} (t_E - t_F) dt = (t_{i+1}-t_i) * c_{m_i} * (t_E - t_F)$ Since $t_F$ is the time the fault is injected and the fault rate is constant over this interval, $t_F = (t_{i+1}-t_i)/2$, the midpoint of the interval. Thus the full integral is: $C= \sum_m \sum_{i} r_{m_i}*(t_{i+1}-t_i)* c_{m_i} *(t_E - (t_{i+1}-t_i)/2)$, or $C= \sum_m \sum_{i} r_{m_i}*(t_{i+1}-t_i)* C_{m_i}$, where $C_{m_i}$ is the cost of scenario injected at $t_F = (t_{i+1}-t_i)/2$.


### Idea: Even multi-point Sampling
We could also approximate the integral with multiple points, where:

$C= \sum_m \sum_{i} r_{m_i}*(t_{i+1}-t_i)* \frac{1}{n} \sum_{t = t_{i}}^{t_{i+1}}  C_{m_{i_t}}$

Using an even sampling technique, $n$ evenly spaced simulations between $t_i$ and $t_{i+1}$ would be run, each giving their own $C_{m_{i_t}}$. The resulting cost for the scenario would then be the average cost of each of those scenarios.

### Idea: Naive monte carlo
Instead of running the simulations at even intervals (which could cause us to run into problems if $C_{m_{i_t}}$ has any sort of periodic regularity. The formula for cost would still be:

$C= \sum_m \sum_{i} r_{m_i}*(t_{i+1}-t_i)* \frac{1}{n} \sum_{t = t_{i}}^{t_{i+1}}  C_{m_{i_t}}$

Using an even sampling technique, $n$ simulations at random times between $t_i$ and $t_{i+1}$ would be run, each giving their own $C_{m_{i_t}}$. The resulting cost for the scenario would then be the average cost of each of those scenarios.

### Idea: Find approximate sample time from multiple points/full integral
One of the reasons we would like this simulation to remain quick is to be able to run it in the context of a design optimization problem. In this case, we might be willing to sample several points at the beginning to get an idea of the representative sample time, which we would then use throughout optimization as the single point approximation (which could be periodically updated/validated at the end). To find the average cost, use:

$C= \sum_m \sum_{i} r_{m_i}*(t_{i+1}-t_i)* \frac{1}{n} \sum_{t = t_{i}}^{t_{i+1}}  C_{m_{i_t}}$

then, to find the representative sample time, we could minimize the error $|C-C_{m_{i_t}}|$ at each sample point using a quadratic approximation. For the three points used (See Arora): 

$a_2 = \frac{1}{t_u - t_m}[\frac{f_u-f_l}{t_u-t_l} - \frac{f_m-f_l}{t_m-t_l}]$

$a_1 = \frac{f_m-f_l}{t_m-t_l} - a_2 (t_l - t_m)$

$a_0 = f_l - a_1*t_l - a_2*t_l^2$

New pt $t_{new} = -\frac{a_1}{2*a_2}$. Since we'd only be doing one iteration, calculate $f_{new}$, and if it is better, the representative point is $t_{new}$.

## Experiments

(Model description)

### Baseline - the full integral

- what is C calculated to be?

### Linear Approx

### Even sampling (w- # of pts)

### Monte Carlo

### Approximation

## Conclusiosn
(which worked best, which didn't)

( maybe a plot of error vs. computation time )

(risks RE approximation (reconfiguration could change representative sample time))