vignettes/failure_rate.Rmd

---
title: "Failure rate distributions"
output:
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{Failure rate distributions}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Introduction

One of the most important quantities in reliability theory and
survival analysis is the failure rate, often denoted by $\lambda$.
For a system with a random lifetime $T$, its failure rate over some
interval $(t, t + \Delta t)$ is the ratio of the probability of
failure in that interval $(t, t + \Delta t)$ given that the
system survived up to the time $t$ divided by the length
of the interval $\Delta t$.
That is,
$$
  \lambda = \frac{\Pr(T \leq t + \Delta t | T > t)}{\Delta t}.
$$

For instance, if we have a system with a failure rate of $\lambda$
over the interval $(t, t + \Delta t)$, then the probability of failure
over this interval, given that it has survived up to time $t$, is
$\lambda \Delta t$.
Of course, for most systems, the failure rate changes over time. For
instance, a system may be more likely to fail when it is new, or when
it is old.

At the limit, as $\Delta t \to 0$, we have what is known as the
instaneous failure rate, or the hazard function,

$$
h(t) = \lim_{\Delta t \to 0}
  \frac{\Pr\{T \leq t + \Delta t | T > t\}}{\Delta t}.
$$

We can rewrite the above expression as
$$
\begin{align}
h(t)
  &= \lim_{\Delta t \to 0}
    \frac{\Pr\{t < T \leq t + \Delta t\}}{ \Pr\{T > t\} \Delta t}\\
  &= \lim_{\Delta t \to 0} \left(\frac{F(t + \Delta t) - F(t)}{\Delta t}\right)
      \frac{1}{ 1 - F(t) }\\
  &= \frac{f(t)}{S(t)},
\end{align}
$$
where $f$ is the pdf, $F$ is the cdf, and $S$ is the survival function of $T$.

The concept of the failure rate or hazard function is the basis of
many important lifetime models in survival analysis and reliability theory.
In what follows, we consider the dynamic failure rate (DFR) family of
distributions, which can be a function of both time and any other
predictors or covariates, i.e., $h(t, x)$ may change with respect to both
$t$ and the covariate vector $x$.

## Dynamic failure rate (DFR) distribution

Now, we consider the case where the failure rate is not constant. In fact,
we generalize it to be a function of both time and any other predictors, say
$t$ and $x$ where $x$ is a vector of covariates $(x_1,\ldots,x_p)$. We denote this function
hazard function by $h(t, x)$.

First, we define the cumulative hazard function. It is defined as
$$
H(t, x) = \int_0^t h(u, x) \, du,
$$
which is the expected number of failures up to time $t$. This is useful
in its own right, but it is also useful in defining the survival function.
The survival function, $\Pr\{T > t\}$, is given by
$$
S(t|x) = \exp\left(-\int_0^t h(s, x) \, ds\right).
$$

There is a well-known relationship between the survival function, the
harzard function, and the pdf,
$$
  h(t, x) =   \frac{f(t|x)}{S(t|x)}.
$$
We may rewrite the above expression to define the pdf in terms of the
hazard and survival functions,
$$
f(t|x) = h(t, x) S(t|x).
$$

#### Common failure rate distributions

In what follows, we show how two well-known distributions, the exponential
and the Weibull, are defined in this framework.

1. Exponential distribution:
   $$
     h(t, x) = h(t) = \lambda.
   $$

2. Weibull distribution:
   $$
     h(t, x) = h(t) =
     \left(\frac{k}{\lambda}\right)\left(\frac{t}{\lambda}\right)^{k-1}.
   $$

Of course, we can parameterize any family of distributions in terms of
the hazard function. Sometimes, the parameterization yields an analytical
form for the cumulative hazard function, which is useful in computing
the other related distribution functions, like the survival and pdf.

Even simple hazard functions, when combined with other predictors, can
yield complex distribution functions, but first let's consider a simpler
case where the hazard function is only a function of some other predictor
$x$,
$$
h(t, x) = h(x) = \exp(\beta_0 + \beta_1 x),
$$
which is just a family of hazard function for an exponential distribution with
rate $\lambda = \exp(\beta_0 + \beta_1 x)$. This is pretty straightforward
to estimate, and in fact may be rewritten as:
$$
h(x) = h_0 \exp(\beta_1 x),
$$
where $h_0$ is the same for each observation, and $\beta_1$ is the coefficient
for the predictor $x$. We can generalize $h_0$ to be a function of $t$, in which
case we have the proportional hazards model, which is straightforward to
estimate,
$$
h(t,x) = h_0(t) \exp(\beta_1 x).
$$

Things become more complicated if we have a hazard function  in which $t$
and $x$ interact, e.g., 
$$
h(t, x) = \exp(\beta0 + \beta_1 t + \beta_1 x + \beta_{1 2} x t).
$$
In this case, we no longer have a proportional hazards model, and the estimation of
the parameters is more complicated.

### Likelihood function and maximum likelihood estimation

The likelihoon function is a sufficient statistic for the parameters of
the distribution. The likelihood function for an observation of an exact
failure time is given by
$$
\ell_i(\theta|t_i,x_i) = f(t_i|x_i,\theta),
$$
where $\theta$ is the vector of parameters of the distribution, $t_i$ 
is the exact failure time, and $x_i$ is the vector of predictors 
for the $i$th observation in the sample. We denote $l_i$ as a shorthand
for $\ell_i(\theta|t_i,x_i)$, and we call this a likelihood contribution.

If we have a sample in which $n$ are exact failure times, the total
likelihood function over these observations is given by
$$
\ell(\theta|t,X) = \prod_{i=1}^n \ell_i(\theta|t_i,X_{i:}),
$$
where $t$ is the vector of failure times, $X$ is an $n \times p$ matrix
of predictors, and $X_{i:}$ is the $i$-th row of $X$, a $1 \times p$
vector of predictors for the $i$th row of $X$ corresponding to the $i$th
observation in the exact failure time sample.

It is more convenient to work with the log-likelihood function normally,
so we define the log-likelihood contribution as
$$
L_i = \log \ell_i.
$$
A particularly nice feature of the log-likelihood function is that it
is additive, so the total log-likelihood function is given by
$$
L(\theta|t,X) = \sum_{i=1}^n L_i,
$$
and thus by the CLT, the log-likelihood function is asymptotically
normally distributed. This comes in handy for the MLE, which is
asymptotically normally distributed with a mean equal to the true
parameter value $\theta$ and a variance equal to the inverse of the Fisher
information matrix, assuming the following regularity conditions hold:

(1) The support of the distribution does not depend on the parameter
    $\theta$.
(2) The parameter space is open.
(3) The gradient of the log-likelihood function with respect to
    $\theta$ exists and is continuous.
(4) The expectation of the gradient of the log-likelihood function
    with respect to $\theta$ is zero.

For our DFR (dynamic failure rate) model, the $i$-th log-likelihood contribution
for the exact failure time is given by
\begin{align}
\ell_i &= h(t_i, x_i) S(t_i|x_i)\\
       &= h(t_i, x_i) \exp\left(-H(t_i, x_i)\right)
\end{align}
where $h$ is the hazard function, $S$ is the survival function, and $H$
is the cumulative hazard function.
The log-likelihood function is given by
$$
\begin{align}
\ell_i &= \log \ell_i\\
       &= \log h(t_i, x_i) - H(t_i, x_i),
\end{align}
$$
which has a nice form if $H$ has an analytical form, which is the case
for many of the well-known distributions, but otherwise we can use the
definition of $H$ to compute it numerically.


## Simulation
We construct a data frame showing mean and variance computed in different ways.

### Installation

We load the R package `dfr.dist` which contains all of the
relevant material in this section with:

```{r setup}
if (!require(dfr.dist)) {
  devtools::install_github("queelius/dfr_dist")
}
library(dfr.dist)
library(algebraic.dist)
```


```{r}
expdist <- dfr_dist(rate = function(t, par, ...) par, par = 1)
is_dfr_dist(expdist)
params(expdist, 100)
params(expdist)
h <- hazard(expdist)
h
```


```{r eval=F}
n <- 1000

sup(expdist)

mu.hat <- expectation(
  expdist,
  function(x) x, n = n)$value

(var.hat <- expectation(
  dfr.dist,
  function(x) (x - mu.hat)^2, n = n)$value)

df <- data.frame(
  analytical = c(mean(dfr.dist),
                 vcov(dfr.dist)),
  sample = c(mean(data), var(data))
)
row.names(df) <- c("mean", "variance")
print(df)
```