-
Notifications
You must be signed in to change notification settings - Fork 0
/
failure_rate.Rmd
262 lines (224 loc) · 8.2 KB
/
failure_rate.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
---
title: "Failure rate distributions"
output:
rmarkdown::html_vignette:
toc: true
vignette: >
%\VignetteIndexEntry{Failure rate distributions}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## Introduction
One of the most important quantities in reliability theory and
survival analysis is the failure rate, often denoted by $\lambda$.
For a system with a random lifetime $T$, its failure rate over some
interval $(t, t + \Delta t)$ is the ratio of the probability of
failure in that interval $(t, t + \Delta t)$ given that the
system survived up to the time $t$ divided by the length
of the interval $\Delta t$.
That is,
$$
\lambda = \frac{\Pr(T \leq t + \Delta t | T > t)}{\Delta t}.
$$
For instance, if we have a system with a failure rate of $\lambda$
over the interval $(t, t + \Delta t)$, then the probability of failure
over this interval, given that it has survived up to time $t$, is
$\lambda \Delta t$.
Of course, for most systems, the failure rate changes over time. For
instance, a system may be more likely to fail when it is new, or when
it is old.
At the limit, as $\Delta t \to 0$, we have what is known as the
instaneous failure rate, or the hazard function,
$$
h(t) = \lim_{\Delta t \to 0}
\frac{\Pr\{T \leq t + \Delta t | T > t\}}{\Delta t}.
$$
We can rewrite the above expression as
$$
\begin{align}
h(t)
&= \lim_{\Delta t \to 0}
\frac{\Pr\{t < T \leq t + \Delta t\}}{ \Pr\{T > t\} \Delta t}\\
&= \lim_{\Delta t \to 0} \left(\frac{F(t + \Delta t) - F(t)}{\Delta t}\right)
\frac{1}{ 1 - F(t) }\\
&= \frac{f(t)}{S(t)},
\end{align}
$$
where $f$ is the pdf, $F$ is the cdf, and $S$ is the survival function of $T$.
The concept of the failure rate or hazard function is the basis of
many important lifetime models in survival analysis and reliability theory.
In what follows, we consider the dynamic failure rate (DFR) family of
distributions, which can be a function of both time and any other
predictors or covariates, i.e., $h(t, x)$ may change with respect to both
$t$ and the covariate vector $x$.
## Dynamic failure rate (DFR) distribution
Now, we consider the case where the failure rate is not constant. In fact,
we generalize it to be a function of both time and any other predictors, say
$t$ and $x$ where $x$ is a vector of covariates $(x_1,\ldots,x_p)$. We denote this function
hazard function by $h(t, x)$.
First, we define the cumulative hazard function. It is defined as
$$
H(t, x) = \int_0^t h(u, x) \, du,
$$
which is the expected number of failures up to time $t$. This is useful
in its own right, but it is also useful in defining the survival function.
The survival function, $\Pr\{T > t\}$, is given by
$$
S(t|x) = \exp\left(-\int_0^t h(s, x) \, ds\right).
$$
There is a well-known relationship between the survival function, the
harzard function, and the pdf,
$$
h(t, x) = \frac{f(t|x)}{S(t|x)}.
$$
We may rewrite the above expression to define the pdf in terms of the
hazard and survival functions,
$$
f(t|x) = h(t, x) S(t|x).
$$
#### Common failure rate distributions
In what follows, we show how two well-known distributions, the exponential
and the Weibull, are defined in this framework.
1. Exponential distribution:
$$
h(t, x) = h(t) = \lambda.
$$
2. Weibull distribution:
$$
h(t, x) = h(t) =
\left(\frac{k}{\lambda}\right)\left(\frac{t}{\lambda}\right)^{k-1}.
$$
Of course, we can parameterize any family of distributions in terms of
the hazard function. Sometimes, the parameterization yields an analytical
form for the cumulative hazard function, which is useful in computing
the other related distribution functions, like the survival and pdf.
Even simple hazard functions, when combined with other predictors, can
yield complex distribution functions, but first let's consider a simpler
case where the hazard function is only a function of some other predictor
$x$,
$$
h(t, x) = h(x) = \exp(\beta_0 + \beta_1 x),
$$
which is just a family of hazard function for an exponential distribution with
rate $\lambda = \exp(\beta_0 + \beta_1 x)$. This is pretty straightforward
to estimate, and in fact may be rewritten as:
$$
h(x) = h_0 \exp(\beta_1 x),
$$
where $h_0$ is the same for each observation, and $\beta_1$ is the coefficient
for the predictor $x$. We can generalize $h_0$ to be a function of $t$, in which
case we have the proportional hazards model, which is straightforward to
estimate,
$$
h(t,x) = h_0(t) \exp(\beta_1 x).
$$
Things become more complicated if we have a hazard function in which $t$
and $x$ interact, e.g.,
$$
h(t, x) = \exp(\beta0 + \beta_1 t + \beta_1 x + \beta_{1 2} x t).
$$
In this case, we no longer have a proportional hazards model, and the estimation of
the parameters is more complicated.
### Likelihood function and maximum likelihood estimation
The likelihoon function is a sufficient statistic for the parameters of
the distribution. The likelihood function for an observation of an exact
failure time is given by
$$
\ell_i(\theta|t_i,x_i) = f(t_i|x_i,\theta),
$$
where $\theta$ is the vector of parameters of the distribution, $t_i$
is the exact failure time, and $x_i$ is the vector of predictors
for the $i$th observation in the sample. We denote $l_i$ as a shorthand
for $\ell_i(\theta|t_i,x_i)$, and we call this a likelihood contribution.
If we have a sample in which $n$ are exact failure times, the total
likelihood function over these observations is given by
$$
\ell(\theta|t,X) = \prod_{i=1}^n \ell_i(\theta|t_i,X_{i:}),
$$
where $t$ is the vector of failure times, $X$ is an $n \times p$ matrix
of predictors, and $X_{i:}$ is the $i$-th row of $X$, a $1 \times p$
vector of predictors for the $i$th row of $X$ corresponding to the $i$th
observation in the exact failure time sample.
It is more convenient to work with the log-likelihood function normally,
so we define the log-likelihood contribution as
$$
L_i = \log \ell_i.
$$
A particularly nice feature of the log-likelihood function is that it
is additive, so the total log-likelihood function is given by
$$
L(\theta|t,X) = \sum_{i=1}^n L_i,
$$
and thus by the CLT, the log-likelihood function is asymptotically
normally distributed. This comes in handy for the MLE, which is
asymptotically normally distributed with a mean equal to the true
parameter value $\theta$ and a variance equal to the inverse of the Fisher
information matrix, assuming the following regularity conditions hold:
(1) The support of the distribution does not depend on the parameter
$\theta$.
(2) The parameter space is open.
(3) The gradient of the log-likelihood function with respect to
$\theta$ exists and is continuous.
(4) The expectation of the gradient of the log-likelihood function
with respect to $\theta$ is zero.
For our DFR (dynamic failure rate) model, the $i$-th log-likelihood contribution
for the exact failure time is given by
\begin{align}
\ell_i &= h(t_i, x_i) S(t_i|x_i)\\
&= h(t_i, x_i) \exp\left(-H(t_i, x_i)\right)
\end{align}
where $h$ is the hazard function, $S$ is the survival function, and $H$
is the cumulative hazard function.
The log-likelihood function is given by
$$
\begin{align}
\ell_i &= \log \ell_i\\
&= \log h(t_i, x_i) - H(t_i, x_i),
\end{align}
$$
which has a nice form if $H$ has an analytical form, which is the case
for many of the well-known distributions, but otherwise we can use the
definition of $H$ to compute it numerically.
## Simulation
We construct a data frame showing mean and variance computed in different ways.
### Installation
We load the R package `dfr.dist` which contains all of the
relevant material in this section with:
```{r setup}
if (!require(dfr.dist)) {
devtools::install_github("queelius/dfr_dist")
}
library(dfr.dist)
library(algebraic.dist)
```
```{r}
expdist <- dfr_dist(rate = function(t, par, ...) par, par = 1)
is_dfr_dist(expdist)
params(expdist, 100)
params(expdist)
h <- hazard(expdist)
h
```
```{r eval=F}
n <- 1000
sup(expdist)
mu.hat <- expectation(
expdist,
function(x) x, n = n)$value
(var.hat <- expectation(
dfr.dist,
function(x) (x - mu.hat)^2, n = n)$value)
df <- data.frame(
analytical = c(mean(dfr.dist),
vcov(dfr.dist)),
sample = c(mean(data), var(data))
)
row.names(df) <- c("mean", "variance")
print(df)
```