-
Notifications
You must be signed in to change notification settings - Fork 0
/
lab1.Rmd
239 lines (173 loc) · 7.02 KB
/
lab1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
---
title: "Lab 1"
author: "[Student Name]"
date: "05-19-20"
output:
html_document:
toc: true
number_sections: true
toc_float: true
df_print: paged
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, comment = NA)
```
# Function `limit_e()`
Create a function called `limit_e()` that takes one argument, `n`. Argument
`n` should be a vector of integers greater than zero. Function `limit_e()` will
compute and return the evaluated quantity <br>
$$\bigg(1 + \frac{1}{n}\bigg)^n.$$
<br>
From calculus you know that the mathematical constant $e$ is defined as
<br>
$$e = \lim_{n \to \infty}\bigg(1 + \frac{1}{n}\bigg)^n.$$
<br>
```{r limit-e-fcn}
```
After you write your function, test it with the following function calls.
What do you notice happens? Why is this happening? **Be sure to remove the**
**chunk option `eval=FALSE` to see the output when you knit your document.**
```{r e-test, eval=FALSE}
limit_e(100)
limit_e(1000000)
limit_e(c(1, 1000000, 100000000000))
limit_e(c(1, 1000000, 1000000000000000000))
```
*You do not need to build checks in this function*.
# Pareto distribution
## Introduction
R provides functions that return useful characteristics of many common
probability distributions. The naming convention for these functions is a
prefix, which identifies what the function does, followed by an abbreviation
of the probability distribution's name. These prefixes are:
+ `p` for "probability", the cumulative distribution function (CDF)
+ `q` for "quantile", the inverse CDF
+ `d` for "density", the density function (PDF)
+ `r` for "random", a random variable having the specified distribution.
For the normal distribution, these functions are `pnorm`, `qnorm`, `dnorm`,
and `rnorm`, where the norm portion reminds us this is for the normal
distribution. For the binomial distribution, these functions are `pbinom`,
`qbinom`, `dbinom`, and `rbinom`. Click [here](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Distributions.html)
for a list of probability distributions in the `Base` R package.
The [Pareto distribution](https://en.wikipedia.org/wiki/Pareto_distribution)
is not available in base R, so we're going to code it ourselves. For this
lab, we'll just code the quantile function, i.e., `qpareto()`.
Here's a bit of background on deriving the Pareto's quantile function.
The Pareto family of distributions is parameterized by $\alpha$ and $x_0$, and
has probability density function
\[
f(x) = \begin{cases}
\frac{(\alpha - 1)x_0^{\alpha - 1}}{x^{\alpha}}, &x > x_0,\\
0, &x \leq x_0.
\end{cases}
\]
From the PDF it is relatively easy to compute the CDF, which is given by
\[
F(x) = \begin{cases}
0 & x < x_0\\
1 - \left(\frac{x_0}{x} \right)^{\alpha - 1} & x \geq x_0.
\end{cases}
\]
The quantile function is defined for $0 \le p \le 1$, and it returns the
value $x_p$ such that $F(x_p) = p$. For the Pareto distribution,
the quantile function is given by
\[
Q(p) = Q(p, \alpha, x_0) = {x_0}{(1-p)^{-\frac{1}{\alpha - 1}}}.
\]
Using the definition of $Q(p)$, we can compute the $p$th quantile for
specific values of $p$. For example, here are the medians ($0.5$ quantiles)
of Pareto distributions with $x_0 = 1, \alpha = 3.5$;
$x_0 = 6\times 10^8, \alpha = 2.34$; and the $0.92$ quantile of the
Pareto distribution with $x_0 = 1\times 10^6, \alpha = 2.5$.
```{r examples, results='hold'}
1 * (1 - 0.5) ^ (-1 / (3.5 - 1))
6e8 * (1 - 0.5) ^ (-1 / (2.34 - 1))
1e6 * (1 - 0.92) ^ (-1 / (2.5 - 1))
```
It would be helpful to have a function that automated this process,
both so we don't have to remember the form of the quantile function for the
Pareto distribution, and so we avoid making mistakes.
We will build our function, `qpareto()`, in a sequence of steps.
## Step 1
### `qpareto_1`
Write a function called `qpareto_1()` that takes arguments
`p`, `alpha`, and `x0` and returns $Q(p, \alpha, x_0)$ as defined above.
Check to make sure your function returns the same answers as the three above.
```{r}
```
```{r qpareto_1, eval=FALSE, results='hold'}
qpareto_1(p = 0.5, alpha = 3.5, x0 = 1)
qpareto_1(p = 0.5, alpha = 2.34, x0 = 6e8)
qpareto_1(p = 0.92, alpha = 2.5, x0 = 1e6)
```
## Step 2
### `qpareto_2()`
Most of the quantile functions in R have an argument `lower.tail` that is
either `TRUE` or `FALSE`. If `TRUE`, the function returns the $p$th quantile.
If `FALSE`, the function returns the $(1-p)$th quantile, i.e., returns the
value $x_p$ such that $F(x_p) = 1 - p$.
Create a function `qpareto_2()` that has an additional argument `lower.tail`
which is by default set to `TRUE`. Your `qpareto_2` function should test
whether `lower.tail` is `FALSE`. If it is `FALSE`, the function should replace
$p$ by $1-p$. Then pass either $p$ or $1-p$ to `qpareto_1()` to compute the
appropriate quantile, i.e., `qpareto_1()` is called from inside of
`qpareto_2()`. Test your function with the two function calls below.
```{r}
```
```{r qpareto_2-examples, eval = FALSE, results='hold'}
qpareto_2(p = 0.5, alpha = 3.5, x0 = 1)
qpareto_2(p = 0.08, alpha = 2.5, x0 = 1e6, lower.tail = FALSE)
```
There is a downside to writing the function the way we have.
We need `qpareto_1()`
to be in the work space when `qpareto_2()` is called,
but there is a big advantage.
If we discover a better way to calculate quantiles of the Pareto
distribution, we
can rewrite `qpareto_1()` and the new version will automatically
be used in `qpareto_2()`.
## Step 3
### `qpareto()`
Next, let's add some check with regards to the function's arguments. In the
case of the Pareto quantile function, we need $0\leq p\leq 1$, $\alpha > 1$,
and $x_0 > 0$.
Write a function named `qpareto()` that adds these checks to your code from
function `qpareto_2()`.
R Markdown will not compile if your R function stops due to the `stopifnot()`
function. You can, and should, tell the offending R code chunks to ignore the
stop call, by including `error=TRUE` as a chunk option.
Test your function on the five function calls below.
Remember to set the chunk option `error=TRUE` so your document will knit.
```{r}
```
```{r qpareto-examples, eval=FALSE}
qpareto(p = 0.5, alpha = 3.5, x0 = 1)
qpareto(p = 0.08, alpha = 2.5, x0 = 1e6, lower.tail = FALSE)
qpareto(p = 1.08, alpha = 2.5, x0 = 1e6, lower.tail = FALSE)
qpareto(p = 0.5, alpha = 0.5, x0 = -4)
qpareto(p = 0.5, alpha = 2, x0 = -4)
```
*Is your function vectorized?*
# Arithmetic gone awry
R, as does most software, uses floating point arithmetic, which is not
the same as the arithmetic we know. Computers cannot represent all
numbers exactly.
For your mind to be blown, run the following examples.
```{r cpu-arithmetic, eval=FALSE}
# example 1
0.2 == 0.6 / 3
# example 2
point3 <- c(0.3, 0.4 - 0.1, 0.5 - 0.2, 0.6 - 0.3, 0.7 - 0.4)
point3
point3 == 0.3
```
To work around these issues, use `all.equal()` for checking the equality of
two double quantities in R.
```{r cpu-arithmetic2, eval=FALSE}
# example 1, all.equal()
all.equal(0.2, 0.6 / 3)
# example 2, all.equal()
point3 <- c(0.3, 0.4 - 0.1, 0.5 - 0.2, 0.6 - 0.3, 0.7 - 0.4)
point3
all.equal(point3, rep(.3, length(point3)))
```