-
Notifications
You must be signed in to change notification settings - Fork 118
/
fp-myths.lyx
343 lines (287 loc) · 6.31 KB
/
fp-myths.lyx
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
#LyX 2.3 created this file. For more info see http://www.lyx.org/
\lyxformat 544
\begin_document
\begin_header
\save_transient_properties true
\origin unavailable
\textclass article
\begin_preamble
\newcommand{\fl}{\operatorname{fl}}
\end_preamble
\use_default_options true
\maintain_unincluded_children false
\language english
\language_package default
\inputencoding auto
\fontencoding global
\font_roman "default" "default"
\font_sans "default" "default"
\font_typewriter "default" "default"
\font_math "auto" "auto"
\font_default_family default
\use_non_tex_fonts false
\font_sc false
\font_osf false
\font_sf_scale 100 100
\font_tt_scale 100 100
\use_microtype false
\use_dash_ligatures false
\graphics default
\default_output_format default
\output_sync 0
\bibtex_command default
\index_command default
\paperfontsize default
\spacing single
\use_hyperref false
\papersize default
\use_geometry true
\use_package amsmath 2
\use_package amssymb 2
\use_package cancel 1
\use_package esint 1
\use_package mathdots 1
\use_package mathtools 1
\use_package mhchem 1
\use_package stackrel 1
\use_package stmaryrd 1
\use_package undertilde 1
\cite_engine basic
\cite_engine_type default
\biblio_style plain
\use_bibtopic false
\use_indices false
\paperorientation portrait
\suppress_date false
\justification true
\use_refstyle 1
\use_minted 0
\index Index
\shortcut idx
\color #008000
\end_index
\topmargin 1in
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\paragraph_indentation default
\is_math_indent 0
\math_numbering_side default
\quotes_style english
\dynamic_quotes 0
\papercolumns 1
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\html_math_output 0
\html_css_as_file 0
\html_be_strict false
\end_header
\begin_body
\begin_layout Title
Some myths about floating-point arithmetic
\end_layout
\begin_layout Author
Steven G.
Johnson
\end_layout
\begin_layout Standard
(This list is adapted from the
\begin_inset Quotes eld
\end_inset
prevalent misconceptions about floating-point arithmetic
\begin_inset Quotes erd
\end_inset
by William Kahan's 2004 presentation
\begin_inset Quotes eld
\end_inset
How Java’s Floating-Point Hurts Everyone Everywhere.
\begin_inset Quotes erd
\end_inset
)
\end_layout
\begin_layout Standard
As in Trefethen's book, we denote floating point operations by
\begin_inset Formula $\oplus,\otimes,\ldots$
\end_inset
.
We denote the set of floating-point numbers by
\begin_inset Formula $\mathbb{F}$
\end_inset
, and
\begin_inset Formula $\fl(x)$
\end_inset
denotes the closest element of
\begin_inset Formula $\mathbb{F}$
\end_inset
to
\begin_inset Formula $x\in\mathbb{R}$
\end_inset
.
Assuming
\begin_inset Formula $x$
\end_inset
does not overflow or underflow (exceed the max/min exponent), two key facts
are that
\begin_inset Formula $|\fl(x)-x|\leq\epsilon|x|$
\end_inset
, where
\begin_inset Formula $\epsilon$
\end_inset
is the machine precision, and that (assuming IEEE
\begin_inset Quotes eld
\end_inset
correct rounding
\begin_inset Quotes erd
\end_inset
)
\begin_inset Formula $x\odot y=\fl(x\cdot y)$
\end_inset
for binary operations
\begin_inset Formula $\cdot\in\{\times,\pm,/\}$
\end_inset
.
The other key fact is to understand that
\begin_inset Formula $\mathbb{F}$
\end_inset
is a specific set of rational numbers:
\begin_inset Formula $p$
\end_inset
-digit integers multiplied by powers of two (in binary floating-point) or
powers of 10 (in decimal floating point).
\end_layout
\begin_layout Standard
A number of pernicious myths about floating-point arithmetic are prevalent.
They include:
\end_layout
\begin_layout Itemize
A unpredictable random number of order
\begin_inset Formula $\epsilon$
\end_inset
is added to every result.
e.g.
\begin_inset Formula $1\oplus1$
\end_inset
may give
\begin_inset Formula $2\pm\epsilon$
\end_inset
, and
\begin_inset Formula $0\otimes x$
\end_inset
may give
\begin_inset Formula $\pm\epsilon$
\end_inset
.
\series bold
False
\series default
.
(e.g.
\begin_inset Formula $1\oplus1$
\end_inset
always gives exactly 2, and
\begin_inset Formula $0\otimes x$
\end_inset
always gives exactly 0 [unless
\begin_inset Formula $x$
\end_inset
is
\begin_inset Formula $\pm$
\end_inset
Inf or NaN], since
\begin_inset Formula $2$
\end_inset
and
\begin_inset Formula $0$
\end_inset
are exactly representable.)
\end_layout
\begin_layout Itemize
Integer arithmetic is more accurate than floating-point arithmetic.
\series bold
False
\series default
.
(See above: integer arithmetic is performed exactly in floating-point.)
\end_layout
\begin_layout Itemize
Integer arithmetic is much faster than floating-point arithmetic.
\series bold
False
\series default
on any modern general-purpose CPU.
(Maybe true in 1980s, or on small embedded systems.)
\end_layout
\begin_layout Itemize
Computational
\series bold
precision
\series default
(the number of digits stored) is the same thing as the computational
\series bold
accuracy
\series default
.
\series bold
False
\series default
.
(Numbers can be much more accurate than the number of digits stored, e.g.
integers are stored exactly, or much less accurate, e.g.
due to error accumulation.)
\end_layout
\begin_layout Itemize
\begin_inset Quotes eld
\end_inset
Arithmetic much more precise than the data it operates upon is needless,
and wasteful.
\begin_inset Quotes erd
\end_inset
(Kahan)
\series bold
False
\series default
: even if you only need 3 significant digits in the final result, you may
need many more digits at intermediate steps.
\end_layout
\begin_layout Itemize
Floating-point arithmetic incurs rounding errors in representing typical
decimal fractions, e.g.
0.1 or 3.1415.
\series bold
True
\series default
for
\emph on
binary
\emph default
floating-point, but
\series bold
False
\series default
for
\emph on
decimal
\emph default
floating-point (available in many software libraries and some hardware).
\end_layout
\begin_layout Itemize
\begin_inset Quotes eld
\end_inset
In floating–point arithmetic nothing is ever exactly 0 ; but if it is, no
useful purpose is served by distinguishing +0 from -0.
\begin_inset Quotes erd
\end_inset
(Kahan)
\series bold
False.
\series default
(Signed zeros are useful for tracking
\emph on
underflow
\emph default
, indicating branch cuts, and other situations.)
\end_layout
\end_body
\end_document