-
Notifications
You must be signed in to change notification settings - Fork 0
/
RA.tex
269 lines (223 loc) · 9.46 KB
/
RA.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
% template from http://nook.cs.ucdavis.edu/~koehl/Teaching/ECS20/homeworks.html
\documentclass[11pt]{article}
\usepackage{amsthm,amsmath, amssymb}
\setlength{\oddsidemargin}{0in}
\setlength{\evensidemargin}{0in}
\setlength{\textheight}{9.0in}
\setlength{\textwidth}{6.5in}
\setlength{\topmargin}{-0.5in}
\usepackage{graphicx}
\usepackage{listings}
\lstset{xleftmargin=.25in} % https://tex.stackexchange.com/questions/10828/
\usepackage{multicol}
\usepackage{courier}
\usepackage{verbatim}
\lstset{basicstyle=\footnotesize\ttfamily,breaklines=true}
\usepackage{fancyhdr}
\usepackage{indentfirst}
\pagestyle{fancy}
\usepackage{hyperref}
% Due ???
\newcommand{\titleVar}{ECS 122B Spring 2019 - Randomized Algorithm Analysis}
\lhead{\titleVar{}} \chead{} \rhead{Eric Li}
\lfoot{} \cfoot{\thepage} \rfoot{}
\renewcommand{\headrulewidth}{0pt}
\renewcommand{\footrulewidth}{0pt}
\setlength\headsep{0.333in}
\title{\bf \titleVar{}}
\author{Eric Li}
\newcommand{\insertGraph}[3]{
\centerline{\includegraphics[scale=#1]{#2}}
\centerline{#3}
}
\newcommand{\insertTwoGraphs}[3]{
\begin{multicols}{2}
\insertGraph{#1}{#2.png}{#2}
\insertGraph{#1}{#3.png}{#3}
\end{multicols}
}
\begin{document}
\maketitle
\renewcommand\thesubsection{\arabic{section}.\Alph{subsection}}
This analysis adapts the idea in Introduction to Algorithms, third edition,
chapter 7.4.2. % Page 181
Recall the selection algorithm. % Page 216
The non-base case is first partitioning the array, then call RandomizedSelect
on the partition of interest.
The only time the algorithm compares is when partitioning the array, and similar
to Lemma 7.1, we can see that the running time of RandomizedSelect is
$O(n + X)$, where $X$ is the number of comparisons by the Partition algorithm.
Suppose we can sort the input array as $z_1, z_2, ..., z_n$, where $z_1$ is the
smallest. Also, for simplicity, assume that $\forall i \neq j, z_i \neq z_j$.
From similar reason for Quicksort, the time $z_i$ and $z_j$ are compared is at
most $1$. Let $X_{ij} = I\{z_i \text{ is compared to } z_j\}$. There is
$$X = \sum_{i=0}^n{\sum_{j=i+1}^n{X_{ij}}}$$
And since they are independent from each other,
$$E[X] = E\left[\sum_{i=0}^n{\sum_{j=i+1}^n{X_{ij}}}\right]
= \sum_{i=0}^n{\sum_{j=i+1}^n{Pr\{z_i \text{ is compared to } z_j\}}}$$
So we need to compute $Pr\{z_i \text{ is compared to } z_j\}$. The difference
to Quicksort is that this time we have another number $k$, which is the number
of element in the list that we want to find.
There are two cases. When $i \le k \le j$ (Note that there is $i < j$), $z_i$
and $z_j$ get compared only when $z_i$ or $z_j$ is the first vertex chosen
among $z_i, z_{i+1}, ..., z_j$. There are $j - i + 1$ vertices in this list,
so $Pr\{z_i \text{ is compared to } z_j\} = \displaystyle\frac{2}{j-i+1}$.
The second case is when $k$ is not between $i$ and $j$. Due to symmetricity, we
only discuss the case where $k < i < j$ (case 2.a).
Among $z_k, z_{k+1}, ..., z_i, z_{i+1}, ..., z_j$, if one of them other than
$z_i$ and $z_j$ is chosen, $z_i$ and $z_j$ will not be compared.
The reason is that if a number between $k$ and $i$ is chosen, the partition
$z_i$ and $z_j$ belong to will not be recursed on; if a number between $i$ and
$j$ is chosen, $z_i$ and $z_j$ will belong to different partitions. Now, there
are $j - k + 1$ numbers to choose from, and
$Pr\{z_i \text{ is compared to } z_j\} = \displaystyle\frac{2}{j-k+1}$.
Symmetrically, when $i < j < k$ (case 2.b),
$Pr\{z_i \text{ is compared to } z_j\} = \displaystyle\frac{2}{k-i+1}$.
Note that when $i = k$ or $j = k$, the first case is equivalent to the second
case (2.a and 2.b, respectively).
So $$
\begin{aligned}
E[X] & = \sum_{i=0}^n{\sum_{j=i+1}^n{Pr\{z_i \text{ cmp. to } z_j\}}} \\
& = \sum_{i=0}^{k-1}{\sum_{j=i+1}^n{Pr\{z_i \text{ cmp. to } z_j\}}}
+ \sum_{i=k}^n{\sum_{j=i+1}^n{Pr\{z_i \text{ cmp. to } z_j\}}} \\
& = \sum_{i=0}^{k-1}{\left(
\sum_{j=i+1}^k{Pr\{z_i \text{ cmp. to } z_j\}}
+ \sum_{j=k+1}^n{Pr\{z_i \text{ cmp. to } z_j\}}
\right)}
+ \sum_{i=k}^n{\sum_{j=i+1}^n{\displaystyle\frac{2}{j-k+1}}} \\
& = \sum_{i=0}^{k-1}{\left(
\sum_{j=i+1}^k{\displaystyle\frac{2}{k-i+1}}
+ \sum_{j=k+1}^n{\displaystyle\frac{2}{j-i+1}}
\right)}
+ \sum_{i=k}^n{\sum_{j=i+1}^n{\displaystyle\frac{2}{j-k+1}}} \\
& = \sum_{i=0}^{k-1}{\sum_{j=i+1}^k{\displaystyle\frac{2}{k-i+1}}}
+ \sum_{i=0}^{k-1}{\sum_{j=k+1}^n{\displaystyle\frac{2}{j-i+1}}}
+ \sum_{i=k}^n{\sum_{j=i+1}^n{\displaystyle\frac{2}{j-k+1}}} \\
& = \sum_{i=0}^{k-1}{\displaystyle\frac{2\,(k-i)}{k-i+1}}
+ \sum_{i=0}^{k-1}{\sum_{j=k+1}^n{\displaystyle\frac{2}{j-i+1}}}
+ \sum_{j=k+1}^n{\sum_{i=k}^{j-1}{\displaystyle\frac{2}{j-k+1}}} \\
& = \sum_{i=0}^{k-1}{\displaystyle\frac{2\,(k-i)}{k-i+1}}
+ \sum_{i=0}^{k-1}{\sum_{j=k+1}^n{\displaystyle\frac{2}{j-i+1}}}
+ \sum_{j=k+1}^n{\displaystyle\frac{2\,(j-k)}{j-k+1}} \\
& \le \sum_{i=0}^{k-1}{\displaystyle\frac{2\,(k-i+1)}{k-i+1}}
+ \sum_{i=0}^{k-1}{\sum_{j=k+1}^n{\displaystyle\frac{2}{j-i+1}}}
+ \sum_{j=k+1}^n{\displaystyle\frac{2\,(j-k+1)}{j-k+1}} \\
& = \sum_{i=0}^{k-1}{2}
+ \sum_{i=0}^{k-1}{\sum_{j=k+1}^n{\displaystyle\frac{2}{j-i+1}}}
+ \sum_{j=k+1}^n{2} \\
& = 2\,n
+ \sum_{i=0}^{k-1}{\sum_{j=k+1}^n{\displaystyle\frac{2}{j-i+1}}}\\
& = 2\,n
+ \sum_{i=0}^{k-1}{\sum_{j=1}^{n-k}{\displaystyle\frac{2}{j-k-i+1}}}\\
& = 2\,n
+ \sum_{i=-k}^{-1}{\sum_{j=1}^{n-k}{\displaystyle\frac{2}{j-i+1}}}\\
& = 2\,n
+ \sum_{i=1}^k{\sum_{j=1}^{n-k}{\displaystyle\frac{2}{j+i+1}}} \\
& = 2\,n
+ 2\,\sum_{i=1}^k{\sum_{j=1}^{n-k}{\displaystyle\frac{1}{j+i+1}}}
\end{aligned}
$$
\begin{comment}
def check(n=10, k=4) :
a = set()
for i in range(k, n+1) :
for j in range(i+1, n+1) :
a.add((i, j))
b = set()
for j in range(k+1, n+1) :
for i in range(k, j-1+1) :
b.add((i, j))
assert a == b
return a, b
\end{comment}
Now, to prove that $X = O(n)$, we only need to prove that
$$A = \sum_{i=1}^k{\sum_{j=1}^{n-k}{\displaystyle\frac{1}{j+i+1}}} = O(n)$$
\begin{comment}
Since the series does not always grow from $1$, we have to derive something
similar to Equation A.10. % Page 1154
(Note that $\lg$ has a base of $2$, as in Introduction to Algorithms).
$$
\begin{aligned}
\sum_{k=m}^n{\frac{1}{k}}
& \le \sum_{i=\lfloor\lg{m}\rfloor}^{\lfloor\lg{n}\rfloor}{
\sum_{j=0}^{2^i-1}{\frac{1}{2^i+j}}} \\
& \le \sum_{i=\lfloor\lg{m}\rfloor}^{\lfloor\lg{n}\rfloor}{
\sum_{j=0}^{2^i-1}{\frac{1}{2^i}}} \\
& = \sum_{i=\lfloor\lg{m}\rfloor}^{\lfloor\lg{n}\rfloor}{1} \\
& \le \lg{n} - \lg{m} + 1
\end{aligned}
$$
So
$$
\begin{aligned}
\sum_{i=1}^k{\sum_{j=1}^{n-k}{\displaystyle\frac{1}{j+i+1}}}
& \le \sum_{i=1}^k{\lg{(n-k+i+1)} - \lg{(i+2)} + 1} \\
& \le k + \sum_{i=1}^k{\lg{(n-k+i+1)} - \lg{(i+2)}} \\
& = k + \sum_{i=1}^k{\lg{(n-k+i+1)}} - \sum_{i=1}^k{\lg{(i+2)}} \\
& = k + \sum_{i=n-k+2}^{n+1}{\lg{i}} - \sum_{i=3}^{k+2}{\lg{i}} \\
\end{aligned}
$$
We have $k = O(n)$, and let
$$A = \sum_{i=n-k+2}^{n+1}{\lg{i}} - \sum_{i=3}^{k+2}{\lg{i}}$$
We need to prove that $A = O(n)$. We can consider whether the two summation
have overlap items and split into two cases
Case 1: when $n - k > k$ (i.e. $k < n/2$), let $k' = k$, so
$$A = \sum_{i=n-k'+2}^{n+1}{\lg{i}} - \sum_{i=3}^{k'+2}{\lg{i}}$$
Case 2: when $n - k \le k$ (i.e. $k \ge n/2$), let $k' = n - k$
$$
\begin{aligned}
A & = \sum_{i=n-n+k'+2}^{n+1}{\lg{i}} - \sum_{i=3}^{n-k'+2}{\lg{i}} \\
& = \sum_{i=k'+2}^{n+1}{\lg{i}} - \sum_{i=3}^{n-k'+2}{\lg{i}} \\
& = \sum_{i=k'+2}^{n+1}{\lg{i}} - \sum_{i=k'+2}^{n-k'+2}{\lg{i}}
+ \sum_{i=k'+2}^{n-k'+2}{\lg{i}} - \sum_{i=3}^{n-k'+2}{\lg{i}} \\
& = \sum_{i=n-k'+2}^{n+1}{\lg{i}} - \sum_{i=3}^{k'+2}{\lg{i}} \\
\end{aligned}
$$
So we can see that whichever the case is, $\exists k' \le n/2$ such that
$$A = \sum_{i=n-k'+2}^{n+1}{\lg{i}} - \sum_{i=3}^{k'+2}{\lg{i}}$$
Now we can drop the lower term
$$
\begin{aligned}
A & = \sum_{i=n-k'+2}^{n+1}{\lg{i}} - \sum_{i=3}^{k'+2}{\lg{i}} \\
& \le \sum_{i=n-k'+2}^{n+1}{\lg{i}} \\
& \le \sum_{i=n-\lfloor\frac{n}2\rfloor+2}^{n+1}{\lg{i}} \\
& \le \sum_{i=\lceil\frac{n}2\rceil}^{n+1}{\lg{i}} \\
0/0
\end{aligned}
$$
\end{comment}
Due to symmetricity, we can assume that $k \le n/2$, which implies $k \le n-k$.
The summation becomes
$$
\begin{aligned}
A & =
% \frac{1}{3} + \frac{2}{4} + \frac{3}{5} + ... + \frac{k}{k+2} + \frac{k}{k+3}
% + \frac{k}{k+4} + ... + \frac{k}{n-k+2} + \frac{k-1}{k-k+3} + ...
% + \frac{1}{n+1} \\
\sum_{i=1}^k{\sum_{j=1}^{n-k}{\displaystyle\frac{1}{j+i+1}}} \\
& = \sum_{i=1}^k{\sum_{j=1+i}^{n-k+i}{\displaystyle\frac{1}{j+1}}} \\
& = \sum_{i=1}^k{\left(
\sum_{j=1+i}^{k}{\displaystyle\frac{1}{j+1}} +
\sum_{j=k}^{n-k+1}{\displaystyle\frac{1}{j+1}} +
\sum_{j=n-k+1}^{n-k+i}{\displaystyle\frac{1}{j+1}}
\right)} \\
& = \sum_{i=1}^k{\sum_{j=1+i}^{k}{\displaystyle\frac{1}{j+1}}} +
\sum_{i=1}^k{\sum_{j=k}^{n-k+1}{\displaystyle\frac{1}{j+1}}} +
\sum_{i=1}^k{\sum_{j=n-k+1}^{n-k+i}{\displaystyle\frac{1}{j+1}}} \\
& = \sum_{j=2}^k{\sum_{i=1}^{j-1}{\displaystyle\frac{1}{j+1}}} +
\sum_{j=k}^{n-k+1}{\displaystyle\frac{k}{j+1}} +
\sum_{j=n-k+1}^{n-k+k}{\sum_{i=j-n+k}^{k}{\displaystyle\frac{1}{j+1}}} \\
& = \sum_{j=2}^k{\displaystyle\frac{j-1}{j+1}} +
\sum_{j=k}^{n-k+1}{\displaystyle\frac{k}{j+1}} +
\sum_{j=n-k+1}^{n}{\displaystyle\frac{n-j+1}{j+1}} \\
& \le \sum_{j=2}^k{1} +
\sum_{j=k}^{n-k+1}{1} +
\sum_{j=n-k+1}^{n}{1} \\
& \le \sum_{j=2}^n{1} \\
& \le n
\end{aligned}
$$
So $A = O(n)$.
So $E[X] = O(n)$.
So the expected running time of RandomizedSelect is $O(n + X) = O(n)$.
\end{document}