Skip to content

Commit

Permalink
Proofread and corrected errors in preface (EN)
Browse files Browse the repository at this point in the history
  • Loading branch information
liuxinyu95 committed Jan 28, 2024
1 parent 12f601a commit ac0071c
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 18 deletions.
38 changes: 20 additions & 18 deletions others/preface/preface-en.tex
Original file line number Diff line number Diff line change
Expand Up @@ -434,24 +434,24 @@ \section*{Summary}
\vspace{3mm}
For example $X$ = [3, 1, 3, 5, 4], the missing number $x = 2$, the duplicated one $y = 3$. We give 4 methods: (1) divide and conquer; (2) pigeon hole sort; (3) sign encoding; and (4) equations.

\textbf{Divide and conquer}: Partition the numbers with the middle point $m = \lfloor \dfrac{1 + n}{2} \rfloor$: the left $as = [a \leq m, a \gets X]$, and the right $bs = [b > m, b \gets X]$. If the length of $|as| < m$, then the missing number is on the left, let $s = 1 + 2 + ... + m = \dfrac{m (m + 1)}{2}$, then $x = s - sum(as)$. We can also calculate the missing one on the right. Let $s' = (m + 1) + (m + 2) + ... + n = \dfrac{(n + m + 1)(n - m)}{2}$, then $y = sum(bs) - s'$. If the length of $|as| > m$, then the duplicated number is on the left. Use the similar method, we calculate the missing number $x = s' - sum(bs)$, and the duplicated number $y = sum(as) - s$. Otherwise if the length $|as| = m$, then there are $m$ numbers not greater than $m$. But we don't know whether they are some permutation of 1, 2, ..., $m$. We can calculate and compare $sum(as)$ and $s$. If equal, then we can drop all numbers on the left, then recursively find $x$ and $y$ on the right; otherwise, we drop the right, recursively find on the left. In recursive finding, we need replace the lower bound of 1 with $l$. Because we halve the list every time, the overall performance is $O(n)$ according to the master theorem.
\textbf{Divide and conquer}: Partition the numbers with the middle point $m = \lfloor \frac{1 + n}{2} \rfloor$: the left is $as = [a \leq m, a \gets X]$; the right is $bs = [b > m, b \gets X]$. If the length of $|as| < m$, then the missing number is on the left. Let $s = 1 + 2 + \dotsb + m = \frac{m (m + 1)}{2}$, then $x = s - sum(as)$. The duplicated one is on the right. Let $s' = (m + 1) + (m + 2) + \dotsb + n = \frac{(n + m + 1)(n - m)}{2}$, then $y = sum(bs) - s'$. If the length of $|as| > m$, then the duplicated number is on the left. In the same way, calculate the missing number $x = s' - sum(bs)$ and the duplicated number $y = sum(as) - s$. Otherwise if the length $|as| = m$, then there are $m$ numbers not greater than $m$. In order to know whether they are some permutation of 1, 2, ..., $m$, we calculate and compare $sum(as)$ and $s$. If equal, then drop all numbers on the left, and recursively find $x$ and $y$ on the right; otherwise, drop the right, recursively find on the left. In recursive finding, we need update the lower bound of 1 with $l$. Because we halve the list every time, the overall performance is $O(n)$ by the master theorem.

\begin{Haskell}
missDup xs = solve xs 1 (length xs) where
solve xs@(_:_:_) l u | k < m - l + 1 = (sl - sl', sr' - sr)
| k > m - l + 1 = (sr - sr', sl' - sl)
| sl == sl' = solve bs (m + 1) u
| otherwise = solve as l m
where
m = (l + u) `div` 2
(as, bs) = partition (<=m) xs
k = length as
sl = (l + m) * (m - l + 1) `div` 2
sr = (m + 1 + u) * (u - m) `div` 2
(sl', sr') = (sum as, sum bs)
where
m = (l + u) `div` 2
(as, bs) = partition (<=m) xs
k = length as
sl = (l + m) * (m - l + 1) `div` 2
sr = (m + 1 + u) * (u - m) `div` 2
(sl', sr') = (sum as, sum bs)
\end{Haskell}

\textbf{Pigeon hole sort}. Since all numbers are within the range from 1 to $n$, we can do pigeon hole sort. Scan from left to right, for every number $x$ at position $i$, if $x \neq i$, we swap it with number $y$ at position $x$. We find the duplicated number if $x = y$, besides, we find the missing number $i$. Repeat this till $x = i$ or meet the duplicated number. Because every number is swapped to its right position a time, the total performance is $O(n)$.
\textbf{Pigeon hole sort}. We apply pigeon hole sort since all numbers are within the range from 1 to $n$. Scan from left to right, for every number $x$ at position $i$, if $x \neq i$, swap it with number $y$ at position $x$. We find the duplicated number if $x = y$, besides, we find the missing number $i$. Repeat this till $x = i$ or meet the duplicated number. Because every number is swapped to its right position a time, the total performance is $O(n)$.

\begin{Bourbaki}
(Int, Int) missDup([Int] xs) {
Expand All @@ -470,9 +470,10 @@ \section*{Summary}
}
}
return (miss, dup)
}
\end{Bourbaki}

\textbf{Sign encoding}. Setup an array of $n$ flags. For every number $x$, mark the $x$-th flag in the array true. When meet the duplicated number, the corresponding flag was marked before. Let the duplicated number be $d$, we know $s = 1 + 2 + ... + n = \dfrac{n (n + 1)}{2}$, and the sum $s'$ of all numbers. We can calculate the missing number $m = d + s - s'$. However, this method need additional $n$ flags. The existence of a number is a type of binary information (yes/no), we can encode it as the positive/negative sign, hence re-use the space. For every $x$, flip the number at position $|x|$ to negative, where $|x|$ is the absolute value. If a number of some position is already negative, it's the duplicated one, and we can next calculate the missing one.
\textbf{Sign encoding}. Setup an array of $n$ flags. For every number $x$, mark the $x$-th flag in the array true. The flag will have been marked before when meet the duplicated number. Let the duplicated number be $d$, the sum $s = 1 + 2 + \dotsb + n = \frac{n (n + 1)}{2}$, and $s'$ be the sum of all numbers. The missing number $m = d + s - s'$. However, this method need additional $n$ flags. The existence of a number is a type of binary information (yes/no), we encode it as the positive/negative sign, hence re-use the space. For every $x$, flip the number at position $|x|$ to negative, where $|x|$ is the absolute value. If a number at some position is already negative, it's the duplicated one; from where we then calculate the missing one.

\begin{Bourbaki}
(Int, Int) missDup([Int] xs) {
Expand All @@ -489,9 +490,10 @@ \section*{Summary}
xs[j] = -abs(xs[j])
}
return (miss, dup)
}
\end{Bourbaki}

\textbf{Equation}. Consider a simplified problem: random drop a number after shuffle 1 to $n$, how to find it? We sum all the numbers, then subtract it from $\dfrac{n (n + 1)}{2}$:
\textbf{Equation}. Consider a simplified problem: random drop a number after shuffle 1 to $n$, how to find it? We sum all the numbers, then subtract it from $\frac{n (n + 1)}{2}$:

\[
m = s - s'
Expand All @@ -504,27 +506,27 @@ \section*{Summary}
\label{eq:miss-dup-1}
\ee

Where the left hand is the sum of the $i$-th number minus $i$. Can we figure out a second equation? We can use square: sum the difference between the square of the $i$-th number and the square of $i$:
Where the left hand is the sum of the difference between the $i$-th number and $i$. We use square to figure out the second equation: sum the difference between the square of the $i$-th number and the square of $i$:

\be
\sum (x[i]^2 - i^2) = d^2 - m^2 = (d + m)(d - m)
\label{eq:miss-dup-2}
\ee

Since $d - m \neq 0$, we can divide \cref{eq:miss-dup-1} by \cref{eq:miss-dup-2} on both sides to get another equation:
Since $d - m \neq 0$, divide \cref{eq:miss-dup-1} by \cref{eq:miss-dup-2} on both sides to get another equation:

\be
\sum (x[i]^2 - i^2) / \sum (x[i] - i) = d + m
\label{eq:miss-dup-3}
\ee

Compare equation \cref{eq:miss-dup-1} and \cref{eq:miss-dup-3}, there are two equations with two unknowns. We can solve them:
Compare \cref{eq:miss-dup-1} and \cref{eq:miss-dup-3}, there are two equations with two unknowns. Solve this system:

\[
\begin{cases}
m = \dfrac{1}{2} (\dfrac{\sum (x[i]^2 - i^2)}{\sum (x[i] - i)} - \sum (x[i] - i)) \\
d = \dfrac{1}{2} (\dfrac{\sum (x[i]^2 - i^2)}{\sum (x[i] - i)} + \sum (x[i] - i)) \\
\end{cases}
\begin{dcases}
m & = \dfrac{1}{2} [\dfrac{\sum (x[i]^2 - i^2)}{\sum (x[i] - i)} - \sum (x[i] - i)] \\
d & = \dfrac{1}{2} [\dfrac{\sum (x[i]^2 - i^2)}{\sum (x[i] - i)} + \sum (x[i] - i)]
\end{dcases}
\]

\begin{Haskell}
Expand Down
2 changes: 2 additions & 0 deletions others/preface/preface-zh-cn.tex
Original file line number Diff line number Diff line change
Expand Up @@ -432,6 +432,7 @@ \section*{小结}
}
}
return (miss, dup)
}
\end{Bourbaki}

\textbf{符号编码}。假设存在一个长度为$n$的标记数组,对于序列中的每个数字$x$,我们都将标记数组中的第$x$个位置做上标记。当我们遇到重复元素时,我们会发现这个位置上的标记已经做过了。记重复的数字为$d$,我们知道$s = 1 + 2 + ... + n = \dfrac{n (n + 1)}{2}$,以及序列中所有的数字和$s'$。我们可以计算出丢失的数字$m = d + s - s'$。 但是这一方法需要额外长度为$n$的空间用作标记数组。由于数字的存在与否是一种二值化的信息(有、无),我们可以将其编码为数字的正负号,从而复用待查找的数字序列。对于序列中的每个数字$x$,我们将序列中第$|x|$位置上的元素标记为负数,其中$|x|$表示绝对值。如果发现某一位置已经为负了,我们就找到了重复的元素,接下来我们就可以计算出丢失的数字。
Expand All @@ -451,6 +452,7 @@ \section*{小结}
xs[j] = -abs(xs[j])
}
return (miss, dup)
}
\end{Bourbaki}

\textbf{解方程}。考虑一个简化的问题:给定一个从1到$n$的列表,去掉一个元素,然后打乱序列的顺序,怎样能够快速找出去掉的元素呢?我们可以将列表中所有的元素相加,然后从$\dfrac{n (n + 1)}{2}$减去这一结果就得出了答案。这一思路可以表示为如下方程:
Expand Down

0 comments on commit ac0071c

Please sign in to comment.