diff --git a/others/preface/preface-en.tex b/others/preface/preface-en.tex index de28ce7f6..b5b785b3f 100644 --- a/others/preface/preface-en.tex +++ b/others/preface/preface-en.tex @@ -434,7 +434,7 @@ \section*{Summary} \vspace{3mm} For example $X$ = [3, 1, 3, 5, 4], the missing number $x = 2$, the duplicated one $y = 3$. We give 4 methods: (1) divide and conquer; (2) pigeon hole sort; (3) sign encoding; and (4) equations. -\textbf{Divide and conquer}: Partition the numbers with the middle point $m = \lfloor \dfrac{1 + n}{2} \rfloor$: the left $as = [a \leq m, a \gets X]$, and the right $bs = [b > m, b \gets X]$. If the length of $|as| < m$, then the missing number is on the left, let $s = 1 + 2 + ... + m = \dfrac{m (m + 1)}{2}$, then $x = s - sum(as)$. We can also calculate the missing one on the right. Let $s' = (m + 1) + (m + 2) + ... + n = \dfrac{(n + m + 1)(n - m)}{2}$, then $y = sum(bs) - s'$. If the length of $|as| > m$, then the duplicated number is on the left. Use the similar method, we calculate the missing number $x = s' - sum(bs)$, and the duplicated number $y = sum(as) - s$. Otherwise if the length $|as| = m$, then there are $m$ numbers not greater than $m$. But we don't know whether they are some permutation of 1, 2, ..., $m$. We can calculate and compare $sum(as)$ and $s$. If equal, then we can drop all numbers on the left, then recursively find $x$ and $y$ on the right; otherwise, we drop the right, recursively find on the left. In recursive finding, we need replace the lower bound of 1 with $l$. Because we halve the list every time, the overall performance is $O(n)$ according to the master theorem. +\textbf{Divide and conquer}: Partition the numbers with the middle point $m = \lfloor \frac{1 + n}{2} \rfloor$: the left is $as = [a \leq m, a \gets X]$; the right is $bs = [b > m, b \gets X]$. If the length of $|as| < m$, then the missing number is on the left. Let $s = 1 + 2 + \dotsb + m = \frac{m (m + 1)}{2}$, then $x = s - sum(as)$. The duplicated one is on the right. Let $s' = (m + 1) + (m + 2) + \dotsb + n = \frac{(n + m + 1)(n - m)}{2}$, then $y = sum(bs) - s'$. If the length of $|as| > m$, then the duplicated number is on the left. In the same way, calculate the missing number $x = s' - sum(bs)$ and the duplicated number $y = sum(as) - s$. Otherwise if the length $|as| = m$, then there are $m$ numbers not greater than $m$. In order to know whether they are some permutation of 1, 2, ..., $m$, we calculate and compare $sum(as)$ and $s$. If equal, then drop all numbers on the left, and recursively find $x$ and $y$ on the right; otherwise, drop the right, recursively find on the left. In recursive finding, we need update the lower bound of 1 with $l$. Because we halve the list every time, the overall performance is $O(n)$ by the master theorem. \begin{Haskell} missDup xs = solve xs 1 (length xs) where @@ -442,16 +442,16 @@ \section*{Summary} | k > m - l + 1 = (sr - sr', sl' - sl) | sl == sl' = solve bs (m + 1) u | otherwise = solve as l m - where - m = (l + u) `div` 2 - (as, bs) = partition (<=m) xs - k = length as - sl = (l + m) * (m - l + 1) `div` 2 - sr = (m + 1 + u) * (u - m) `div` 2 - (sl', sr') = (sum as, sum bs) + where + m = (l + u) `div` 2 + (as, bs) = partition (<=m) xs + k = length as + sl = (l + m) * (m - l + 1) `div` 2 + sr = (m + 1 + u) * (u - m) `div` 2 + (sl', sr') = (sum as, sum bs) \end{Haskell} -\textbf{Pigeon hole sort}. Since all numbers are within the range from 1 to $n$, we can do pigeon hole sort. Scan from left to right, for every number $x$ at position $i$, if $x \neq i$, we swap it with number $y$ at position $x$. We find the duplicated number if $x = y$, besides, we find the missing number $i$. Repeat this till $x = i$ or meet the duplicated number. Because every number is swapped to its right position a time, the total performance is $O(n)$. +\textbf{Pigeon hole sort}. We apply pigeon hole sort since all numbers are within the range from 1 to $n$. Scan from left to right, for every number $x$ at position $i$, if $x \neq i$, swap it with number $y$ at position $x$. We find the duplicated number if $x = y$, besides, we find the missing number $i$. Repeat this till $x = i$ or meet the duplicated number. Because every number is swapped to its right position a time, the total performance is $O(n)$. \begin{Bourbaki} (Int, Int) missDup([Int] xs) { @@ -470,9 +470,10 @@ \section*{Summary} } } return (miss, dup) +} \end{Bourbaki} -\textbf{Sign encoding}. Setup an array of $n$ flags. For every number $x$, mark the $x$-th flag in the array true. When meet the duplicated number, the corresponding flag was marked before. Let the duplicated number be $d$, we know $s = 1 + 2 + ... + n = \dfrac{n (n + 1)}{2}$, and the sum $s'$ of all numbers. We can calculate the missing number $m = d + s - s'$. However, this method need additional $n$ flags. The existence of a number is a type of binary information (yes/no), we can encode it as the positive/negative sign, hence re-use the space. For every $x$, flip the number at position $|x|$ to negative, where $|x|$ is the absolute value. If a number of some position is already negative, it's the duplicated one, and we can next calculate the missing one. +\textbf{Sign encoding}. Setup an array of $n$ flags. For every number $x$, mark the $x$-th flag in the array true. The flag will have been marked before when meet the duplicated number. Let the duplicated number be $d$, the sum $s = 1 + 2 + \dotsb + n = \frac{n (n + 1)}{2}$, and $s'$ be the sum of all numbers. The missing number $m = d + s - s'$. However, this method need additional $n$ flags. The existence of a number is a type of binary information (yes/no), we encode it as the positive/negative sign, hence re-use the space. For every $x$, flip the number at position $|x|$ to negative, where $|x|$ is the absolute value. If a number at some position is already negative, it's the duplicated one; from where we then calculate the missing one. \begin{Bourbaki} (Int, Int) missDup([Int] xs) { @@ -489,9 +490,10 @@ \section*{Summary} xs[j] = -abs(xs[j]) } return (miss, dup) +} \end{Bourbaki} -\textbf{Equation}. Consider a simplified problem: random drop a number after shuffle 1 to $n$, how to find it? We sum all the numbers, then subtract it from $\dfrac{n (n + 1)}{2}$: +\textbf{Equation}. Consider a simplified problem: random drop a number after shuffle 1 to $n$, how to find it? We sum all the numbers, then subtract it from $\frac{n (n + 1)}{2}$: \[ m = s - s' @@ -504,27 +506,27 @@ \section*{Summary} \label{eq:miss-dup-1} \ee -Where the left hand is the sum of the $i$-th number minus $i$. Can we figure out a second equation? We can use square: sum the difference between the square of the $i$-th number and the square of $i$: +Where the left hand is the sum of the difference between the $i$-th number and $i$. We use square to figure out the second equation: sum the difference between the square of the $i$-th number and the square of $i$: \be \sum (x[i]^2 - i^2) = d^2 - m^2 = (d + m)(d - m) \label{eq:miss-dup-2} \ee -Since $d - m \neq 0$, we can divide \cref{eq:miss-dup-1} by \cref{eq:miss-dup-2} on both sides to get another equation: +Since $d - m \neq 0$, divide \cref{eq:miss-dup-1} by \cref{eq:miss-dup-2} on both sides to get another equation: \be \sum (x[i]^2 - i^2) / \sum (x[i] - i) = d + m \label{eq:miss-dup-3} \ee -Compare equation \cref{eq:miss-dup-1} and \cref{eq:miss-dup-3}, there are two equations with two unknowns. We can solve them: +Compare \cref{eq:miss-dup-1} and \cref{eq:miss-dup-3}, there are two equations with two unknowns. Solve this system: \[ -\begin{cases} -m = \dfrac{1}{2} (\dfrac{\sum (x[i]^2 - i^2)}{\sum (x[i] - i)} - \sum (x[i] - i)) \\ -d = \dfrac{1}{2} (\dfrac{\sum (x[i]^2 - i^2)}{\sum (x[i] - i)} + \sum (x[i] - i)) \\ -\end{cases} +\begin{dcases} +m & = \dfrac{1}{2} [\dfrac{\sum (x[i]^2 - i^2)}{\sum (x[i] - i)} - \sum (x[i] - i)] \\ +d & = \dfrac{1}{2} [\dfrac{\sum (x[i]^2 - i^2)}{\sum (x[i] - i)} + \sum (x[i] - i)] +\end{dcases} \] \begin{Haskell} diff --git a/others/preface/preface-zh-cn.tex b/others/preface/preface-zh-cn.tex index 1f1464b8e..b6422bd85 100644 --- a/others/preface/preface-zh-cn.tex +++ b/others/preface/preface-zh-cn.tex @@ -432,6 +432,7 @@ \section*{小结} } } return (miss, dup) +} \end{Bourbaki} \textbf{符号编码}。假设存在一个长度为$n$的标记数组,对于序列中的每个数字$x$,我们都将标记数组中的第$x$个位置做上标记。当我们遇到重复元素时,我们会发现这个位置上的标记已经做过了。记重复的数字为$d$,我们知道$s = 1 + 2 + ... + n = \dfrac{n (n + 1)}{2}$,以及序列中所有的数字和$s'$。我们可以计算出丢失的数字$m = d + s - s'$。 但是这一方法需要额外长度为$n$的空间用作标记数组。由于数字的存在与否是一种二值化的信息(有、无),我们可以将其编码为数字的正负号,从而复用待查找的数字序列。对于序列中的每个数字$x$,我们将序列中第$|x|$位置上的元素标记为负数,其中$|x|$表示绝对值。如果发现某一位置已经为负了,我们就找到了重复的元素,接下来我们就可以计算出丢失的数字。 @@ -451,6 +452,7 @@ \section*{小结} xs[j] = -abs(xs[j]) } return (miss, dup) +} \end{Bourbaki} \textbf{解方程}。考虑一个简化的问题:给定一个从1到$n$的列表,去掉一个元素,然后打乱序列的顺序,怎样能够快速找出去掉的元素呢?我们可以将列表中所有的元素相加,然后从$\dfrac{n (n + 1)}{2}$减去这一结果就得出了答案。这一思路可以表示为如下方程: