Sqrt Decomposition is a method / data structure that allows you to perform some commpn operations in O( sqrt(n))
operations, which is faster than O(n)
for the trivial algorithm.
The common problems:
-
finding sum of the elements in sub-array,
-
finding minimal / maximal element, etc.
-
First we describe the data structure for one of the simplest applications of this idea,
-
then show how to generalize it to solve some other problems,
-
and finally look at a slightly different use of this idea:
splitting the input requests into sqrt blocks.
Given an array a[0... ... n-1]
, implement a data structure that allows to find the sum of the elements a[l---r]
for arbitrary l
and r
in O(sqrt(n))
operations.
The basic idea of sqrt decomposition is preprocessing. We'll divide the array a
into blocks of length approximately sqrt(n)
, and for each block i
we'll precalculate the sum of elements in it b[i]
.
We can assume that both the size of the block and the number of blocks are equal to sqrt(n)
rounded up:
s = floor[ sqrt(n) ]
Then the array a
is divided into blocks:
b[0] = a[0] + a[1] + ... + a[s-1],
b[1] = a[s] + a[s+1]+... + a[2s-1],
... ... ...
b[s-1] = ... ... + a[n-1],
The last block may have fewer elements than the others (if n not a multiple of s), it is not important to the discussion (as it can be handled easily). Thus, for each block k, we know the sum of elements on it b[k]:
b[k] = \sum\limits_{i=k\cdot s}^{\min {(n-1,(k+1)\cdot s - 1})} a[i]
Thus, in order to calculate the sum of elements on the interval [l, r] we only need to sum the elements of the two "tails": t1 and t2 , and sum the values b[i] in all the blocks from k+1 to p-1:
\sum\limits\_{i=l}^r a[i] = \sum\limits\_{i=l}^{(k+1) \cdot s-1} a[i] + \sum\limits\_{i=k+1}^{p-1} b[i] + \sum\limits\_{i=p\cdot s}^r a[i]
Note: When k = p
, i.e. l and r belong to the same block, the formula can't be applied, and the sum should be calculated trivially.
This approach allows us to significantly reduce the number of operations. Indeed, the size of each "tail" does not exceed the block length s
, and the number of blocks in the sum does not exceed s
. Since we have chosen s ~~ sqrt(n)
, the total number of operations required to find the sum of elements on the interval [l,r] is sqrt(n)
.
// input data
int n;
vector<int> a (n);
// preprocessing
int len = (int) sqrt (n + .0) + 1; // size of the block and the number of blocks
vector<int> b (len);
for (int i=0; i<n; ++i)
b[i / len] += a[i];
// answering the queries
for (;;) {
int l, r;
// read input data for the next query
int sum = 0;
for (int i=l; i<=r; )
if (i % len == 0 && i + len - 1 <= r) {
// if the whole block starting at i belongs to [l, r]
sum += b[i / len];
i += len;
}
else {
sum += a[i];
++i;
}
}
This one has too much division operations. Better
int sum = 0;
int c_l = l / len, c_r = r / len;
if (c_l == c_r)
for (int i=l; i<=r; ++i)
sum += a[i];
else {
for (int i=l, end=(c_l+1)*len-1; i<=end; ++i)
sum += a[i];
for (int i=c_l+1; i<=c_r-1; ++i)
sum += b[i];
for (int i=c_r*len; i<=r; ++i)
sum += a[i];
}
So far we were discussing the problem of finding the sum of elements of a continuous subarray. This problem can be extended to allow to update individual array elements. If an element a[i] changes, it's sufficient to update the value of b[k] for the block to which this element belongs (k = i/s) in one operation:
b[k] += a_{new}[i] - a_{old}[i]
On the other hand, the task of finding the sum of elements can be replaced with the task of finding minimal/maximal element of a subarray.
If this problem has to address individual elements' updates as well, updating the value of b[k] is also possible, but it will require iterating through all values of block k in O(s) = O(sqrt(n))
operations.
Sqrt decomposition can be applied in a similar way to a whole class of other problems:
- finding the number of zero elements,
- finding the first non-zero element,
- counting elements which satisfy a certain property etc.
Another class of problems appears when we need to update array elements on intervals
: increment existing elements or replace them with a given value.
For example, let's say we can do two types of operations on an array: add a given value
Finally, those two classes of problems can be combined if the task requires doing both element updates on an interval and queries on an interval. Both operations can be done with
There exist other problems which can be solved using sqrt decomposition, for example, a problem about maintaining a set of numbers which would allow adding/deleting numbers, checking whether a number belongs to the set and finding
A similar idea, based on sqrt decomposition, can be used to answer range queries (
The idea is to answer the queries in a special order based on the indices. We will first answer all queries which have the left index in block 0, then answer all queries which have left index in block 1 and so on. And also we will have to answer the queries of a block is a special order, namely sorted by the right index of the queries.
As already said we will use a single data structure. This data structure will store information about the range. At the beginning this range will be empty. When we want to answer the next query (in the special order), we simply extend or reduce the range, by adding/removing elements on both sides of the current range, until we transformed it into the query range. This way, we only need to add or remove a single element once at a time, which should be pretty easy operations in our data structure.
Since we change the order of answering the queries, this is only possible when we are allowed to answer the queries in offline mode.
In Mo's algorithm we use two functions for adding an index and for removing an index from the range which we are currently maintaining.
void remove(idx); // TODO: remove value at idx from data structure
void add(idx); // TODO: add value at idx from data structure
int get_answer(); // TODO: extract the current answer of the data structure
int block_size;
struct Query {
int l, r, idx;
bool operator<(Query other) const
{
return make_pair(l / block_size, r) <
make_pair(other.l / block_size, other.r);
}
};
vector<int> mo_s_algorithm(vector<Query> queries) {
vector<int> answers(queries.size());
sort(queries.begin(), queries.end());
// TODO: initialize data structure
int cur_l = 0;
int cur_r = -1;
// invariant: data structure will always reflect the range [cur_l, cur_r]
for (Query q : queries) {
while (cur_l > q.l) {
cur_l--;
add(cur_l);
}
while (cur_r < q.r) {
cur_r++;
add(cur_r);
}
while (cur_l < q.l) {
remove(cur_l);
cur_l++;
}
while (cur_r > q.r) {
remove(cur_r);
cur_r--;
}
answers[q.idx] = get_answer();
}
return answers;
}
Based on the problem we can use a different data structure and modify the add/remove/get_answer functions accordingly. For example if we are asked to find range sum queries then we use a simple integer as data structure, which is
For answering mode-queries, we can use a binary search tree (e.g. map<int, int>) for storing how often each number appears in the current range, and a second binary search tree (e.g. set<pair<int, int>>) for keeping counts of the numbers (e.g. as count-number pairs) in order. The add method removes the current number from the second BST, increases the count in the first one, and inserts the number back into the second one. remove does the same thing, it only decreases the count. And get_answer just looks at second tree and returns the best value in
Sorting all queries will take
How about the other operations? How many times will the add and remove be called?
Let's say the block size is
If we only look at all queries having the left index in the same block, the queries are sorted by the right index. Therefore we will call add(cur_r) and remove(cur_r) only
The value of cur_l can change by at most
For
- Block size of precisely
$\sqrt{N}$ doesn't always offer the best runtime. For example, if$\sqrt{N}=750$ then it may happen that block size of$700$ or$800$ may run better. More importantly, don't compute the block size at runtime - make it const. Division by constants is well optimized by compilers. - In odd blocks sort the right index in ascending order and in even blocks sort it in descending order. This will minimize the movement of right pointer, as the normal sorting will move the right pointer from the end back to the beginning at the start of every block. With the improved version this resetting is no more necessary.
You can read about even faster sorting approach here.