# 3.9-Pattern matching

Here is the entry for the twenty-second algorithm, a fundamental task in computer science with two distinct and fascinating quantum solutions.

***

### 22. Pattern Matching

Pattern matching is the ubiquitous problem of finding a specific sequence (a pattern) within a larger body of data (a text). This is the core operation of a "find" command in a text editor, a search for a specific gene in a DNA sequence, or a query in a large database. Quantum computers offer two different types of speedups for this problem: a quadratic speedup for the general worst-case scenario and a more powerful superpolynomial speedup for average-case scenarios.

* **Complexity**: **Varies** (Polynomial to Superpolynomial Speedup)
    * **Worst-Case**: A quantum algorithm based on Grover's search can find a pattern of length $m$ in a text of length $n$ in **$O(\sqrt{n}\sqrt{m})$** queries. This offers a quadratic speedup over the naive classical search ($O(nm)$).
    * **Average-Case**: A more advanced quantum algorithm can solve the problem in **$\tilde{O}(2^{O(\sqrt{m})})$** time for random strings, provided the pattern length $m$ is sufficiently large. This provides a **superpolynomial** speedup over the best classical algorithms in this regime [215].

* **Implementation Libraries**: These are theoretical algorithms demonstrating quantum query complexity. They are **not implemented in standard quantum libraries**.

***

### **Detailed Theory üß†**

The two quantum speedups come from two completely different algorithmic approaches.

**Part 1: The Worst-Case Speedup via Grover's Search**

This approach provides a robust, guaranteed quadratic speedup for any pair of strings.

1.  **Framing as a Search**: The problem can be viewed as a search over all possible starting positions of the pattern $P$ in the text $T$. There are $N = n-m+1$ possible starting indices. We are searching for a "winning" index $i$ where the substring of $T$ starting at $i$ matches $P$.
2.  **The Oracle**: We can construct a quantum oracle that takes an index $i$ as input and checks if the substring $T[i \dots i+m-1]$ is equal to the pattern $P$. If it is, the oracle marks the state as a winner.
3.  **Applying Grover's Algorithm**: We can directly apply **Grover's search algorithm (Algorithm #14)** to the space of all possible starting indices. Since there are approximately $n$ possible starting positions, Grover's algorithm can find the correct one in about $O(\sqrt{n})$ queries.
4.  **The Full Complexity**: Each query to the oracle involves comparing two strings of length $m$. Taking this cost into account, the total complexity of the algorithm becomes $O(\sqrt{n}\sqrt{m})$. This is a solid quadratic speedup over the naive $O(nm)$ classical approach.



**Part 2: The Average-Case Superpolynomial Speedup via Hidden Shift**

This second, more advanced algorithm achieves a much more dramatic speedup, but only works for "average" or "random" strings, not carefully constructed worst-case inputs.

1.  **A New Perspective**: Instead of treating the problem as a simple search, this algorithm cleverly **reduces it to the Hidden Shift Problem (Algorithm #20)**. This is a highly non-trivial step that connects this practical problem to an abstract algebraic one.
2.  **The Reduction**: At a high level, the algorithm constructs two functions, one based on the pattern $P$ and another based on the text $T$. If the pattern $P$ appears in the text $T$ starting at a secret location $s$, then the function for the text will look like a "shifted" and "noisy" version of the function for the pattern. The problem is now to find this hidden shift $s$.
3.  **Solving a Noisy, Multi-Dimensional Hidden Shift**: The hidden shift problem that arises from this reduction is not a simple, clean one. It has noise (because other parts of the text won't match the pattern) and is multi-dimensional.
4.  **Generalizing Kuperberg's Sieve**: The algorithm to solve this is a powerful generalization of **Kuperberg's sieve algorithm** (which was originally developed for the Dihedral Hidden Subgroup Problem). The algorithm is adapted to handle the noise and dimensionality inherent in the pattern matching reduction. Kuperberg's sieve works in sub-exponential time, $2^{O(\sqrt{\text{size}})}$.
5.  **The Speedup**: In this context, the "size" of the problem is related to the pattern length $m$. This gives a quantum runtime of $\tilde{O}(2^{O(\sqrt{m})})$. For a classical algorithm, searching a random string still takes time polynomial in $n$ and $m$. When the pattern length $m$ is large (e.g., $m$ grows faster than $(\log n)^2$), the quantum algorithm's sub-exponential scaling in $m$ becomes dramatically faster than the classical algorithm's polynomial scaling in $n$.

---

### **Significance and Use Cases üèõÔ∏è**

* **Bioinformatics and Data Mining**: Pattern matching is the heart of computational biology (e.g., finding gene sequences in a genome) and large-scale text analysis. The average-case algorithm is particularly tantalizing for these fields, as genomic data and large text corpora often have random-like statistical properties, suggesting that quantum computers could one day offer a major advantage in analyzing these massive datasets.

* **Sophisticated Problem Reduction**: The average-case algorithm is a triumph of theoretical computer science. It shows how a messy, practical problem like pattern matching can be transformed into a highly structured, abstract quantum problem like the Hidden Shift problem. This ability to find and exploit hidden algebraic structure is a recurring theme in many powerful quantum algorithms.

* **Expanding the Quantum Toolkit**: This work is significant for extending the applicability of Kuperberg's sieve algorithm far beyond its original algebraic context. It demonstrates that advanced quantum techniques can be made robust enough to handle the "noise" and complexity of real-world application domains.

---

### **References**

* [217] Ramesh, H., & Vinay, V. (2001). *String matching in $\tilde{O}(\sqrt{n} + \sqrt{m})$ quantum time*. In Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms.
* [215] Montanaro, A. (2014). *Quantum pattern matching fast on average*. Algorithmica, 70(4), 577-598.
* [66] Kuperberg, G. (2005). *A subexponential-time quantum algorithm for the dihedral hidden subgroup problem*. SIAM Journal on Computing, 35(1), 170-188.