# Intro

This page introduces to the algorithms and data structures section.

## Prefix sum

Prefix sum for sequence $x_1, x_2, \ldots, x_n$ is a sequence $y_1, y_2, \ldots, y_{n+1}$.

Where: $y_i = \sum^{i-1}_{j=1} x_j, i = \overline{1, n+1}$

From this the statement $y_i = y_{i-1} + x_{i-1}$ will be correct.

The following cell shows the prefix sum idea in table format.

| $i$      | $1$      | $2$                         | $3$                                     | $4$                                       | $\ldots$                                   | $n$                                        | $n+1$                                     |
|----------|----------|-----------------------------|------------------------------------------|--------------------------------------------|---------------------------------------------|---------------------------------------------|---------------------------------------------|
| $x_i$    | $x_1$    | $x_2$                       | $x_3$                                    | $x_4$                                      | $\ldots$                                    | $x_n$                                      |                                             |
| $y_i$    | $0$      | $x_1$                       | $x_1 + x_2 = y_2 + x_2$                  | $x_1 + x_2 + x_3 = y_3 + x_3$              | $\ldots$                                    | $\sum^{n-1}_{j=1} x_j = y_{n-1} + x_{n-1}$ | $\sum^{n}_{j=1} x_j = y_n + x_n$          |


The prefix sum can be computed from the original sequence in $O(n)$ time.

---

The following cell illustrates the implementation of the counting prefix sum.

In [None]:
def count_prefix_sum(seq: list[float | int]):
    ans: list[float | int] = [0]
    for x in seq:
        ans.append(ans[-1] + x)
    return ans

count_prefix_sum([1, 2, 3, 4])

[0, 1, 3, 6, 10]

### Sum of subset

Calculate the sum of the elements of the sequence from the $a$-th to the $b$-th position using formula:

$$\sum_{i=a}^b x_i = y_{b + 1} - y_a$$

If you have the prefix sum precomputed, you can compute the sum of a subset with complexity of $O(1)$.

To **prove** this just describe the difference $y_{b+1} - y_a$:

$$y_{b+1} - y_a = \sum_{i=1}^{b} x_i - \sum_{i=1}^{a-1} x_i = \sum_{i=a}^b x_i$$


### Transformation

The prefix sum is a really useful data structure. To fulfill its potential, you must apply a transformation to the original set - $f(x)$.

For example, if the task requires counting the number of elements $x_i > k$ for $i=\overline{a,b}$, then cae apply the following transformation:

$$
f(x_i) = \begin{cases}
    1, x_i > k; \\
    0, x_i \leq k.
\end{cases}
$$

Count and use the prefix sum for the transformed sequence.

## Two pointers

In the two-pointers technique uses two indexes (pointers) that access the elements of an array during the same iteration.

**Example of task**:

There is a sorted array of numbers. You need to count all pairs of the elements, $(a,b)$, for which the statemene $a - b > D$ is true.

---

The following cell shows the approach to solve this using the two pointers technique.

In [10]:
def find_pair(lst: list[int], D: int):
    l, r = 0, 1
    count = 0
    lst_len = len(lst)
    while True:
        if r >= lst_len:
            break
        if (lst[r] - lst[l] > D) and (l < r):
            count += (lst_len - r)
            l += 1
        else:
            r += 1
    return count

The idea is to compare items under indices $l$ and $r$. 

If $x_r - x_l > D$, then count all pairs $(l,r), (l, r+1), \ldots (l, N)$, as garanteeed that $x_{r+t} - x_l > D$ due to $x_{r+t} > x_{r}$. then move $l$ pointer.

Otherwise just move $r$. 

It ensures $O(2N)$ compexity of the algorithm.

The following cell shows the result of the applying the algorithm.

In [None]:
lst = [1, 2, 3, 4, 5, 6]
find_pair(lst, 2)

3

## Binary search

Binary search is a fast way to find an item in a **sorted** list.

Here’s how it works:

1. Look at the middle element.
2. If it’s what you want, you’re done.
3. If it’s smaller than what you want, search the right half.
4. If it’s bigger, search the left half.
5. Repeat until you find the item or the list is empty.

Binary search is quick because it halves the list each step. It works only if the list is **already sorted**.

For a more detailed and strict description, check the [binary search](binary_search.ipynb) page.

---

The following code shows an implementation of binary search to find the index of a given element. At each step, it updates either the **left** (`l`) or **right** (`r`) boundary based on the value at `(l + r) // 2`. Each step brings the search closer to the result.

In [5]:
def find_index(lst: list[int], num: int):
    l, r = -1, len(lst)
    while l + 1 < r:
        p = (l + r) // 2
        if lst[p] == num:
            return p
        elif lst[p] < num:
            l = p
        else:
            r = p
    return None

The following cell shows that it works.

In [7]:
find_index([1, 3, 7, 12, 88, 104], 1)

0

## Event sort

The event sort approach is used for tasks involving events that occur at different times (or any other orderable entity). It typically assumes a system with multiple states, where events are used to transition between these states.

The following picture is a typical illustration of kind of tasks that are usually solved with an even sort:

<svg width=700 height=200 stroke="black" stroke-width="3" style="font-family: 'LatinModern'" font-style="italic" font-size=30>
  <line x1=0 y1=150 x2=600 y2=150 />
  <line x1=600 y1=150 x2=570 y2=160 />
  <line x1=600 y1=150 x2=570 y2=140 />
  <g stroke-width="2">
    <line x1=30 y1=130 x2=130 y2=130 />
    <line x1=160 y1=130 x2=300 y2=130 />
    <line x1=190 y1=110 x2=480 y2=110 />
    <line x1=320 y1=130 x2=440 y2=130 />
    <line x1=350 y1=110 x2=530 y2=110 />
    <g stroke-dasharray="4">
    <line x1=30 y1=130 x2=30 y2=150 />
      <line x1=130 y1=130 x2=130 y2=150 />
      <line x1=160 y1=130 x2=160 y2=150 />
      <line x1=300 y1=130 x2=300 y2=150 />
      <line x1=190 y1=110 x2=190 y2=150 />
      <line x1=480 y1=110 x2=480 y2=150 />
      <line x1=320 y1=130 x2=320 y2=150 />
      <line x1=440 y1=130 x2=440 y2=150 />
      <line x1=350 y1=110 x2=350 y2=150 />
      <line x1=530 y1=110 x2=530 y2=150 />
    </g>
  </g>
  <circle cx=30 cy=130 r=2 />
  <circle cx=130 cy=130 r=2 />
  <circle cx=160 cy=130 r=2 />
  <circle cx=300 cy=130 r=2 />
  <circle cx=190 cy=110 r=2 />
  <circle cx=480 cy=110 r=2 />
  <circle cx=320 cy=130 r=2 />
  <circle cx=440 cy=130 r=2 />
  <circle cx=350 cy=110 r=2 />
  <circle cx=530 cy=110 r=2 />
  <g stroke-width=0>
    <text x=580 y=180>t</text>
    <text x=25 y=175>t</text><text x=35 y=180 font-size=15>1</text>
    <text x=125 y=175>t'</text><text x=135 y=180 font-size=15>1</text>
    <text x=155 y=175>t</text><text x=165 y=180 font-size=15>2</text>
    <text x=295 y=175>t'</text><text x=305 y=180 font-size=15>2</text>
    <text x=185 y=175>t</text><text x=195 y=180 font-size=15>3</text>
    <text x=475 y=175>t'</text><text x=485 y=180 font-size=15>3</text>
    <text x=315 y=175>t</text><text x=325 y=180 font-size=15>4</text>
    <text x=435 y=175>t'</text><text x=445 y=180 font-size=15>4</text>
    <text x=345 y=175>t</text><text x=355 y=180 font-size=15>5</text>
    <text x=525 y=175>t'</text><text x=535 y=180 font-size=15>5</text>
  </g>
</svg>

The typical input is an array of $(t_i, t'_i)$ pairs, representing the start and end of some process. The values $t_i$ and $t'_i$ can have a variety of physical meanings — for example, the time a customer spends in a shop or on a website, distances between objects, and so on.

The main idea of the approach is to iterate over the $(t_i, t'_i)$ pairs and create a sequence of events $(\tau_j, \text{type}_j, \overline{x}_j)$, where:

* $\tau_j$ is the timestamp of the event.
* $\text{type}_j$ indicates the type of the event — typically one value for the start of a process and another for the end.
* $\overline{x}_j$ is a vector containing additional information about the event, which varies depending on the specific task.

The array of the vectors of the events $(\tau_j, \text{type}_j, \overline{x}_j)$ is sorted. Usually it supposes lower $\tau_j$ to go first.  Ordering by $\text{type}_j$ is also important, so you have to assign lower marks to the events that are supposed to be processed first.

The second cycle iterates though the sorted array of the events and performs some computations based on their contents.

---

As the example cosider programm that generates SVG illustration for this picture. As input it takes pairs: begining and ending of the line.

The issue here is that lines that overlaps ($t'_i > t_j$) must be at the different levels. However, when a line that occupied a certain level ends, the next line must take its place. So the approach is to each event that corresponds to the start of the line to remember the current position of the line and increase following.

In [None]:
from typing import Callable
template = """
<svg width=700 height=200 stroke="black" stroke-width="3" style="font-family: 'LatinModern'" font-style="italic" font-size=30>
  <line x1=0 y1=150 x2=600 y2=150 />
  <line x1=600 y1=150 x2=570 y2=160 />
  <line x1=600 y1=150 x2=570 y2=140 />
  <g stroke-width="2">
    {lines}
    <g stroke-dasharray="4">
    {projections}
    </g>
  </g>
  {circles}
  <g stroke-width=0>
    <text x=580 y=180>t</text>
    {coord_denotes}
  </g>
</svg>
"""

coordinates = [(30, 130), (160, 300), (190, 480), (320, 440), (350, 530)]

# Building an events sequence
START, END = 0, 1
events: list[tuple[int, int, int, int]] = []
for i, (start, end) in enumerate(coordinates):
    events.append((start, START, end, i))
    events.append((end, END, 0, i))

events.sort()

positions: dict[int, int] = {}
y_coord = 130

lines: list[str] = []
line_template = "<line x1={x1} y1={y} x2={x2} y2={y} />" 
circles: list[str] = []
circle_template = "<circle cx={x} cy={y} r=2 />"
projections: list[str] = []
projection_template = "<line x1={x} y1={y} x2={x} y2=150 />"
coord_denote: Callable[[int, int, str], str] = lambda x, i, text: (
    f"<text x={x} y=175>{text}</text>"
    f"<text x={x + 10} y=180 font-size=15>{i}</text>"
)
coord_denotes: list[str] = []

# Perform actions according to events
for x_coord, e_type, end, i in events:
    if e_type == START:
        lines.append(line_template.format(x1=x_coord, y=y_coord, x2=end))
        circles.append(circle_template.format(x=x_coord, y=y_coord))
        circles.append(circle_template.format(x=end, y=y_coord))
        projections.append(projection_template.format(x=x_coord, y=y_coord))
        projections.append(projection_template.format(x=end, y=y_coord))
        coord_denotes.append(coord_denote(x=x_coord - 5, i=(i + 1), text="t"))
        coord_denotes.append(coord_denote(x=end - 5, i=(i + 1), text="t'"))

        positions[i] = y_coord
        y_coord -= 20
    else:
        if y_coord < positions[i]:
          y_coord = positions[i]

print(template.format(
    lines="\n    ".join(lines),
    projections="\n      ".join(projections),
    circles="\n  ".join(circles),
    coord_denotes="\n    ".join(coord_denotes)
))


<svg width=700 height=200 stroke="black" stroke-width="3" style="font-family: 'LatinModern'" font-style="italic" font-size=30>
  <line x1=0 y1=150 x2=600 y2=150 />
  <line x1=600 y1=150 x2=570 y2=160 />
  <line x1=600 y1=150 x2=570 y2=140 />
  <g stroke-width="2">
    <line x1=30 y1=130 x2=130 y2=130 />
    <line x1=160 y1=130 x2=300 y2=130 />
    <line x1=190 y1=110 x2=480 y2=110 />
    <line x1=320 y1=130 x2=440 y2=130 />
    <line x1=350 y1=110 x2=530 y2=110 />
    <g stroke-dasharray="4">
    <line x1=30 y1=130 x2=30 y2=150 />
      <line x1=130 y1=130 x2=130 y2=150 />
      <line x1=160 y1=130 x2=160 y2=150 />
      <line x1=300 y1=130 x2=300 y2=150 />
      <line x1=190 y1=110 x2=190 y2=150 />
      <line x1=480 y1=110 x2=480 y2=150 />
      <line x1=320 y1=130 x2=320 y2=150 />
      <line x1=440 y1=130 x2=440 y2=150 />
      <line x1=350 y1=110 x2=350 y2=150 />
      <line x1=530 y1=110 x2=530 y2=150 />
    </g>
  </g>
  <circle cx=30 cy=130 r=2 />
  <circle cx=130 cy=130 r=2 /