# Geometric and Topological Methods in Machine Learning: Exercises 1

### Ex 1: Draw the Voronoi diagram and the delaunay triangulation of the 10 points of Fig.1

![voronoi](images/ex1.png)

### Ex 2: (Computing the median) Propose an algorithm to compute the median of a set of n real numbers, an analyse its complexity.

Tony Hoare Algorithm or [Quickselect](https://rcoh.me/posts/linear-time-median-finding/) (algorithm to find the k-th smallest element in an unordered list):

- Select an index of a **n**-element list to compute at random. The index is called **pivot**
- Split the list into 2 groups, lesser and greater than the pivot
- Recurse on the list with a number of elements equal or above the index **k** of the median
    - k is updated to **k-len(group with values \le k)** if **group with values \gt** chosen

In [30]:
import random

def quickselect_median(l):
    if len(l) % 2 == 1:
        return quickselect(l, len(l) // 2)
    else:
        return 0.5 * (quickselect(l, len(l) / 2 - 1) + \
                      quickselect(l, len(l) / 2))


def quickselect(l, k):
    """
    Select the kth element in l (0 based)
    """
    if len(l) == 1:
        assert k == 0
        return l[0]
    
    pivot = random.choice(l)
    print(pivot)

    lows = [el for el in l if el < pivot]
    highs = [el for el in l if el > pivot]
    pivots = [el for el in l if el == pivot]

    if k < len(lows):
        return quickselect(lows, k)
    elif k < len(lows) + len(pivots):
        # We got lucky and guessed the median
        return pivots[0]
    else:
        return quickselect(highs, k - len(lows) - len(pivots))

In [31]:
l = np.random.randint(0, 10, size=(11,))
l

array([5, 2, 5, 9, 7, 6, 8, 5, 4, 8, 6])

In [32]:
quickselect_median(l)

2
8
6


6

<u>Breakdown:</u>

**step 1**
> [3, 7, 3, 7, 8, 2, 7, 8, 7, 3, 5]
>
> 11 elements -> we want the 6th element in the sorted array

**step 2**
> random pick: element 2 at index 1
>
> Split between: 
>
> \le = [], pivot = [2], \gt = [5, 5, 9, 7, 6, 8, 5, 4, 8, 6]

**step 3**
> fewer than 6 elements in \le split, we select \gt
>
> k is updated to 5 (6 - len(\le))

**step 4**
> random pick: element 8 at index 6 (of \gt)
>
> Split between:
>
> \le = [5, 5, 7, 6, 5, 4, 6], pivot = [8, 8], \gt = [9]

**step 5**
> fewer than 5 elements in \gt split, we select \le

**step 6**
> random pick: element 6 at index 3 (of \le)
>
> Split between:
>
> \le = [5, 5, 5, 4], pivot [6, 6], \gt = [7]

**step 7**
> less than 5 elements in \le and \gt, therefore returns the pivot

<u>Proof of **O(n)**:</u>

The pivot index selection is uniform so $\mathbb{E}[pivot_{index}] = \frac{1}{len(list)}$, i.e., the split leads to approximately equally sized greater and lesser (than the pivot) than the pivot.

Therefore the recursion on average operates on subsequently halved lists such that:

> $Complexity = \underset{i=0}{\overset{+\infty}{\sum}}\frac{n}{2^i} \Rightarrow 2n$

We have a linear complexity:

> $O(n)$

### Ex 3: (Building a kd-tree) Using the previous question, write the recurrence relationship for the construction of a standard kd-tree in dimension d. (Hint: adapt the reccurence seen for Quicksort.)

### Ex 4: (kd-trees with cuts maximizing the variance) Consider the idea of replacing coordinate axis used in standard kd-trees by directions which maximize the variance of the projected points.
- **Explain how to do this using Principal Components Analysis (assuming you know PCA)**
- **Explain the incidence of this modification on the complexity of the tree construction**

### Ex 5: (Searching metric trees) Using the triangle inequality, prove the correctness of the method presented to search the exact nearest neighbor in a metric tree.