# Homework 5

## Problem 3

### Solution
We show that it is possible to shatter $N = 15$ points with the given hypothesis set (15 parameters and polynomials up to fourth order).

### Step 1:

- We first generate a list of all $2^N = 2^{15}$ dichotomies. This is done by considering the numbers $0, 1, ..., 2^{15} - 1$, and converting each of the numbers to their binary representation, i.e we first generate the strings

```
00000 00000 00000    <---- 0
00000 00000 00001    <---- 1
00000 00000 00010    <---- 2
00000 00000 00011    <---- 3
...
11111 11111 11111    <---- 2^15 - 1
```

- These strings are then used to create the $2^{15}$ dichotomies by replacing the character '0' by the integer -1, and the character '1' is replaced by the integer +1. We then get lists of the form:

```
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1]
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1 -1]
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1  1]
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1 -1 -1]
...
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
```

In [1]:
# STEP 1

import numpy as np
import matplotlib.pyplot as plt 

def get_dichotomies(N):

    # each number from 0 to 2^N-1 is converted into its binary representation
    # with length 15 (we pad zeros such that the string has length 15)
    binaries = []
    for i in range(2**N):
        b = bin(i)[2:].zfill(N)
        binaries.append([int(x) for x in b])


    # Use the binary strings to generate 2^15 dichotomies (classifications)
    dichotomies = []
    for binary_list in binaries:
        L = []
        for x in binary_list:
            if x == 0:
                L.append(-1)
            else:
                L.append(1)
        dichotomies.append(np.array(L))

        
    return dichotomies

### Step 2:

- We generate 15 randomly distributed points $(x_1, x_2)$ in the box $[-1,1] \times [-1,1]$. 

- We create the feature matrix $\mathbf{Z}$ (of size 15 x 15), where each row has the form $\mathbf{z} = (1,   x_1,       x_2,     x_1^2,   x_1 x_2, 
  x^2_2, x_1^3, x^2_1 x_2, x_1 x_2^2,     x_2^3,
  x_1^4, x_1^3 x_2, x_1^2 x_2^2, x_1 x_2^3, x_2^4)
  $
- For each dichotomy $\mathbf{y_{class}}$ (which is a vector of length 15 with $+1$ and $-1$ entries) we compute a weight vector $\mathbf{\tilde{w}}$ via linear regression: $\mathbf{\tilde{w}} = (((\mathbf{Z}^T \mathbf{Z})^{-1} \mathbf{Z}) \mathbf{y_{class}})$
- We then check if that weight vector creates the correct dichotomy via the comparison $\mathbf{y_{class}} == \text{sign}(\mathbf{Z}\mathbf{\tilde{w}})$ .
- If we managed to generate all $2^N = 2^{15}$ distinct dichotomies, then we have successfully shattered $N = 15$ points. 

In [2]:
# STEP 2


N = 15
dichotomies = get_dichotomies(N)


def shatter_N_randomly_distributed_points(N, dichotomies):

    # Create 15 random points in the box [-1,1] x [-1,1]
    x1 = np.random.uniform(-1, 1, N)
    x2 = np.random.uniform(-1, 1, N)

    # feature matrix Z
    Z = np.array([np.ones(N), x1, x2,
                     x1**2, x1*x2, x2**2,
                     x1**3, x1**2 * x2, x1* x2**2, x2**3,
                      x1**4, x1**3 * x2, x1**2 * x2**2, x1 * x2**3, x2**4]).T

    # see lecture 3, slide 17
    Z_dagger = np.dot(np.linalg.inv(np.dot(Z.T, Z)), Z.T)


    # -----------------------------------------------------

    count_correct_classifications = 0

    # go through all 2^N dichotomies
    for y_class in dichotomies:    

        # Use linear regression to get weight vector
        w_tilde = np.dot(Z_dagger, y_class)

        # check if weight vector generates correct dichotomy
        if sum(y_class != np.sign(np.dot(Z, w_tilde)) ) == 0:
            count_correct_classifications += 1


    return count_correct_classifications    


#------------------------

num_generated_dichotomies = shatter_N_randomly_distributed_points(N, dichotomies)

print("number of correct classifications:", num_generated_dichotomies)

if num_generated_dichotomies == 2**N:
    print("We shattered", N, "points.")
else:
    print("N =", N, "is either a break point or the points were not positioned optimally.")
    


number of correct classifications: 32768
We shattered 15 points.


The program shows that we can shatter 15 points.

### Step 3:

To gain confidence in the correctness of our program we try to shatter 15 points that are all on a line at $y = 0.7$. We suspect that it is then not possible to shatter these collinear points (see also [lecture 5](https://youtu.be/SEYAnnLazMU?t=33m1s)).

In [3]:
# STEP 3

# Check if we can shatter 15 points that are all on a line at y = 0.7
N = 15
dichotomies = get_dichotomies(N)


def shatter_N_points_on_a_horizontal_line(N, dichotomies):

    x1 = np.random.uniform(-1, 1, N)
    x2 = np.array([0.7]*N)


    Z = np.array([np.ones(N), x1, x2,
                     x1**2, x1*x2, x2**2,
                     x1**3, x1**2 * x2, x1* x2**2, x2**3,
                      x1**4, x1**3 * x2, x1**2 * x2**2, x1 * x2**3, x2**4]).T


    Z_dagger = np.dot(np.linalg.inv(np.dot(Z.T, Z)), Z.T)


    # -----------------------------------------------------

    count_correct_classifications = 0

    # go through all 2^N dichotomies
    for y_class in dichotomies:    

        # Use linear regression to get weight vector
        w_tilde = np.dot(Z_dagger, y_class)

        # check if weight vector generates correct dichotomy
        if sum(y_class != np.sign(np.dot(Z, w_tilde)) ) == 0:
            count_correct_classifications += 1


    return count_correct_classifications
        

# -----------------------

num_generated_dichotomies = shatter_N_points_on_a_horizontal_line(N, dichotomies) 

print("number of correct classifications:", num_generated_dichotomies)

if num_generated_dichotomies == 2**N:
    print("We shattered", N, "points.")
else:
    print("N =", N, "is either a break point or the points were not positioned well.")
    

number of correct classifications: 10
N = 15 is either a break point or the points were not positioned well.


And indeed, we often cannot generate $2^{15}$ dichotomies with this collinear configuration of 15 points. In fact, for this collinear configuration the number of generated dichotomies appears to be much smaller than $2^{15}$.


Notes:

- Do not put all the points on the x-axis, because this means that all $x_2$ equal zero, and you get a column of zeros in the feature matrix which leads to a singular matrix when calculating the weight vector via linear regression. That's why we chose them on the line at $y = 0.7$.

- Do not enter a value for N that is less than 15. It leads to a feature matrix with less rows than columns which means you have more degrees of freedom than data points.

### Step 4:

Let's check if we can find a weight vector for 15 points on a line if we choose the simple dichotomy $[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]$ . 


This should be possible, right? For example, we can simply put a horizontal line at $y = 5$ as a separating boundary.

Let's see what linear regression finds.

In [4]:
# STEP 4

# We only consider a single dichotomy and try to find out if linear regression
# will find a weight vector that generates this dichotomy
dichotomies = [np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])]
N = 15

# We repeat the experiment 10 times
for i in range(10):    
    num_generated_dichotomies = shatter_N_points_on_a_horizontal_line(N, dichotomies)
    print("number of generated dichotomies:", num_generated_dichotomies)

number of generated dichotomies: 1
number of generated dichotomies: 1
number of generated dichotomies: 0
number of generated dichotomies: 0
number of generated dichotomies: 0
number of generated dichotomies: 0
number of generated dichotomies: 0
number of generated dichotomies: 0
number of generated dichotomies: 0
number of generated dichotomies: 0


**Result:**

Linear regression often does not return a weight vector $\mathbf{\tilde{w}}$ that generates the "simple" dichotomy $[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]$ . It seems that it is often not able to find the "correct" weight vector, e.g. one that is equivalent to a horizontal line as separation boundary.

One reason may be that linear regression does not try to find the correct classification but instead tries to minimize the squared errors.

### Step 5:

And finally, let's try to shatter $N = 16$ randomly distributed points which should not be possible.

In [5]:
N = 16
dichotomies = get_dichotomies(N)

num_generated_dichotomies = shatter_N_randomly_distributed_points(N, dichotomies) 


print("number of correct classifications:", num_generated_dichotomies)

if num_generated_dichotomies == 2**N:
    print("We shattered", N, "points.")
else:
    print("N =", N, "is either a break point or the points were not positioned well.")


number of correct classifications: 45110
N = 16 is either a break point or the points were not positioned well.


# Conclusion

- For 15 randomly distributed points in the box [-1,1] x [-1,1] we managed to generate $2^{15}$ dichotomies, and thus shatter $N = 15$ points.
- If we put the 15 points on a line at $y=0.7$, then linear regression cannot find all $2^{15}$ dichotomies.
- It's interesting to see that for the collinear points linear regression often cannot find a weight vector for the simple dichotomy with all points classified as +1. After all linear regression doesn't try to find the correct classification but tries to minimize the squared errors.