## Name: Check co-linearity of N point in an N-dimensional space.
### Date: 01/11/2024
### Status: Done. Interesting findings all around
### Idea: 
Check if the points are co-linear by:

1. Taking the difference of two of them (generating the `slope` vector): $slope = X[0] - X[1]$.
2. Check that every other element is a multiple of this slope vector: $(slope - X[i])/slope = \hat{\lambda}$, where ($\hat{\lambda}$ is a vector with constant values $\lambda$)

Points of interest:
1. To account for offset of the data in any dimension we first need to center the data, across dimensions in 0 i.e. using StandardScaler().
2. This process is similar to calling `np.linalg.matrix_rank`, but by printing the values we can get a more qualitative look on the results in the presence of noise.

### Results:
Works!
Better to use the hand-crafted method in the presence of noise. Else, use rank.


## Case 1 : No offset (zero-centered), No noise


In [1]:
import numpy as np

# the base array in 10 dimensions
base_array = np.arange(1,10)

# The 20 scaled variants of this
scales = np.arange(-10,10)
scales = scales[scales != 0]

print("Base", base_array)
print("Scales", scales)

Base [1 2 3 4 5 6 7 8 9]
Scales [-10  -9  -8  -7  -6  -5  -4  -3  -2  -1   1   2   3   4   5   6   7   8
   9]


In [2]:
print('The whole space with the 20 points')
space = np.einsum('i,j->ij', scales, base_array)

print("The rank is:")
np.linalg.matrix_rank(space)

The whole space with the 20 points
The rank is:


1

In [31]:
def print_diffs(space, print_std=True):
    
    start_point = space[0] - space[1]
    to_return = []
    for other_point in space[2:]:
        scaled_diff = (start_point - other_point)/start_point
        if print_std:
            std = scaled_diff.std()
            to_return.append(std)
            print(std)
        else:
            print(scaled_diff)
            to_return.append(scaled_diff)
    return np.array(to_return)
                
print_diffs(space);

0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0


As expected the average std of each element towards the slope is 0.


## Case 1 : With offset, No noise

Let's offset the line on which the planes lay.

In [4]:
space_with_offset = space + 10
np.linalg.matrix_rank(space_with_offset)

2

In [32]:
print_diffs(space_with_offset);

2.6885326206904914
2.6885326206904914
2.6885326206904914
2.6885326206904914
2.6885326206904914
2.6885326206904914
2.6885326206904914
2.6885326206904914
2.6885326206904914
2.6885326206904914
2.6885326206904914
2.6885326206904914
2.6885326206904914
2.6885326206904914
2.6885326206904914
2.6885326206904914
2.6885326206904914


As we can see the matrix rank fails, and so does the std visual method (we expected 0 in everything).

However, the std version, is more robust as the std is constant across the vectors indicating that all of them have the same std over the original diff slope.

## Case 2 : With offset, and noise

In [34]:
eps = 10
noise = np.random.uniform(low=-eps, high=+eps, size=space.shape)
space_with_offset_and_noise = space_with_offset + noise 

In [35]:
np.linalg.matrix_rank(space_with_offset_and_noise)

9

In [36]:
print_diffs(space_with_offset_and_noise);

12.828156857526189
7.834757438911808
6.5121009952652456
3.5881983720222403
7.214284610980746
2.57907001209209
2.2153344533740884
1.9577785528471494
5.645144933006594
12.0913806257172
10.365411394945516
9.865728831634378
16.476541328769436
18.060358202349033
20.41647078925691
17.618416244471895
25.465458452893188


As we can see the matrix rank fails, and so does the std visual method.

However, when the noise is more managable/realistic the iterative method can be more helpful than the rank one.

In [49]:
eps = 0.001
noise = np.random.normal(loc=0, scale=eps, size=space.shape)
space_with_offset_and_noise = space_with_offset + noise 
print(f"Rank: {np.linalg.matrix_rank(space_with_offset_and_noise)}")
print_diffs(space_with_offset_and_noise);

Rank: 9
2.688318536165365
2.6878411189758427
2.6877857432418977
2.6874410898954757
2.687138044410977
2.68679818976248
2.6866864050259696
2.6859656080542473
2.6859909276638954
2.6855619609114747
2.6859144359666036
2.6853230165563216
2.6854586101255595
2.6845495206171104
2.6851336665163856
2.6847815891729585
2.684312538755308


We can see that although the std's are not exactly similar we can see that they are very close to its other!