# Intro

Consider computing the distance (non-similarity) between the two given vectors<br>
of length normalized from -1 to 1 in user preference applications.<br>
Distance of 0 between the two must mean that they are completely the same,<br>
larger distances mean less similarity.<br><br>

Classical methods such as Euclidean distance and Cosine similarity<br>
have their limitations and downsides.<br>
Calculating the dot product as a similarity mesaure is even farther from being ideal.<br><br>

Euclidean distance does not pay attention to the direction of vectors<br>
while cosine distance ignores the magnitude. Meanwhile dot product as a measure<br>
ignores the common sense.<br><br>

One possible solution to the problem of evaluating how similar two vectorised personalities are<br>
is AnsNorm Similarity method given below.<br><br>

Here you can find several cases when the classics aren't ideal<br>
and test ansnorm() function on the same given examples.<br><br>

FYI:<br>
cosine returns values from 0 to 2<br>
euclidean returns values from 0 to +inf<br>
dot product is any number from -inf to +inf<br><br>
As for <b>ansnorm</b>,<br>
for input vectors normalised from -1 to 1<br>
it outputs distance values from 0 to 1:<br>
0 for identical entities and 1 for the most dissimilar.


# Initiate

In [1]:
import numpy as np
from scipy.spatial.distance import cosine, euclidean

# Definitions

In [2]:
def get_cosine(vec_1, vec_2, epsilon=10**-6):
    """
    cosine() is so bad, that it's incapable
    of evaluating [0, 0] to [1, 1] distance
    due to zero division.
    Thus we manually bring it to live here
    """
    vec_1 = [i if i else epsilon for i in vec_1]
    vec_2 = [i if i else epsilon for i in vec_2]
    return cosine(vec_1, vec_2)

In [3]:
def ansnorm(vec_1, vec_2):
    """
    intended to work with -1 to 1 normalized vectors
    combines two approaches so that negatives of cosine similarity
    are smoothened as well as the negatives of euclidean alone
    """
    euc = euclidean(vec_1, vec_2)
    cos = get_cosine(vec_1, vec_2)
    # let's scale both distances to [0, 1]
    # since cosine() lies between 0 and 2, halve it    
    cos /= 2
    # now normalize the euclidean distance output
    # larges distances in our case are beteween
    # [-1, -1, ... , -1] and [1, 1, ... , 1]
    # and the value depends on n dimensions
    # so let's consider it
    ones = np.ones(len(vec_1))
    largest_euc = euclidean(-ones, ones)
    euc /= largest_euc    
    dist = euc/2 + euc*cos + cos/2
    # scale dist to [0, 1]
    dist /= 2
    # return euc, cos, dist  # an option for testing the function
    return dist

# Distance calculation showcases

## Dot product

### Dot product 1: Being not applicable for measuring similarity

In [4]:
np.dot([0,0,0], [0,0,0])  # read: they are the same. TRUE

0

In [5]:
np.dot([0,0,0], [1,1,1])  # read: they are the same. FALSE

0

In [6]:
np.dot([1,1,1], [1,1,1])  # read: they are different. FALSE

3

In [7]:
np.dot([-1,-1,-1], [1,1,1])  # read: they are super-close. FALSE

-3

#### Dot product 1 VS Right answers

In [8]:
print(f'{np.dot([0,0,0], [0,0,0])} vs {ansnorm([0,0,0], [0,0,0])}')

0 vs 0.0


In [9]:
print(f'{np.dot([0,0,0], [1,1,1])} vs {ansnorm([0,0,0], [1,1,1])}')

0 vs 0.125


In [10]:
print(f'{np.dot([1,1,1], [1,1,1])} vs {ansnorm([1,1,1], [1,1,1])}')

3 vs 0.0


In [11]:
print(f'{np.dot([-1,-1,-1], [1,1,1])} vs {ansnorm([-1,-1,-1], [1,1,1])}')

-3 vs 1.0


### Dot product 2: Farther is closer

In [12]:
np.dot([0.1,0.1,0.1], [-0.1,-0.1,-0.1]) > np.dot([1,1,1], [-1,-1,-1]) # read: farther is closer: False

True

#### Dot product 2 Right answer

In [13]:
ansnorm([0.1,0.1,0.1], [-0.1,-0.1,-0.1]) > ansnorm([1,1,1], [-1,-1,-1])

False

### Dot product 3: Same things aren't same. In different ways

In [14]:
np.dot([100], [100]) > np.dot([25], [25])  # two identical vectors are less similar if their the magnitude is larger. FALSE

True

In [15]:
np.dot([1], [1]) > np.dot([0.25], [0.25])  # two identical vectors are less similar if their the magnitude is larger. FALSE

True

#### Dot product 3 Right answers

In [16]:
ansnorm([100], [100]) > ansnorm([25], [25])  # be careful, ansnorm is not intended for use outside of (-1, 1)

False

In [17]:
ansnorm([1], [1]) > ansnorm([0.25], [0.25])

False

## Euclidean

### Euclidean 1: Being on the farthest corner is evaluated differently for different n-dims

The more dimensions are there - the larger the distance between the unit vectors pointing in opposite directions.<br>
It does not come handy when trying to make a conclusion on how similar vectorized personalities are.

In [60]:
euclidean([0], [1])

1.0

In [61]:
euclidean([0,0], [1,1])

1.4142135623730951

In [62]:
euclidean([0,0,0], [1,1,1])

1.7320508075688772

In [72]:
euclidean([0], [1])

1.0

In [73]:
euclidean([0, 0], [1, 1])

1.4142135623730951

#### Euclidean 1: Human-interpretable answers

Regardless of n dims, similarity between the vectors<br>
is evaluated with consistence.

In [65]:
ansnorm([0], [1])

0.125

In [66]:
ansnorm([0,0], [1,1])

0.125

In [67]:
ansnorm([0,0,0], [1,1,1])

0.125

In [70]:
ansnorm([0.5], [0.75])

0.03125

In [71]:
ansnorm([0.5, 0.5], [0.75, 0.75])

0.03125

### Euclidean 2: Opposite vectors considered to be similar

In [22]:
# Pairwise difference = 0.2
# See the opposing vectors:
print(f'opposing vectors: {euclidean([0.1,0.1,0.1], [-0.1,-0.1,-0.1]):.2f}')

opposing vectors: 0.35


In [23]:
# Pairwise difference = 0.2
# See the identically oriented vectors:
print(f'identically oriented vectors: {euclidean([0.1,0.1,0.1], [0.3,0.3,0.3]):.2f}')


identically oriented vectors: 0.35


read: the OPPOSING things are no different from identically inclined things. FALSE <br>
E.g. enjoying swimming more than you do is equivalent to hating swiming as much as you love it. FALSE

#### Euclidean 2: Right answers

In [24]:
# Pairwise difference = 0.2
# See the opposing vectors:
print(f'opposing vectors: {ansnorm([0.1,0.1,0.1], [-0.1,-0.1,-0.1]):.2f}')

opposing vectors: 0.33


In [25]:
# Pairwise difference = 0.2
# See the identically oriented vectors:
print(f'identically oriented vectors: {ansnorm([0.1,0.1,0.1], [0.3,0.3,0.3]):.2f}')

identically oriented vectors: 0.03


read: co-oriented things are less far apart than those opposing each other. TRUE<br>
compare against: dot product (opposite from being correct), euclidean (indifferent to the case)

#### Euclidean 2 vs Right answers

In [26]:
f'{euclidean([0.1,0.1,0.1], [-0.1,-0.1,-0.1]):.2f} vs {ansnorm([0.1,0.1,0.1], [-0.1,-0.1,-0.1]):.2f}'

'0.35 vs 0.33'

In [27]:
f'{euclidean([0.1,0.1,0.1], [0.3,0.3,0.3]):.2f} vs {ansnorm([0.1,0.1,0.1], [0.3,0.3,0.3]):.2f}'

'0.35 vs 0.03'

## Cosine

### Cosine 1: Zero division issue

In [None]:
cosine([0,0,0], [1,1,1])  # does not work properly by default

  dist = 1.0 - uv / np.sqrt(uu * vv)


0

#### Cosine 1: Solution

In [66]:
#replacement of zeros by an arbitrary small epsilon value:
get_cosine([0,0,0], [1,1,1], epsilon=10**-6)

0

### Cosine 2: Absence of a feature gets confused with the fullest extent of it

In [32]:
e = 10**-9  # approaches ZERO
cosine([e,e,e], [1,1,1])  # read: no difference. FALSE

0

In [40]:
# zero is no different from infinity:
cosine([10**-9,10**-9,10**-9], [10**9,10**9,10**9])  # read: no difference. FALSE

0

#### Cosine 2 VS Right answers

In [99]:
print(f'{cosine([e,e,e], [1,1,1]):.2f} vs {ansnorm([e,e,e], [1,1,1]):.2f}')

0.00 vs 0.12


In [101]:
print(f'{cosine([10**-9,10**-9,10**-9], [10**9,10**9,10**9]):.2f} vs {ansnorm([10**-9,10**-9,10**-9], [10**9,10**9,10**9]):.2f}')  # caution: ansnorm is intended for (-1, 1) space

0.00 vs 125000000.00


### Cosine 3: The slightest difference may be evaluated as maximum distance

In [41]:
# at the same time zero and zero are the most different things ever:
cosine([-e,-e,-e], [e,e,e])  # read: ZERO and ZERO are super-different. FALSE

2.0

#### Cosine 3 vs Right answer

In [103]:
print(f'{cosine([-e,-e,-e], [e,e,e]):.2f} vs {ansnorm([-e,-e,-e], [e,e,e]):.2f}')

2.00 vs 0.25


## Ans-Norm Similarity

### Same is same, different is different:

In [60]:
ansnorm([0,0,0], [0,0,0])  # read: vectors are the same. TRUE

0.0

In [61]:
ansnorm([1,1,1], [1,1,1])  # read: vectors are the same. TRUE
# compare against: dot product

0.0

In [None]:
ansnorm([-1,-1,-1], [1,1,1]) # read: vectors are the MOST different. TRUE
# compare against: dot product (False), eucledian (uninterpretable)

1.0

### Farther away things are farther away:

In [46]:
e = 10**-9  # approaches ZERO
ansnorm([-e,-e,-e], [e,e,e])  # read: ZERO and ZERO are NOT super-different. TRUE
# closer to being similar
# compare against: cosine

0.25000000075

In [None]:
ansnorm([-0.5,-0.5,-0.5], [0.5,0.5,0.5])
# read: opposite vec with half magnitude is closer to the half of being farthest apart. TRUE
# compare against: cosine

0.625

In [47]:
ansnorm([-1,-1,-1], [1,1,1])
# read: the opposite vec with half magnitude is closer to the half of being farthest apart. TRUE
# compare against: dot product, euclidean

1.0

### Magnitude counts:

In [91]:
ansnorm([0.1], [0.2])

0.0125

In [93]:
ansnorm([0.001], [0.002])

0.000125

In [95]:
# compare with cosine:
cosine([0.1], [0.2]) == cosine([0.001], [0.002])

True

### Direction counts:

In [80]:
# difference = 0.5, negative
ansnorm([0.25], [-0.25])

0.4375

In [81]:
# difference = 0.5, positive
ansnorm([0.25], [0.75])

0.0625

In [96]:
# compare with euclidean:
euclidean([0.25], [-0.25]) == euclidean([0.25], [0.75])

True

### Test yourself:

In [None]:
vector_1 = [...same n dims please...]
vector_2 = [...same n dims please...]

ansnorm(vector_1, vector_2)