# How cool is cucumber? Experiments with word embedding

We all know old, good and hackneyed examples, that are typically used to intuitively explain, what the **word embedding** technique is. We almost always come across a chart presenting a simplified, 2-dimensional vector representation of words **queen** and **king**, which are distant from each other in the similar length as words **woman** and **man**. 

One of the most convenient ways to get **embedding vectors** for natural language is to use pre-trained models distributed with [**spacy**](https://spacy.io/) library.

In [22]:
import numpy
import numpy.typing as npt
import spacy
# Installing en_core_web_md

In [4]:
nlp = spacy.load('en_core_web_md')

In [6]:
fire = nlp('fire')
ice = nlp('ice')

In [11]:
len(fire.vector)

300

A couple of simple function calls, but there is a lot work done behind the scene. In fact, we can use **nlp** object to process whole sentences (or documents) at once. For now, we only need to process single words.

## Adding a new axis

The experiment I'd like to conduct is to "draw" a straight line in n-dimensional space and treat it as a new axis. It's possible if we project the considered points on this line.

## Line in n-dimensional space

First of all, we find an equation of a line generalized to n-dimensional space. Such straight line can be unambiguously determined using a point lying on this line and so-called **direction vector**.

<center>
$\vec{d} = (l, m, n, ...)$
</center>

Given the points $A = (x, y, z)$ and $B = (x_{1}, y_{1}, z_{1})$, it's simply computed as a elementwise difference between these two points, namely:

<center>
$ \vec{d} = A - B = (x, y, z, ...) - (x_{1}, y_{1}, z_{1}, ...) $
</center>

In our case, it can be defined as follows:

In [16]:
direction = fire.vector - ice.vector

Complete equation:

<center>
$ \frac{x - x_{1}}{l}  = \frac{y - y_{1}}{m} = \frac{z - z_{1}}{n} =   \ ...$
</center>



We also will use a **midpoint** between two initial points as beginining of our new axis.
It can be calculated with the following formula:

<center>
$  M = (\frac{x + x_{1}}{2}, \frac{y + y_{1}}{2}, \frac{z + z_{1}}{2}, ...) $
</center>

Writing that as a function:

In [32]:
def midpoint(x: npt.NDArray, y: npt.NDArray) -> npt.NDArray:
    if (len(x) != len(y)):
        raise ValueError(
            'Vectors come from different spaces! ' + 
            'x: {} dimensions, y: {} dimensions'.format(len(x), len(y)))
    return (x + y) / 2

In [36]:
# midpoint(np.array([2, 3]), np.array([-1, 20]))
# midpoint(np.array([2, 3]), np.array([-1, 20, -45]))

In [37]:
mid = midpoint(fire.vector, ice.vector)

Two values that we need for the moment:
* ```mid```
* ```direction```

## Distance from a new point to the beggining of the axis

Next, we compute distance from a point to the begining of the aforementioned new axis.

In [None]:
# https://math.stackexchange.com/questions/1905533/find-perpendicular-distance-from-point-to-line-in-3d
# https://onlinemschool.com/math/library/analytic_geometry/p_line/

In [38]:
cucumber = nlp('cucumber')

<center>
$\overline{MC} = M - C $
<\center>

In [41]:
mc_dist = mid - cucumber.vector

## Projection

In [49]:
projection_dist = mc_dist @ direction

In [50]:
projection_dist

6.8615074

Rewriting all as a function

In [53]:
class axis:
    
    def __init__(self, x, y):
        self.x = x 
        self.y = y
        self.mid = midpoint(x, y)
        self.direction = x - y
        
        self.pole_x = self._compute(x)
        self.pole_y = self._compute(y)
        
    def __call__(self, vec):
        return self._compute(vec)
    
    def _compute(self, vec):
        mc_dist = mid - vec
        return mc_dist @ self.direction

In [69]:
fire_ice_axis = axis(fire.vector, ice.vector)

In [70]:
# Ice
ice_fire_axis.pole_x

# Fire
ice_fire_axis.pole_y

33.697186

In [79]:
fire_ice_axis(nlp('icecream').vector)

12.972902

In [None]:
# 