# Semantics and Pragmatics, KIK-LG103

## Lab session 4, Part 3

---

In the lecture on distributional semantics, we learned about the **cosine similarity** measure. The equation for cosine similarity is the following: 
$$
\cos(v,w)
= \frac{ v \cdot w }{ \lvert v \rvert \lvert w \rvert }
= \frac{ \sum_{i=1}^{N} v_i w_i }{ \sqrt{\sum_{i=1}^{N} v_i^2} \sqrt{\sum_{i=1}^{N} w_i^2} }
$$

In this Part we will implement a function that calculates the cosine similarity between two vectors using NumPy, the Python library we worked with last time. If high school math isn't fresh on your mind, the equation might look scary. Don't worry, however; this part will walk you through it step-by-step.

Let's first import numpy and then get right into it.

----

In [None]:
import numpy as np

### Section 3.1: Dot product
---
Let's start by investigating the numerator in the equation:

$v \cdot w = \sum_{i=1}^{N} v_i w_i$

This is called a **dot product**. In plain English the equation says: Multiply the vectors $v$ and $w$ element-wise and sum the elements of the resulting vector. Let's calculate a dot product in two steps for with some actual values.

---

**Vectors:** 

$i = \begin{bmatrix}2\\1\\3\end{bmatrix}$, $j = \begin{bmatrix}3\\-1\\0\end{bmatrix}$ 

**Element-wise multiplication:**

$i \circ j 
= \begin{bmatrix}2\\1\\3\end{bmatrix} \circ \begin{bmatrix}3\\-1\\0\end{bmatrix} 
= \begin{bmatrix}2\times3\\1\times(-1)\\3\times0\end{bmatrix}
= \begin{bmatrix}6\\-1\\0\end{bmatrix}$

**Summing the values in the resulting vector:**

$6 + (-1) + 0 = 5$

---

**Exercise 3.1.1** Implement the function `dot` in the code cell below. Recall from last session that element-wise multiplication of two vectors is done using the operator `*`; the same operator we have used for normal multiplication. 

For computing the sums of the vector elements, NumPy offers a convenient function `np.sum()`:

    >>> v = np.array([1, 2, 3])
    >>> np.sum(v)
    6

In three steps:

1. Element-wise multiply the vectors v and w. You can assign the result to a variable called `mul_result`, for example.
2. Sum the elements of the resulting vector. Do this using `np.sum`. For example: `np.sum(mul_result)`.
3. Return the sum.

---

In [None]:
def dot(v, w):
    """Compute the dot product of the vectors v and w."""
    return -1.0


# FOR TESTING

v = np.array([2, 1, 3])
w = np.array([3, -1, 0])
print("Result:", dot(v, w)) # This should be 5

### Section 3.2: The length of a vector
----

Now that we have the dot product out of the way, we can move on to the denumerator in the original equation.

$\lvert v \rvert \lvert w \rvert = \sqrt{\sum_{i=1}^{N} v_i^2} \sqrt{\sum_{i=1}^{N} w_i^2}$

$\lvert v \rvert$ is the length of the vector $v$, so in the equation above we multiply the lengths of the two vectors. Let's see how we can compute the length (also called **L2-norm**) of a single vector:

$\lvert v \rvert = \sqrt{\sum_{i=1}^{N} v_i^2}$

Again in plain English the equation says: square each element in the vector $v$, sum the elements of the resulting vector, and take the square root of the sum. Let's once again illustrate this with some actual values.

---

**Vector:** 

$i = \begin{bmatrix}2\\1\\3\end{bmatrix}$

**Vector squared:**

$\begin{bmatrix}2^2\\1^2\\3^2\end{bmatrix}
= \begin{bmatrix}4\\1\\9\end{bmatrix}$

**Summing:**

$4 + 1 + 9 = 14$

**Square root:**

$\sqrt{14} = 3.741...$

---

---

**Exercise 1.2.1** Implement the function `l2norm` in the code cell below.

You can square each element of a vector using the operator `**`:

    >>> v = np.array([1, 2, 3])
    >>> v**2
    array([1, 4, 9])
    
Summing is done as in exercise 1.1.1. For calculating the square root, you can use the NumPy function `np.sqrt()`:

    >>> np.sqrt(16)
    4.0

In four steps:

1. Square each element in the vector v.
2. Sum the elements of the resulting vector. Do this using `np.sum().
3. Compute the square root of the sum using `np.sqrt()`.
4. Return the resulting value.

*Note: Feel free to use the function `dot` we implemented above if you can see the connection here. If you can't, no worries. Just follow the steps above.*

---

In [None]:
def l2norm(v):
    """Calculate the length (l2-norm) of the vector v."""
    return 0


# FOR TESTING

a = np.array([2, 1, 3])
print(l2norm(a)) # This should be 3.74165...

### Section 3.3: Putting it all together
---
We now have all the necessary building blocks for computing the cosine similarity.

$
\cos(\vec{v},\vec{w})
= \frac{ v \cdot w }{ \lvert v \rvert \lvert w \rvert }
$

We can calculate the dot product $v \cdot w$ with our function `dot` and the length of a vector $\lvert v \rvert$ with our function `l2norm`. Implementing a function for cosine similarity is now straightforward.

---

**Exercise 1.3.1** Implement the function `cosine_similarity` below.

---

In [None]:
def cosine_similarity(v, w):
    """Compute the cosine similarity between the vectors v and w."""
    return 0


# FOR TESTING

v = np.array([1, 2, 3])
w = np.array([1, 2, 3])

print(cosine_similarity(v, w)) # This should be 1.0

v = np.array([1, 2, 3])
w = np.array([-1, -2, -3])

print(cosine_similarity(v, w)) # This should be -1.0

When you have completed all class assignments, you can continue with the home assignment.