# Term vector similarity

In this exercise you'll need to complete the code for computing the similarity between two documents that are represented by their term vectors.

In [4]:
import ipytest
import math
import pytest

ipytest.autoconfig()

## Jaccard similarity

This metric is a set similarity; that is, it only captures the presence and absence of terms with no regard to their frequency. Put simply, it captures the ratio of shared terms and total terms in the two documents.

$$sim_{Jaccard} = \frac{|X \cap Y|}{|X \cup Y|}$$

where $X$ and $Y$ denote the set of terms in documents $x$ and $y$, respectively.

In [1]:
def jaccard(x, y):
    """Computes the Jaccard similarity between two term vectors."""
    # TODO
    return -1

Tests.

In [3]:
%%run_pytest[clean]

def test_no_common_terms():
    x = [0, 0, 0, 1, 2, 1]
    y = [1, 5, 3, 0, 0, 0]
    assert jaccard(x, y) == 0

def test_only_common_terms():
    x = [0, 1, 2, 1, 0, 1]
    y = [0, 5, 3, 7, 0, 1]
    assert jaccard(x, y) == 1

def test_some_common_terms():
    x = [0, 1, 1, 0, 1, 1]
    y = [5, 0, 3, 0, 7, 0]
    assert jaccard(x, y) == 0.4

...                                                                                [100%]
3 passed in 0.01s


## Cosine similarity

$$sim_{cos}(x,y) = \frac{\mathbf{x} \cdot \mathbf{y}}{||\mathbf{x}||~||\mathbf{y}||} = \frac{\sum_{i=1}^n x_i y_i}{\sqrt{\sum_{i=1}^n x_i^2} \sqrt{\sum_{i=1}^n y_i^2}}$$

In [3]:
def cosine(x, y):
    """Computes the Cosine similarity between two term vectors."""
    # TODO
    return -1

Tests.

In [5]:
%%run_pytest[clean]

def test_no_common_terms():
    x = [0, 0, 0, 1, 2, 1]
    y = [1, 5, 3, 0, 0, 0]
    assert cosine(x, y) == 0

def test_identical_docs():
    x = [0, 0, 0, 1, 2, 1]
    assert cosine(x, x) == pytest.approx(1, rel=1e-3)

def test_short_docs():
    x = [4, 2]
    y = [1, 3]
    assert cosine(x, y) == pytest.approx(math.sqrt(2) / 2, rel=1e-3)

...                                                                                [100%]
3 passed in 0.01s


## Feedback

Please give (anonymous) feedback on this exercise by filling out [this form](https://forms.gle/2jPayczbFhEcC9K68).