# Rocchio feedback

In [1]:
import ipytest
import pytest

ipytest.autoconfig()

Vocabulary

In [2]:
VOC = ['news', 'about', 'presidental', 'campaign', 'food', 'text']

Query vector

In [3]:
Q = [1, 1, 1, 1, 0, 0]

Document-term matrix (each row corresponds to a document vector)

In [4]:
DT = [
    [1.5, 0.1, 0, 0, 0, 0],
    [1.5, 0.1, 0, 2, 2, 0],
    [1.5, 0, 3, 2, 0, 0],
    [1.5, 0, 4, 2, 0, 0], 
    [1.5, 0, 0, 6, 2, 0]
]

Feedback: IDs (indices) of positive and negative documents

In [5]:
D_POS = [2, 3]
D_NEG = [0, 1, 4]

## Rocchio feedback

Compute the updated query according to:
$$\vec{q}_m = \alpha \vec{q} + \frac{\beta}{|D^+|}\sum_{d \in D^+}\vec{d} - \frac{\gamma}{|D^-|}\sum_{d \in D^-}\vec{d}$$

where
  - $\vec{d}$ is the original query vector
  - $D^+, D^-$ are set of relevant and non-relevant feedback documents
  - $\alpha, \beta, \gamma$ are parameters that control the movement of the original vector

In [10]:
def get_updated_query(q, d_pos, d_neg, alpha, beta, gamma):
    q_m = [alpha * t for t in q]
    
    # positive feedback docs
    for idx in d_pos:
        for t in range(len(VOC)):
            q_m[t] += beta / len(d_pos) * DT[idx][t]
        
    # negative feedback docs
    for idx in d_neg:
        for t in range(len(VOC)):
            q_m[t] -= gamma / len(d_neg) * DT[idx][t]
        
    return q_m

Tests.

In [11]:
%%run_pytest[clean]

def test_no_expansion():
    q_m = get_updated_query(Q, D_POS, D_NEG, 1, 0, 0)
    assert q_m == Q

def test_expansion():
    q_m = get_updated_query(Q, D_POS, D_NEG, 0.6, 0.2, 0.2)
    assert q_m == pytest.approx([0.600, 0.587, 1.300, 0.467, -0.267, 0], rel=1e-2)

..                                                                                 [100%]
2 passed in 0.01s


## Feedback

Please give (anonymous) feedback on this exercise by filling out [this form](https://forms.gle/2jPayczbFhEcC9K68).