# Feedback

## Types of feedback

1. Relevance feedback
   - User indicates which of the returned search result are relevant and which are not.
2. Pseudo feedback
   - Assumes top K documents retrieved by the search engine are relevant.
   - Obtain additional relevant words **not in query** from these top K documents, for e.g., using a background language model.
   - These additional relevant words can be used to expand the original query.
   - **No user interaction** is required beyond entering the query.
3. Implicit
   - Obtained user's preferences indirectly by inferring his/her behavior for e.g. from clickthrough logs.

## Feedback for vector space model

### General method: query modification

- Adjust query vector 
  - Add new (weighted) terms
  - Adjust weights of old terms

### Rocchio

  - Adjust query vector 
    - to be more similar to some "averaged vector" of relevant documents and
    - less similar to those irrelevant ones.

Given a query vector $\mathbf{q}$, the updated vector $\mathbf{q_m}$ computed using weights
$\alpha, \beta, \gamma \in \mathbb{R_{+}}$ is

\begin{align}
\mathbf{q_m} = \alpha 
                \underbrace{\mathbf{q}}_{\text{Original query}} 
                +
                \beta
                \underbrace {
                \sum_{ \mathbf{d} \in \mathcal{D_r}}
               \frac{\mathbf{d}}
                    {\lvert 
                     \underbrace{\mathcal{D_r}}_{ \substack{ \text{Relevant}\\\text{docs} } }  
                     \rvert}
                }_{ \text{Average relevant doc} } -               
               \gamma
               \underbrace {
                \sum_{ \mathbf{d} \in \mathcal{D_n}}
               \frac{\mathbf{d}}
                    {\lvert 
                     \underbrace{\mathcal{D_n}}_{ \substack{ \text{Irrelevant}\\\text{docs} } }  
                     \rvert}
               }_{ \text{Average irrelevant doc} }
\end{align}

### In practice

1. Irrelevant centroid usually less important since it represents a large range of topics and hence will drag the query vector all over the place.
2. Often truncate vector to consider only elements of centroid vectors with highest weights.
3. Keep "relative" high weights of original query vector since the terms are entered by the user and hence should be precise.
4. $\beta$ should be higher for relevant feedback than for pseudo feedback as in the former case we know that the relevant documents are accurate because the user marked them as so whereas in the latter case we only assume that they are.
5. Usually robust.

## Feedback in language model approaches

**Problem:** Query likelihood cannot naturally support relevance feedback.

**Solution**:
- Kullback-Leibler (KL) divergence retrieval model as a generalization of query likelihood 
- Feedback is achieved through query model estimation/updating.

## Kullback-Leibler (KL) Divergence Retrieval Model

Query likelihood model
\begin{align}
f(q, d) 
&= 
\sum_{w \in q \cap d} c(w, q) \log \frac{P_{\text{Seen}}(w \mid d)}{\alpha_d P(w \mid C)}
+
n \log \alpha_d.
\end{align}

KL-divergence model
\begin{align}
f(q, d)
&=
\sum_{w \in d, P(w \mid \hat{\theta}_Q) > 0 } 
\underbrace{
P(w \mid \hat{\theta}_{Q}) 
}_{\substack{\text{For query LM, } \\ \text{set to } \frac{c(w, q)}{\lvert q \rvert} }}
\log \frac{P_{\text{Seen}}(w \mid d)}{\alpha_d P(w \mid C)}
+
n \log \alpha_d.
\end{align}

So the goal is to estimate $P(w \mid \hat{\theta}_{Q})$. One way to incorporate feedback in estimating this model is to first obtain a set of feedback documents that are assumed to be relevant $\mathcal{F} = \{d_i\}_{i=1}^n$ and then estimate $P(w \mid \hat{\theta}_{Q})$ as the distribution that is most likely to have generated this $\mathcal{F}$.

To do this, we first model $P(w \mid \hat{\theta}_{Q})$ as the mixture of a background and topic model

\begin{align}
P(\mathcal{F} \mid \theta) &= 
\prod_{i=1}^n \prod_{w \in d_i}
\left( (1 - \lambda) \underbrace{P(w \mid \theta)}_{\text{Topic model}} + 
\lambda \underbrace{P(w \mid C)}_{ \substack{\text{Collection} \\ \text{topic model}}} \right)
^{ \overbrace{c\left(w, d_i\right)}^{\text{Count of } w \text{ in } d_i} } 
\text{ where } \lambda \in [0, 1]
\end{align}
and we seek $\theta^\ast$ such that

\begin{align}
\theta^\ast 
&= \underset{\theta}{\arg \max} \log P(\mathcal{F} \mid \theta) \\
&= \underset{\theta}{\arg \max} \sum_{i=1}^n \sum_{w \in d_i} c(w, d_i) 
                                \log \left( (1 - \lambda) P(w \mid \theta) + 
                                \lambda P(w \mid C) \right)
\end{align}
