# Evaluation of IR:

Calculating ( precision, recall ,fmeasure) on results of IR models

This lab demonstrates how the **Vector Space Model (VSM)** can be applied to rank documents based on **cosine similarity** between **TF-IDF vectors** of documents and a query.

---

## Dataset

We use a small set of **movie review snippets** as documents:

| DocID | Content |
|-------|---------|
| D1    | I loved this movie the plot was exciting and the characters were amazing |
| D2    | Terrible movie waste of time and I would not recommend it |
| D3    | An average film some good moments but overall it was predictable |
| D4    | Fantastic performance by the lead actor brilliant cinematography |
| D5    | Bad script poor direction not worth watching at all |

---

## Query

We use a sample query to find relevant documents:


The **relevant documents** for evaluation are `{D1, D4}`.

---

## Preprocessing

1. **Tokenization**: Split text into lowercase word tokens.  
2. **Vocabulary creation**: Build a list of unique terms across all documents.  
3. **TF-IDF Calculation**:
   - **Term Frequency (TF)**: Number of times a term appears in a document.
   - **Inverse Document Frequency (IDF)**: Measures how rare a term is across all documents.
   - **TF-IDF**: Multiply TF by IDF to compute term weights.

---

## Cosine Similarity

To rank documents:

\[
\text{cosine\_sim}(D, Q) = \frac{D \cdot Q}{\|D\| \|Q\|}
\]

- Measures similarity between **document vector** and **query vector**.  
- Range: 0 (no similarity) to 1 (identical vectors).  

Documents are ranked in descending order of cosine similarity with the query.

---

## Evaluation Metrics

We use **Precision, Recall, and F1-score** to evaluate retrieval effectiveness:

- **Precision (P)**: Fraction of retrieved documents that are relevant.  
- **Recall (R)**: Fraction of relevant documents that are retrieved.  
- **F1-score (F1)**: Harmonic mean of precision and recall.

\[
P = \frac{TP}{TP + FP}, \quad
R = \frac{TP}{TP + FN}, \quad
F1 = \frac{2PR}{P + R}
\]

Where:  
- TP = True Positives  
- FP = False Positives  
- FN = False Negatives  

---

## Lab Workflow

1. Tokenize documents and query.  
2. Compute TF-IDF vectors for all documents and the query.  
3. Calculate cosine similarity between each document and the query.  
4. Rank documents based on similarity scores.  
5. Evaluate retrieval using Precision, Recall, and F1 at different top-k ranks.

---

## Notes

- This demo uses a small dataset of movie reviews for simplicity.  
- The workflow can scale to larger datasets and more complex queries.  
- VSM allows **ranking documents**, unlike Boolean retrieval which is strictly binary.


In [1]:
# vsm_lab_demo_movies.py
from math import log, sqrt
from collections import Counter
import re

# Sample movie review documents
docs = [
    "I loved this movie the plot was exciting and the characters were amazing",
    "Terrible movie waste of time and I would not recommend it",
    "An average film some good moments but overall it was predictable",
    "Fantastic performance by the lead actor brilliant cinematography",
    "Bad script poor direction not worth watching at all",
]

doc_ids = [f"D{i + 1}" for i in range(len(docs))]

# Example query
query = "movie plot exciting characters"

# Set of relevant documents for evaluation
relevant_set = {"D1", "D4"}

token_pattern = re.compile(r"[a-zA-Z]+")

def tokenize(text):
    return [t.lower() for t in token_pattern.findall(text)]

# Tokenize documents
tokenized_docs = [tokenize(d) for d in docs]
N = len(docs)

# Compute document frequencies
df = Counter()
for terms in tokenized_docs:
    for t in set(terms):
        df[t] += 1

# Compute IDF
idf = {t: log((N + 1) / (df_t + 1)) + 1 for t, df_t in df.items()}

# TF-IDF vector
def tfidf_vector(terms):
    tf = Counter(terms)
    vec = {}
    for t, freq in tf.items():
        if t in idf:
            vec[t] = freq * idf[t]
    return vec

# Cosine similarity
def cosine_sim(v1, v2):
    dot = sum(v1.get(t, 0) * v2.get(t, 0) for t in set(v1) | set(v2))
    n1 = sqrt(sum(w * w for w in v1.values()))
    n2 = sqrt(sum(w * w for w in v2.values()))
    if n1 == 0 or n2 == 0:
        return 0.0
    return dot / (n1 * n2)

# Compute document vectors
doc_vectors = [tfidf_vector(tokenize(d)) for d in docs]
query_vec = tfidf_vector(tokenize(query))

# Compute cosine similarity scores
scores = [
    (doc_id, cosine_sim(query_vec, vec)) for doc_id, vec in zip(doc_ids, doc_vectors)
]
ranked = sorted(scores, key=lambda x: x[1], reverse=True)

# Precision, Recall, F1 calculation
def precision_recall_f1(retrieved_ids, relevant_ids):
    retrieved_set = set(retrieved_ids)
    tp = len(retrieved_set & relevant_ids)
    fp = len(retrieved_set - relevant_ids)
    fn = len(relevant_ids - retrieved_set)
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0.0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0.0
    f1 = (
        (2 * precision * recall / (precision + recall))
        if (precision + recall) > 0
        else 0.0
    )
    return precision, recall, f1

# Main execution
if __name__ == "__main__":
    print("Ranking (DocID, CosineSimilarity):")
    for d, s in ranked:
        print(d, round(s, 4))
    for k in range(1, len(docs) + 1):
        topk = [doc_id for doc_id, _ in ranked[:k]]
        P, R, F1 = precision_recall_f1(topk, relevant_set)
        print(f"@{k} -> P={P:.4f} R={R:.4f} F1={F1:.4f}")


Ranking (DocID, CosineSimilarity):
D1 0.5469
D2 0.112
D3 0.0
D4 0.0
D5 0.0
@1 -> P=1.0000 R=0.5000 F1=0.6667
@2 -> P=0.5000 R=0.5000 F1=0.5000
@3 -> P=0.3333 R=0.5000 F1=0.4000
@4 -> P=0.5000 R=1.0000 F1=0.6667
@5 -> P=0.4000 R=1.0000 F1=0.5714
