# Information Retrieval: Vector Space Model & Evaluation

This notebook covers key concepts in Information Retrieval including:
- Vector Space Model
- TF-IDF Weighting
- Cosine Similarity
- Evaluation Metrics (Precision, Recall, F-measure, MAP)

---

https://github.com/hsantos10/CSCE-670---Information-Storage-and-Retrieval-Spring-2026-/blob/main/Week%206/Week_06_Recommender_Basics.ipynb

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from collections import Counter
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import seaborn as sns

# Set style for better visualizations
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)

## 1. Vector Space Model (VSM)

### Concept
The Vector Space Model represents documents and queries as vectors in a high-dimensional space where:
- Each dimension corresponds to a term in the vocabulary
- Documents with similar content will be close in this space
- Similarity can be computed using geometric measures (e.g., cosine similarity)

### Mathematical Representation
- Document: $\vec{d} = (w_{1,d}, w_{2,d}, ..., w_{n,d})$
- Query: $\vec{q} = (w_{1,q}, w_{2,q}, ..., w_{n,q})$
- Where $w_{i,j}$ is the weight of term $i$ in document/query $j$

In [None]:
# Example document collection
documents = [
    "The cat sat on the mat",
    "The dog sat on the log",
    "Cats and dogs are animals",
    "The animal sat on the chair"
]

query = "cat and dog"

print("Document Collection:")
for i, doc in enumerate(documents, 1):
    print(f"Doc {i}: {doc}")
print(f"\nQuery: {query}")

## 2. Term Frequency (TF)

### Raw Term Frequency
$$TF(t, d) = \text{count of term } t \text{ in document } d$$

### Normalized Term Frequency
$$TF(t, d) = \frac{\text{count of term } t \text{ in document } d}{\text{total terms in document } d}$$

### Log-scaled Term Frequency
$$TF(t, d) = 1 + \log(\text{count of term } t \text{ in document } d)$$
(if count > 0, else 0)

In [None]:
def calculate_term_frequency(document):
    """Calculate raw term frequency for a document"""
    terms = document.lower().split()
    tf = Counter(terms)
    return dict(tf)

def calculate_normalized_tf(document):
    """Calculate normalized term frequency"""
    terms = document.lower().split()
    tf = Counter(terms)
    total_terms = len(terms)
    return {term: count/total_terms for term, count in tf.items()}

def calculate_log_tf(document):
    """Calculate log-scaled term frequency"""
    terms = document.lower().split()
    tf = Counter(terms)
    return {term: 1 + np.log10(count) if count > 0 else 0 for term, count in tf.items()}

# Example with first document
example_doc = documents[0]
print(f"Document: '{example_doc}'\n")
print("Raw TF:", calculate_term_frequency(example_doc))
print("\nNormalized TF:", calculate_normalized_tf(example_doc))
print("\nLog TF:", calculate_log_tf(example_doc))

## 3. Inverse Document Frequency (IDF)

### Concept
IDF measures how informative a term is across the entire document collection.
- Rare terms have high IDF (more informative)
- Common terms have low IDF (less informative)

### Formula
$$IDF(t) = \log\frac{N}{df(t)}$$

Where:
- $N$ = total number of documents
- $df(t)$ = number of documents containing term $t$

### Alternative (smoothed) Formula
$$IDF(t) = \log\frac{N + 1}{df(t) + 1} + 1$$

In [None]:
def calculate_idf(documents):
    """Calculate IDF for all terms in document collection"""
    N = len(documents)

    # Count document frequency for each term
    df = {}
    for doc in documents:
        terms = set(doc.lower().split())
        for term in terms:
            df[term] = df.get(term, 0) + 1

    # Calculate IDF
    idf = {term: np.log10(N / doc_freq) for term, doc_freq in df.items()}
    return idf, df

idf_scores, doc_frequencies = calculate_idf(documents)

# Create a DataFrame for better visualization
idf_df = pd.DataFrame({
    'Term': list(idf_scores.keys()),
    'Document Frequency': [doc_frequencies[term] for term in idf_scores.keys()],
    'IDF Score': list(idf_scores.values())
}).sort_values('IDF Score', ascending=False)

print("IDF Scores for Terms:")
print(idf_df.to_string(index=False))

# Visualize IDF scores
plt.figure(figsize=(12, 6))
plt.bar(idf_df['Term'], idf_df['IDF Score'], color='steelblue')
plt.xlabel('Terms')
plt.ylabel('IDF Score')
plt.title('IDF Scores: Rare vs Common Terms')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

print("\nðŸ“Œ Observation: Terms like 'the' have low IDF (common), while 'mat', 'log' have higher IDF (rare)")

## 4. TF-IDF Weighting

### Formula
$$TF\text{-}IDF(t, d) = TF(t, d) \times IDF(t)$$

### Intuition
- Combines term importance within a document (TF) with term rarity across documents (IDF)
- High TF-IDF means: term appears frequently in this document but rarely in others
- Low TF-IDF means: term is common across all documents (e.g., "the", "and")

In [None]:
def calculate_tfidf_manual(documents):
    """Calculate TF-IDF manually for better understanding"""
    # Get IDF scores
    idf_scores, _ = calculate_idf(documents)

    # Calculate TF-IDF for each document
    tfidf_vectors = []
    for doc in documents:
        tf = calculate_term_frequency(doc)
        tfidf = {}
        for term, freq in tf.items():
            tfidf[term] = freq * idf_scores[term]
        tfidf_vectors.append(tfidf)

    return tfidf_vectors

# Calculate TF-IDF manually
tfidf_manual = calculate_tfidf_manual(documents)

print("TF-IDF Vectors (Manual Calculation):\n")
for i, tfidf_vec in enumerate(tfidf_manual, 1):
    print(f"Document {i}: {documents[i-1]}")
    sorted_terms = sorted(tfidf_vec.items(), key=lambda x: x[1], reverse=True)
    for term, score in sorted_terms[:5]:  # Top 5 terms
        print(f"  {term}: {score:.4f}")
    print()

In [None]:
# Using sklearn's TfidfVectorizer for comparison
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(documents)
feature_names = vectorizer.get_feature_names_out()

# Create DataFrame for visualization
tfidf_df = pd.DataFrame(
    tfidf_matrix.toarray(),
    columns=feature_names,
    index=[f'Doc {i+1}' for i in range(len(documents))]
)

print("TF-IDF Matrix (using sklearn):")
print(tfidf_df.round(3))

# Heatmap visualization
plt.figure(figsize=(14, 6))
sns.heatmap(tfidf_df, annot=True, fmt='.3f', cmap='YlOrRd', cbar_kws={'label': 'TF-IDF Score'})
plt.title('TF-IDF Matrix Heatmap')
plt.xlabel('Terms')
plt.ylabel('Documents')
plt.tight_layout()
plt.show()

## 5. Cosine Similarity

### Formula
$$\text{cosine similarity}(\vec{d}, \vec{q}) = \frac{\vec{d} \cdot \vec{q}}{||\vec{d}|| \times ||\vec{q}||} = \frac{\sum_{i=1}^{n} d_i \times q_i}{\sqrt{\sum_{i=1}^{n} d_i^2} \times \sqrt{\sum_{i=1}^{n} q_i^2}}$$

### Properties
- Range: [-1, 1] (but typically [0, 1] for text)
- 1 = identical direction (very similar)
- 0 = orthogonal (no similarity)
- Independent of vector magnitude (normalized)

In [None]:
def cosine_similarity_manual(vec1, vec2):
    """Calculate cosine similarity between two vectors"""
    # Get all unique terms
    all_terms = set(vec1.keys()) | set(vec2.keys())

    # Create vectors
    v1 = np.array([vec1.get(term, 0) for term in all_terms])
    v2 = np.array([vec2.get(term, 0) for term in all_terms])

    # Calculate cosine similarity
    dot_product = np.dot(v1, v2)
    norm_v1 = np.linalg.norm(v1)
    norm_v2 = np.linalg.norm(v2)

    if norm_v1 == 0 or norm_v2 == 0:
        return 0

    return dot_product / (norm_v1 * norm_v2)

# Calculate query TF-IDF
query_tf = calculate_term_frequency(query)
idf_scores, _ = calculate_idf(documents)
query_tfidf = {term: freq * idf_scores.get(term, 0) for term, freq in query_tf.items()}

print(f"Query: '{query}'")
print(f"Query TF-IDF: {query_tfidf}\n")

# Calculate similarity with each document
similarities = []
for i, doc_tfidf in enumerate(tfidf_manual, 1):
    sim = cosine_similarity_manual(query_tfidf, doc_tfidf)
    similarities.append((i, sim))
    print(f"Similarity with Doc {i}: {sim:.4f}")

# Rank documents
ranked_docs = sorted(similarities, key=lambda x: x[1], reverse=True)
print("\nðŸ“Š Ranking:")
for rank, (doc_id, sim) in enumerate(ranked_docs, 1):
    print(f"{rank}. Doc {doc_id} (score: {sim:.4f}): {documents[doc_id-1]}")

In [None]:
# Visualize cosine similarity
query_vector = vectorizer.transform([query])
similarities_sklearn = cosine_similarity(query_vector, tfidf_matrix)[0]

# Create visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

# Bar plot
doc_labels = [f'Doc {i+1}' for i in range(len(documents))]
ax1.bar(doc_labels, similarities_sklearn, color='coral')
ax1.set_xlabel('Documents')
ax1.set_ylabel('Cosine Similarity')
ax1.set_title(f'Query-Document Similarity\nQuery: "{query}"')
ax1.set_ylim(0, 1)

# Add values on bars
for i, v in enumerate(similarities_sklearn):
    ax1.text(i, v + 0.02, f'{v:.3f}', ha='center', va='bottom')

# Radar plot
angles = np.linspace(0, 2 * np.pi, len(documents), endpoint=False)
similarities_plot = np.concatenate((similarities_sklearn, [similarities_sklearn[0]]))
angles_plot = np.concatenate((angles, [angles[0]]))

ax2 = plt.subplot(122, projection='polar')
ax2.plot(angles_plot, similarities_plot, 'o-', linewidth=2, color='coral')
ax2.fill(angles_plot, similarities_plot, alpha=0.25, color='coral')
ax2.set_xticks(angles)
ax2.set_xticklabels(doc_labels)
ax2.set_ylim(0, 1)
ax2.set_title('Similarity Radar Plot')
ax2.grid(True)

plt.tight_layout()
plt.show()

## 6. Evaluation Metrics

### Precision and Recall

**Precision**: What fraction of retrieved documents are relevant?
$$Precision = \frac{\text{# Relevant Retrieved}}{\text{# Retrieved}} = \frac{TP}{TP + FP}$$

**Recall**: What fraction of relevant documents are retrieved?
$$Recall = \frac{\text{# Relevant Retrieved}}{\text{# Relevant}} = \frac{TP}{TP + FN}$$

### F-Measure
Harmonic mean of precision and recall:
$$F_1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}$$

Weighted F-measure:
$$F_\beta = (1 + \beta^2) \times \frac{Precision \times Recall}{\beta^2 \times Precision + Recall}$$

In [None]:
def calculate_precision_recall(retrieved, relevant):
    """
    Calculate precision and recall

    Args:
        retrieved: set of retrieved document IDs
        relevant: set of relevant document IDs
    """
    retrieved_set = set(retrieved)
    relevant_set = set(relevant)

    true_positives = len(retrieved_set & relevant_set)

    precision = true_positives / len(retrieved_set) if len(retrieved_set) > 0 else 0
    recall = true_positives / len(relevant_set) if len(relevant_set) > 0 else 0

    return precision, recall

def calculate_f_measure(precision, recall, beta=1):
    """Calculate F-measure"""
    if precision + recall == 0:
        return 0
    return (1 + beta**2) * (precision * recall) / (beta**2 * precision + recall)

# Example: System retrieved docs [1, 2, 3, 4, 5]
# Ground truth relevant docs: [2, 3, 6, 7]
retrieved = [1, 2, 3, 4, 5]
relevant = [2, 3, 6, 7]

precision, recall = calculate_precision_recall(retrieved, relevant)
f1 = calculate_f_measure(precision, recall)

print("Evaluation Example:")
print(f"Retrieved documents: {retrieved}")
print(f"Relevant documents:  {relevant}")
print(f"\nTrue Positives: {set(retrieved) & set(relevant)}")
print(f"False Positives: {set(retrieved) - set(relevant)}")
print(f"False Negatives: {set(relevant) - set(retrieved)}")
print(f"\nPrecision: {precision:.3f} ({len(set(retrieved) & set(relevant))}/{len(retrieved)})")
print(f"Recall:    {recall:.3f} ({len(set(retrieved) & set(relevant))}/{len(relevant)})")
print(f"F1-Score:  {f1:.3f}")

In [None]:
# Visualize Precision-Recall Tradeoff
def plot_precision_recall_tradeoff():
    """Visualize how precision and recall change with different thresholds"""
    # Simulated ranked results with relevance judgments
    ranked_results = [
        (1, True),   # (doc_id, is_relevant)
        (2, False),
        (3, True),
        (4, True),
        (5, False),
        (6, True),
        (7, False),
        (8, False),
        (9, True),
        (10, False)
    ]

    total_relevant = sum(1 for _, rel in ranked_results if rel)

    precisions = []
    recalls = []

    # Calculate precision and recall at each cutoff
    for k in range(1, len(ranked_results) + 1):
        retrieved = [doc_id for doc_id, _ in ranked_results[:k]]
        relevant_retrieved = sum(1 for _, rel in ranked_results[:k] if rel)

        precision = relevant_retrieved / k
        recall = relevant_retrieved / total_relevant

        precisions.append(precision)
        recalls.append(recall)

    # Plot
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

    # Precision and Recall vs K
    k_values = list(range(1, len(ranked_results) + 1))
    ax1.plot(k_values, precisions, 'o-', label='Precision', linewidth=2, markersize=8)
    ax1.plot(k_values, recalls, 's-', label='Recall', linewidth=2, markersize=8)
    ax1.set_xlabel('K (number of documents retrieved)')
    ax1.set_ylabel('Score')
    ax1.set_title('Precision and Recall vs K')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    ax1.set_ylim(0, 1.1)

    # Precision-Recall Curve
    ax2.plot(recalls, precisions, 'o-', linewidth=2, markersize=8, color='purple')
    ax2.set_xlabel('Recall')
    ax2.set_ylabel('Precision')
    ax2.set_title('Precision-Recall Curve')
    ax2.grid(True, alpha=0.3)
    ax2.set_xlim(0, 1.1)
    ax2.set_ylim(0, 1.1)

    # Add annotations for key points
    for i in [0, 4, 9]:
        ax2.annotate(f'K={i+1}', xy=(recalls[i], precisions[i]),
                    xytext=(10, 10), textcoords='offset points',
                    bbox=dict(boxstyle='round,pad=0.5', fc='yellow', alpha=0.5),
                    arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0'))

    plt.tight_layout()
    plt.show()

    return precisions, recalls

precisions, recalls = plot_precision_recall_tradeoff()
print("\nðŸ“Œ Observation: As we retrieve more documents (increase K), recall increases but precision typically decreases")

## 7. Average Precision (AP)

### Formula
$$AP = \frac{1}{R} \sum_{k=1}^{n} Precision(k) \times rel(k)$$

Where:
- $R$ = total number of relevant documents
- $n$ = number of retrieved documents
- $rel(k)$ = 1 if document at rank $k$ is relevant, 0 otherwise

### Interpretation
- Rewards systems that rank relevant documents higher
- Takes into account the position of relevant documents

In [None]:
def calculate_average_precision(ranked_results, relevant_docs):
    """
    Calculate Average Precision

    Args:
        ranked_results: list of retrieved document IDs in ranked order
        relevant_docs: set of relevant document IDs
    """
    relevant_set = set(relevant_docs)

    precisions_at_relevant = []
    num_relevant_seen = 0

    for i, doc_id in enumerate(ranked_results, 1):
        if doc_id in relevant_set:
            num_relevant_seen += 1
            precision_at_i = num_relevant_seen / i
            precisions_at_relevant.append(precision_at_i)

    if len(precisions_at_relevant) == 0:
        return 0

    return sum(precisions_at_relevant) / len(relevant_set)

# Example
ranked_results = [3, 7, 1, 5, 2, 8, 4, 6, 9, 10]
relevant_docs = [1, 2, 3, 5]

print("Average Precision Example:")
print(f"Ranked results: {ranked_results}")
print(f"Relevant docs:  {relevant_docs}\n")

# Show step-by-step calculation
print("Step-by-step calculation:")
num_relevant_seen = 0
precisions_at_relevant = []

for i, doc_id in enumerate(ranked_results, 1):
    if doc_id in relevant_docs:
        num_relevant_seen += 1
        precision_at_i = num_relevant_seen / i
        precisions_at_relevant.append(precision_at_i)
        print(f"Rank {i}: Doc {doc_id} is relevant. Precision@{i} = {num_relevant_seen}/{i} = {precision_at_i:.4f}")

ap = calculate_average_precision(ranked_results, relevant_docs)
print(f"\nAverage Precision = ({' + '.join([f'{p:.4f}' for p in precisions_at_relevant])}) / {len(relevant_docs)}")
print(f"                  = {sum(precisions_at_relevant):.4f} / {len(relevant_docs)}")
print(f"                  = {ap:.4f}")

## 8. Mean Average Precision (MAP)

### Formula
$$MAP = \frac{1}{Q} \sum_{q=1}^{Q} AP(q)$$

Where:
- $Q$ = number of queries
- $AP(q)$ = Average Precision for query $q$

### Interpretation
- Single-number metric for overall system performance
- Average of AP scores across all queries

In [None]:
def calculate_map(queries_results):
    """
    Calculate Mean Average Precision

    Args:
        queries_results: list of tuples (ranked_results, relevant_docs) for each query
    """
    aps = []
    for ranked_results, relevant_docs in queries_results:
        ap = calculate_average_precision(ranked_results, relevant_docs)
        aps.append(ap)

    return sum(aps) / len(aps) if aps else 0

# Example with multiple queries
queries_results = [
    # Query 1
    ([3, 7, 1, 5, 2], [1, 2, 3]),
    # Query 2
    ([1, 4, 2, 6, 3], [1, 3, 4]),
    # Query 3
    ([5, 2, 8, 1, 3], [1, 2, 5])
]

print("Mean Average Precision Example:\n")
aps = []
for i, (ranked_results, relevant_docs) in enumerate(queries_results, 1):
    ap = calculate_average_precision(ranked_results, relevant_docs)
    aps.append(ap)
    print(f"Query {i}:")
    print(f"  Ranked: {ranked_results}")
    print(f"  Relevant: {relevant_docs}")
    print(f"  AP = {ap:.4f}\n")

map_score = calculate_map(queries_results)
print(f"MAP = ({' + '.join([f'{ap:.4f}' for ap in aps])}) / {len(queries_results)}")
print(f"    = {map_score:.4f}")

In [None]:
# Visualize AP for different queries
query_names = [f'Query {i+1}' for i in range(len(queries_results))]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

# Bar plot of AP scores
colors = ['skyblue', 'lightcoral', 'lightgreen']
ax1.bar(query_names, aps, color=colors)
ax1.axhline(y=map_score, color='red', linestyle='--', linewidth=2, label=f'MAP = {map_score:.4f}')
ax1.set_ylabel('Average Precision')
ax1.set_title('Average Precision per Query')
ax1.legend()
ax1.set_ylim(0, 1.1)

# Add value labels
for i, (name, ap) in enumerate(zip(query_names, aps)):
    ax1.text(i, ap + 0.02, f'{ap:.3f}', ha='center', va='bottom', fontweight='bold')

# Box plot showing distribution
ax2.boxplot([aps], labels=['AP Scores'])
ax2.scatter([1]*len(aps), aps, color=colors, s=100, zorder=3)
ax2.axhline(y=map_score, color='red', linestyle='--', linewidth=2, label=f'MAP = {map_score:.4f}')
ax2.set_ylabel('Average Precision')
ax2.set_title('Distribution of AP Scores')
ax2.legend()
ax2.set_ylim(0, 1.1)

plt.tight_layout()
plt.show()

## 9. Comprehensive Example: End-to-End IR System

In [None]:
class SimpleIRSystem:
    """A simple Information Retrieval System"""

    def __init__(self, documents):
        self.documents = documents
        self.vectorizer = TfidfVectorizer()
        self.tfidf_matrix = self.vectorizer.fit_transform(documents)

    def search(self, query, top_k=5):
        """Search for documents relevant to query"""
        query_vector = self.vectorizer.transform([query])
        similarities = cosine_similarity(query_vector, self.tfidf_matrix)[0]

        # Get top-k results
        top_indices = np.argsort(similarities)[::-1][:top_k]

        results = []
        for idx in top_indices:
            results.append({
                'doc_id': idx,
                'document': self.documents[idx],
                'score': similarities[idx]
            })

        return results

    def evaluate(self, query, relevant_doc_ids, top_k=5):
        """Evaluate search results"""
        results = self.search(query, top_k)
        retrieved_ids = [r['doc_id'] for r in results]

        precision, recall = calculate_precision_recall(retrieved_ids, relevant_doc_ids)
        f1 = calculate_f_measure(precision, recall)
        ap = calculate_average_precision(retrieved_ids, relevant_doc_ids)

        return {
            'results': results,
            'precision': precision,
            'recall': recall,
            'f1': f1,
            'ap': ap
        }

# Create a larger document collection
doc_collection = [
    "Machine learning is a subset of artificial intelligence",
    "Deep learning uses neural networks with multiple layers",
    "Natural language processing enables computers to understand human language",
    "Computer vision allows machines to interpret visual information",
    "Reinforcement learning is inspired by behavioral psychology",
    "Supervised learning requires labeled training data",
    "Unsupervised learning finds patterns in unlabeled data",
    "Transfer learning reuses pre-trained models for new tasks",
    "Generative models can create new data similar to training data",
    "Neural networks are inspired by biological neurons in the brain"
]

# Initialize IR system
ir_system = SimpleIRSystem(doc_collection)

# Test query
test_query = "neural networks and deep learning"
relevant_docs = [1, 9]  # Ground truth

# Evaluate
evaluation = ir_system.evaluate(test_query, relevant_docs, top_k=5)

print(f"Query: '{test_query}'\n")
print("="*80)
print("Search Results:")
print("="*80)
for i, result in enumerate(evaluation['results'], 1):
    relevance = "âœ“ RELEVANT" if result['doc_id'] in relevant_docs else "âœ— Not relevant"
    print(f"{i}. [Doc {result['doc_id']}] {relevance}")
    print(f"   Score: {result['score']:.4f}")
    print(f"   Text: {result['document']}")
    print()

print("="*80)
print("Evaluation Metrics:")
print("="*80)
print(f"Precision:  {evaluation['precision']:.4f}")
print(f"Recall:     {evaluation['recall']:.4f}")
print(f"F1-Score:   {evaluation['f1']:.4f}")
print(f"Avg Precision: {evaluation['ap']:.4f}")

## 10. Summary and Key Takeaways

### Vector Space Model
- âœ… Represents documents and queries as vectors
- âœ… Enables similarity computation
- âœ… Foundation for modern IR systems

### TF-IDF
- âœ… Balances term frequency (local) with inverse document frequency (global)
- âœ… Reduces impact of common words
- âœ… Highlights discriminative terms

### Cosine Similarity
- âœ… Measures angle between vectors
- âœ… Normalized (independent of document length)
- âœ… Range: 0 (dissimilar) to 1 (identical)

### Evaluation Metrics
- **Precision**: Quality of results (how many retrieved are relevant?)
- **Recall**: Coverage of results (how many relevant are retrieved?)
- **F-measure**: Harmonic mean of precision and recall
- **Average Precision**: Rewards ranking relevant documents higher
- **MAP**: Overall system performance across queries

### Trade-offs
- Precision â†” Recall: Improving one often hurts the other
- Ranking matters: AP/MAP consider position of relevant documents
- Context matters: Different applications prioritize different metrics

## 11. Practice Exercises

Try these exercises to reinforce your understanding:

In [None]:
# Exercise 1: Calculate TF-IDF for a new document collection
exercise_docs = [
    "Python is a popular programming language",
    "Java is used for enterprise applications",
    "Python is great for data science and machine learning",
    "JavaScript is essential for web development"
]

# TODO: Calculate TF-IDF and find the most important terms for each document
# Your code here

print("Exercise 1: Implement TF-IDF calculation for the exercise_docs collection")

In [None]:
# Exercise 2: Given these search results, calculate precision, recall, and AP
retrieved = [1, 3, 5, 7, 9, 2, 4, 6, 8, 10]
relevant = [1, 2, 5, 7]

# TODO: Calculate precision, recall, F1, and Average Precision
# Your code here

print("Exercise 2: Calculate evaluation metrics for the given results")

In [None]:
# Exercise 3: Compare two different ranking algorithms
ranking_a = [2, 1, 5, 3, 7, 4, 6, 8]
ranking_b = [1, 3, 2, 5, 4, 7, 6, 8]
relevant = [1, 2, 3, 5]

# TODO: Calculate AP for both rankings and determine which is better
# Your code here

print("Exercise 3: Compare the two ranking algorithms using AP")

## References and Further Reading

1. **Introduction to Information Retrieval** by Manning, Raghavan, and SchÃ¼tze
   - Chapter 6: Scoring, term weighting, and the vector space model
   - Chapter 8: Evaluation in information retrieval

2. **Modern Information Retrieval** by Baeza-Yates and Ribeiro-Neto

3. **Scikit-learn Documentation**
   - TfidfVectorizer: https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

4. **TREC Evaluation Resources**: https://trec.nist.gov/