# 18. Multimedia Information Retrieval (Concepts)

Traditional IR focuses on text. Multimedia IR (MIR) deals with **Images**, **Audio**, and **Video**.

Since we are using vanilla Python without heavy libraries (OpenCV, LibROSA), this notebook demonstrates the **concepts** and **algorithms** using simplified examples.

In [3]:
import random
import math
import matplotlib.pyplot as plt # Optional, but standard in many environments. We won't strictly rely on it.

## 1. Image Retrieval: Color Histograms

One simple way to compare images is by their color distribution.

**Algorithm**:
1. Quantize colors (e.g., Red, Green, Blue bins).
2. Count pixels in each bin.
3. Compare histograms using Euclidean Distance or Cosine Similarity.

In [4]:
# Simulating Image Data (Pixel Grid)
# Let's say we have 10x10 images with RGB values (0-255)

def generate_dummy_image(dominant_color):
    # dominant_color: 'R', 'G', or 'B'
    pixels = []
    for _ in range(100):
        if dominant_color == 'R':
            p = (random.randint(200, 255), random.randint(0, 50), random.randint(0, 50))
        elif dominant_color == 'G':
            p = (random.randint(0, 50), random.randint(200, 255), random.randint(0, 50))
        else:
            p = (random.randint(0, 50), random.randint(0, 50), random.randint(200, 255))
        pixels.append(p)
    return pixels

img1 = generate_dummy_image('R') # Mostly Red
img2 = generate_dummy_image('R') # Another Red Image
img3 = generate_dummy_image('B') # Mostly Blue Image

def compute_histogram(pixels):
    # Simple 3-bin histogram (R, G, B)
    hist = [0, 0, 0]
    for r, g, b in pixels:
        if r > g and r > b: hist[0] += 1
        elif g > r and g > b: hist[1] += 1
        else: hist[2] += 1
    return hist

h1 = compute_histogram(img1)
h2 = compute_histogram(img2)
h3 = compute_histogram(img3)

print(f"Hist 1 (Red): {h1}")
print(f"Hist 2 (Red): {h2}")
print(f"Hist 3 (Blue): {h3}")

Hist 1 (Red): [100, 0, 0]
Hist 2 (Red): [100, 0, 0]
Hist 3 (Blue): [0, 0, 100]


## 2. Comparing Histograms
We use Euclidean distance.

$$ D(h_1, h_2) = \sqrt{\sum (h_1[i] - h_2[i])^2} $$

In [5]:
def euclidean_distance(h1, h2):
    return math.sqrt(sum((a - b) ** 2 for a, b in zip(h1, h2)))

d12 = euclidean_distance(h1, h2)
d13 = euclidean_distance(h1, h3)

print(f"Distance (Red1 vs Red2): {d12:.2f} (Should be small)")
print(f"Distance (Red1 vs Blue): {d13:.2f} (Should be large)")

Distance (Red1 vs Red2): 0.00 (Should be small)
Distance (Red1 vs Blue): 141.42 (Should be large)


## 3. Audio Retrieval: Audio Fingerprinting (Shazam style)

To match audio, we don't compare raw waveforms. We compare **Spectrogram Peaks**.

**Concept**:
1. Convert audio to Spectrogram (Time-Frequency).
2. Find "Constellation Map" (Peaks in frequency).
3. Hash pairs of peaks (Anchor Point + Target Zone).
4. Search hashes in database.

In [6]:
# Simulated Audio Fingerprints
# Hash format: (freq1, freq2, delta_time)

song_db = {
    "Song A": ["120:440:10", "300:500:5", "100:200:20"],
    "Song B": ["500:600:10", "100:100:2", "900:1000:5"]
}

sample_clip = ["120:440:10", "300:500:5"] # Matches Song A

def identify_song(clip_hashes, database):
    best_match = None
    max_hits = 0
    
    for name, db_hashes in database.items():
        hits = 0
        db_set = set(db_hashes)
        for h in clip_hashes:
            if h in db_set:
                hits += 1
        
        if hits > max_hits:
            max_hits = hits
            best_match = name
            
    return best_match

match = identify_song(sample_clip, song_db)
print(f"\nSample clip matched: {match}")


Sample clip matched: Song A
