### Dot Product measures similarity

In [6]:
'''
SET‚Äì1 : Why Dot Product Measures Similarity (Foundations)

Q1. Why is dot product called an ‚Äúagreement score‚Äù?
Ans. Because it adds up how much corresponding features in two vectors agree with each other.
Higher agreement gives a higher dot product.
'''
# Example
v = [1, 2, 3]
u = [1, 2, 3]
# Dot product = 1*1 + 2*2 + 3*3 = 14 (strong agreement)


'''
Q2. How does dot product compare features one by one?
Ans. Each term multiplies matching features, checking whether both are large, small, or opposite.
'''
# Example
v = [2, 0, 1]
u = [2, 5, 1]
# Dot product = 2*2 + 0*5 + 1*1 = 5
# Matching non-zero features contribute most


'''
Q3. Why does alignment matter more than size in dot product similarity?
Ans. Because alignment (direction) determines meaning, while size only scales strength.
'''
# Example
v = [1, 1]
u = [10, 10]
# High dot product because directions match, not because sizes match


'''
Q4. Why is dot product zero for perpendicular vectors?
Ans. Because perpendicular vectors share no common direction, so their agreement is zero.
'''
# Example
v = [1, 0]
u = [0, 1]
# Dot product = 0 ‚Üí no similarity


'''
Q5. Why does dot product become negative for opposite directions?
Ans. Because features disagree, indicating conflicting meaning.
'''
# Example
v = [1, 0]
u = [-1, 0]
# Dot product = -1 ‚Üí opposite meaning


'''
Q6. How does geometry explain dot product as similarity?
Ans. Geometrically, dot product depends on vector lengths and cos(theta), which measures alignment.
'''
# Example
# v ¬∑ u = |v||u|cos(theta)
# Small theta ‚Üí large cos(theta) ‚Üí high similarity


'''
Q7. Why is dot product useful for representing meaning in vectors?
Ans. Because it combines feature-level agreement with directional alignment into one score.
'''
# Example
v = [likes_math, likes_ai, likes_music] = [1, 1, 0]
u = [likes_math, likes_ai, likes_music] = [1, 0, 0]
# Dot product = 1 ‚Üí partial agreement


#### Set 2

In [7]:
'''
SET‚Äì2 : Why Dot Product Measures Similarity (AI & Applications)

Q1. Why is dot product widely used in AI models?
Ans. Because it gives a single numeric score that represents how much two meanings match.
'''
# Example
query_embedding = [0.2, 0.8]
doc_embedding   = [0.3, 0.7]
# High dot product ‚Üí document is relevant to the query


'''
Q2. How does dot product work in Attention mechanisms?
Ans. In Attention, dot product measures how relevant a key is to a query.
Higher score means more attention.
'''
# Example
query = [1, 0]
key   = [1, 1]
# query ¬∑ key = 1 ‚Üí relevant token


'''
Q3. Why is dot product used in RAG (Retrieval-Augmented Generation)?
Ans. It helps rank documents by matching query embeddings with document embeddings.
'''
# Example
query = [0.6, 0.4]
doc1  = [0.5, 0.5]
doc2  = [0.1, 0.9]
# query ¬∑ doc1 > query ¬∑ doc2 ‚Üí doc1 retrieved first


'''
Q4. Why does raw dot product sometimes give misleading similarity?
Ans. Raw dot product can be misleading because it depends on vector magnitude, not just direction. 
Larger vectors can produce high similarity scores even when their semantic meaning is not more similar ‚Äî only their scale is larger.
'''
# Example
v = [1, 1]       # small vector
u = [100, 100]   # same direction, much larger

dot = v[0]*u[0] + v[1]*u[1]  # = 200
# High dot product due to size, not meaning
# This is why we use cosine similarity.
# Where, we go and normalize the vectors first. (The magnitude/scale factor is removed.)


'''
Q5. Why do we normalize vectors before using dot product?
Ans. Normalization removes magnitude and keeps only direction, giving true semantic similarity.
'''
# Example
v = [3, 4]
u = [6, 8]
# After normalization, vÃÇ ¬∑ √ª = 1 ‚Üí identical meaning


'''
Q6. How is cosine similarity related to dot product?
Ans. Cosine similarity is the dot product of normalized vectors.
'''
# Example
# cosine_similarity(v, u) = (v ¬∑ u) / (‚Äñv‚Äñ ‚Äñu‚Äñ)
v = [1, 0]
u = [0, 1]
# Cosine similarity = 0 ‚Üí unrelated meanings


'''
Q7. Why does dot product capture both agreement and conflict?
Ans. Positive values show agreement, zero shows no relation, and negative shows conflict.
'''
# Example
good = [1, 0]
bad  = [-1, 0]
# good ¬∑ bad = -1 ‚Üí opposite sentiment


'''
Q8. What is the final mental model for dot product in AI?
Ans. Dot product answers: ‚ÄúHow strongly do these two meanings align?‚Äù
'''
# Example
user_interest = [0.9, 0.1]
item_profile  = [0.8, 0.2]
# High dot product ‚Üí good recommendation


### Cos Table

| Angle (Œ∏) |        cos Œ∏       | Meaning (Vector Intuition)         |
| --------: | :----------------: | ---------------------------------- |
|        0¬∞ |          1         | Same direction (identical vectors) |
|       30¬∞ |  ‚àö3/2 ‚âà **0.866**  | Very similar direction             |
|       45¬∞ |  1/‚àö2 ‚âà **0.707**  | Quite similar                      |
|       60¬∞ |    1/2 = **0.5**   | Moderately similar                 |
|       90¬∞ |          0         | No similarity (orthogonal)         |
|      120¬∞ |   ‚àí1/2 = **‚àí0.5**  | Opposite-ish direction             |
|      135¬∞ | ‚àí1/‚àö2 ‚âà **‚àí0.707** | Strongly opposite                  |
|      150¬∞ | ‚àí‚àö3/2 ‚âà **‚àí0.866** | Very opposite                      |
|      180¬∞ |         ‚àí1         | Exact opposite direction           |


### Short Visual

In [8]:
# Angle   Cosine     Meaning
# 0¬∞        1.0        Same
# 90¬∞       0.0        Unrelated
# 180¬∞     -1.0        Opposite

### Dot Product is AGREEMENT SCORE

In [9]:
import numpy as np

v = np.array([1, 2, 3])
u = np.array([1, 2, 3])

print(v @ u)   # 14

# Each matching feature multiplies to contribute positively to the dot product.
# Or you can say each matching feature contributes to the overall dot product.
# Or you can say each matching feature reinforces the similarity or overall agreement or overall score or overall dot product.
# Non-matching features (like 0*5) contribute nothing.

14


### Only Non-Zero features contribute

In [10]:
v = np.array([2, 0, 1])
u = np.array([2, 5, 1])

print(v @ u)   # 5

# Only aligned, non-zero features contribute.

5


### Preventing Size/Magnitude from giving illution of good Dot Product

In [12]:
v = np.array([1, 1])
u = np.array([10, 10])

print(v @ u)                              # large (20)
print((v @ u) / (np.linalg.norm(v) * np.linalg.norm(u)))  # 1.0

# Here in 2nd print, we go and normalize the vectors first. (The magnitude/scale factor is removed.)
# High dot product (20) comes from same direction, not meaningfully larger similarity.

20
0.9999999999999998


### Perpendicular Vectors (No Agreement)

In [13]:
v = np.array([1, 0])
u = np.array([0, 1])

print(v @ u)   # 0
# No contribution from perpendicular vectors

0


### Opposite Direction Vectors (Conflicting)

In [14]:
v = np.array([1, 0])
u = np.array([-1, 0])

print(v @ u)   # -1

# Negative score = conflict / opposite meaning.

-1


### Dot Product in Attention

In [16]:
query = np.array([1, 0])   # what I am looking for (current word / question)
key   = np.array([1, 1])   # what this token represents (stored meaning)

print(query @ key)   # 1

# Dot product = 1*1 + 0*1 = 1

# Positive score ‚Üí vectors point in roughly the same direction
# Means: query and key share some common meaning

# Bigger dot product ‚Üí stronger alignment
# Smaller or zero ‚Üí weak or no relation
# Negative ‚Üí opposite meanings (rare but possible)

# IMPORTANT:
# Dot product measures BOTH:
# 1) Direction similarity (meaning)
# 2) Magnitude (strength / importance)

# That‚Äôs why raw dot product can be misleading
# (longer vectors give bigger scores even if meaning is same)

# In Attention:
# - Query asks: "Is this token relevant to me?"
# - Key answers: "Here is what I contain"
# - Dot product = relevance score

# Higher score ‚Üí more attention weight
# Lower score ‚Üí less attention paid


1


### Dot Product in RAG

In [17]:
query = np.array([0.6, 0.4])
doc1  = np.array([0.5, 0.5])
doc2  = np.array([0.1, 0.9])

print(query @ doc1)  # higher
print(query @ doc2)  # lower
# Dot product helps rank documents by relevance to the query

0.5
0.42000000000000004


### Cosine Similarity = Normalised Dot Product

In [18]:
v = np.array([1, 0])
u = np.array([0, 1])

cos_sim = (v @ u) / (np.linalg.norm(v) * np.linalg.norm(u))
print(cos_sim)   # 0

# Cosine similarity = normalized dot product.
# Cosine similarity = 0 means vectors are perpendicular (no similarity).
# Only aligned, non-zero features contribute.


0.0


### Agreement vs Conflict

In [19]:
good = np.array([1, 0])
bad  = np.array([-1, 0])

print(good @ bad)   # -1
# Negative score = conflict / opposite meaning.
# If output were to be +ve, it would imply agreement, which is not the case here.
# If output were to be 0, it would imply no relation, which is also not the case here.

# üìå Positive = agreement
# üìå Zero = unrelated
# üìå Negative = conflict

-1
