# module02_vector_mathematics


## vectors.py
vectors.py
Explanation of Vectors
Vectors are mathematical objects with magnitude and direction.
In machine learning, they represent data points in multi-dimensional space.
Example: [0.00567, 0.00678, 0.79876] - a 3D vector


In [None]:
import numpy as np  # Import NumPy for vector operations


In [None]:
# Creating vectors
vector1 = np.array([0.00567, 0.00678, 0.79876])  # Create a 3D vector as a NumPy array
vector2 = np.array([0.1, 0.2, 0.3])  # Create another 3D vector
vector3 = np.array([1, 2, 3, 4, 5])  # Create a 5D vector (different dimension)


In [None]:
print("Vector 1:", vector1)  # Print vector1 to show its components
print("Vector 2:", vector2)  # Print vector2 to show its components
print("Vector 3:", vector3)  # Print vector3 to show its components (5 dimensions)


In [None]:
# Vector operations
print("\nVector Operations:")  # Print section header


In [None]:
# Addition
result_add = vector1 + vector2  # Add corresponding components of the two vectors
print("Addition:", result_add)  # Print the result of vector addition


In [None]:
# Subtraction
result_sub = vector2 - vector1  # Subtract corresponding components of vector1 from vector2
print("Subtraction:", result_sub)  # Print the result of vector subtraction


In [None]:
# Scalar multiplication
scalar = 2.5  # Define a scalar value
result_scalar = scalar * vector1  # Multiply each component of vector1 by the scalar
print("Scalar multiplication:", result_scalar)  # Print the result of scalar multiplication


In [None]:
# Magnitude (norm)
magnitude1 = np.linalg.norm(vector1)  # Calculate the Euclidean norm (magnitude) of vector1
magnitude2 = np.linalg.norm(vector2)  # Calculate the Euclidean norm (magnitude) of vector2
print("Magnitude of vector1:", magnitude1)  # Print magnitude of vector1
print("Magnitude of vector2:", magnitude2)  # Print magnitude of vector2


In [None]:
# Unit vector (normalization)
unit_vector1 = vector1 / magnitude1  # Divide vector1 by its magnitude to get unit vector
print("Unit vector of vector1:", unit_vector1)  # Print the unit vector (length = 1)
print("Magnitude of unit vector:", np.linalg.norm(unit_vector1))  # Verify unit vector has magnitude 1


Vector similarity concepts:
- Euclidean distance: physical distance between points
- Cosine similarity: angle between vectors (direction similarity)
- Dot product: projection of one vector onto another


In RAG and NLP:
- Document embeddings are high-dimensional vectors
- Similarity search finds vectors close in vector space
- Clustering groups similar vectors together


## cosine_similarity.py
cosine_similarity.py
Explanation of Cosine Similarity
Cosine similarity measures the cosine of the angle between two vectors.
It's used to determine how similar two documents or items are, regardless of their magnitude.
Values range from -1 (opposite) to 1 (identical).


In [None]:
import numpy as np  # Import NumPy for vector operations
from sklearn.metrics.pairwise import cosine_similarity  # Import cosine similarity from scikit-learn


In [None]:
def cosine_similarity_manual(vec1, vec2):  # Define function to calculate cosine similarity manually
    """
    Calculate cosine similarity manually.
    """
    dot_product = np.dot(vec1, vec2)  # Calculate dot product of the two vectors
    norm1 = np.linalg.norm(vec1)  # Calculate Euclidean norm (magnitude) of first vector
    norm2 = np.linalg.norm(vec2)  # Calculate Euclidean norm (magnitude) of second vector
    return dot_product / (norm1 * norm2)  # Return cosine similarity: dot product divided by product of magnitudes


In [None]:
# Example vectors (could represent document embeddings)
vec1 = np.array([1, 2, 3, 4])  # Create first example vector
vec2 = np.array([1, 2, 3, 5])  # Create second vector (similar to vec1)
vec3 = np.array([-1, -2, -3, -4])  # Create third vector (opposite direction to vec1)


In [None]:
print("Manual Cosine Similarity:")  # Print section header
print(f"vec1 vs vec2: {cosine_similarity_manual(vec1, vec2):.3f}")  # Print similarity between vec1 and vec2
print(f"vec1 vs vec3: {cosine_similarity_manual(vec1, vec3):.3f}")  # Print similarity between vec1 and vec3


In [None]:
# Using scikit-learn
vectors = np.array([vec1, vec2, vec3])  # Create array of all vectors for pairwise comparison
similarity_matrix = cosine_similarity(vectors)  # Calculate pairwise cosine similarities


In [None]:
print("\nCosine Similarity Matrix:")  # Print section header
print(similarity_matrix)  # Print the similarity matrix showing all pairwise similarities


In the context of BM25Okapi, cosine similarity can be used to compare query and document vectors
The get_scores() method in BM25 typically returns scores for unique keywords


In [None]:
from rank_bm25 import BM25Okapi  # Import BM25 ranking algorithm


In [None]:
# Sample documents and query
documents = [  # Define sample documents for BM25 demonstration
    "The cat sat on the mat",  # First document
    "The dog played in the park",  # Second document
    "Cats and dogs are pets"  # Third document
]


In [None]:
# Tokenize documents
tokenized_docs = [doc.lower().split() for doc in documents]  # Convert documents to lowercase and split into words


In [None]:
# Create BM25 model
bm25 = BM25Okapi(tokenized_docs)  # Initialize BM25 model with tokenized documents


In [None]:
# Query
query = "cat dog"  # Define search query
tokenized_query = query.lower().split()  # Tokenize the query


In [None]:
# Get BM25 scores
scores = bm25.get_scores(tokenized_query)  # Calculate BM25 relevance scores for the query against all documents
print(f"\nBM25 scores for query '{query}': {scores}")  # Print the BM25 scores for each document


Note: get_scores() returns scores for each document, not just unique keywords
For unique keywords, you might need to process the query differently


## dot_product.py
dot_product.py
Explanation of Dot Product
The dot product (scalar product) is a mathematical operation between two vectors.
It results in a scalar value and measures the similarity in direction.
Formula: A · B = Σ(a_i * b_i) = |A| * |B| * cos(θ)


In [None]:
import numpy as np  # Import NumPy for vector operations


In [None]:
def dot_product_manual(vec1, vec2):  # Define function to calculate dot product manually
    """
    Calculate dot product manually.
    """
    if len(vec1) != len(vec2):  # Check if vectors have the same length
        raise ValueError("Vectors must have the same length")  # Raise error if lengths don't match
    return sum(a * b for a, b in zip(vec1, vec2))  # Sum the products of corresponding elements


In [None]:
# Example vectors
vec1 = np.array([1, 2, 3])  # Create first example vector
vec2 = np.array([4, 5, 6])  # Create second example vector
vec3 = np.array([-1, 2, -3])  # Create third vector (somewhat perpendicular to vec1)


In [None]:
print("Vector 1:", vec1)  # Print vec1 components
print("Vector 2:", vec2)  # Print vec2 components
print("Vector 3:", vec3)  # Print vec3 components


In [None]:
# Manual calculation
manual_dot12 = dot_product_manual(vec1, vec2)  # Calculate dot product of vec1 and vec2 manually
manual_dot13 = dot_product_manual(vec1, vec3)  # Calculate dot product of vec1 and vec3 manually


In [None]:
print("\nManual Dot Products:")  # Print section header
print(f"vec1 · vec2 = {manual_dot12}")  # Print manual dot product result
print(f"vec1 · vec3 = {manual_dot13}")  # Print manual dot product result


In [None]:
# Using NumPy
numpy_dot12 = np.dot(vec1, vec2)  # Calculate dot product using NumPy's dot function
numpy_dot13 = np.dot(vec1, vec3)  # Calculate dot product using NumPy's dot function


In [None]:
print("\nNumPy Dot Products:")  # Print section header
print(f"vec1 · vec2 = {numpy_dot12}")  # Print NumPy dot product result
print(f"vec1 · vec3 = {numpy_dot13}")  # Print NumPy dot product result


In [None]:
# Relationship with cosine similarity
def cosine_similarity_from_dot(vec1, vec2):  # Define function to calculate cosine similarity using dot product
    dot = np.dot(vec1, vec2)  # Calculate dot product
    norm1 = np.linalg.norm(vec1)  # Calculate magnitude of vec1
    norm2 = np.linalg.norm(vec2)  # Calculate magnitude of vec2
    return dot / (norm1 * norm2)  # Return cosine similarity: dot product divided by product of magnitudes


In [None]:
cos_sim12 = cosine_similarity_from_dot(vec1, vec2)  # Calculate cosine similarity between vec1 and vec2
cos_sim13 = cosine_similarity_from_dot(vec1, vec3)  # Calculate cosine similarity between vec1 and vec3


In [None]:
print("Cosine Similarities:")  # Print section header
print(f"cos(θ) for vec1, vec2: {cos_sim12:.3f}")  # Print cosine similarity with 3 decimal places
print(f"cos(θ) for vec1, vec3: {cos_sim13:.3f}")  # Print cosine similarity with 3 decimal places


Applications in ML:
- Neural network forward pass (weighted sum)
- Cosine similarity for text/document comparison
- Projection of vectors onto axes
- Matrix multiplication (dot product of rows/columns)
