Business Problem

In [None]:
import numpy as np
import matplotlib.pyplot as plt
# Data structure: [restaurant_id, 2021, 2022, 2023, 2024]
sales_data = np.array([
    [1, 150000, 180000, 220000, 250000],  # Paradise Biryani
    [2, 120000, 140000, 160000, 190000],  # Beijing Bites
    [3, 200000, 230000, 260000, 300000],  # Pizza Hub
    [4, 180000, 210000, 240000, 270000],  # Burger Point
    [5, 160000, 185000, 205000, 230000]   # Chai Point
])
print(sales_data)
print(sales_data.shape,sales_data.ndim,sales_data.dtype)
print("i want 3rd row ",sales_data[3])
print("i want 1st 3 \n",sales_data[:4])
print("first col remove \n",sales_data[:,1:])

arithmetic operation

In [None]:
print(np.sum(sales_data[:,1:],axis=1))
print(sales_data.sum(axis=1))

Min,max sales per rest

In [None]:
print(np.min(sales_data[:,1:],axis=1))
print(np.max(sales_data[:,1:],axis=1))

Average sales per restau

In [None]:
print(np.mean(sales_data[:,1:],axis=1))

cumulative sum

In [1]:
cumsum = np.cumsum(sales_data[:,1:],axis=1)
print(cumsum)

plt.figure(figsize=(10,5))
plt.plot(np.mean(cumsum,axis=0))
plt.title('cumulative sum')
plt.xlabel('sales')
plt.ylabel('cumulative sum')
plt.grid(True)
plt.show()

NameError: name 'np' is not defined

VECTOR AND DOT PRODUCT (ADVANCED THEORY)

1. Vector Definition:
   - A vector is an ordered list of numbers representing magnitude and direction.
   - In R^n space, a vector has n components.
   - Example: v = [v1, v2, v3, ..., vn]

2. Basic Vector Operations:
   - Addition: u + v = [u1+v1, u2+v2, ..., un+vn]
   - Scalar Multiplication: c*u = [c*u1, c*u2, ..., c*un]
   - Norm (Length): ||u|| = sqrt(u1^2 + u2^2 + ... + un^2)

3. Dot Product (Scalar Product):
   - Formula: u · v = u1*v1 + u2*v2 + ... + un*vn
   - Result is a scalar (not a vector).
   - It measures "directional similarity" between vectors.

4. Geometric Interpretation:
   - u · v = ||u|| * ||v|| * cos(θ)
   - θ = angle between u and v
   - If u · v > 0 → angle < 90° (vectors point in similar direction)
   - If u · v = 0 → vectors are orthogonal (perpendicular)
   - If u · v < 0 → angle > 90° (vectors point in opposite directions)

5. Applications in Data Science:
   - Cosine Similarity:
       cos(θ) = (u · v) / (||u|| * ||v||)
       Used in text similarity, recommendation systems, word embeddings.
   - Projections:
       Projection of u on v = (u · v / ||v||^2) * v
       Used in dimensionality reduction and regression.
   - Matrix Multiplication:
       Each element of matrix product is a dot product of a row and a column.
   - Machine Learning:
       Neural networks compute weighted sums (dot products) in each layer.

6. Advanced Example Context:
   - Consider two word embeddings in NLP:
       u = vector of "king"
       v = vector of "man"
       w = vector of "woman"
     The relationship "king - man + woman ≈ queen" uses vector arithmetic and dot products.

7. Summary:
   - Vector → direction + magnitude
   - Dot product → scalar measuring similarity + angle
   - Core tool in ML: similarity, projection, optimization, embeddings.


In [None]:
import numpy as np

# Example vectors (higher dimension: like word embeddings, 5D)
u = np.array([2, 3, -1, 4, 0])
v = np.array([1, -2, 3, 0, 5])

# 1. Basic Dot Product
dot = np.dot(u, v)
print("Dot Product:", dot)

# 2. Magnitudes (norms)
norm_u = np.linalg.norm(u)
norm_v = np.linalg.norm(v)
print("Norm of u:", norm_u)
print("Norm of v:", norm_v)

# 3. Cosine Similarity
cos_sim = dot / (norm_u * norm_v)
print("Cosine Similarity:", cos_sim)

# 4. Angle between vectors (in degrees)
angle_rad = np.arccos(cos_sim)
angle_deg = np.degrees(angle_rad)
print("Angle between u and v (degrees):", angle_deg)

# 5. Projection of u onto v
proj_u_on_v = (np.dot(u, v) / np.dot(v, v)) * v
print("Projection of u onto v:", proj_u_on_v)

# 6. Realistic Example: Document similarity
doc1 = np.array([1, 2, 0, 1, 3])  # e.g., word frequencies
doc2 = np.array([0, 1, 1, 0, 2])

dot_docs = np.dot(doc1, doc2)
cos_docs = dot_docs / (np.linalg.norm(doc1) * np.linalg.norm(doc2))
print("\nDocument Cosine Similarity:", cos_docs)

# 7. Matrix Multiplication via Dot Products
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.array([[7, 8], [9, 10], [11, 12]])
C = np.dot(A, B)
print("\nMatrix Multiplication Result:\n", C)


1. Scalars:
   - A scalar is just a single number (magnitude only, no direction).
   - Examples: temperature = 30°C, mass = 70kg, speed = 50 km/h.
   - Represented as: s ∈ ℝ (real number).

2. Vectors:
   - A vector has both magnitude AND direction.
   - Examples: velocity = (60 km/h north), force = (10 N upward).
   - Represented as: v = [v1, v2, v3, ...] in n-dimensional space.

3. Relation between Vector and Scalar:
   - Vectors can be scaled by scalars:
        If v = [2, 3], then 2*v = [4, 6].
   - The **length (magnitude)** of a vector is a scalar.
   - The **dot product** of two vectors → scalar.
   - The **cross product** of two vectors → vector.

4. Operations involving Scalars:
   - Scalar × Scalar → Scalar
   - Scalar × Vector → Vector
   - Vector · Vector → Scalar
   - Vector × Vector → Vector (3D only)


In [None]:
import numpy as np

# Define vector and scalar
v = np.array([2, 3, 4])
s = 5

# 1. Scalar multiplication
scaled_v = s * v
print("Scalar * Vector:", scaled_v)

# 2. Dot Product (only vector · vector)
u = np.array([1, -1, 2])
dot = np.dot(v, u)
print("Dot Product (v·u):", dot)

# 3. Cross Product (only vector × vector in 3D)
cross = np.cross(v, u)
print("Cross Product (v×u):", cross)

# ❌ Not valid:
# np.dot(v, s)   # Error (can't dot vector with scalar)
# np.cross(v, s) # Error (can't cross vector with scalar)


EUCLIDEAN DISTANCE vs COSINE SIMILARITY

1. Euclidean Distance:
   - Definition: Straight-line distance between two vectors (points).
   - Formula:
       d(u, v) = sqrt( (u1 - v1)^2 + (u2 - v2)^2 + ... + (un - vn)^2 )
   - Always >= 0.
   - Small value → vectors/points are close.
   - Used in: kNN (k-Nearest Neighbors), clustering, image retrieval.

2. Cosine Similarity:
   - Definition: Measures cosine of angle between two vectors.
   - Formula:
       cos(θ) = (u · v) / (||u|| * ||v||)
   - Value range: [-1, 1]
       1   → vectors point in same direction
       0   → vectors are orthogonal (no similarity)
      -1   → vectors point in opposite directions
   - Focuses on **orientation**, not magnitude.
   - Used in: text similarity (TF-IDF vectors), recommendation systems.

3. Key Difference:
   - Euclidean Distance → measures absolute distance (magnitude matters).
   - Cosine Similarity → measures directional similarity (magnitude ignored).

4. Example:
   u = [1, 2], v = [2, 4]
   - Euclidean distance = sqrt((1-2)^2 + (2-4)^2) = sqrt(1+4) = √5
   - Cosine similarity = (u·v)/(||u||*||v||) = (1*2+2*4)/(√5*√20) = 10/10 = 1
   → Meaning: u and v point in same direction but are at different lengths.


In [None]:
import numpy as np

# Example vectors
u = np.array([1, 2, 3])
v = np.array([2, 3, 4])

# 1. Euclidean Distance
euclidean = np.linalg.norm(u - v)
print("Euclidean Distance:", euclidean)

# 2. Cosine Similarity
dot = np.dot(u, v)
norm_u = np.linalg.norm(u)
norm_v = np.linalg.norm(v)
cosine_sim = dot / (norm_u * norm_v)
print("Cosine Similarity:", cosine_sim)

# 3. Show difference clearly
x = np.array([1, 0])
y = np.array([100, 0])

print("\nExample with same direction but different length:")
print("Euclidean Distance:", np.linalg.norm(x - y))   # Large
print("Cosine Similarity:", np.dot(x, y) / (np.linalg.norm(x)*np.linalg.norm(y)))  # = 1


VECTORIZATION IN PYTHON

1. Definition:
   - Vectorization means applying operations to entire arrays/vectors at once,
     instead of looping through elements one by one.
   - It uses low-level C/Fortran optimizations inside NumPy, making it faster.

2. Why Vectorization?
   - Python loops (for/while) are slow for large datasets.
   - NumPy uses vectorized operations (SIMD - Single Instruction Multiple Data).
   - Example:
       Without vectorization: for i in range(n): c[i] = a[i] + b[i]
       With vectorization:    c = a + b

3. Benefits:
   - Faster execution (important in ML & Data Science).
   - Cleaner, more concise code.
   - Easy to read (like math notation).

4. Examples of Vectorization:
   - Element-wise operations (add, subtract, multiply, divide).
   - Matrix operations (dot product, cross product, multiplication).
   - Statistical operations (mean, variance, std).
   - Logical operations (masking, filtering).

5. When to Use Vectorization?
   - Whenever working with arrays/matrices (NumPy, pandas).
   - For ML preprocessing, feature engineering, matrix multiplications.
   - For avoiding explicit Python loops on large datasets.


In [None]:
import numpy as np
import time

# Example arrays
a = np.arange(1_000_000)
b = np.arange(1_000_000)

# -------------------------------
# 1. Without Vectorization (Loop)
# -------------------------------
start = time.time()
c_loop = np.zeros_like(a)
for i in range(len(a)):
    c_loop[i] = a[i] + b[i]
end = time.time()
print("Loop Time:", end - start)

# -------------------------------
# 2. With Vectorization (NumPy)
# -------------------------------
start = time.time()
c_vec = a + b
end = time.time()
print("Vectorized Time:", end - start)

# -------------------------------
# 3. Element-wise Operations
# -------------------------------
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

print("\nElement-wise add:", x + y)
print("Element-wise multiply:", x * y)
print("Element-wise square:", x ** 2)

# -------------------------------
# 4. Statistical Vectorization
# -------------------------------
data = np.random.rand(1000000)  # 1M random numbers
print("\nMean:", np.mean(data))
print("Standard Deviation:", np.std(data))

# -------------------------------
# 5. Masking (Logical Vectorization)
# -------------------------------
arr = np.array([1, 2, 3, 4, 5, 6])
mask = arr > 3
print("\nMask:", mask)          # [False False False  True  True  True]
print("Filtered:", arr[mask])   # [4 5 6]


BROADCASTING IN PYTHON (NumPy)

1. Definition:
   - Broadcasting allows NumPy to perform operations on arrays of different shapes
     without explicitly copying or looping.
   - The smaller array is "stretched" (broadcasted) across the larger one
     so they can be combined element-wise.

2. Broadcasting Rules:
   - Compare dimensions from right to left.
   - Two dimensions are compatible if:
       (a) they are equal, OR
       (b) one of them is 1.
   - If compatible, NumPy virtually "expands" the smaller array
     without making actual copies.

3. Examples:
   - Vector + Scalar:
       [1, 2, 3] + 5 → [6, 7, 8]
   - Matrix + Vector:
       [[1,2,3],
        [4,5,6]] + [10,20,30]
       → [[11,22,33],
          [14,25,36]]

4. Benefits:
   - Memory efficient (no data duplication).
   - Fast (uses optimized C loops).
   - Cleaner code, no need for explicit reshaping.

5. Limitations:
   - If shapes are not compatible → ValueError.
   - Example: (3,) and (4,) cannot broadcast.


In [None]:
import numpy as np

# 1. Vector + Scalar
a = np.array([1, 2, 3])
print("a + 5:", a + 5)   # [6 7 8]

# 2. Matrix + Vector
M = np.array([[1, 2, 3],
              [4, 5, 6]])
v = np.array([10, 20, 30])
print("\nMatrix + Vector:\n", M + v)

# 3. Higher Dim Broadcasting
X = np.array([[1], [2], [3]])   # Shape (3,1)
Y = np.array([10, 20, 30])      # Shape (3,)
print("\nShapes:", X.shape, Y.shape)
print("Broadcasted Sum:\n", X + Y)

# 4. Broadcasting in Math
arr = np.array([1, 2, 3, 4, 5])
print("\nSquare all elements:", arr ** 2)  # [1 4 9 16 25]

# 5. Incompatible Shapes (Error)
try:
    a = np.array([1, 2, 3])
    b = np.array([4, 5, 6, 7])   # Different length
    print(a + b)
except Exception as e:
    print("\nError Example:", e)
