In [1]:
import numpy as np

__Exercise__: Create a small 5×5 array to represent a simple image (maybe a diagonal line of 255s on a background of 0s). Print out the array and verify the positions of high values form the diagonal. This simulates creating a simple image pattern with numpy.

In [2]:
a = np.diag([255] * 5)

In [3]:
a

array([[255,   0,   0,   0,   0],
       [  0, 255,   0,   0,   0],
       [  0,   0, 255,   0,   0],
       [  0,   0,   0, 255,   0],
       [  0,   0,   0,   0, 255]])

In [4]:
# Verify positions of 255

for i in range(a.shape[0]):
    if a[i, i] == 255:
        print(f'255 found at position {i}, {i}')

255 found at position 0, 0
255 found at position 1, 1
255 found at position 2, 2
255 found at position 3, 3
255 found at position 4, 4


---

In [5]:
b = np.zeros((5, 5), dtype=int)
np.fill_diagonal(b, 255)        # modifies the input array in-place, it does not return a value
b

array([[255,   0,   0,   0,   0],
       [  0, 255,   0,   0,   0],
       [  0,   0, 255,   0,   0],
       [  0,   0,   0, 255,   0],
       [  0,   0,   0,   0, 255]])

---

__NumPy arrays__ are `homogeneous`, meaning `every element must be the same type`.

---

In [6]:
students = np.array([
    [1, 85, 78],
    [2, 90, 88],
    [3, 75, 85]
], dtype=np.int32)

students

array([[ 1, 85, 78],
       [ 2, 90, 88],
       [ 3, 75, 85]])

In [7]:
students[:, 1].mean()   # mean of second column (Math scores)

83.33333333333333

---

__Real-World Example (Loading CSV data):__ You might have a CSV file with rows of data. While the pandas library is often used for tabular data, NumPy can also load simple numeric data. For instance, if data.csv contains:

In [8]:
# height,weight,age
# 170,65,25
# 160,50,30
# 180,80,22

We can load it with NumPy (using `genfromtxt` or `loadtxt`):

In [9]:
data = np.loadtxt('data.csv', delimiter=',', skiprows=1)
data

array([[170.,  65.,  25.],
       [160.,  50.,  30.],
       [180.,  80.,  22.]])

In [10]:
# Notice by default loadtxt gave floats; we can specify dtype=int if we want integers.

---
---

### Key Linear Algebra Concepts in Machine Learning
---

#### 1. Vectors

Description: In mathematics, __a vector__ is an ordered list of numbers. Geometrically, you can think of a vector as a point in space (like a coordinate) or an arrow from the origin to that point. For example, [3, 5] in 2D represents a point 3 units along the x-axis and 5 units along the y-axis. Vectors have a _magnitude_ (length) and _direction_. In linear algebra, vectors are often written as column vectors (like a column of numbers), but in NumPy we usually use 1D arrays to represent them.

In data science, a vector is a convenient way to represent a single data instance or a set of features. __Feature vector__ - a list of features describing one sample. For example, if we have a patient with [height, weight, age], that’s a feature vector in 3-dimensional space. Vectors are used to represent words in NLP (word embeddings), pixel values of an image (flattened into one long vector), or a time-series of sensor readings, etc.

In [11]:
# create a vector (as a 1D numpy array)

v = np.array([2, 5, 1])
w = np.array([3, 4, 1])

print('Vector v:', v)
print('Vector w:', w)
print('Shape of v:', v.shape)

Vector v: [2 5 1]
Vector w: [3 4 1]
Shape of v: (3,)


In [12]:
# addition - add corresponding elements

print("v + w =", v + w)

v + w = [5 9 2]


In [13]:
# subtraction

print("v - w =", v - w)

v - w = [-1  1  0]


NumPy will perform __element-wise__ addition/subtraction automatically since v and w have the same shape.

In [14]:
# Scalar Multiplication: Multiply each element by a number.

print("2 * v =", 2 * v)

2 * v = [ 4 10  2]


In [15]:
# Magnitude (Length): ||v|| = sqrt(v_1^2 + v_2^2 + ... ).

mag_v = np.linalg.norm(v)    # Euclidean norm (length) of v

print("||v|| =", mag_v)

||v|| = 5.477225575051661


In [16]:
# manually it can be done

np.sqrt((v**2).sum())

5.477225575051661

---
---

#### 2. Matrices 

In linear algebra, matrices are used to solve systems of linear equations, to represent linear transformations (like rotating or scaling coordinates), and much more.

In [17]:
# create a 2 x 3 matrix
M = np.array([[1, 2, 3],
             [4, 5, 6]])

In [18]:
# first row
M[0, :]

array([1, 2, 3])

In [19]:
# 3rd column
M[:, 2]

array([3, 6])

We can do __operations__ on matrices __element-wise__ similar to vectors (addition, subtraction, scalar multiply, etc., __as long as shapes align__ or 
__via broadcasting__).

In [20]:
M * 2

array([[ 2,  4,  6],
       [ 8, 10, 12]])

In [21]:
N = np.array([[7, 8, 9],
             [1, 2, 3]])

M + N

array([[ 8, 10, 12],
       [ 5,  7,  9]])

---

__Use in Data/ML__:

As mentioned, treating the whole dataset as a matrix allows vectorized computations. For instance, if X is an (N×M) matrix of data and w is an (M×1) weight vector, then X @ w yields an N×1 vector of predictions (one per data point). This is how we express making predictions for multiple data points in one go.

----
----

#### 3. Matrix Multiplication

This is __not__ done element-wise, but follows a specific rule: if A is of shape (p×q) and B is of shape (q×r), then their product C = A × B is of shape (p×r). Each element of C is computed by taking a row of A and a column of B and computing their dot product (multiply corresponding elements and sum them up).

For matrix multiplication to be valid, the __inner dimensions__ must match (the number of columns of the first matrix must equal the number of rows of the second matrix). If you have incompatible shapes, you cannot multiply them in the standard linear algebra sense.

(Note: pay attention to order – matrix multiplication is _not commutative_, meaning 
AB ≠ BA in general.)

__Python/NumPy Example__: We can use `np.dot()` or the `@` operator to do matrix multiplication in NumPy.

In [22]:
A = np.array([[1, 2, 3],
             [4, 5, 6]])   # shape (2, 3)
B = np.array([[7, 8],
             [9, 10],
             [11, 12]])    # shape (3, 2)

In [23]:
C = A.dot(B)
C

array([[ 58,  64],
       [139, 154]])

In [24]:
A @ B

array([[ 58,  64],
       [139, 154]])

In [25]:
np.dot(A, B)

array([[ 58,  64],
       [139, 154]])

---

__Use in Data/ML__: Matrix multiplication is everywhere in machine learning:

- __Linear Regression/Linear Models__: If X is your data matrix (N samples × M features) and β is a parameter vector (M × 1), then predictions 
y^ for all N samples can be computed as the matrix product X × β (result is N × 1). This is essentially performing N dot-products (one for each sample).
- __Neural Networks__: The computation in each layer of a neural network is often a matrix multiply: if you have an input vector, it’s multiplied by a weight matrix to produce an output vector for the next layer. When you process multiple inputs at once (batch processing), you actually use matrix multiplication between a batch matrix and the weight matrix.
- __Word Embeddings__: In NLP, if you represent the vocabulary as vectors (one-hot encodings), multiplying a one-hot vector (which is mostly zeros and a 1 for the target word index) by an __embedding matrix__ yields the vector for that word. That’s matrix multiplication under the hood: one-hot (1×V) times embedding matrix (V×D) = word embedding (1×D).


---
---

#### 4. Dot Product

Description: The __dot product__ (also called scalar product or inner product) is an operation that takes two vectors of the same length and returns a single number (a scalar). If $a$ and $b$ are vectors $(a_1, a_2, ..., a_n)$ and $(b_1, b_2, ..., b_n)$, then

$$
 a \cdot b = a_1 b_1 + a_2 b_2 + \cdots + a_n b_n. 
$$

It's essentially multiplying corresponding components and summing them. We saw this concept inside matrix multiplication (each entry was a row dot a column). The dot product has a geometric interpretation: $a \cdot b = \|a\|\|b\|\cos\theta$, where $\theta$ is the angle between the two vectors. So if two vectors point in similar directions, their dot product is large (and positive); if they are orthogonal (90° apart), dot product is 0; if they point opposite, dot product is negative.

__Python/NumPy Example__: Dot product of two vectors.

In [26]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
dot = np.dot(a, b) # or a.dot(b) or a @ b (for 1D does dot)
dot

32

In [27]:
# We can also confirm this by breaking it down:

elementwise = a * b
elementwise.sum()

32

---

**Use in Data/ML:**
- **Feature Weights:** If you have a feature vector and a weight vector, the prediction of a linear model is a dot product $w \cdot x$ (plus maybe a bias). For instance, in linear regression or in a single neuron of a neural net, you compute weighted sum of inputs – that’s a dot product.
- **Similarity:** In information retrieval or recommender systems, you might compute how similar two users are by taking the dot product of their preference vectors. Cosine similarity between two vectors is basically $\frac{a \cdot b}{\|a\|\|b\|}$. If vectors are normalized to length 1, cosine similarity is exactly the dot product. Word embeddings are often compared via dot product to find similar words ([Linear Algebra Required for Data Science | GeeksforGeeks](https://www.geeksforgeeks.org/linear-algebra-required-for-data-science/#:~:text=%2A%20NLP%20,dot%20products%20alongside%20matrix%20multiplication)).
- **Orthogonality:** As noted, if $a \cdot b = 0$, the vectors are orthogonal (uncorrelated in a sense). In ML, this concept appears in orthogonal feature vectors or orthogonal weight initialization in neural networks, etc., meaning components that capture independent information.
- **Matrix multiplication connection:** When we do $X @ w$ for predictions, each output is a dot product of a data row with the weight vector. So dot product is the elemental operation inside matrix multiplication.


---
---

### Basic Arithmetic Operations

In [28]:
x = np.array([1, 2, 3])
y = np.array([10, 10, 10])

In [29]:
x + y 

array([11, 12, 13])

In [30]:
x * y

array([10, 20, 30])

In [31]:
x + 5

array([6, 7, 8])

In [32]:
x ** 2

array([1, 4, 9])

---

Note: In NumPy, `*` is __not__ matrix multiplication; it is element-wise multiplication. For matrix multiplication, use `@` or `np.dot` as discussed earlier.



In [33]:
A = np.array([[1, 2],
             [3, 4]])

B = np.array([[5, 6],
             [7, 8]])

In [36]:
print('A * B (element-wise):\n', A * B)
print('A @ B (matrix multiply):\n', A @ B)

A * B (element-wise):
 [[ 5 12]
 [21 32]]
A @ B (matrix multiply):
 [[19 22]
 [43 50]]


---

NumPy also has built-in __universal__ functions (ufuncs) that apply to each element:

- `np.sqrt(x)` – square root of each element.
- `np.exp(x)` – exponential (e^x) each element.
- `np.sin, np.log, np.abs`, etc. operate element-wise on an array.


In [46]:
# You can combine operations; for example:

np.sin(np.array([0, np.pi/2, np.pi]))   #(sin of 0, 90°, 180°)

array([0.0000000e+00, 1.0000000e+00, 1.2246468e-16])

In [47]:
np.pi

3.141592653589793

---

### Indexing and Slicing

In [48]:
M = np.array([[5, 6, 7],
             [8, 9, 10],
             [1, 2, 3]])

In [53]:
# the first row
M[0] 

array([5, 6, 7])

In [54]:
# gives the second column as a 1D array
M[:, 1]   

array([6, 9, 2])

In [55]:
# You can slice multiple axes
M[0:2, 1:3]

array([[ 6,  7],
       [ 9, 10]])

---

If you assign to a sliced portion, it will modify the original array (because slicing returns a *view*, not a copy, in NumPy. For example: `sub = M[:2, :2]; sub[:] = 0` will set the top-left 2x2 block of M to zeros.


In [56]:
M

array([[ 5,  6,  7],
       [ 8,  9, 10],
       [ 1,  2,  3]])

In [57]:
sub = M[:2, :2]
sub

array([[5, 6],
       [8, 9]])

In [58]:
sub[:] = 0

sub

array([[0, 0],
       [0, 0]])

In [59]:
M

array([[ 0,  0,  7],
       [ 0,  0, 10],
       [ 1,  2,  3]])

✅ __What is a "view" in NumPy?__
- A __view__ is a new array object that looks at the same data in memory as the original array.

- So when you change the view, you also change the original — because they __share the same underlying data buffer__.

- But the view itself is a different Python object (that's why they have different id_s).



In [62]:
# if you want to create a true copy

copy = M[:2, :2].copy()

Then `copy` will have __its own separate memory__, and changing it won’t affect `M`.



---

NumPy also supports __fancy indexing__ (using arrays of indices) and __boolean indexing__ (using a boolean mask array to pick elements).

In [63]:
M

array([[ 0,  0,  7],
       [ 0,  0, 10],
       [ 1,  2,  3]])

In [65]:
M[[0, 2]]    # would pick row 0 and row 2

array([[0, 0, 7],
       [1, 2, 3]])

In [68]:
M[M % 2 == 0] # picks out all even numbers 

array([ 0,  0,  0,  0, 10,  2])

---


### Reshaping and Resizing


In [69]:
A = np.arange(1, 13)
B = A.reshape(3, 4)
B

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [71]:
C = B.reshape(2, 2, 3)
C

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

The number of elements must remain the same. Actually, 3x4 is 12 elements, so 2x2x3 is also 12 – it's valid.

In [74]:
D = B.reshape(2, -1)
D                         # -1 means "infer the number of columns

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

---

#### Flattening

__flatten/ravel__: to collapse an N-D array into 1D. `arr.flatten()` returns a copy, `arr.ravel()` returns a view if possible.

In [75]:
B

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [78]:
flat = B.flatten()   # copy as 1D
flat

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

This gives a new array (if you modify it, B won’t change).

In [79]:
flat[0] = 0
print('flat: ', flat)
print('B: ', B)

flat:  [ 0  2  3  4  5  6  7  8  9 10 11 12]
B:  [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


`B.ravel()` would give a view if B is contiguous in memory, which it is, meaning changes to ravel output might reflect in B.

In [80]:
B

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [81]:
rav = B.ravel()
rav

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [82]:
rav[0] = 0
print('rav:', rav)
print('B:', B)

rav: [ 0  2  3  4  5  6  7  8  9 10 11 12]
B: [[ 0  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


---

#### Transposing

In [83]:
B

array([[ 0,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [84]:
B.T

array([[ 0,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11],
       [ 4,  8, 12]])

---

### Stacking and Splitting

__Stacking__: concatenating arrays along a certain axis. If shapes align appropriately, you can use:
- `np.concatenate` (general purpose, you specify axis).
- `np.vstack` (stack vertically, i.e., one on top of another, which is axis=0 concatenation for 2D).
- `np.hstack` (stack horizontally, axis=1 for 2D).
- `np.stack` (to create a new axis and stack along it, if needed).


In [85]:
p = np.array([1, 2, 3])
q = np.array([4, 5, 6])

In [86]:
np.hstack((p, q))

array([1, 2, 3, 4, 5, 6])

In [87]:
np.vstack((p, q))

array([[1, 2, 3],
       [4, 5, 6]])

If we had two 2D arrays with same number of columns, we could vstack (adding more rows). If same number of rows, we could hstack (adding more columns).



---

In [88]:
X1 = np.array([[1, 2], [3, 4]])
X2 = np.array([[5, 6], [7, 8]])

In [90]:
X1

array([[1, 2],
       [3, 4]])

In [92]:
cat0 = np.concatenate((X1, X2), axis=0)   # stack rows -> shape (4,2)
cat0

array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

In [93]:
cat1 = np.concatenate((X1, X2), axis=1)   # stack cols -> shape (2, 4)
cat1

array([[1, 2, 5, 6],
       [3, 4, 7, 8]])

In [95]:
np.hstack((X1, X2))

array([[1, 2, 5, 6],
       [3, 4, 7, 8]])

---

__Splitting__: dividing an array into multiple sub-arrays:
- `np.split` (specify indices or sections).
- `np.hsplit`, `np.vsplit` for specific directions in 2D.


In [96]:
Z = np.arange(10)
Z

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [98]:
part1, part2, part3 = np.split(Z, [4, 7])
print(part1, part2, part3)

[0 1 2 3] [4 5 6] [7 8 9]


The indices [4,7] meant: break before index 4 (so part1 is Z[0:4]), and before index 7 (part2 is Z[4:7]), and the rest is part3 (Z[7:]).

For a 2D array, `np.vsplit` could split into submatrices by rows, `np.hsplit` by columns.

---
---