In data science and machine learning, we typically organize data into structured collections like __tables__, __vectors__, or __matrices__ for convenient processing.

__Why Linear Algebra?__ _Linear algebra_ is the branch of mathematics that deals with vectors (1D arrays of numbers), matrices (2D arrays of numbers), and linear transformations.

A dataset table with _m_ rows (samples) and _n_ columns (features) can be seen as an m√ón matrix of numbers. Each data sample is a vector, and the whole dataset is a matrix.


Methods like __word embeddings__ (e.g. Word2Vec or GloVe) turn words into __high-dimensional vectors__, and linear algebra operations (dot products, matrix multiplications) are used to measure similarities or relationships between words.

For __dimensionality reduction__ (simplifying datasets by reducing features), methods like __Principal Component Analysis (PCA)__ rely on linear algebra concepts of eigenvalues and eigenvectors to find new axes (principal components) that capture the most variance in the data. PCA transforms data into a smaller set of variables while preserving important information

---

In [1]:
import numpy as np

__Exercise__: Create a small 5√ó5 array to represent a simple image (maybe a diagonal line of 255s on a background of 0s). Print out the array and verify the positions of high values form the diagonal. This simulates creating a simple image pattern with numpy.

In [2]:
a = np.diag([255] * 5)

In [3]:
a

array([[255,   0,   0,   0,   0],
       [  0, 255,   0,   0,   0],
       [  0,   0, 255,   0,   0],
       [  0,   0,   0, 255,   0],
       [  0,   0,   0,   0, 255]])

In [4]:
# Verify positions of 255

for i in range(a.shape[0]):
    if a[i, i] == 255:
        print(f'255 found at position {i}, {i}')

255 found at position 0, 0
255 found at position 1, 1
255 found at position 2, 2
255 found at position 3, 3
255 found at position 4, 4


---

In [5]:
b = np.zeros((5, 5), dtype=int)
np.fill_diagonal(b, 255)        # modifies the input array in-place, it does not return a value
b

array([[255,   0,   0,   0,   0],
       [  0, 255,   0,   0,   0],
       [  0,   0, 255,   0,   0],
       [  0,   0,   0, 255,   0],
       [  0,   0,   0,   0, 255]])

---

__NumPy arrays__ are `homogeneous`, meaning `every element must be the same type`.

---

In [6]:
students = np.array([
    [1, 85, 78],
    [2, 90, 88],
    [3, 75, 85]
], dtype=np.int32)

students

array([[ 1, 85, 78],
       [ 2, 90, 88],
       [ 3, 75, 85]])

In [7]:
students[:, 1].mean()   # mean of second column (Math scores)

83.33333333333333

---

__Real-World Example (Loading CSV data):__ You might have a CSV file with rows of data. While the pandas library is often used for tabular data, NumPy can also load simple numeric data. For instance, if data.csv contains:

In [8]:
# height,weight,age
# 170,65,25
# 160,50,30
# 180,80,22

We can load it with NumPy (using `genfromtxt` or `loadtxt`):

In [9]:
data = np.loadtxt('data.csv', delimiter=',', skiprows=1)
data

array([[170.,  65.,  25.],
       [160.,  50.,  30.],
       [180.,  80.,  22.]])

In [10]:
# Notice by default loadtxt gave floats; we can specify dtype=int if we want integers.

---
---

*Exercise*: Create a NumPy array with some made-up tabular data (e.g., a 4x3 array for 4 cars with [horsepower, weight, MPG]). Try to compute something like the correlation between two columns using NumPy (hint: you can compute mean of each, subtract, multiply and average). This will give practice in selecting and combining columns.



In [11]:
# 4 cars: [HP, Weight, MPG]
data = np.array([
    [150, 3000, 28],
    [180, 3200, 25],
    [130, 2800, 32],
    [200, 3500, 22]
])

In [12]:
# Let‚Äôs compute the correlation between horsepower and MPG.

hp = data[:, 0]   # horsepower column
mpg = data[:, 2]  # mpg column

In [13]:
hp

array([150, 180, 130, 200])

In [14]:
mpg

array([28, 25, 32, 22])

In [15]:
# Means
hp_mean = hp.mean()
mpg_mean = mpg.mean()

In [16]:
# Deviations
hp_dev = hp - hp_mean
mpg_dev = mpg - mpg_mean

In [17]:
hp_dev

array([-15.,  15., -35.,  35.])

In [18]:
# Covariance (not normalized)
cov = np.mean(hp_dev * mpg_dev)

In [19]:
# Standard Deviations
hp_std = np.sqrt(np.mean(hp_dev ** 2))
mpg_std = np.sqrt(np.mean(mpg_dev ** 2))

In [20]:
np.sqrt(1/4)

0.5

In [21]:
# Correlation
corr = cov / (hp_std * mpg_std)
corr

-0.9913021199060593

---
__Alternative: NumPy built-in correlation__

In [22]:
np.corrcoef(hp, mpg)#[0, 1]

array([[ 1.        , -0.99130212],
       [-0.99130212,  1.        ]])

In [23]:
np.corrcoef(hp, mpg)[0, 1]

-0.9913021199060593

---
----

### Key Linear Algebra Concepts in Machine Learning
---

#### 1. Vectors

Description: In mathematics, __a vector__ is an ordered list of numbers. Geometrically, you can think of a vector as a point in space (like a coordinate) or an arrow from the origin to that point. For example, [3, 5] in 2D represents a point 3 units along the x-axis and 5 units along the y-axis. Vectors have a _magnitude_ (length) and _direction_. In linear algebra, vectors are often written as column vectors (like a column of numbers), but in NumPy we usually use 1D arrays to represent them.

In data science, a vector is a convenient way to represent a single data instance or a set of features. __Feature vector__ - a list of features describing one sample. For example, if we have a patient with [height, weight, age], that‚Äôs a feature vector in 3-dimensional space. Vectors are used to represent words in NLP (word embeddings), pixel values of an image (flattened into one long vector), or a time-series of sensor readings, etc.

In [24]:
# create a vector (as a 1D numpy array)

v = np.array([2, 5, 1])
w = np.array([3, 4, 1])

print('Vector v:', v)
print('Vector w:', w)
print('Shape of v:', v.shape)

Vector v: [2 5 1]
Vector w: [3 4 1]
Shape of v: (3,)


In [25]:
# addition - add corresponding elements

print("v + w =", v + w)

v + w = [5 9 2]


In [26]:
# subtraction

print("v - w =", v - w)

v - w = [-1  1  0]


NumPy will perform __element-wise__ addition/subtraction automatically since v and w have the same shape.

In [27]:
# Scalar Multiplication: Multiply each element by a number.

print("2 * v =", 2 * v)

2 * v = [ 4 10  2]


In [28]:
# Magnitude (Length): ||v|| = sqrt(v_1^2 + v_2^2 + ... ).

mag_v = np.linalg.norm(v)    # Euclidean norm (length) of v

print("||v|| =", mag_v)

||v|| = 5.477225575051661


In [29]:
# manually it can be done

np.sqrt((v**2).sum())

5.477225575051661

---
---

#### 2. Matrices 

In linear algebra, matrices are used to solve systems of linear equations, to represent linear transformations (like rotating or scaling coordinates), and much more.

In [30]:
# create a 2 x 3 matrix
M = np.array([[1, 2, 3],
             [4, 5, 6]])

In [31]:
# first row
M[0, :]

array([1, 2, 3])

In [32]:
# 3rd column
M[:, 2]

array([3, 6])

We can do __operations__ on matrices __element-wise__ similar to vectors (addition, subtraction, scalar multiply, etc., __as long as shapes align__ or 
__via broadcasting__).

In [33]:
M * 2

array([[ 2,  4,  6],
       [ 8, 10, 12]])

In [34]:
N = np.array([[7, 8, 9],
             [1, 2, 3]])

M + N

array([[ 8, 10, 12],
       [ 5,  7,  9]])

---

__Use in Data/ML__:

As mentioned, treating the whole dataset as a matrix allows vectorized computations. For instance, if X is an (N√óM) matrix of data and w is an (M√ó1) weight vector, then X @ w yields an N√ó1 vector of predictions (one per data point). This is how we express making predictions for multiple data points in one go.

----
----

#### 3. Matrix Multiplication

This is __not__ done element-wise, but follows a specific rule: if A is of shape (p√óq) and B is of shape (q√ór), then their product C = A √ó B is of shape (p√ór). Each element of C is computed by taking a row of A and a column of B and computing their dot product (multiply corresponding elements and sum them up).

For matrix multiplication to be valid, the __inner dimensions__ must match (the number of columns of the first matrix must equal the number of rows of the second matrix). If you have incompatible shapes, you cannot multiply them in the standard linear algebra sense.

(Note: pay attention to order ‚Äì matrix multiplication is _not commutative_, meaning 
AB ‚â† BA in general.)

__Python/NumPy Example__: We can use `np.dot()` or the `@` operator to do matrix multiplication in NumPy.

In [22]:
A = np.array([[1, 2, 3],
             [4, 5, 6]])   # shape (2, 3)
B = np.array([[7, 8],
             [9, 10],
             [11, 12]])    # shape (3, 2)

In [23]:
C = A.dot(B)
C

array([[ 58,  64],
       [139, 154]])

In [24]:
A @ B

array([[ 58,  64],
       [139, 154]])

In [25]:
np.dot(A, B)

array([[ 58,  64],
       [139, 154]])

---

__Use in Data/ML__: Matrix multiplication is everywhere in machine learning:

- __Linear Regression/Linear Models__: If X is your data matrix (N samples √ó M features) and Œ≤ is a parameter vector (M √ó 1), then predictions 
y^ for all N samples can be computed as the matrix product X √ó Œ≤ (result is N √ó 1). This is essentially performing N dot-products (one for each sample).
- __Neural Networks__: The computation in each layer of a neural network is often a matrix multiply: if you have an input vector, it‚Äôs multiplied by a weight matrix to produce an output vector for the next layer. When you process multiple inputs at once (batch processing), you actually use matrix multiplication between a batch matrix and the weight matrix.
- __Word Embeddings__: In NLP, if you represent the vocabulary as vectors (one-hot encodings), multiplying a one-hot vector (which is mostly zeros and a 1 for the target word index) by an __embedding matrix__ yields the vector for that word. That‚Äôs matrix multiplication under the hood: one-hot (1√óV) times embedding matrix (V√óD) = word embedding (1√óD).


---
---

#### 4. Dot Product

Description: The __dot product__ (also called scalar product or inner product) is an operation that takes two vectors of the same length and returns a single number (a scalar). If $a$ and $b$ are vectors $(a_1, a_2, ..., a_n)$ and $(b_1, b_2, ..., b_n)$, then

$$
 a \cdot b = a_1 b_1 + a_2 b_2 + \cdots + a_n b_n. 
$$

It's essentially multiplying corresponding components and summing them. We saw this concept inside matrix multiplication (each entry was a row dot a column). The dot product has a geometric interpretation: $a \cdot b = \|a\|\|b\|\cos\theta$, where $\theta$ is the angle between the two vectors. So if two vectors point in similar directions, their dot product is large (and positive); if they are orthogonal (90¬∞ apart), dot product is 0; if they point opposite, dot product is negative.

__Python/NumPy Example__: Dot product of two vectors.

In [26]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
dot = np.dot(a, b) # or a.dot(b) or a @ b (for 1D does dot)
dot

32

In [27]:
# We can also confirm this by breaking it down:

elementwise = a * b
elementwise.sum()

32

---

**Use in Data/ML:**
- **Feature Weights:** If you have a feature vector and a weight vector, the prediction of a linear model is a dot product $w \cdot x$ (plus maybe a bias). For instance, in linear regression or in a single neuron of a neural net, you compute weighted sum of inputs ‚Äì that‚Äôs a dot product.
- **Similarity:** In information retrieval or recommender systems, you might compute how similar two users are by taking the dot product of their preference vectors. Cosine similarity between two vectors is basically $\frac{a \cdot b}{\|a\|\|b\|}$. If vectors are normalized to length 1, cosine similarity is exactly the dot product. Word embeddings are often compared via dot product to find similar words ([Linear Algebra Required for Data Science | GeeksforGeeks](https://www.geeksforgeeks.org/linear-algebra-required-for-data-science/#:~:text=%2A%20NLP%20,dot%20products%20alongside%20matrix%20multiplication)).
- **Orthogonality:** As noted, if $a \cdot b = 0$, the vectors are orthogonal (uncorrelated in a sense). In ML, this concept appears in orthogonal feature vectors or orthogonal weight initialization in neural networks, etc., meaning components that capture independent information.
- **Matrix multiplication connection:** When we do $X @ w$ for predictions, each output is a dot product of a data row with the weight vector. So dot product is the elemental operation inside matrix multiplication.


---
---

5. #### Eigenvalues and Eigenvectors


Given a square matrix A, an eigenvector v is a non-zero vector such that when A multiplies v, the result is just a scalar multiple of v. That is:

Av = Œªv

where v is the eigenvector and Œª (lambda) is the corresponding eigenvalue (a scalar). In plain language, applying the transformation A to v does not change its direction ‚Äì it only scales v by factor Œª.

Every matrix has certain special vectors (eigenvectors) that stay in the same direction under that matrix transformation. The scalar by which it gets scaled is the eigenvalue.

__Why are eigenvectors/values useful in ML?__ They help identify principal directions in data:

- __Principal Component Analysis (PCA)__: We compute eigenvectors of the covariance matrix of the data. The eigenvector with the largest eigenvalue indicates the direction of maximum variance in the data (first principal component), the next gives the second principal component, and so on. By projecting data onto the top eigenvectors, we reduce dimensionality while retaining most variance.

...

In [38]:
A = np.array([[2, 1],
             [1, 2]])

# Compute eigenvalues and eigenvectors
eig_vals, eig_vecs = np.linalg.eig(A)

In [39]:
print('Eigenvalues:', eig_vals)
print('Eigenvectors:', eig_vecs)

Eigenvalues: [3. 1.]
Eigenvectors: [[ 0.70710678 -0.70710678]
 [ 0.70710678  0.70710678]]


For matrix A = [[2,1],[1,2]], the eigenvalues should turn out to be 3 and 1. The eigenvectors might be (normalized) versions of [1,1] and [1,-1] respectively:

- Œª=3 eigenvector ~ [0.707, 0.707] (which is [1,1] normalized),
- Œª=1 eigenvector ~ [0.707, -0.707] (which is [1,-1] normalized).


Here the first column is the eigenvector for eigenvalue 3, second column for eigenvalue 1. We can verify: if we multiply A by the first eigenvector, it should equal 3 times that eigenvector.

In [40]:
v = eig_vecs[:, 0] # eigenvector corresponding to eig_vals[0]
lambda_val = eig_vals[0]
print('A @ v:', A.dot(v))
print('lambda * v:', lambda_val * v)

A @ v: [2.12132034 2.12132034]
lambda * v: [2.12132034 2.12132034]


Use in Data/ML:

- __Dimensionality Reduction (PCA)__: As mentioned, eigenvectors of the covariance matrix give principal components. If you have high-dimensional data, you can compute these to find a lower-dimensional representation capturing most variance.
- __Feature Engineering__: Sometimes, combinations of features (eigenvectors are essentially linear combinations) can be more informative than the original features. PCA provides those combinations.
- __Explaining Variance__: Eigenvalues in PCA tell you how much variance each principal component (eigenvector direction) accounts for. A large eigenvalue means that axis has a lot of the data‚Äôs variance.
- __Markov Chains/Transition Matrices__: The steady-state distribution of a Markov chain is an eigenvector of the transition matrix (eigenvalue 1). This concept is used in algorithms like PageRank.


----
---

In [46]:
X = np.array([[5.2, 7.1, 3.0],
              [2.0, 9.3, 1.1]])

In [53]:
X.shape # a tuple of dimensions

(2, 3)

In [54]:
X.dtype

dtype('float64')

In [56]:
X.ndim

2

In [57]:
X.size # total number of elements (product of dimensions).

6

---
---

### Basic Arithmetic Operations

In [41]:
x = np.array([1, 2, 3])
y = np.array([10, 10, 10])

In [42]:
x + y 

array([11, 12, 13])

In [43]:
x * y

array([10, 20, 30])

In [44]:
x + 5

array([6, 7, 8])

In [45]:
x ** 2

array([1, 4, 9])

---

Note: In NumPy, `*` is __not__ matrix multiplication; it is element-wise multiplication. For matrix multiplication, use `@` or `np.dot` as discussed earlier.



In [33]:
A = np.array([[1, 2],
             [3, 4]])

B = np.array([[5, 6],
             [7, 8]])

In [34]:
print('A * B (element-wise):\n', A * B)
print('A @ B (matrix multiply):\n', A @ B)

A * B (element-wise):
 [[ 5 12]
 [21 32]]
A @ B (matrix multiply):
 [[19 22]
 [43 50]]


---

NumPy also has built-in __universal__ functions (ufuncs) that apply to each element:

- `np.sqrt(x)` ‚Äì square root of each element.
- `np.exp(x)` ‚Äì exponential (e^x) each element.
- `np.sin, np.log, np.abs`, etc. operate element-wise on an array.


In [35]:
# You can combine operations; for example:

np.sin(np.array([0, np.pi/2, np.pi]))   #(sin of 0, 90¬∞, 180¬∞)

array([0.0000000e+00, 1.0000000e+00, 1.2246468e-16])

In [36]:
np.pi

3.141592653589793

---

### Indexing and Slicing

In [59]:
M = np.array([[5, 6, 7],
             [8, 9, 10],
             [1, 2, 3]])

In [38]:
# the first row
M[0] 

array([5, 6, 7])

In [39]:
# gives the second column as a 1D array
M[:, 1]   

array([6, 9, 2])

In [40]:
# You can slice multiple axes
M[0:2, 1:3]

array([[ 6,  7],
       [ 9, 10]])

---

If you assign to a sliced portion, it will modify the original array (because slicing returns a *view*, not a copy, in NumPy. For example: `sub = M[:2, :2]; sub[:] = 0` will set the top-left 2x2 block of M to zeros.


In [60]:
M

array([[ 5,  6,  7],
       [ 8,  9, 10],
       [ 1,  2,  3]])

In [61]:
sub = M[:2, :2]
sub

array([[5, 6],
       [8, 9]])

In [62]:
sub[:] = 0

sub

array([[0, 0],
       [0, 0]])

In [64]:
M

array([[ 0,  0,  7],
       [ 0,  0, 10],
       [ 1,  2,  3]])

In [67]:
# now let's change M and see what happens to sub
M[:2, :2] = -1

In [68]:
M

array([[-1, -1,  7],
       [-1, -1, 10],
       [ 1,  2,  3]])

In [69]:
sub

array([[-1, -1],
       [-1, -1]])

‚úÖ __What is a "view" in NumPy?__
- A __view__ is a new array object that looks at the same data in memory as the original array.

- So when you change the view, you also change the original ‚Äî because they __share the same underlying data buffer__.

- But the view itself is a different Python object (that's why they have different id_s).



In [66]:
# if you want to create a true copy

copy = M[:2, :2].copy()

Then `copy` will have __its own separate memory__, and changing it won‚Äôt affect `M`.



---

NumPy also supports __fancy indexing__ (using arrays of indices) and __boolean indexing__ (using a boolean mask array to pick elements).

In [70]:
M

array([[-1, -1,  7],
       [-1, -1, 10],
       [ 1,  2,  3]])

In [71]:
M[[0, 2]]    # would pick row 0 and row 2

array([[-1, -1,  7],
       [ 1,  2,  3]])

In [72]:
# boolean mask
M[M % 2 == 0] # picks out all even numbers 

array([10,  2])

In [73]:
scores = np.array([65, 80, 90, 50, 85])
passed = scores[scores > 60]

In [74]:
print('Scores:', scores)
print('Passed scores:', passed)

Scores: [65 80 90 50 85]
Passed scores: [65 80 90 85]


---


### Reshaping and Resizing


In [83]:
A = np.arange(1, 13)
B = A.reshape(3, 4)
B

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [84]:
C = B.reshape(2, 2, 3)
C

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

The number of elements must remain the same. Actually, 3x4 is 12 elements, so 2x2x3 is also 12 ‚Äì it's valid.

In [85]:
D = B.reshape(2, -1)
D                         # -1 means "infer the number of columns

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

---

In [77]:
A

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [81]:
A.shape

(12,)

In [82]:
A.reshape((1, 12))

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]])

---

---

#### Flattening

__flatten/ravel__: to collapse an N-D array into 1D. `arr.flatten()` returns a copy, `arr.ravel()` returns a view if possible.

In [86]:
B

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [87]:
flat = B.flatten()   # copy as 1D
flat

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

This gives a new array (if you modify it, B won‚Äôt change).

In [88]:
flat[0] = 0
print('flat: ', flat)
print('B: ', B)

flat:  [ 0  2  3  4  5  6  7  8  9 10 11 12]
B:  [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


---

`B.ravel()` would give a view if B is contiguous in memory, which it is, meaning changes to ravel output might reflect in B.

In [89]:
B

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [90]:
rav = B.ravel()
rav

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [91]:
rav[0] = 0
print('rav:', rav)
print('B:', B)

rav: [ 0  2  3  4  5  6  7  8  9 10 11 12]
B: [[ 0  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


---

#### Transposing

In [92]:
B

array([[ 0,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [93]:
B.T

array([[ 0,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11],
       [ 4,  8, 12]])

---

### Stacking and Splitting

__Stacking__: concatenating arrays along a certain axis. If shapes align appropriately, you can use:
- `np.concatenate` (general purpose, you specify axis).
- `np.vstack` (stack vertically, i.e., one on top of another, which is axis=0 concatenation for 2D).
- `np.hstack` (stack horizontally, axis=1 for 2D).
- `np.stack` (to create a new axis and stack along it, if needed).


In [94]:
p = np.array([1, 2, 3])
q = np.array([4, 5, 6])

In [95]:
np.hstack((p, q))

array([1, 2, 3, 4, 5, 6])

In [96]:
np.vstack((p, q))

array([[1, 2, 3],
       [4, 5, 6]])

If we had two 2D arrays with same number of columns, we could vstack (adding more rows). If same number of rows, we could hstack (adding more columns).



---

In [97]:
X1 = np.array([[1, 2], [3, 4]])
X2 = np.array([[5, 6], [7, 8]])

In [98]:
X1

array([[1, 2],
       [3, 4]])

In [99]:
cat0 = np.concatenate((X1, X2), axis=0)   # stack rows -> shape (4,2)
cat0

array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

In [100]:
cat1 = np.concatenate((X1, X2), axis=1)   # stack cols -> shape (2, 4)
cat1

array([[1, 2, 5, 6],
       [3, 4, 7, 8]])

In [101]:
np.hstack((X1, X2))

array([[1, 2, 5, 6],
       [3, 4, 7, 8]])

---

__Splitting__: dividing an array into multiple sub-arrays:
- `np.split` (specify indices or sections).
- `np.hsplit`, `np.vsplit` for specific directions in 2D.


In [102]:
Z = np.arange(10)
Z

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [103]:
part1, part2, part3 = np.split(Z, [4, 7])
print(part1, part2, part3)

[0 1 2 3] [4 5 6] [7 8 9]


The indices [4,7] meant: break before index 4 (so part1 is Z[0:4]), and before index 7 (part2 is Z[4:7]), and the rest is part3 (Z[7:]).

For a 2D array, `np.vsplit` could split into submatrices by rows, `np.hsplit` by columns.

---
---

## Broadcasting

It is a mechanism in NumPy that allows arithmetic operations on arrays of different shapes, by automatically ‚Äústretching‚Äù one array to match the shape of the other __without actually copying data__.

In [104]:
data = np.array([1.0, 2.0, 3.0])
print(data * 2)        # Here 2 is treated as [2,2,2] to match shape -> [2. 4. 6.]

[2. 4. 6.]


Scalar with array is the simplest broadcast ‚Äì it applies to each element.

---

In [105]:
A = np.array([[1, 2, 3],
              [4, 5, 6]])    # shape (2, 3)

b = np.array([10, 20, 30])   # shape (3, )

C = A + b
print('A + b =\n', C)

A + b =
 [[11 22 33]
 [14 25 36]]


Here b is 1D of length 3. NumPy will treat b as a 2x3 by _replicating_ it in a way for each row of A (because A has 2 rows, and b's length matches A's columns).

`b` was broadcasted to shape (2,3) conceptually. This is very convenient: we didn't have to manually stack `b` to match `A` ‚Äì NumPy did it for us.



---

In [106]:
A

array([[1, 2, 3],
       [4, 5, 6]])

In [107]:
c = np.array([7, 8])   
c.shape

(2,)

In [108]:
A + c

ValueError: operands could not be broadcast together with shapes (2,3) (2,) 

(2,3) and (2,) are not compatible because after aligning dimensions: A is (2,3), c would become (1,2) to match two dims which is then expanded to (2,2) vs (2,3) ‚Äì mismatch in second dim (2 vs 3). So you'd get a ValueError.

__Use of broadcasting__:

Often when you want to apply some operation row-wise or column-wise without writing loops. For example, subtract the mean of each column from the columns: if col_mean is shape (3,) for 3 columns, doing A - col_mean will subtract the corresponding mean from each column of A (because (3,) broadcasts to (2,3)).


---
---

## Aggregation Functions

NumPy provides a suite of __aggregation__ (reduction) functions that compute a summary statistic over the elements of an array. Common ones include:

- `np.sum` ‚Äì sum of all elements (or along an axis)
- `np.prod` ‚Äì product of all elements
- `np.mean` ‚Äì average
- `np.std`, `np.var` ‚Äì standard deviation, variance
- `np.min`, `np.max` ‚Äì minimum, maximum
- `np.argmin`, `np.argmax` ‚Äì indices of min/max
- etc.


These can operate on the entire array or along a specific axis:

- If you do `array.sum()`, it sums up everything in the array (flattened).
- If you do `array.sum(axis=0)`, it will sum __down the rows__ (i.e., produce a sum for each column).
- `array.sum(axis=1)` would sum __across the columns__ (a sum for each row).

In [109]:
M = np.array([[2, 4, 6],
              [1, 3, 5]])

In [110]:
print('Total sum:', M.sum())
print('Sum by columns:', M.sum(axis=0))
print('Sum by rows:', M.sum(axis=1))
print('Mean of all elements:', M.mean())
print('Max value:', M.max(), 'at index', M.argmax())

Total sum: 21
Sum by columns: [ 3  7 11]
Sum by rows: [12  9]
Mean of all elements: 3.5
Max value: 6 at index 2


In the above, `M.argmax()` by default will return the index in the flattened array, otherwise along the specified axis.
There are ways to get the 2D index via `np.unravel_index` if needed. But you can also do `M.argmax(axis=1)` to get the index of max in each row, for example.

In [44]:
# cumulative sum

np.cumsum(M)

array([ 2,  6, 12, 13, 16, 21])

axis: default (None) is to compute the cumsum over the flattened array.

In [45]:
np.cumsum(M, axis=0)

array([[ 2,  4,  6],
       [ 3,  7, 11]])

In [46]:
np.cumsum(M, axis=1)

array([[ 2,  6, 12],
       [ 1,  4,  9]])

---


__Note on axes__: The axis number in NumPy corresponds to the dimension index. For a 2D array, axis=0 means operate down each column (compress the 0th index, which is the row index ‚Äì so result has one value per column), axis=1 means operate across each row. For a 3D array, axis=0 would collapse across the first dimension, and so on. the rule of thumb: axis specifies which dimension will be eliminated (reduced) by the operation. E.g., sum axis=1 on shape (2,3) -> result shape (2,) because you eliminated dimension 1 (columns) and have one result per row.



---

In [47]:
M

array([[2, 4, 6],
       [1, 3, 5]])

__Unique elements__: `np.unique(array)` gives sorted unique values (useful for classification labels etc).

In [48]:
np.unique(M)

array([1, 2, 3, 4, 5, 6])

---

- __Copy vs View__: As mentioned, slices are views. If you want a true copy of an array (that you can modify independently), use `array.copy()`.


- __Sorting__: `np.sort(array)` returns a sorted copy. `array.sort()` sorts in-place. For multi-dim arrays, you can sort along an axis.

- __Matrix inverse (’∞’°’Ø’°’§’°÷Ä’± ’¥’°’ø÷Ä’´÷Å) and linear algebra__: NumPy provides some linear algebra routines in `np.linalg` module. For example, `np.linalg.inv(A)` for inverse of a matrix, `np.linalg.det(A)` for determinant, `np.linalg.solve(A, b)` to solve linear system A x = b. Use these with caution ‚Äì inversion is expensive and not needed unless specifically required (and matrix must be square and full-rank).

- __Save/Load__: You can save arrays to disk and load them. `np.save('file.npy', arr)` and `np.load('file.npy')` for NumPy‚Äôs binary format; or `np.savetxt('file.txt', arr, delimiter=',')` for text.


For a square matrix A (say of size n √ó n), its inverse is another matrix such that when you multiply ùê¥ by that inverse matrix, you get the identity matrix:

---