Linear algebra plays a fundamental role in data science, as it provides the mathematical foundation for understanding data structures, machine learning models, and optimization techniques.
Linear algebra is concerned with scalars, vectors and matrices
###  ** Scalars **
Most everyday mathematics
consists of manipulating
numbers one at a time.
Formally, we call these values *scalars*.
For example, the temperature in Palo Alto
is a balmy $72$ degrees Fahrenheit.
If you wanted to convert the temperature to Celsius
you would evaluate the expression
$c = \frac{5}{9}(f - 32)$, setting $f$ to $72$.
In this equation, the values
$5$, $9$, and $32$ are constant scalars.
The variables $c$ and $f$
in general represent unknown scalars.

We denote scalars
by ordinary lower-cased letters
(e.g., $x$, $y$, and $z$)
and the space of all (continuous)
*real-valued* scalars by $\mathbb{R}$.
just remember that the expression $x \in \mathbb{R}$
is a formal way to say that $x$ is a real-valued scalar.
The symbol $\in$ (pronounced "in")
denotes membership in a set.
For example, $x, y \in \{0, 1\}$
indicates that $x$ and $y$ are variables
that can only take values $0$ or $1$.

(**Scalars are implemented as values
that contain only one element.**)
Below, we assign two scalars
and perform the familiar addition, multiplication,
division, and exponentiation operations.
  

In [1]:
x = 3
y = 10
x + y, x * y, x / y, x**y

(13, 30, 0.3, 59049)


### **Vectors**

- **Vectors**: A vector is an ordered list of numbers, it can also be an arrow in space representing a piece of data. Vectors are used to represent data points in data science (e.g., a feature vector in a machine learning model).
When vectors represent examples from real-world datasets, their values hold some real-world signaificance. 
We often visualize vectors 
by stacking their elements vertically.

$$\mathbf{x} =\begin{bmatrix}x_{1}  \\ \vdots  \\x_{n}\end{bmatrix},$$

Here $x_1, \ldots, x_n$ are elements of the vector.
There is also a difference between such *column vectors*
and *row vectors* whose elements are stacked horizontally.
  

In [2]:
# Example 1. Declaring a vector in Python using a list
v = [3, 2]
print(v)


[3, 2]


In [None]:
#Example 2. Declaring a vector in Python using NumPy
import numpy as np
v = np.array([3, 2])
print(v)


In [None]:
Example 4-3. Declaring a three-dimensional vector in Python using NumPy
import numpy as np
v = np.array([4, 1, 2])
print(v)

# we can declare vectors of different dimensions

 **Operations**:
  - **Addition**: We can easily combine movements of two vectors into a single vector, we add values in their respective directions



In [None]:
from numpy import array
v = array([3,2,4])
w = array([2,-1,10])
# sum the vectors
v_plus_w = v + w
# display summed vector
print(v_plus_w) # [5, 1,14]

  - **Scalar Multiplication**:
  We can grow and shrink vectors by multiplying it with a scalar
  Scaling vectors doesnot change their direction but magnitude except when we multiply a vector by a negative number. Still the new vector exists on the same line. 

In [None]:
 ## Scaling a number in Python using NumPy
from numpy import array
v = array([3,1])
# scale the vector
scaled_v = 2.0 * v
# display scaled vector
print(scaled_v) # [6 2]

 - **Span and Linear Dependence**:
  so far we have looked at adding vectors and scaling them, with these two operations, two vectors can be scaled and added to create any new vector. 
  The whole space of possible vectors is a span. This limits the number of possiblities because of linear dependence, all the new vectors will be created in the same direction.

## Matrices

Just as scalars are $0^{\textrm{th}}$-order
and vectors are $1^{\textrm{st}}$-order,
matrices are $2^{\textrm{nd}}$-order array of numbers.
We denote matrices by bold capital letters
(e.g., $\mathbf{X}$, $\mathbf{Y}$, and $\mathbf{Z}$),
and represent them in code by tensors with two axes.
The expression $\mathbf{A} \in \mathbb{R}^{m \times n}$
indicates that a matrix $\mathbf{A}$ 
contains $m \times n$ real-valued scalars,
arranged as $m$ rows and $n$ columns.
When $m = n$, we say that a matrix is *square*.
Visually, we can illustrate any matrix as a table.
To refer to an individual element,
we subscript both the row and column indices, e.g.,
$a_{ij}$ is the value that belongs to $\mathbf{A}$'s
$i^{\textrm{th}}$ row and $j^{\textrm{th}}$ column:

$$\mathbf{A}=\begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \\ \end{bmatrix}.$$


In code, we represent a matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$
by a $2^{\textrm{nd}}$-order with shape ($m$, $n$).
[**We can convert any appropriately sized $m \times n$ tensor 
into an $m \times n$ matrix**] 
by passing the desired shape to `reshape`:

###  **Matrices**



- **Matrix**: A matrix is a 2D array of numbers arranged in rows and columns. In data science, matrices are used to represent datasets, with rows as data points and columns as features.
  - Notation: \(A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}\)
  - Example: \( A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \) is a 2x3 matrix.

  **Operations**:
  - **Matrix Addition**: \( A + B = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} + \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} = \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix} \)
  - **Scalar Multiplication**: \( c \cdot A = 2 \cdot \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} 2 & 4 \\ 6 & 8 \end{bmatrix} \)
  - **Matrix Multiplication**:
    - Rule: Multiply row elements by corresponding column elements and sum the products.
    - \( A \cdot B = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \cdot \begin{bmatrix} 2 & 0 \\ 1 & 3 \end{bmatrix} = \begin{bmatrix} (1 \cdot 2 + 2 \cdot 1) & (1 \cdot 0 + 2 \cdot 3) \\ (3 \cdot 2 + 4 \cdot 1) & (3 \cdot 0 + 4 \cdot 3) \end{bmatrix} = \begin{bmatrix} 4 & 6 \\ 10 & 12 \end{bmatrix} \)

### 3. **Matrix Transpose**

- **Transpose**: The transpose of a matrix flips it over its diagonal, turning rows into columns and vice versa.
  - Notation: \(A^T\)
  - Example: If \(A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}\), then \(A^T = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix}\).

### 4. **Determinants and Inverses**

- **Determinant**: The determinant is a scalar value that can be computed from a square matrix. It gives information about the matrix properties, such as whether it is invertible.
  - Example: For a 2x2 matrix \(A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\), the determinant is \(\text{det}(A) = ad - bc\).

- **Matrix Inverse**: The inverse of a matrix \(A^{-1}\) is such that \(A \cdot A^{-1} = I\), where \(I\) is the identity matrix.
  - Example: For a 2x2 matrix, the inverse of \(A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\) is \(A^{-1} = \frac{1}{\text{det}(A)} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}\), provided \(\text{det}(A) \neq 0\).

### 5. **Linear Equations**

In data science, systems of linear equations arise when modeling relationships between variables.

- **System of Equations**:
  \[
  \begin{align*}
  2x + 3y &= 5 \\
  4x - y &= 2
  \end{align*}
  \]
  can be represented in matrix form as:
  \[
  \begin{bmatrix} 2 & 3 \\ 4 & -1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 5 \\ 2 \end{bmatrix}
  \]
  Solving these equations gives values for \(x\) and \(y\).

### 6. **Eigenvalues and Eigenvectors**

Eigenvalues and eigenvectors are important in dimensionality reduction techniques like **Principal Component Analysis (PCA)**.

- **Eigenvalue**: A scalar \(\lambda\) such that for a matrix \(A\), the equation \(A \mathbf{v} = \lambda \mathbf{v}\) holds, where \(\mathbf{v}\) is a non-zero vector called the eigenvector.
  - Example: If \(A = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}\), and solving the characteristic equation \(\text{det}(A - \lambda I) = 0\) yields eigenvalues, which can be used for PCA.

### 7. **Linear Transformations**

A matrix can be viewed as a linear transformation that maps one vector to another. This is fundamental in data science for transforming features or reducing dimensionality.

- Example: A transformation \( T: \mathbb{R}^2 \to \mathbb{R}^2 \) represented by the matrix \( A = \begin{bmatrix} 2 & 0 \\ 0 & 3 \end{bmatrix} \) will stretch vectors along the x-axis by 2 and along the y-axis by 3.

### 8. **Dot Product and Norms**

- **Dot Product**: The dot product of two vectors \( \mathbf{a} \cdot \mathbf{b} \) is a scalar value that provides information about the angle between them. It’s used extensively in machine learning algorithms like linear regression and neural networks.
  - Example: \( \mathbf{a} = \begin{bmatrix} 1 \\ 2 \end{bmatrix}, \mathbf{b} = \begin{bmatrix} 3 \\ 4 \end{bmatrix} \)
  - Dot product: \( \mathbf{a} \cdot \mathbf{b} = (1 \cdot 3) + (2 \cdot 4) = 3 + 8 = 11 \)
  
- **Norm (Magnitude)**: The norm of a vector \( \| \mathbf{v} \| \) measures its length or magnitude.
  - Example: \( \mathbf{v} = \begin{bmatrix} 3 \\ 4 \end{bmatrix} \)
  - Norm: \( \| \mathbf{v} \| = \sqrt{3^2 + 4^2} = 5 \)

### 9. **Applications in Data Science**

- **Linear Regression**:
  - In matrix form, linear regression can be written as \( \mathbf{y} = X \mathbf{\beta} + \epsilon \), where \(X\) is the matrix of input features, \(\beta\) is the vector of coefficients, and \(\epsilon\) is the error term.
  
- **Principal Component Analysis (PCA)**: PCA uses eigenvalues and eigenvectors to reduce the dimensionality of the data by finding the directions (principal components) that maximize variance.

---

### Example: Solving Linear Regression Using Algebra

Given a dataset:

\[
\mathbf{X} = \begin{bmatrix} 1 & 1

 \\ 1 & 2 \\ 1 & 3 \end{bmatrix}, \mathbf{y} = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}
\]

The goal is to find the best-fit line \(y = \beta_0 + \beta_1 x\). In matrix form, the solution is:

\[
\hat{\beta} = (X^T X)^{-1} X^T y
\]

### **Linear Transformations**

A matrix can be viewed as a linear transformation that maps one vector to another. This is fundamental in data science for transforming features or reducing dimensionality.

- Example: A transformation \( T: \mathbb{R}^2 \to \mathbb{R}^2 \) represented by the matrix \( A = \begin{bmatrix} 2 & 0 \\ 0 & 3 \end{bmatrix} \) will stretch vectors along the x-axis by 2 and along the y-axis by 3.
