# Math Prerequisites for Deep Learning

## Linear Algebra Terminology

### Scalar

Scalar has no sense of dimension. 

A **scalar** can be thought of in different contexts, but generally, it refers to a single value and is often used in contrast to more complex structures like vectors, matrices, or tensors. Here are some common contexts in which the term "scalar" is used:



#### In Mathematics:

- A **scalar** is a single numerical value. It can be any real number, complex number, etc.
- It represents magnitude without direction (unlike vectors, which have both magnitude and direction).


#### In Programming:

- A **scalar** typically refers to a single value of a basic data type (such as integer, float, boolean, etc.) as opposed to collections like arrays, lists, or more complex structures like objects.
- In Python, for example, a scalar could be an `int`, `float`, `bool`, or `str`.



#### In Machine Learning and Deep Learning:
- In libraries like NumPy, TensorFlow, and PyTorch, a **scalar** is a 0-dimensional tensor.
  - **NumPy example:**
  


In [1]:
import numpy as np
scalar = np.array(5)
print(scalar.ndim)  # Output: 0
print (scalar)

0
5


In [3]:
import torch
scalar = torch.tensor(5)
print(scalar.dim())  # Output: 0
print(scalar)

0
tensor(5)


Scalars are fundamental building blocks in many fields and are used in various ways depending on the context. 

### Vector

Vector has only one and only one dimension.


#### In linear algebra

A vector is a geometric entity that has both magnitude and direction. It is typically represented as an ordered list of numbers, which can be visualized as a directed line segment from the origin to a point in space.
A vector can be a "column vector" or a "row vector". 

#### In Machine Learning and Deep Learning

In machine learning (ML) and deep learning (DL), a vector is often used to represent data, features, or weights. Vectors are fundamental for various computations and transformations in these fields. They are utilized to encode information in a format that can be processed by algorithms.

### Matrix

#### In Linear Algebra

A matrix is a rectangular array of numbers arranged in rows and columns.

Notation: Usually denoted by uppercase letters (e.g., 𝐴, 𝐵, or 𝐶).

#### In ML/DL

A matrix in ML/DL is similar in structure but often represents data, features, weights, or activations within neural networks.

**Uses**:

* Data Representation: _Rows represent samples_, and _columns represent features_.

* Weight Matrices: Weights of connections between layers in neural networks are stored in matrices.

* Input and Output: Matrices are used for input data, intermediate activations, and output predictions.

* Batch Processing: In deep learning, processing multiple samples (batch) simultaneously is common, often represented by matrices.

#### Images

In Image recognition and processing, grayscale images are represented by matrix. Each cell in the matrix represents a single pixel and the value represents the brightness of the pixel. 

A color image can be represented by a 3D tensor, where each of the three dimensions corresponds to a different aspect of the image:

* Height (number of rows).
* Width (number of columns).
* Channels (RGB color channels)

For example, a color image that is 256 pixels high and 256 pixels wide, the corresponding 3D tensor would have the shape: `(256, 256, 3)`.

### Tensor

Tensors may have dimension from zero to any number. 

#### In Linear Algebra

**Definition**: 
In linear algebra, a tensor is a geometric object that generalizes scalars, vectors, and matrices to higher dimensions. A tensor can be thought of as a multi-dimensional array of numerical values.

_Notation_: Tensors are usually denoted with uppercase letters (e.g., 
𝐴, 𝐵, or 𝐶), sometimes with additional indices to specify their dimensions.

**Uses**: 
* coordinate Systems: Tensors are used to describe physical quantities in different coordinate systems.
* Tensor Calculus: Tensors are fundamental in differential geometry and the theory of relativity, where they describe the curvature of space-time.

**Transformations**: Tensors transform according to specific rules under a change of coordinates, preserving the structure of the physical laws they represent.

#### In ML/DL

**Definition**:

In ML/DL, a tensor is a data structure that can represent high-dimensional data. It extends the concept of vectors and matrices to arbitrary dimensions (also called orders).

Notation: Tensors are also denoted with uppercase letters (e.g., 
𝑇, 𝑊), and their dimensions are often referred to as their shape.

**Uses**:

* Data Representation: Tensors are used to store datasets in various dimensions, such as images (3D tensors: width, height, channels) or sequences of data.
* Weights and Activations: In neural networks, tensors represent the weights of connections and the activations (outputs) of neurons in each layer.
* Operations: Tensors are manipulated through operations like matrix multiplication, element-wise addition, and non-linear transformations.

## Disambiguation of Data Types 

In Computer Science, Data Type refers to the "format of data storage", such as integer, floating-point, boolean etc. 

In Statistics, data type means the "category of data". examples are "categorical, numerical, ordinal, ratio etc. 

Data Types cross-reference in DL/ML: 

| #    | Math | Numpy | PyTorch |
|------|------|-------|---------|
|1|Scalar|Array  |Tensor   |
|2|Vector|Array  |Tensor   |
|3|Matrix|ND array|Tensor  |
|4|Tensor|ND array|Tensor  |



## Converting Reality to Numbers

__Reality__ can be divided into two types: 
* Continuous reality: numeric, many to infinite distinct values. Examples: height, income, exam scores etc. 
* Categorical reality: discrete, limited (typically a few) distinct value. Examples: landscape (sea vs. mountain), disease diagnosis, gender etc.

### Representing Categorical Data

#### Dummy-Coding vs. One-Hot Coding

**Dummy Coding**: uses single vector to describe one 'feature'. It uses 0/1 or True/False as the values. Example: exam - pass/fail, house - sold/market, fraud detection etc. 

**One-hot Coding**: uses a matrix to describe multiple features. It uses 0/1 or True/False on each category of the collection. 


<table>
  <thead>
    <tr>
      <th colspan="2">Dummy Coding</th>
      <th colspan="4">One-Hot Coding</th>
    </tr>
    <tr>
      <th>Reality: exam</th>
      <th>y (representation)</th>
      <th>Reality: Genre</th>
      <th>History</th>
      <th>Scifi</th>
      <th>Kids</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Pass</td>
      <td>1</td>
      <td>y1</td>
      <td>0</td>
      <td>1</td>
      <td>0</td>
    </tr>
    <tr>
      <td>Fail</td>
      <td>0</td>
      <td>y2</td>
      <td>0</td>
      <td>0</td>
      <td>1</td>
    </tr>
    <tr>
      <td>Fail</td>
      <td>0</td>
      <td>y3</td>
      <td>1</td>
      <td>0</td>
      <td>0</td>
    </tr>
    <tr>
      <th></th>
      <th>A column vector is created to represent the reality:
          <br>[0<br> 1<br> 1]
      </th>
      <th></th>
      <th colspan="3">A matrix is used to represent the reality: <br>
      [0 1 0<br>
       0 0 1<br>
       1 0 0]
      </th>
    </tr>
  </tbody>
</table>