

# Essential Python programming concepts

## Variables

## Lists

## Dictionaries

## Functions

## Classes

## Inheritance

## Magic methods



# Essential Mathematical Concepts


## Tensors

> In machine learning, a tensor is a generalisation of a matrix with more than 2 dimensions

# TODO ![](./images/vector-matrix-tensor.png)

### Vector & matrix notation

We typically represent vectors with lower-case mathematical symbols, and matrices with upper-case mathematical symbols.

A vector element might have a subscript to represent its position in the vector.

A matrix element might have a double subscript to represent its position in the vector, where the first subscript represents the row position, and the second the column position.

### Matrix multiplication

### Broadcasting

In many cases when programming, the dimensions of a calculation don't match mathematically, but the calculation still works - such as when multiplying a column vector by a matrix. 
This is because the program assumes that you want to _broadcast_ that vector across the matrix in a way that does make sense.

In [None]:
import numpy as np

vector = np.array([10, 20, 30, 40])

matrix = np.array([
    [1, 2, 3, 4],
    [1, 2, 3, 4],
    [1, 2, 3, 4],
])

print("Vector shape:", vector.shape)
print("Matrix shape:", matrix.shape)

print()
print(vector * matrix)
print()
print(vector + matrix)

Broadcasting works when the dimensions match from right to left, or when the dimension of one object is 1. See more [here](https://numpy.org/doc/stable/user/basics.broadcasting.html#broadcastable-arrays).


### Dot product & Cosine similarity

The dot product is a function applied on two vectors that does an element wise multiplication followed by a sum

# TODO dot product eqn

The dot product is also equivalent to the _cosine similarity_, a function used to express the similarity of two vectors. The cosine similarity is large when the vectors point in the same direction, zero when they are orthogonal, and negative when they point in opposite directions.

### e & log

$e$ is an important irrational number approximately equal to 2.71828...

It is important mathematically because for any number $x$, $e^x = \frac{d}{dx}e^x$. 

That is, if you draw the curve of $a^x$ and their corresponding gradients, 
- for every number $a$ where $a>e$ the gradient curve is above the original 
- for every number $a$ where $a<e$ the gradient curve is below original 
- for $a=e$, the gradient curve is exactly the same as the original 

The function $log(x)$
- Is the inverse of $e^x$
- Can be read as "the power of $e$ which gives $x$
- Is also known as the natural logarithm, $ln(x)$

That is, $e^{log(x)} = log(e^x) = x$

## Sets and Real numbers

$x \in \R$ 
- is read "x is in the set of real numbers"
- simply means, it's a scalar

$x \in \R^d$
- is read "x is in the set of $d$-dimensional vectors full of real numbers"
- simply means, it's a vector with $d$ elements

$X \in \R^dxk$ 
- is read "X is in the set of $d$ by $k$ matrices full of real numbers"
- simply means, it's a matrix with $d$ rows and $k$ columns

$X \in \R^dxkxh$ 
- is read "X is in the set of $d$ by $k$ by $h$ tensors full of real numbers"
- simply means, it's a tensor where the first dimension has size $d$, the second dimension has size $k$, and the third dimension has size $h$

## Differentiation

Differentiation is the act of taking a function $f(x)$ and calculating it's _derivative_.

The derivative is a function which when evaluated at $x$, tells you how steep $f(x)$ is at, for any value of $x$.

The derivative function of the function $f(x)$ is read "the derivative of $f$ with respect to $x$".

It may be written as:
- $f'(x)$
- $\frac{df}{dx}

The steepness is also known as:
- The slope
- The __gradient__ - we will be using this term a lot

# TODO gradient diagram

<!-- - $\frac{\partial f}{\partial x} -->

## Probability Distributions

Throughout machine learning you will see probability distributions.

A probability distribution over discrete classes can be represented as a vector, where each element is the probability assigned to a different class.

Every probability distribution satisfies 2 key properties:
- Every value is between 0 and 1
- All of the values sum to 1

Many of the models are actually conditional probability distributions, like $p(y|x)$. 
And even regression models can be considered in the same way... but that's a topic for another day.