**Deep Learning**
inside the field of machine learning, using massive datasets, accelerated computing on GPU (graphics processing units)

**PyTorch** - open-source Python framework

- from Facebook's AI Research team (FAIR)
- used for developing deep neural networks

**Neural Network**

- imitates how a brain works
- in the same way as in a brain
    - the *input* nodes are one-way streets to a calculation node
    - some decision happens there, and produces a one-way action to the output
        - in a brain it happens or it doesn't, but here it should always happen but with different results


- basically, it finds a boundary between different categories (like flower species)

**Features**
- inputs; the data that is input into a neural network for it to study

**Targets** 
- the desired outcomes
- denoted by $y$

**Prediction** 
- the output; the prediction of how items are categorized
- aka **score**
- aka **logits**
- denoted by $\hat{y}$

**Linear Boundaries**

- a boundary that's linear
- this is the simplest solution/output produced by a neural network (ie to differentiate between the target categories)
- calculated with $Wx + b = 0$, where `W` is some weights, and `b` is a bias


- there are higher dimensions: curves, planes, etc.

**Perceptron**
- a building block of neural networks
- a visualisation of an equation into a small graph, where
    - each input $x$ is in a node
    - the weights $w$ are labeled on the *edges* (the arrows) of the input nodes
    - the bias $b$ can either be in the calculation node or viewed as an *input* node
        - usually the bias is considered a *weight* for an input of constant $1$, as in this image
    - the calculation is also in a node
    - the prediction is the return value

![perceptron with summing equation](https://raw.githubusercontent.com/shamicker/pytorch-challenge/master/images/perceptron-summation.PNG)

**Examples of Perceptrons as Logical Operators**

An AND perceptron:
![an AND perceptron visualized](https://raw.githubusercontent.com/shamicker/pytorch-challenge/master/images/and-perceptron.PNG)

An OR perceptron:
![an OR perceptron visualized](https://raw.githubusercontent.com/shamicker/pytorch-challenge/master/images/perceptron-or.PNG)


An XOR perceptron, which is either or, but not both and not neither:
![an XOR perceptron visualized](https://raw.githubusercontent.com/shamicker/pytorch-challenge/master/images/perceptron-xor.PNG)


**2 different visuals for the XOR perceptron**
![an XOR neural network](https://raw.githubusercontent.com/shamicker/pytorch-challenge/master/images/xor-neuralnetwork.PNG)

![an XOR multi-layer perceptron](https://raw.githubusercontent.com/shamicker/pytorch-challenge/master/images/xor-multi-layer-perceptron.PNG)

**Perceptron Algorithm**
- adjusting the linear boundary by "asking" each data point if it's been classified correctly or incorrectly

**One-Hot Encoding**
- the act of binary-izing multiple classes, or multiple outcomes
    - in math terms, it's getting an **identity matrix**
    - it's the matrix equivalent of $1$:
        - you can multiply ANY matrix by an identity matrix and you'll get the original matrix back!
        ![it's an identity matrix](https://raw.githubusercontent.com/shamicker/pytorch-challenge/master/images/Identity_matrix.PNG)
        
![one-hot encoding](https://raw.githubusercontent.com/shamicker/pytorch-challenge/master/images/one-hot_encoding.PNG)

**Activation Functions**
- a way of presenting output; preparing it for the next step

**Step Function**
- the yes/no kind
- for binary predictions
- $Wx + b = 0$
    - whether a data point is above or below the *linear boundary*
    - aka if the data point output is positive or negative

**Sigmoid Function**
- probabilities of the outputs to be above or below the linear boundary (or correctly classified)
- the high end approaches 1, the negative approaches 0, the middle is 50% (right at the linear boundary)
- this can be for binary predictions or multi-layer networks
- denoted as either of the following (they are the same thing):
    - $\sigma(x)$
    - $1/(1 + e^{-x})$

**Softmax Function**
- for multi-class networks
- this "squishes" the output, or **normalizes** it into a probability, where all the results must add up to 1
- this also eliminates the problem of negative outputs (because $e^{negative-number}$ is positive)
- the formula is $e^{each-output}/\Sigma e^{each-output}$
![probability of class i](https://raw.githubusercontent.com/shamicker/pytorch-challenge/master/images/softmax_function.PNG)

**More Activation Functions**
- **TanH**
- **ReLU** is the most popular, simplest, and apparently extremely effective!
![more activation functions](https://raw.githubusercontent.com/shamicker/pytorch-challenge/master/images/more_activation_functions.PNG)

**Error Functions**
- tell us how far we are from the solution; the distance from the goal
- aka **loss**
- aka **cost**

There are different kinds of error functions:
- discrete outputs (yes/no)
- continuous outputs (probabilities)
- log-loss (for 2 possible outputs)
- cross entropy (for 3+ possible outputs)

**Log Loss**

**Cross Entropy**
- a log-loss function
- like everything in neural networks (I think?) we usually calculate with natural logs ($ln$), or base e
- done to get all-positive numbers
- we take the negative of the log of the probabilities


The log of anything between $0$ and $1$ is a negative number, and the log of $1$ is $0$.

We know that a high probability of success approaches 1 (100%).
When we take the negative log of that probability, we get a low number. So think of this *negative log* as the **errors**.

Because for each input, the outcome is either "yes" or "no"......?????

**Mean Squared Loss**
- often used in regression and binary classification problems
![mean squared loss equation](https://raw.githubusercontent.com/shamicker/pytorch-challenge/master/images/mean_squared_loss_equation.png)

**Gradient Descent**
- the negative slope of the loss function
    - negative because the derivative returns the steepest ascent, but we want descent
- is the resulting vector of partial derivatives, with respect to the weights
    - points in the direction of fastest change (steepest ascent)
- 