# Regression

**Prediction**: Linear Regression

**Classification**: Logistic Regression

# Perceptrons


!["The Perceptron"](NeuralNetworksFiles/perceptron.png "The Perceptron, a fundamental part of Neural Netowrks")
_The Perceptron, a fundamental part of Neural Networks_

## Perceptron Formula
### Discrete Activation Function
$$f(x_1, x_2, x_3, ..., x_m) = \left\{
                \begin{array}{ll}
                  0 \text{ if } b + \sum w_i * x_i < 0\\
                  1 \text{ if } b + \sum w_i * x_i > 0
                \end{array}
              \right.$$
              
### Continuous Activation Function
_Sigmoid Function:_

$$ \sigma(x) = \frac{1}{1 + e^{-x}}$$

## Softmax
$$p_i = \frac{e^{z_i}}{ \sum_{j=0}^{n} e^{z_j}}$$ where $p_i$ is the probability of class $i$

## One Hot Encoding
One column per feature, with binary values. [0,1] fature present/not present.

## Maximum Likelihood
Pick the model that gives the existing labels the highest probability.

## Cross Entropy
Negatives of the logarithms of the predicted probabilities. Smaller is better, because larger predicted probability is better [when the prediction is correct].

$$ \text{Cross Entropy } = - \sum_{i=1}^{m} y_i ln(p_i) + (1-p_i) ln(1-p_i)$$
or
$$ H(p,q) = - \sum_{x} p(x) \ log \ q(x) $$

## Multi-Class Cross Entropy

$$ \text{ Cross Entropy } = - \sum_{i=1}^{n} \sum_{j=1}^{m} y_{ij} \ ln(p_{ij}) $$

...

## Back-Propagation

*A backward pass that computes the prediction error and adjusts input weights.*

### Gradient Descent
_**Gradient:** partial derivatives of the loss function with respect to all of the weights._

_**Gradient Descent:** Adjusting the weights to reduce prediction error (minimizing error function) by moving weights in the opposite direction of the gradient by an amount equal to the learning rate._


Pseudocode: `x = x - learning_rate * gradient_of_x`

![image.png](attachment:image.png)


#### Neural Network as a Function Composition
_A neural network is the composition of several functions, usually two per node (linear + sigmoid)_

_Therefore Gradient Descent needs to compute the partial derivative w.r.t. a node's inputs for each node._

#### Chain Rule
$$ \frac{\delta f \circ g}{\delta x} = \frac{\delta f}{\delta g} \frac{\delta f}{\delta x}$$


### Stochastic Gradient Descent
_Gradient Descent with random sampling of batch of data points [on which the error is computed]_



# Tensorflow

Hello world:

In [None]:
import tensorflow as tf

# Create TensorFlow object called hello_constant
hello_constant = tf.constant('Hello World!')

with tf.Session() as sess:
    # Run the tf.constant operation in the session
    output = sess.run(hello_constant)
    print(output)

![image.png](attachment:image.png)
_Data is stored as n-dimensional tensors_


`tf.placeholder()` returns a dynamically sized tensor [at runtime], whihc adjusts to the dataset size.

In [None]:
x = tf.placeholder(tf.string)

with tf.Session() as sess:
    output = sess.run(x, feed_dict={x: 'Hello World'})

_The _`tf.Variable` _class has functionality creates a tensor of variables._ Initialization:

`init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)`

`tf.truncated_normal()` initializes tensor to random Gaussian values within 2 std. dev

`tf.zeros()` initializes tensor to all zeroes

In [None]:
n_features = 120
n_labels = 5
weights = tf.Variable(tf.truncated_normal((n_features, n_labels)))

In [None]:
n_labels = 5
bias = tf.Variable(tf.zeros(n_labels))

### Multinomial Logistic Classification
`X -> Linear Model -> Logit -> Soft Max -> 1-Hot Encoded Predited Label, Actual Label -> Cross Entropy` 

or 

`Linear Model(X) -> Soft Max(Logit) -> Cross Entropy(1-Hot Encoded Predicted Labels, Actual Labales)`