## Neural Networks Basics for Beginners

### 1. Introduction to Neural Networks
Before diving into the more complex concepts, understand what a neural network is. At its core, a neural network is a series of algorithms that attempt to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates.

**Resource:** For a fundamental understanding, read [Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/) by Michael Nielsen

### 2. Activation Functions
Activation functions are crucial, they define the ouput of a node(or neuron) given a set of inputs. Essentially, they decide whether a neuron should be activated or not.

Common Activation Functions:

* Sigmoid: A function that maps any value to a value between 0 and 1. It's useful for binary classification.
* ReLU (Rectified Linear Unit): Allows only positive values to pass through and blocks negative values. With this activation function, all negative inputs automatically becomes 0 as an output.
* Leaky ReLU: Allows positive values and negative values. However, with this activation function, the negative inputs automatically becomes 0.1 * the negative input as the output.
* Softmax: Often used in the final layer of a neural network-based classifier. It's useful for multi-class classification.

**Implementation of the Sigmoid activation function**

In [1]:
import math

def sigmoid(x):
    return 1/(1 + math.exp(-x))

In [2]:
sigmoid(100)

1.0

* `import math`: This line of code tells Python to import a module called "math," which provides mathematical functions and constants.

* `def sigmoid(x):`: This line defines a function named sigmoid that takes one argument called `x`. In Python, you create functions using the def keyword followed by the function name and a pair of parentheses that can contain input parameters.

* `return 1/(1 + math.exp(-x))`: This line is the body of the sigmoid function and contains the code that computes the sigmoid function's value for the given input `x`.

* `math.exp(-x)`: Calculates the exponential of the negative value of `x`. The `math.exp()` function calculates `e` raised to the power of `-x`, where `e` is approximately 2.71828.

* `1 + math.exp(-x)`: Adds 1 to the result of `math.exp(-x)`. It increases the value obtained in the previous step by 1.

* `1 / (1 + math.exp(-x))`: Calculates 1 divided by the result from the previous step. This division results in the sigmoid function's value for the input `x`. The sigmoid function maps any real number `x` to a value between 0 and 1, commonly used in machine learning and neural networks for binary classification tasks.

So, when you call `sigmoid(x)` with a specific value for `x`, it will return the sigmoid of that value. For example, `sigmoid(0)` would return approximately 0.5, and `sigmoid(1)` would return approximately 0.73105.

**Implementation of the tanh activation function**

In [3]:
def tanh(x):
    return (math.exp(x) - math.exp(-x))/(math.exp(x) + math.exp(-x))

In [4]:
tanh(-56)

-1.0

* `import math`: This line of code imports the "math" module, which provides various mathematical functions and constants for our use.

* `def tanh(x):`: This line defines a function named `tanh` that takes one argument `x`. In Python, you define functions using the `def` keyword, followed by the function name and a pair of parentheses that may contain input parameters.

* `return (math.exp(x) - math.exp(-x))/(math.exp(x) + math.exp(-x))`: This line is the body of the `tanh` function and contains the code that calculates the hyperbolic tangent (tanh) of the given input `x`. Let's break down this expression:

- `math.exp(x)`: Calculates the exponential of `x`. The `math.exp()` function computes `e` raised to the power of `x`, where `e` is approximately 2.71828.

- `math.exp(-x)`: Calculates the exponential of the negative value of `x`, which is `e` raised to the power of `-x`.

- `(math.exp(x) - math.exp(-x))`: Calculates the difference between `math.exp(x)` and `math.exp(-x)`.

- `(math.exp(x) + math.exp(-x))`: Calculates the sum of `math.exp(x)` and `math.exp(-x)`.

The final result is `(math.exp(x) - math.exp(-x))/(math.exp(x) + math.exp(-x))`, which represents the hyperbolic tangent (`tanh`) of `x`.

The hyperbolic tangent function, `tanh(x)`, maps any real number `x` to a value between -1 and 1. It is a common mathematical function used in various applications, including neural networks and signal processing.

For example, tanh(0) would return 0.0, and tanh(1) would return approximately 0.76159.


**Implementation of the ReLU activation function**

In [10]:
def relu(x):
    return max(0,x)

In [11]:
relu(-100)

0

In [12]:
relu(100)

100

* `def relu(x):`: This line defines a function named `relu` that takes one argument `x`. In Python, functions are defined using the def keyword, followed by the function name and a pair of parentheses that may contain input parameters.

* `return max(0, x)`: This line is the body of the `relu` function and contains the code that calculates the Rectified Linear Unit (ReLU) of the given input `x`.

* `max(0, x)`: The `max()` function in Python takes two arguments and returns the maximum of the two values. In this case, it takes 0 and x as arguments.

If x is greater than or equal to 0, `max(0, x)` will return `x` because `x` is the larger of the two values.

If `x` is less than 0, `max(0, x)` will return 0 because 0 is the larger of the two values.

The ReLU function, `relu(x)`, is commonly used in neural networks. It essentially replaces negative values in x with 0 and leaves positive values unchanged.

For example, `relu(3)` would return 3, `relu(-2)` would return 0, and `relu(0)` would return 0.

This function helps introduce non-linearity into neural networks, making them capable of learning complex patterns and representations in data.


**Implementation of the Leaky ReLU activation function**

In [13]:
def leaky_relu(x):
    return max(0.1 * x, x)

In [14]:
leaky_relu(-100)

-10.0

* `def leaky_relu(x):`: This line defines a function named `leaky_relu` that takes one argument x. In Python, functions are defined using the def keyword, followed by the function name and a pair of parentheses that may contain input parameters.

* `return max(0.1 * x, x)`: This line is the body of the `leaky_relu` function and contains the code that calculates the Leaky Rectified Linear Unit (Leaky ReLU) of the given input `x`.

* `0.1 * x`: This expression calculates 0.1 times x, resulting in `0.1x`. This is the "leak" component of the Leaky ReLU.

* `max(0.1 * x, x)`: The `max()` function in Python takes two arguments and returns the maximum of the two values. In this case, it takes `0.1 * x` and `x` as arguments.

If `x` is greater than or equal to 0, `max(0.1 * x, x)` will return `x` because `x` is the larger of the two values.

If `x` is less than 0, `max(0.1 * x, x)` will return `0.1 * x` because `0.1 * x` is the larger of the two values, and this introduces a small "leak" for negative inputs.

The Leaky ReLU function, `leaky_relu(x)`, is another activation function used in neural networks and machine learning. It's similar to the standard ReLU but allows a small gradient for negative inputs, preventing the "dying ReLU" problem.

For example, `leaky_relu(3)` would return 3, `leaky_relu(-2)` would return -0.2, and `leaky_relu(0)` would return 0.

### 3. Loss Functions
A loss function measures how well the neutral network is performing. It's a method of evaluating how well specific algorithm models the given data.

Common Loss Functions:
* Mean Squared Error, Mean Absolute Error and other error metrics: Commonly used for regression tasks.
* Cross-Entropy: Oftern used in classification tasks.
* Binary Cross-Entropy: Used for binary classification tasks.


**Implementation of the Mean Absolute Error**

In [15]:
import numpy as np

y_predicted = np.array([1,1,0,0,1])
y_true = np.array([0.7,0.3,1,0,0.5])

In [16]:
def mean_absolute_error(y_true, y_predicted):
    return np.mean(np.abs(y_true - y_predicted))

In [17]:
mae = mean_absolute_error(y_true, y_predicted)
print("Mean Absolute Error:", mae)

Mean Absolute Error: 0.5


* `import numpy as np` imports the NumPy library as `np`.

* `mean_absolute_error` is a function that takes two NumPy arrays `y_true` and `y_predicted` as input.

* `np.abs(y_true - y_predicted)` calculates the absolute differences between the elements of `y_true` and `y_predicted`.

* `np.mean()` computes the mean (average) of these absolute differences, giving you the Mean Absolute Error.

You can use this function to calculate the MAE for any pair of arrays `y_true` and `y_predicted` as shown in the example usage.

**Implementation of the Mean Squared Error**

In [18]:
import numpy as np

def mean_squared_error(y_true, y_predicted):
    return np.mean((y_true - y_predicted)**2)

In [19]:
mse = mean_squared_error(y_true, y_predicted)
print("Mean Squared Error:", mse)

Mean Squared Error: 0.366


* `mean_squared_error` is a function that takes two `NumPy` arrays `y_true` and `y_predicted` as input.

* `(y_true - y_predicted)**2` calculates the squared differences between the elements of `y_true` and `y_predicted`.

* `np.mean()` computes the mean (average) of these squared differences, giving you the Mean Squared Error.

You can use this function to calculate the MSE for any pair of arrays `y_true` and `y_predicted` as shown in the example usage.

**Implementation of the Log Cross-Entropy** - For multiclass classification

In [22]:
np.log([0])

  np.log([0])


array([-inf])

In [23]:
epsilon = 1e-15

In [20]:
def log_cross_entropy(y_true, y_predicted):
    return -np.sum(y_true * np.log(y_predicted))

In [21]:
log_ce = log_cross_entropy(y_true, y_predicted)
print("Log Cross-Entropy:", log_ce)

Log Cross-Entropy: nan


  return -np.sum(y_true * np.log(y_predicted))
  return -np.sum(y_true * np.log(y_predicted))


### 4. Derivatives and Gradients
Understanding derivatives is crucial in neural networks as they are fundamental to the process of learning, specifically in optimizing the loss function using methods like gradient descent.

* Derivative: Measures how a function changes as its input changes. In machine learning, it's used to find the rate of change of the loss function with respect to the weights.
* Gradient: It's a vector that contains all the partial derivatives of a function.

To get more understanding on Derivatives, you might want to check out [Khan Academy's Introduction to Derivatives](https://www.khanacademy.org/math/old-differential-calculus/derivative-intro-dc)