<a href="https://colab.research.google.com/github/SSSpock/skillspire/blob/main/intro_to_ai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

https://www.3blue1brown.com/lessons/neural-networks

https://www.3blue1brown.com/lessons/gradient-descent

https://www.3blue1brown.com/lessons/backpropagation

# Section 1: Introduction to Neural Networks

In this section, we will cover the basics of neural networks, their applications, structure, and activation functions.

1.1 What are neural networks?

Neural networks are a type of machine learning model inspired by the human brain. They consist of interconnected nodes (or neurons) organized in layers that work together to process and learn patterns from input data. Neural networks can learn complex patterns and representations, making them well-suited for tasks such as image recognition, natural language processing, and game playing.

1.2 Applications of neural networks

Neural networks have found widespread use in various domains, some of which include:

- Image classification and object detection
- Speech recognition and synthesis
- Natural language processing (translation, sentiment analysis, question-answering)
- Game playing (chess, Go)
- Time-series prediction and anomaly detection

1.3 Structure of neural networks

A neural network typically consists of three types of layers:

- Input layer: This layer receives input data and feeds it into the network.
- Hidden layer(s): These are intermediate layers that process the input data, extracting patterns and features. A neural network can have multiple hidden layers, forming a "deep" neural network.
- Output layer: This layer produces the final output or prediction from the processed data.

Each layer contains nodes (neurons) connected to nodes in the adjacent layers. These connections have associated weights, which determine the strength of the connection between nodes. The data flows through the network, undergoing transformations at each node, ultimately producing the output.

1.4 Activation functions

Activation functions are used to introduce non-linearity in neural networks, allowing them to learn complex, non-linear patterns. These functions are applied to the weighted sum of the inputs at each node. Some popular activation functions include:

- Sigmoid: The sigmoid function squashes the input value into the range (0, 1). It's commonly used in the output layer of binary classification problems.
- Tanh: The hyperbolic tangent function is similar to the sigmoid function but has a range of (-1, 1). It's useful for problems where the output needs to be centered around 0.
- ReLU (Rectified Linear Unit): The ReLU function returns the input value if it's positive and 0 if it's negative. ReLU is popular in hidden layers due to its simplicity and efficiency.

In the next section, we will explore the concept of a perceptron, the simplest type of neural network.

# Section 2: Perceptron

In this section, we will discuss the concept of a perceptron, linear separability, the perceptron learning algorithm, its limitations, and an example of building a perceptron in Python.

2.1 Concept of a perceptron

A perceptron is the simplest type of neural network, consisting of a single layer with one or more input nodes and a single output node. It serves as the foundation for more complex neural network models. The perceptron receives input values (features), multiplies them by their corresponding weights, sums them up, and applies an activation function to produce the output.

Mathematically, the output (y) of a perceptron can be represented as:

y = f(w1*x1 + w2*x2 + ... + wn*xn + b)

where:
- x1, x2, ..., xn are the input features
- w1, w2, ..., wn are the corresponding weights
- b is the bias term
- f is the activation function

2.2 Linear separability

A perceptron can only learn patterns that are linearly separable, meaning that the input data points can be separated into their respective classes using a straight line (or a hyperplane in higher dimensions). If the data is not linearly separable, the perceptron will fail to find a solution.

2.3 Perceptron learning algorithm

The perceptron learning algorithm is an iterative process used to find the optimal weights and bias for the perceptron. The algorithm updates the weights and bias based on the errors made in the predictions. The steps of the algorithm are:

1. Initialize the weights and bias to small random values.
2. For each input data point:
   a. Calculate the output of the perceptron.
   b. Update the weights and bias based on the error (difference between predicted and true output).
3. Repeat steps 2 until the error converges or a maximum number of iterations is reached.

2.4 Limitations of perceptrons

Perceptrons have some limitations that hinder their ability to solve complex problems:

- They can only learn linearly separable patterns.
- They cannot represent complex, non-linear relationships between input features and output.
- They do not support multiple output nodes, limiting their application to binary classification problems.

2.5 Coding example: Building a perceptron in Python

In this example, we will implement a simple perceptron using Python to learn the AND function, which is linearly separable.



In [1]:
import numpy as np

# Activation function
def step_function(x):
    return 1 if x >= 0 else 0

# Perceptron training function
def train_perceptron(X, y, learning_rate=0.1, epochs=100):
    # Initialize weights and bias
    weights = np.random.rand(X.shape[1])
    bias = np.random.rand()

    for epoch in range(epochs):
        for xi, target in zip(X, y):
            # Calculate the output
            weighted_sum = np.dot(xi, weights) + bias
            output = step_function(weighted_sum)

            # Update weights and bias based on the error
            error = target - output
            weights += learning_rate * error * xi
            bias += learning_rate * error

    return weights, bias

# Input data (AND function)
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([0, 0, 0, 1])

# Train the perceptron
weights, bias = train_perceptron(X, y)

# Test the perceptron
for xi in X:
    weighted_sum = np.dot(xi, weights) + bias
    output = step_function(weighted_sum)
    print(f"Input: {xi}, Output: {output}")


Input: [0 0], Output: 0
Input: [0 1], Output: 0
Input: [1 0], Output: 0
Input: [1 1], Output: 1


In this updated example, we first define the step function as our activation function, and then implement the train_perceptron function to train the perceptron. We use the AND function as our input data and train the perceptron with a learning rate of 0.1 and 100 epochs. After training, we test the perceptron on the same input data and print the results, which should correctly represent the AND function.

Section 3: Multi-Layer Perceptron (MLP)

In this section, we will introduce the Multi-Layer Perceptron (MLP), discuss its layers, describe forward propagation, and provide a coding example of building an MLP in Python.

3.1 Introduction to MLP

A Multi-Layer Perceptron (MLP) is a type of neural network that consists of multiple layers of nodes (neurons) organized in an input layer, one or more hidden layers, and an output layer. MLPs can learn more complex patterns and representations compared to a single-layer perceptron, making them suitable for a wide range of tasks.

3.2 Layers in an MLP

An MLP has three types of layers:

- Input layer: This layer receives input data and feeds it into the network.
- Hidden layer(s): These are intermediate layers that process the input data, extracting patterns and features. An MLP can have multiple hidden layers, which allows it to learn more complex and abstract representations.
- Output layer: This layer produces the final output or prediction from the processed data.

3.3 Forward propagation

Forward propagation is the process of passing input data through the MLP to obtain the final output. During forward propagation, the input data is transformed at each layer of the network by applying weights, biases, and activation functions. The steps of forward propagation are:

1. Calculate the weighted sum of the inputs for each node in the first hidden layer.
2. Apply the activation function to the weighted sum to obtain the output of the first hidden layer.
3. Repeat steps 1 and 2 for each subsequent hidden layer, using the output of the previous layer as the input for the current layer.
4. Calculate the weighted sum of the inputs for each node in the output layer.
5. Apply the activation function (if applicable) to the weighted sum to obtain the final output.

3.4 Coding example: Building an MLP in Python

In this example, we will implement a simple MLP using Python to solve the XOR problem, which is not linearly separable.



In [4]:
import numpy as np

# Activation function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Derivative of the sigmoid function
def sigmoid_derivative(x):
    return x * (1 - x)

# MLP architecture
input_nodes = 2
hidden_nodes = 2
output_nodes = 1

# Initialize weights and biases
hidden_weights = np.random.rand(input_nodes, hidden_nodes)
hidden_bias = np.random.rand(hidden_nodes)
output_weights = np.random.rand(hidden_nodes, output_nodes)
output_bias = np.random.rand(output_nodes)

# Input data (XOR function)
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

# Train the MLP
learning_rate = 0.5
epochs = 5000

for epoch in range(epochs):
    # Forward propagation
    hidden_layer_input = np.dot(X, hidden_weights) + hidden_bias
    hidden_layer_output = sigmoid(hidden_layer_input)
    output_layer_input = np.dot(hidden_layer_output, output_weights) + output_bias
    output_layer_output = sigmoid(output_layer_input)

    # Backpropagation
    output_error = y - output_layer_output
    output_delta = output_error * sigmoid_derivative(output_layer_output)

    hidden_error = np.dot(output_delta, output_weights.T)
    hidden_delta = hidden_error * sigmoid_derivative(hidden_layer_output)

    # Update weights and biases
    output_weights += np.dot(hidden_layer_output.T, output_delta) * learning_rate
    output_bias += np.sum(output_delta, axis=0).reshape(-1) * learning_rate
    hidden_weights += np.dot(X.T, hidden_delta) * learning_rate
    hidden_bias += np.sum(hidden_delta, axis=0) * learning_rate

# Test the MLP
for xi in X:
    hidden_layer_input = np.dot(xi, hidden_weights) + hidden_bias
    hidden_layer_output = sigmoid(hidden_layer_input)
    output_layer_input = np.dot(hidden_layer_output, output_weights) + output_bias
    output_layer_output = sigmoid(output_layer_input)
    print(f"Input: {xi}, Output: {output_layer_output}")



Input: [0 0], Output: [0.03029795]
Input: [0 1], Output: [0.97338029]
Input: [1 0], Output: [0.97334647]
Input: [1 1], Output: [0.02794162]


In this updated example, we implement a simple MLP to solve the XOR problem. We first define the sigmoid activation function and its derivative, and then set up the MLP architecture with input, hidden, and output nodes. We initialize weights and biases, and use the XOR function as our input data.

We train the MLP using forward propagation and backpropagation for 5000 epochs with a learning rate of 0.5. After training, we test the MLP on the same input data and print the results, which should correctly represent the XOR function.

# Section 4: Gradient Descent

In this section, we will discuss gradient descent as an optimization technique, its types, the learning rate, and loss functions. We will also provide a visualization of gradient descent using Plotly.

4.1 What is gradient descent?

Gradient descent is an optimization algorithm used to minimize a function (usually a loss function) by iteratively moving in the direction of the steepest descent as defined by the negative of the gradient. In the context of neural networks, gradient descent is used to find the optimal weights and biases that minimize the loss function.

4.2 Types of gradient descent

There are three main types of gradient descent:

1. Batch gradient descent: Updates the weights and biases using the entire dataset at each iteration.
2. Stochastic gradient descent (SGD): Updates the weights and biases using a single data point at each iteration.
3. Mini-batch gradient descent: Updates the weights and biases using a small subset (batch) of the dataset at each iteration.

4.3 Learning rate

The learning rate is a hyperparameter that controls how much the weights and biases are updated at each iteration. A small learning rate results in slow convergence but better fine-tuning, while a large learning rate results in faster convergence but may cause overshooting and instability.

4.4 Loss functions

A loss function is a measure of how well a neural network is performing in terms of its predictions. It quantifies the difference between the predicted output and the actual output (ground truth). Common loss functions include mean squared error (MSE), cross-entropy loss, and mean absolute error (MAE).


In [6]:
import numpy as np
import plotly.graph_objects as go

# Quadratic function and its derivative
def func(x):
    return x**2

def derivative(x):
    return 2 * x

# Gradient descent algorithm
def gradient_descent(x_start, learning_rate, epochs):
    x_values = [x_start]
    y_values = [func(x_start)]

    for epoch in range(epochs):
        x_start = x_start - learning_rate * derivative(x_start)
        x_values.append(x_start)
        y_values.append(func(x_start))

    return x_values, y_values

# Parameters
x_start = -7
learning_rate = 0.1
epochs = 30

x_values, y_values = gradient_descent(x_start, learning_rate, epochs)

# Plot the function and gradient descent steps
x = np.linspace(-10, 10, 1000)
y = func(x)

fig = go.Figure()

fig.add_trace(go.Scatter(x=x, y=y, mode='lines', name='Function'))
fig.add_trace(go.Scatter(x=x_values, y=y_values, mode='markers', name='Gradient Descent Steps'))

fig.update_layout(title='Gradient Descent Visualization', xaxis_title='x', yaxis_title='y', showlegend=True)

fig.show()

