nnfromscratch

Creating a neural network from scratch using numpy to classify 2x2 matrices

Problem Understanding

We have 2x2 matrices where each cell is black (1) or white (0). The possible classifications are:

Solid: All black or all white.

Vertical: Two columns are consistent (e.g., [[1, 0], [1, 0]]).

Horizontal: Two rows are consistent (e.g., [[1, 1], [0, 0]]).

Diagonal: Diagonal patterns (e.g., [[1, 0], [0, 1]]).

The input size is 4 (flattened 2x2 matrix), and the output size is 4 (one-hot encoded classification).

Step-by-Step Plan

Step 2.1: Data Representation
Flatten the 2x2 matrix into a 1D array for simplicity. Each matrix will be represented as [x1, x2, x3, x4].

Step 2.2: Initialize Parameters
We'll use a single hidden layer neural network:

Input layer: 4 neurons (for the flattened matrix).
Hidden layer: Choose, say, 8 neurons with ReLU activation.
Output layer: 4 neurons (Softmax for classification).

Weights and biases are randomly initialized.

Step 2.3: Forward Propagation

Compute the activation of the hidden layer using Z1 = W1.X + b1 and A1 = ReLU(Z1).
Compute the output layer using Z2 = W2.A1 + b2 and A2 = Softmax(Z2).

Step 2.4: Loss Function
Use cross-entropy loss:

Step 2.5: Backward Propagation

Compute gradients for output and hidden layers.
Update weights and biases using gradient descent.

Step 2.6: Train the Model

Train the network on a set of examples.
Monitor loss and accuracy during training.

First we will import numpy to work with arrays in python

Creating an Activation Function

This is a ReLU (Rectified Linear Unit) function. It takes a number and:

If the number is negative, it makes it 0.
If it’s positive, it keeps it as is.

The function relu_derivative(x) computes the derivative of the ReLU (Rectified Linear Unit) activation function.

Explanation:
1. ReLU Activation Function:
The ReLU function is defined as:
f(x)= x if x>0
f(x)= 0 if x≤0
Its derivative is:
f'(x)= 1 if x>0
f'(x)= 0 if x≤0
What the function does:

The expression (x > 0) creates a boolean array where each element is True if the corresponding element of x is greater than 0 and False otherwise.
.astype(float) converts this boolean array to a float array, where True becomes 1.0 and False becomes 0.0.
Thus, the function returns 1.0 for elements of x that are greater than 0 and 0.0 for elements less than or equal to 0, which corresponds to the derivative of the ReLU function.

Make the numbers stable

np.max(x, axis=0, keepdims=True)

We find the biggest number in the list x.
This helps to keep the math safe and avoid super big numbers (it’s called a "stability fix").
Example: If x = [2, 3, 5], the biggest number is 5.

x - np.max(x, axis=0, keepdims=True)

Subtract the biggest number from every number in x.
Example: [2, 3, 5] - 5 becomes [-3, -2, 0].

np.exp(...)

Now we take the exponential (a fancy math operation) of the adjusted numbers.
Example: np.exp([-3, -2, 0]) becomes [0.05, 0.14, 1].

Turn the numbers into probabilities

np.sum(exp_x, axis=0, keepdims=True)

Add up all the new numbers.
Example: [0.05, 0.14, 1] adds up to 1.19.

exp_x / np.sum(...)

Divide each number by the total to make them into probabilities.
Example:
- 0.05 / 1.19 ≈ 0.04
- 0.14 / 1.19 ≈ 0.12
- 1 / 1.19 ≈ 0.84.
Now you have probabilities: [0.04, 0.12, 0.84].

LOSS FUNCTION

1.Get the number of samples

We’re counting how many examples (or data points) we’re working with.
y_true.shape[1]: The shape gives the dimensions of y_true.
If y_true is a matrix, the second number (1) tells us how many samples we have.
Example: If y_true is a matrix with shape (3, 5), it means there are 5 samples.

2.Calculate the loss

np.log(y_pred)

Take the natural logarithm of each predicted probability in y_pred.
This step is important for how cross-entropy works mathematically.

y_true * np.log(y_pred)

Multiply the true labels (y_true) with the logarithm of the predicted probabilities (np.log(y_pred)).
This ensures we only consider the predictions for the correct labels.
Example: If y_true = [1, 0, 0] and y_pred = [0.7, 0.2, 0.1], only 0.7 (the probability of the correct label) is used.

np.sum(...)

Add up all the values from the previous step for all samples.
Example: If you have predictions for 3 samples, you’ll sum the contributions from all 3.

- (negative sign)

Cross-entropy involves taking the negative of the sum. This makes the loss a positive value.

/ m

Divide by the number of samples (m) to get the average loss per sample.

Initialization

1. Define the function

We’re defining a function called initialize_parameters. It takes three inputs:
- input_size: How many features go into the network (number of input neurons).
- hidden_size: How many neurons are in the hidden layer.
- output_size: How many outputs the network produces (number of output neurons).

2. Set a random seed

np.random.seed(42)
Setting a random seed ensures that the random numbers generated are always the same every time you run the code. This helps with reproducibility, so results don’t vary randomly.

3. Initialize weights and biases

Weights for Layer 1 (𝑊1)

`W1 = np.random.randn(hidden_size, input_size) * 0.01`

np.random.randn(hidden_size, input_size) generates a random matrix of size hidden_size x input_size with values from a standard normal distribution (mean = 0, standard deviation = 1).
Multiplying by 0.01 scales these values down to make them small. This helps the network start learning without large gradients that might destabilize training.
Example: If hidden_size = 3 and input_size = 2, then W1 will be a 3×2 matrix.

Biases for Layer 1 (b1)

`b1 = np.zeros((hidden_size, 1))`

np.zeros((hidden_size, 1)) creates a matrix of zeros with hidden_size x 1 dimensions.
Biases are initialized to zero because they don’t need random starting values.

Similarly we do it for layer 2 (weight and biase)

`W2 = np.random.randn(output_size, hidden_size) * 0.01`

`b2 = np.zeros((output_size, 1))`

Forward Propagation

Z1 = np.dot(W1, X) + b1: Multiply inputs (X) by weights (W1) and add biases (b1). This gives the first hidden layer's signals.
A1 = relu(Z1): Apply ReLU to make these signals cleaner.
Z2 = np.dot(W2, A1) + b2: Multiply hidden layer signals (A1) by weights (W2) and add biases (b2). This gives the final layer's signals.
A2 = softmax(Z2): Apply Softmax to turn final signals into probabilities for classification.

Backward Propagation

dZ2: Difference between the robot’s guess and the true answer.
dW2: Adjustments for the final layer weights.
db2: Adjustments for the final layer biases.
dA1: Feedback to the first layer.
dZ1: Adjustments for the first layer.
dW1, db1: Adjustments for the first layer weights and biases.

Training the Model

initialize_parameters: Give the neurons their starting rules.
forward_propagation: Make guesses.
cross_entropy_loss: Measure how wrong the guesses are.
backward_propagation: Calculate how to improve.
update_parameters: Teach the neurons better rules.
Repeat for a set number of iterations.

Training the Model

initialize_parameters: Give the neurons their starting rules.
forward_propagation: Make guesses.
cross_entropy_loss: Measure how wrong the guesses are.
backward_propagation: Calculate how to improve.
update_parameters: Teach the neurons better rules.
Repeat for a set number of iterations.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
colab		colab
neuralnetworkfromscratch.pdf		neuralnetworkfromscratch.pdf
nnfromscratch.ipynb		nnfromscratch.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nnfromscratch

Creating a neural network from scratch using numpy to classify 2x2 matrices

Problem Understanding

Step-by-Step Plan

First we will import numpy to work with arrays in python

Creating an Activation Function

The function relu_derivative(x) computes the derivative of the ReLU (Rectified Linear Unit) activation function.

Make the numbers stable

Turn the numbers into probabilities

LOSS FUNCTION

1.Get the number of samples

2.Calculate the loss

Initialization

1. Define the function

2. Set a random seed

3. Initialize weights and biases

Weights for Layer 1 (𝑊1)

`W1 = np.random.randn(hidden_size, input_size) * 0.01`

Biases for Layer 1 (b1)

`b1 = np.zeros((hidden_size, 1))`

Similarly we do it for layer 2 (weight and biase)

`W2 = np.random.randn(output_size, hidden_size) * 0.01`

`b2 = np.zeros((output_size, 1))`

Forward Propagation

Backward Propagation

Training the Model

Training the Model

About

Uh oh!

Releases

Packages

Languages

License

mayankcodezzz/nnfromscratch

Folders and files

Latest commit

History

Repository files navigation

nnfromscratch

Creating a neural network from scratch using numpy to classify 2x2 matrices

Problem Understanding

Step-by-Step Plan

First we will import numpy to work with arrays in python

Creating an Activation Function

The function relu_derivative(x) computes the derivative of the ReLU (Rectified Linear Unit) activation function.

Make the numbers stable

Turn the numbers into probabilities

LOSS FUNCTION

1.Get the number of samples

2.Calculate the loss

Initialization

1. Define the function

2. Set a random seed

3. Initialize weights and biases

Weights for Layer 1 (𝑊1)

W1 = np.random.randn(hidden_size, input_size) * 0.01

Biases for Layer 1 (b1)

b1 = np.zeros((hidden_size, 1))

Similarly we do it for layer 2 (weight and biase)

W2 = np.random.randn(output_size, hidden_size) * 0.01

b2 = np.zeros((output_size, 1))

Forward Propagation

Backward Propagation

Training the Model

Training the Model

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`W1 = np.random.randn(hidden_size, input_size) * 0.01`

`b1 = np.zeros((hidden_size, 1))`

`W2 = np.random.randn(output_size, hidden_size) * 0.01`

`b2 = np.zeros((output_size, 1))`

Packages