### Deep LEarning Curriculum

#### Module 8: Deep Learning Fundamentals
##### LO 8.1: Understand Neural Networks
- PC 8.1.1: Define artificial neural networks and their components (neurons, layers). 
- PC 8.1.2: Understand activation functions (ReLU, Sigmoid, Softmax).
- PC 8.1.3: Implement feedforward neural networks using Keras.

##### LO 8.2: Train Deep Learning Models
- PC 8.2.1: Use backpropagation and gradient descent for training models.
- PC 8.2.2: Understand overfitting and apply regularization techniques.
- PC 8.2.3: Implement convolutional neural networks (CNN) for image recognition tasks.

  

#### What is deep learning?
- Deep Learning (DL) is a subfield of Machine Learning
- It is structured using the human neural system
- It consists of layers

#### Why we do Deep Learning?
1. Machine Learning models do not perform very well with unstructured data
2. Machine Learning models stop giving accurate predictions when data reached a certain threshold

### What is Artificial Neural Network (ANN):

An artificial neural network (ANN) is a machine learning algorithm that uses a network of interconnected nodes to process data like the human brain. ANNs are a type of deep learning that can help computers learn and make decisions in a human-like way. 

##### How it works 
- ANNs are made up of layers of nodes, or neurons, that process and transmit information.
- ANNs use complex algorithms to determine the strength of each neuron and its relationship to other neurons.
- ANNs use predicted and actual outputs to improve their function. This process is called "training".

##### What it can do 
- ANNs can recognize complex patterns.
- ANNs can learn from changing sets of data.
- ANNs can make predictions in real time.
- ANNs can solve complicated problems, like summarizing documents or recognizing faces.

Examples of ANN applications Facial recognition, Real-time translation, Google photos, Autonomous cars, and Generative AI. 

### What is an activationn Function
An activation function in deep learning is a mathematical function that determines whether a neuron should be activated. It transforms the input signal of a neuron into an output signal that is passed on to the next layer. 

##### Why are activation functions important? 
- They allow neural networks to learn complex patterns in data
- They enable neural networks to model non-linear relationships between inputs and outputs
- They are crucial for training neural networks that generalize well and provide accurate predictions

##### How do activation functions work?
- A node receives a set of input signals 
- The activation function decides whether the neuron should be activated or not 
- The activation function transforms the input signal into an output signal 
- The output signal is passed on to the next layer 

##### What are some examples of activation functions?
- Sigmoid: Maps inputs to a range between zero and one 
- Tanh: A shifted version of the sigmoid that outputs values from -1 to +1 
- Swish: A smooth activation function that bends from 0 towards values < 0 and then upwards again 

The two most popularly used activatio functions are Sigmoid and ReLU

### Sigmoid Activation Function
The sigmoid activation function is a mathematical function that converts a real number input into a value between 0 and 1. It's used in neural networks to control the output of a layer. 

##### How it works
- Inputs: Small inputs result in outputs close to 0, while large inputs result in outputs close to 1. 
- Shape: The sigmoid function has an "S"-shaped curve. 
- Outputs: The outputs can be interpreted as probabilities, making it useful for classification and probability prediction. 

##### Why it's used 
- The sigmoid function was historically important in the development of neural networks.
- It's useful for predicting probabilities, which makes it natural for binary classification problems.

##### Limitations
- The sigmoid function has some inefficiencies that have reduced its usage in more recent years. 
- The sigmoid function can suffer from the "vanishing gradient" problem, which can make learning in deep neural networks slow or even halt it. 
- All the variants of the sigmoid function are computationally intensive to calculate. 

##### Alternatives 
The ReLU function is a faster alternative to the sigmoid function.

### ReLU Activation Function
The rectified linear unit (ReLU) activation function is a mathematical function used in artificial neural networks to introduce non-linearity. It's also known as the rectifier activation function. 

##### How it works
- If the input value is positive, the ReLU function outputs the same value. 
- If the input value is negative, the ReLU function outputs zero. 

##### Benefits
- Non-linearity: ReLU introduces non-linearity to the network, which allows it to learn complex patterns in the data. 
- Computational efficiency: ReLU is computationally inexpensive because it only activates a subset of neurons at a time. 
- Generalization: ReLU allows neural networks to generalize better to unseen data. 
- Mitigates vanishing gradient problem: ReLU's linear property for positive inputs helps mitigate the vanishing gradient problem. 

##### Applications 
computer vision, speech recognition, and computational neuroscience. 

##### Drawbacks 
Dying ReLU: ReLU can become "dead" if it's trapped on the negative side and always outputs zero.

In [1]:
#import required packages
from keras.models import Sequential
from keras.layers import Dense
import numpy as np


#generate data
#generate train dummy data for 1000 students and dummy test for 500

#columns: age, hours of study and average previous test scores

#set seed
np.random.seed(200)

2025-01-31 15:03:28.191045: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
train_data,test_data = np.random.random((1000,3)), np.random.random((500,3))

In [3]:
train_data

array([[0.94763226, 0.22654742, 0.59442014],
       [0.42830868, 0.76414069, 0.00286059],
       [0.35742368, 0.90969489, 0.45608099],
       ...,
       [0.13403753, 0.56654667, 0.06479317],
       [0.62502594, 0.02876147, 0.70924028],
       [0.87752115, 0.20447666, 0.32131087]])

In [4]:
#generate dummy results for 1000 students: Passed (1) or Failed (0)
labels = np.random.randint(2,size=(1000,1))

In [5]:
'''
Define the model structure with  required layers of neurons
activation functions and optimizers
'''

model = Sequential()
model.add(Dense(5, input_dim=3, activation='relu'))
model.add(Dense(4,activation='relu'))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [6]:
#Train the model and male predictions
model.fit(train_data, labels, epochs=10, batch_size=32)

Epoch 1/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.4793 - loss: 0.7010 
Epoch 2/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.4806 - loss: 0.6994 
Epoch 3/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.4547 - loss: 0.6977 
Epoch 4/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.4857 - loss: 0.6954 
Epoch 5/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.4442 - loss: 0.6964
Epoch 6/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.4887 - loss: 0.6952
Epoch 7/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.4999 - loss: 0.6934
Epoch 8/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5068 - loss: 0.6927
Epoch 9/10
[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[

<keras.src.callbacks.history.History at 0x136b84880>

In [10]:
#make predictions
predictions = model.predict(test_data)

[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
