In [None]:
'''
Neural networks consist of interconnected layers (input, hidden, output) that process data using weighted connections and activation functions. Key aspects include structure (layers/nodes),
architecture types (feedforward, CNN, RNN), training processes (backpropagation), and applications like image recognition or predictive analytics.
Key Aspects of Neural Networks:
Structure and Components:
Layers: Input layer (receives data), hidden layers (process information), and output layer (produces results).
Nodes/Neurons: Artificial neurons receive input, apply weights and biases, and pass the data through an activation function.
Weights and Biases: Parameters adjusted during training to minimize error, allowing the network to learn.

Architecture Types:
Feedforward Neural Networks (FNN): Data moves in one direction, from input to output.
Convolutional Neural Networks (CNN): Designed for image processing and spatial data.
Recurrent Neural Networks (RNN): Handle sequential data by allowing information to loop.
Generative Adversarial Networks (GANs): Used for creating new data instances.

Training and Learning:
Backpropagation: The fundamental algorithm for training, which adjusts weights based on error.
Activation Functions: Non-linear functions (e.g., ReLU, Sigmoid) that allow networks to learn complex patterns.

Performance and Challenges:
Overfitting: When a network learns training data too well, failing to generalize to new data; mitigated by dropout or regularization.
Interpretability: The "black box" nature of deep learning, making it hard to explain how decisions are made.

These networks are crucial for diverse applications, including computer vision, natural language processing, and predictive modeling.
'''

In [None]:
'''
Common Types of Neural Network Layers:
Input Layer: The initial layer that receives raw input data, with neurons equal to the number of input features.
Dense / Fully Connected Layer: Connects every neuron in one layer to every neuron in the next, often used in final classification tasks.
Convolutional Layer (CONV): The core of Convolutional Neural Networks (CNNs), it uses filters to detect spatial features in data like images.
Pooling Layer (POOL): Reduces the dimensionality of feature maps (e.g., Max Pooling) to reduce computational load and control overfitting.
Recurrent Layer (RNN): Processes sequential data (like text or time series) by maintaining a hidden state that carries information from previous steps.
Normalization Layer (BN): Stabilizes training by normalizing the inputs to each activation function, often used in deep networks.
Dropout Layer: A regularization layer that randomly sets input units to 0 during training to prevent overfitting.
Activation Layer (ReLU, Sigmoid, Tanh): Introduces non-linear transformations into the network, enabling it to learn complex, non-linear relationships.
Deconvolutional Layer (Deconv): Performs the reverse of convolution, used for upsampling data in tasks like image segmentation.
Output Layer: The final layer that produces the network's prediction or classification.

What is Layer weight:
Layer weights in a neural network are trainable numerical parameters associated with connections between neurons,
determining the strength, influence, and importance of input signals. They multiply incoming data to dictate how
much each feature contributes to the next layer's output. These weights are adjusted during training to optimize accuracy.
Key Aspects of Layer Weights:
Role in Learning: Weights are initially random and optimized during backpropagation to minimize the error between predicted and actual outputs.
Significance: A high weight value indicates a strong, important connection, while a weight near zero suggests the input has little influence on the output.
Directionality: Positive weights increase the activation of the next neuron, while negative weights decrease it, acting as inhibitory connections.
Computation: In a layer, input values are multiplied by corresponding weights, summed together, and often added to a bias term before passing through an activation function.
Structure: They are organized in matrices (often denoted as coefs_ in frameworks like scikit-learn) that connect neurons in one layer to those in the next.
Effectively, weights represent the learned "knowledge" of the network, enabling it to recognize patterns and make predictions.

What is Layer Bias:
Layer bias in a neural network is a learnable, constant parameter added to the weighted sum of inputs for neurons within a layer, serving as an offset (\(b\)) to control the activation threshold.
It acts like the intercept in a linear equation (\(y=mx+c\)), allowing the network to shift activation functions (e.g., ReLU, Sigmoid) up or down, independent of input values,
which improves flexibility and learning. Key Aspects of Layer Bias Definition & Function: Bias is a trainable parameter, initialized, and updated during backpropagation alongside weights.
It ensures that even if all input features are zero, a neuron can still output a non-zero value and propagate information.
Mathematical Representation: The output of a neuron is calculated as \(y=f(\sum (w_{i}\times x_{i})+b)\), where \(w\) represents weights, \(x\) is input, and \(b\) is the bias.
Role in Learning: Without bias, a neuron might fail to activate if the weighted sum is always zero or negative (e.g., with ReLU).
It allows the model to better fit the data, particularly for patterns that do not pass through the origin \((0,0)\).
Difference from Weights: While weights determine the strength of the connection between neurons, the bias controls the activation threshold (sensitivity) of the neuron itself.
Implementation: In deep learning frameworks like TensorFlow or PyTorch, biases are typically included by default in linear layers.
In summary, layer bias provides the necessary freedom to shift the decision boundary, enabling the network to learn more complex, accurate functions.

What is an activation function:
An activation function in a neural network is a mathematical function applied to a neuron's output, introducing non-linearity to enable the network to learn complex patterns beyond
simple linear relationships, essentially deciding if and how a neuron should "fire" or pass information to the next layer, with common examples being ReLU, Sigmoid, and Tanh.
Without it, a deep network acts like a simple linear model, limiting its power for real-world tasks.
Key Functions & Roles
Introduce Non-Linearity: This is the most crucial role, allowing neural networks to model complex, non-linear data (like images, speech) instead of just straight lines.
Decision Making: It acts like a switch, determining if a neuron's input is significant enough to activate and pass on.
Output Transformation: It transforms the weighted sum of inputs and bias into a specific output range, which helps stabilize training.
Gradient Propagation: Its derivative is vital during backpropagation, influencing how weights are updated during training.
Common Types of Activation Functions
ReLU (Rectified Linear Unit): max(0, x). Popular for hidden layers, it's computationally efficient.
Sigmoid (Logistic): 1 / (1 + exp(-x)). Squashes output between 0 and 1, useful for binary classification.
Tanh (Hyperbolic Tangent): Squashes output between -1 and 1, often preferred over sigmoid in hidden layers.
Softmax: Used in the output layer for multi-class classification, converting scores into probabilities.
Leaky ReLU/PReLU: Variants of ReLU that allow a small, non-zero gradient for negative inputs to prevent "dying neurons".
Why They Matter
By adding these non-linear transformations, activation functions allow deep neural networks to build complex decision boundaries and represent intricate relationships in data, making them
powerful tools for AI.

What is back propagation:
Backpropagation (backward propagation of errors) is the fundamental algorithm for training neural networks, enabling them to learn from data by calculating gradients of a loss function and updating
network weights. It works backward from output to input nodes, using the chain rule of calculus to efficiently compute how much each weight and bias contributes to the error,
allowing for precise weight adjustments to minimize it.
Key Aspects of Backpropagation
Goal: To minimize the error (loss) of a network's prediction compared to the actual label by updating weights.
Process: It computes the gradient (derivative) of the loss function with respect to each parameter in the network.
Efficiency: Instead of calculating the gradient for each weight separately, it uses the chain rule to compute gradients from the last layer to the first (backward), avoiding redundant calculations.
Workflow:
Forward Pass: Input data goes through the network to produce an output.
Loss Calculation: The difference between the prediction and actual output is measured.
Backward Pass: The error is propagated backward, calculating gradients for each weight.
Weight Update: An optimizer (like gradient descent) uses these gradients to update the weights.
Why it Matters
Without backpropagation, training deep neural networks would be computationally infeasible due to the millions of parameters involved. It provides the "correction" step that allows
neural networks to learn, making it essential for deep learning applications like image recognition, natural language processing, and speech recognition.

'''

In [None]:
'''

What is compiling a model:
Compiling a model in a neural network is the process of configuring the model for training and preparing it for efficient execution on the target hardware.
This step transforms the network's abstract definition into a highly efficient series of operations (often matrix transforms) tailored for the CPU or GPU, before the actual learning process begins.

Key Purposes of Compilation
Configuring the Learning Process: The primary function of compilation is to define the essential components and algorithms the model will use to learn from data. This includes specifying:
Optimizer: The algorithm that determines how the model's weights are updated during training to minimize the error. Common choices include 'adam', 'sgd' (Stochastic Gradient Descent), and 'rmsprop'. https://keras.io/api/optimizers/
Loss Function: The mathematical function that quantifies the difference between the model's predictions and the actual target values. The goal of training is to minimize this value.
Common loss functions include 'mean_squared_error' (for regression) and 'binary_crossentropy' or 'categorical_crossentropy' (for classification). https://keras.io/api/losses/
Metrics: Optional metrics used to evaluate the model's performance and report during training (e.g., 'accuracy' for classification problems). More at https://keras.io/api/metrics/
Optimizing for Hardware: Compilation involves an efficiency step where the framework (like TensorFlow or PyTorch) optimizes the network's structure for faster execution on available hardware.
  This can involve:
    Backend Selection: The software backend automatically chooses the best way to represent the network for efficient training and prediction runs.
    Performance Enhancements: Compilation can remove overhead and apply optimizations (such as JIT compilation in PyTorch 2.0) that improve inference speed and efficiency,
    making the model suitable for deployment in various applications and devices.

Possible ways to improve model:
Tuning Hyperparameters: A good hyperparameter to start with is the learning rate for the Adam optimizer. What about the batch size and number of epochs?
Network Depth: What happens if we remove or add more fully-connected layers? How does that affect training and/or the model’s final performance?
Activations: What if we use an activation other than ReLU, e.g. sigmoid?
Dropout: What if we tried adding Dropout layers, which are known to prevent overfitting?
Validation: We can also use the testing dataset for validation during training. Keras will evaluate the model on the validation set at the end of each epoch and report the loss
and any metrics we asked for. This allows us to monitor our model’s progress over time during training, which can be useful to identify overfitting and even support early stopping.