# Introduction to Neural Networks

Welcome to this basic introduction to neural networks. While this course aims to cover building **state-of-the-art models** and **practical implementations in Lightning**, we will also cover the **foundations of neural networks**. Always remember, the foundations covered here may not be exhaustive, as this topic can be a vast coursework in itself. Hence, by utilizing the current best **open-source resources**, we will do our best to equip users with enough knowledge to navigate the subsequent topics.

---

Some commendable resources for learning about this vast topic include:

1.  **[Dive into Deep Learning](https://d2l.ai/chapter_introduction/index.html)** - For a more coding-oriented approach.

2.  **[Understanding Deep Learning](https://udlbook.github.io/udlbook/)** - Covering deep learning content from both theoretical and practical standpoints.

3.  **[DeepLearning.AI](https://www.deeplearning.ai/)** - Last but not least, the coursework produced by Professor Andrew Ng, which has had a significant impact on our machine learning and deep learning community.

---

Let us now get accustomed to some **industrial terms** in general machine learning and deep learning.

---

## Supervised Learning:

As the name suggests, a **model** learns the relationship between **one or more inputs** and **one or more outputs**. For example, a model might take **multiple features of an image** as input to recognize an **output class of animals** in the image, such as "cat" or "dog."

---

### Mathematical Representation - Classification

Let's represent the concept of classification mathematically:

* **Input ($X$):**
    If the model receives multiple features, we can represent the input as a vector:
    $$X = [x_1, x_2, \ldots, x_n]$$
    Here, $n$ is the total number of input features. In our image recognition example, $X$ could be a flattened array of pixel values or a set of extracted visual features from the image.

* **Output ($Y$):**
    For classification, the output is typically a probability distribution over the possible classes. For instance, for "cat" and "dog" classes:
    $$Y = [P(\text{cat}), P(\text{dog})]$$
    Here, $P(\text{cat})$ is the predicted probability that the image contains a cat, and $P(\text{dog})$ is the predicted probability that it contains a dog. The model's final prediction would be the class with the highest probability.

* **The Model ($f$):**
    The model itself is essentially a function that maps the input to the output. This function, $f$, encapsulates all the learned relationships:
    $$Y = f[X]$$
    In a neural network, this function $f$ is a complex arrangement of interconnected nodes (neurons) that perform linear transformations followed by non-linear activation functions. The "learning" part involves adjusting the internal **parameters** (weights and biases) of this function during training.

---

### Mathematical Representation - Regression

If you recall, we discussed estimating an output function using multiple input variables in Chapter 1. Here, we'll demonstrate a regression example where we predict a continuous output $y$.

The equation for $y$ - our **Output Variable** - represents an underlying true function. Our model, often a Fully Connected Neural Network (or Multi-Layer Perceptron), aims to approximate this function. This model is **The Model** in our context:
$$y = 10x_{1}^{2} + 5x_{2}^{2} + 2x_{1}x_{2} + 3x_{1} + 4x_{2} + \varepsilon$$
Our model, aiming to approximate this true function, can be generally expressed as $f(x_{1}, x_{2}, \phi)$.

Where the variables are distributed as follows for generating our data: **Input Variables**
* $x_1$ follows a continuous uniform distribution between -10 and 10, denoted as $x_{1} \sim \mathcal{U}(-10, 10)$.
* $x_{2}$ follows a continuous uniform distribution between 0 and 5, denoted as $x_{2} \sim \mathcal{U}(0, 5)$.
* $\varepsilon$ represents Gaussian noise with a mean of 0 and a variance of $2^{2} = 4$, denoted as $\varepsilon \sim \mathcal{N}(0, 2^{2})$.
* $\phi$ represents the estimated **parameters** (e.g., weights and biases) that the model learns to estimate this second-order quadratic equation.

---

### General Representation of Supervised Learning

In simple terms, in supervised learning, we always try to estimate $Y$ (which can be a single or multiple outputs) by utilizing one or more inputs, $X$. The model inherently contains **parameters** $\phi$. This choice of parameters represents the learned relationship between $X$ and $Y$:
$$Y = f(X, \phi)$$

### What is Learning? How Does a Model Estimate the $\hat{\phi}$ Parameters?

At its core, **learning** in supervised machine learning is the process of finding the optimal **parameters ($\hat{\phi}$)** for a model such that its estimated output ($\hat{Y}$) is as close as possible to the true, actual output ($Y_{actual}$). We achieve this by exposing the model to a **training dataset**, which consists of numerous examples of input-output pairs ($X, Y_{actual}$).

During this training process, we quantify the model's performance using a **loss function ($L$)**. This is a scalar value that summarizes the overall inaccuracy of the model's predictions across the entire training dataset. A **lower loss value indicates higher accuracy**, meaning the model's predictions are closer to the actual values. The discrepancy between the model's prediction and the true value for a single training example is often referred to as the **error ($e_i$)** for that specific example $i$.

Therefore, the parameters $\hat{\phi}$ are estimated by **minimizing the loss** over the training dataset. Mathematically, this objective is represented as:

$$\hat{\phi} = \underset{\phi}{\text{argmin}} \ L(\phi)$$

This equation states that we are searching for the set of parameters $\phi$ that minimizes the loss function $L$.

### What is loss? 

Now, let's clarify the loss function. The simplest conceptualization of loss is the difference between the predicted and actual values. However, for practical and mathematical reasons (like ensuring differentiability for optimization), loss functions are usually defined using operations like squaring the difference for regression or using cross-entropy for classification.

A more precise representation of the loss function, taking into account the entire dataset and commonly used forms, would be:

$$L(\phi) = \frac{1}{M} \sum_{i=1}^{M} \text{Loss}(\hat{Y}_i, Y_{actual,i})$$

Where:
* $M$ is the total number of examples in the training dataset.
* $\hat{Y}_i = f(X_i, \phi)$ is the model's predicted output for the $i$-th input example $X_i$, using the current parameters $\phi$.
* $Y_{actual,i}$ is the true, known output for the $i$-th input example.
* $\text{Loss}(\cdot, \cdot)$ represents a specific function that quantifies the mismatch between the predicted and actual values for a single example. Common examples include:
    * **Mean Squared Error (MSE)** for regression: $\text{Loss}(\hat{Y}_i, Y_{actual,i}) = (\hat{Y}_i - Y_{actual,i})^2$
    * **Binary Cross-Entropy** or **Categorical Cross-Entropy** for classification.

The minimization process typically involves an optimization algorithm (like Gradient Descent) that iteratively adjusts $\phi$ in the direction that most rapidly reduces $L(\phi)$ until a satisfactory minimum is reached. This iterative adjustment is the essence of "learning."

### sample code to generate interactive plots 
```python
import numpy as np
import plotly.graph_objects as go
from ipywidgets import interact, FloatSlider, Layout
from IPython.display import display, HTML

# --- 1. Data Generation Function ---
def generate_regression_data(num_samples: int = 50, true_w: float = 2.0, true_b: float = 5.0, noise_std: float = 2.0) -> tuple[np.ndarray, np.ndarray]:
    """
    Generates synthetic linear regression data for demonstration purposes.

    The data follows a linear relationship with added Gaussian noise:
    y_true = true_w * X + true_b + noise

    Args:
        num_samples (int): The number of data points to generate.
        true_w (float): The true slope (weight) of the underlying linear relationship.
        true_b (float): The true intercept (bias) of the underlying linear relationship.
        noise_std (float): The standard deviation of the Gaussian noise added to the outputs.

    Returns:
        tuple[np.ndarray, np.ndarray]:
            A tuple containing:
            - X (np.ndarray): The input features, uniformly distributed between -10 and 10.
            - y_true (np.ndarray): The true output values corresponding to X, with noise.
    """
    np.random.seed(42) # for reproducibility
    X = np.random.uniform(-10, 10, num_samples) # Input features
    y_true = true_w * X + true_b + np.random.normal(0, noise_std, num_samples) # True outputs with noise
    return X, y_true

# --- 2. Model Prediction Function ---
def linear_regression_predict(X: np.ndarray, w: float, b: float) -> np.ndarray:
    """
    Predicts outputs using a simple linear regression model.

    The model's prediction is given by:
    y_hat = w * X + b

    Args:
        X (np.ndarray): Input features for which to make predictions.
        w (float): The weight (slope) parameter of the linear model.
        b (float): The bias (intercept) parameter of the linear model.

    Returns:
        np.ndarray: The predicted output values (y_hat).
    """
    return w * X + b

# --- 3. Loss Function ---
def mean_squared_error(y_pred: np.ndarray, y_true: np.ndarray) -> float:
    """
    Calculates the Mean Squared Error (MSE) between predicted and true values.

    MSE is a common loss function for regression problems, defined as:
    MSE = (1/N) * sum((y_pred_i - y_true_i)^2)

    Args:
        y_pred (np.ndarray): The array of predicted output values from the model.
        y_true (np.ndarray): The array of true (actual) output values from the dataset.

    Returns:
        float: The calculated Mean Squared Error.
    """
    return np.mean((y_pred - y_true)**2)

# --- 4. Function to Setup Initial Plotly FigureWidget ---
def setup_interactive_plot(X_train: np.ndarray, y_train: np.ndarray, initial_w: float, initial_b: float) -> go.FigureWidget:
    """
    Sets up and returns an initial Plotly FigureWidget for the interactive
    linear regression visualization. It initializes all traces (actual data,
    predicted line, error lines, predicted points) and the plot layout.

    Args:
        X_train (np.ndarray): The input training data features.
        y_train (np.ndarray): The true output training data values.
        initial_w (float): The initial weight (slope) parameter for the predicted line.
        initial_b (float): The initial bias (intercept) parameter for the predicted line.

    Returns:
        go.FigureWidget: The initialized Plotly FigureWidget instance.
    """
    fig = go.FigureWidget()

    # Define a static range for the regression line to ensure it covers the plot width
    x_line_for_plot = np.array([-10, 10])

    # Calculate initial predictions and loss based on initial_w and initial_b
    initial_y_pred_line = linear_regression_predict(x_line_for_plot, initial_w, initial_b)
    initial_y_pred_points = linear_regression_predict(X_train, initial_w, initial_b)
    initial_loss = mean_squared_error(initial_y_pred_points, y_train)

    # Add traces to the figure
    # Trace 0: Actual Training Data (Scatter plot)
    fig.add_trace(go.Scatter(x=X_train, y=y_train, mode='markers',
                             name='Actual Training Data',
                             marker=dict(color='blue', opacity=0.7, size=8)))

    # Trace 1: Predicted Line (Line plot)
    fig.add_trace(go.Scatter(x=x_line_for_plot, y=initial_y_pred_line, mode='lines',
                             name=f'Predicted Line: y = {initial_w:.2f}x + {initial_b:.2f}',
                             line=dict(color='red', width=3)))

    # Trace 2: Individual Error Lines (Scatter plot with 'lines' mode and None for breaks)
    error_x = []
    error_y = []
    for i in range(len(X_train)):
        error_x.extend([X_train[i], X_train[i], None]) # 'None' creates a break between segments
        error_y.extend([y_train[i], initial_y_pred_points[i], None])

    fig.add_trace(go.Scatter(x=error_x, y=error_y, mode='lines',
                             line=dict(color='gray', width=1, dash='dot'),
                             hoverinfo='none', # Disable hover info to keep it clean
                             showlegend=False)) # Hide from legend as it's a visual aid, not a main data series


    # Trace 3: Predicted Points (Scatter plot on the regression line)
    fig.add_trace(go.Scatter(x=X_train, y=initial_y_pred_points, mode='markers',
                             marker=dict(color='green', symbol='circle', size=6, opacity=0.7,
                                         line=dict(color='green', width=1)),
                             hoverinfo='none', # Disable hover info
                             showlegend=False)) # Hide from legend


    # Update general layout properties of the figure
    fig.update_layout(
        xaxis_title='Input Feature $X$', # LaTeX for axis title
        yaxis_title='Output $Y$', # LaTeX for axis title
        xaxis_range=[-11, 11],
        yaxis_range=[-18, 28],
        title_text='Interactive Linear Regression: Estimating Parameters ɸ', # Unicode phi for main title
        title_x=0.5, # Center the main title
        hovermode='closest', # Optimizes hover interactions
        template="plotly_white", # Sets a clean white background theme
        
        # --- Legend Position Adjustment ---
        legend=dict(
            x=1.05,        # X-coordinate relative to the plot area (1.0 is right edge)
            y=1,           # Y-coordinate (1.0 is top edge)
            xanchor='left', # Anchor the legend's left side at 'x'
            yanchor='top',  # Anchor the legend's top side at 'y'
            bgcolor='rgba(255,255,255,0.7)', # Semi-transparent background
            bordercolor='Black',
            borderwidth=1
        ),
        # --- Add a margin to the right to make space for the legend ---
        margin=dict(r=150) # Right margin in pixels. Adjust as needed.
    )

    # Add the Mean Squared Error (MSE) loss annotation
    fig.add_annotation(
        x=0.02, y=1.05, xref="paper", yref="paper", # Position: 2% from left, 105% from bottom (above plot)
        text=f'Mean Squared Error (MSE) Loss: {initial_loss:.4f}',
        showarrow=False, # Do not show an arrow pointing from the annotation
        bgcolor="yellow", # Background color for the text box
        opacity=0.9,
        borderpad=4,
        bordercolor="black",
        borderwidth=1,
        font=dict(size=12),
        align="left",
        valign="bottom" # Align bottom of text box to the y-coordinate
    )

    return fig

# --- 5. Function for Interactive Update Logic ---
def create_update_function(fig: go.FigureWidget, X_train: np.ndarray, y_train: np.ndarray, x_line_for_plot: np.ndarray):
    """
    Creates the inner callback function that ipywidgets.interact will execute
    whenever the slider values (w or b) change. This function updates the
    existing Plotly FigureWidget in place for smooth, live interaction.
    """
    def update_plot_plotly(w: float, b: float):
        """
        Updates the Plotly FigureWidget's traces and annotations based on
        the current weight (w) and bias (b) slider values.
        """
        # Calculate new predictions and loss based on current parameters
        y_pred_line = linear_regression_predict(x_line_for_plot, w, b)
        y_pred_points = linear_regression_predict(X_train, w, b)
        current_loss = mean_squared_error(y_pred_points, y_train)

        # Update Predicted Line trace (Trace 1)
        fig.data[1].x = x_line_for_plot # Ensure x-data is consistent
        fig.data[1].y = y_pred_line
        fig.data[1].name = f'Predicted Line: y = {w:.2f}x + {b:.2f}' # Update legend label

        # Update Error Lines trace (Trace 2)
        # Reconstruct x and y data for error lines for each update
        error_x_updated = []
        error_y_updated = []
        for i in range(len(X_train)):
            error_x_updated.extend([X_train[i], X_train[i], None])
            error_y_updated.extend([y_train[i], y_pred_points[i], None])
        fig.data[2].x = error_x_updated
        fig.data[2].y = error_y_updated

        # Update Predicted Points trace (Trace 3)
        fig.data[3].x = X_train # Ensure x-data is consistent
        fig.data[3].y = y_pred_points

        # Update the text of the MSE loss annotation (located at index 0 in annotations list)
        if fig.layout.annotations: # Robustly check if annotations exist
            fig.layout.annotations[0].text = f'Mean Squared Error (MSE) Loss: {current_loss:.4f}'

    return update_plot_plotly

# --- 6. Main Orchestration Function ---
def run_interactive_regression_demo():
    """
    Orchestrates the entire interactive linear regression demonstration.
    This function generates the data, sets up the Plotly FigureWidget,
    creates the interactive widgets, and links them to the plot update function.
    """
    # Define initial parameters for the demo (matching screenshot's starting values)
    initial_w, initial_b = 1.78, 6.20
    num_samples = 50

    # 1. Generate the synthetic training data
    X_train, y_train = generate_regression_data(num_samples=num_samples)

    # 2. Set up the initial Plotly FigureWidget
    interactive_fig = setup_interactive_plot(X_train, y_train, initial_w, initial_b)

    # 3. Create the update function, passing the figure and data
    x_line_for_plot = np.array([-10, 10]) # This range is static for the line
    update_func = create_update_function(interactive_fig, X_train, y_train, x_line_for_plot)

    # 4. Create interactive FloatSlider widgets for weight (w) and bias (b)
    w_slider = FloatSlider(min=-5.0, max=5.0, step=0.01, value=initial_w,
                           description='Weight ɸ₁:', # Using Unicode subscript 1
                           continuous_update=True, layout=Layout(width='auto'))
    b_slider = FloatSlider(min=-10.0, max=15.0, step=0.01, value=initial_b,
                           description='Bias ɸ₂:', # Using Unicode subscript 2
                           continuous_update=True, layout=Layout(width='auto'))

    # 5. Display the Plotly FigureWidget in the Jupyter output
    display(interactive_fig)

    # 6. Link the sliders to the update function using ipywidgets.interact
    # This establishes the dynamic connection between slider movements and plot updates.
    interact(update_func, w=w_slider, b=b_slider);

    # Print guiding instructions for the user
    print("\nAdjust the sliders above to see how changing the model's parameters (ɸ₁ and ɸ₂) affects the predicted line, individual errors, and the overall MSE loss.")
    print("The goal of 'learning' is to find the ɸ₁ and ɸ₂ values that result in the lowest possible MSE loss, meaning the red line best fits the blue data points.")

# --- Execute the main function to run the demo ---

run_interactive_regression_demo()
```