# Curve Fitting

Traditional curve fitting involves using mathematical models, typically polynomial functions, to best approximate the underlying relationship between variables in a dataset. This process adjusts the parameters of the chosen model to minimize the difference—often measured by the sum of the squares of the errors—between the predicted values by the model and the observed data points. This technique is widely used in fields ranging from physics to finance, wherever a relationship between variables needs to be quantified and predictions are required based on observed trends.

Since neural networks extend this concept to more complex and high-dimensional data, we shall begin by a refresher on traditional models that are used in curve fitting. These are linear models. This notebook explores such models in both regression and binary classification tasks i.e., through the logistics regression formulation of a linear model. 

## Data descriptions

#### Linear regression examples
Synthetic data that resembles a linear behavior--distributed nicely along a line (or in the case of the higher dimensional 2nd example, a plane) was generated to allow learners to **tune** the param

#### Logistic regression exercise
The sample data we're using here resembles a binary classification, where each sample belongs to either `Class: 0` or `Class: 1`. This is quite common in tasks that requires identifying examples that are `positive` to a particular condition, disease, diagnosis, or some other classification criteria.

The binary classification task takes in two input `features`. We can think of features as characteristics of an example. If consider humans as examples, then the two features can be height & weight, age & gender, gender & income, etc. If we consider this in a medical sense, then the two features can be test result A & B, lifestyle & age, or whatever pair of characteristics available to us.

In our activity, we will refer to these as `Feature 0` and `Feature 1`.

In [10]:
# # run this cell
# !git clone https://github.com/mikedataCrunch/GMS5204.git
# !mv ./GMS5204/* .
# !rm ./GMS5204

In [11]:
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interactive, FloatSlider, Layout, widgets
from IPython.display import display
from mpl_toolkits.mplot3d import Axes3D

import  numpy as np


def create_line(slope, intercept):
    """Return x and y arrays that express a line with slope and intercept."""
    x_vals = np.array([min(x), max(x)])
    y_vals = intercept + slope * x_vals
    return x_vals, y_vals


# Define the model function for the surface
def model_surface(x, y, a, b, bias):
    return a * x + b * y + bias
    
def calculate_bce_loss(y_true, y_pred):
    """
    Calculate the binary cross-entropy loss.

    Parameters:
    -----------
    y_true (array-like): True binary labels (0 or 1).
    y_pred (array-like): Predicted probabilities, between 0 and 1.

    Returns:
    --------
    float: The average binary cross-entropy loss.
    """
    # Ensure that y_pred does not contain values exactly equal to 0 or 1,
    # as log(0) is undefined and can cause computation errors.
    epsilon = 1e-10
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)

    # Calculate binary cross-entropy loss
    loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
    return loss


## Linear Regression

In [12]:
# Generate some sample data
np.random.seed(42)
x_pos = np.random.rand(500) * 10  # random data points between 0 and 10
x_neg = np.random.rand(500) * -10  # random data points between 0 and -10
x = np.concatenate([x_pos, x_neg])
y = 2 * x + 1 + np.random.randn(1000) * 2  # model with noise


# Function to update the plot with the new line
def update_plot(slope, intercept):
    x_vals, y_vals = create_line(slope, intercept)
    plt.scatter(x, y, color='blue', label='Data Points', alpha=0.5)
    plt.plot(x_vals, y_vals, color='red', label='Fit Line')
    plt.title('Interactive Linear Regression')
    plt.xlabel('X')
    plt.ylabel('Y')
    plt.legend()
    plt.grid(True)
    plt.show()

slider_style = {'description_width': 'initial', 'handle_color': 'lightblue'}  # Adjust handle color and description width
layout = Layout(width='600px')
# Set up interactive widget
interactive_plot = interactive(
    update_plot,
    slope=FloatSlider(value=-2.0, min=-2.5, max=2.5, step=0.05, style=slider_style, layout=layout, description='Slope:'),
    intercept=FloatSlider(value=-10, min=-10, max=10, step=0.1, style=slider_style, layout=layout, description='Intercept:')
)
display(interactive_plot)


interactive(children=(FloatSlider(value=-2.0, description='Slope:', layout=Layout(width='600px'), max=2.5, min…

## Linear Regression with 2 input features

In [13]:
# Generate the grid for the surface plot
x = np.linspace(-10, 10, 20)
y = np.linspace(-10, 10, 20)
x, y = np.meshgrid(x, y)

# Generate simulated data points
np.random.seed(42)
x_points = np.random.uniform(-10, 10, 100)
y_points = np.random.uniform(-10, 10, 100)
z_points = model_surface(x_points, y_points, 1, 1, 0)  # Initial data points on the surface

def plot_surface(a, b, bias, azimuth, elevation):
    z = model_surface(x, y, a, b, bias)
    
    fig = plt.figure(figsize=(10, 7))
    ax = fig.add_subplot(111, projection='3d')
    
    # Plotting the surface
    ax.plot_surface(x, y, z, cmap='viridis', edgecolor='none', alpha=0.6)
    
    # Adding scatter plot for the data points
    ax.scatter(x_points, y_points, z_points, color='red', label='Data Points')
    
    # Setting labels and title
    ax.set_xlabel('Feature 1')
    ax.set_ylabel('Feature 2')
    ax.set_zlabel('Z (Output)')
    ax.set_title('Interactive 3D Surface Plot')
    ax.legend()
    
    # Set the view angle
    ax.view_init(elev=elevation, azim=azimuth)
    
    plt.show()

slider_style = {'description_width': 'initial', 'handle_color': 'lightblue'}  # Adjust handle color and description width
layout = Layout(width='600px')

# Create interactive widget
interactive_plot = interactive(
    plot_surface,
    a=FloatSlider(value=1, min=0, max=2, step=0.1, description='Coeff. a:', style=slider_style, layout=layout),
    b=FloatSlider(value=1, min=0, max=2, step=0.1, description='Coeff. b:', style=slider_style, layout=layout),
    bias=FloatSlider(value=0, min=-5, max=5, step=0.1, description='Bias:', style=slider_style, layout=layout),
    azimuth=FloatSlider(value=45, min=0, max=360, step=1, description='Orient: z:', style=slider_style, layout=layout),
    elevation=FloatSlider(value=30, min=0, max=90, step=1, description='Orient: elev:', style=slider_style, layout=layout)
                              
)
display(interactive_plot)


interactive(children=(FloatSlider(value=1.0, description='Coeff. a:', layout=Layout(width='600px'), max=2.0, s…

## Class Activity: Logistic Regression

In [14]:
# Define the logistic regression function
def logistic_regression(feature_1, feature_2, weights, bias):
    logit = (weights[0] * feature_1) + (weights[1] * feature_2) + bias
    return 1 / (1 + np.exp(-logit)) # sigmoid function

In [15]:
# Generate synthetic data
np.random.seed(42)
# Class 0
feature_0_class_0 = np.random.normal(2, 1, 100)  # Feature 1 for class 0
feature_1_class_0 = np.random.normal(2, 1, 100)  # Feature 2 for class 0
# Class 1
feature_0_class_1 = np.random.normal(5, 1, 100)  # Feature 1 for class 1
feature_1_class_1 = np.random.normal(5, 1, 100)  # Feature 2 for class 1

features = np.vstack((np.column_stack((feature_0_class_0, feature_1_class_0)),
                      np.column_stack((feature_0_class_1, feature_1_class_1))))
y_true = np.array([0]*100 + [1]*100)

(200,)

In [22]:
y_true[:y_true.size // 2]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [35]:
def plot_decision_boundary(weight1, weight2, bias):
    # Grid for decision boundary visualization
    feature_0, feature_1 = np.meshgrid(
        np.linspace(
            min(np.concatenate([feature_0_class_0, feature_0_class_1])), 
            max(np.concatenate([feature_0_class_0, feature_0_class_1])), 
            50
        ),
        np.linspace(
            min(np.concatenate([feature_1_class_0, feature_1_class_1])), 
            max(np.concatenate([feature_1_class_0, feature_1_class_1])), 
            50
        ),
    )
    
    # Calculate z values for the contour
    zz = logistic_regression(feature_0, feature_1, [weight1, weight2], bias)
    zz = zz.reshape(feature_0.shape)
    
    plt.figure(figsize=(10, 8))
    plt.scatter(
        features[:,0][:y_true.size // 2], 
        features[:,1][:y_true.size // 2], 
        c='blue',
        label='Class: 0',
        alpha=0.8
    )
    plt.scatter(
        features[:,0][y_true.size // 2:], 
        features[:,1][y_true.size // 2:], 
        c='red',
        label='Class: 1',
        alpha=0.8
    )
    
    contour = plt.contourf(
        feature_0, 
        feature_1,    
        zz, 
        levels=[0, 0.5, 1],
        cmap="coolwarm", 
        alpha=0.3)
    
    plt.colorbar(contour)
    # Decision boundary line for the zz = 0.5 threshold
    plt.contour(
        feature_0, 
        feature_1,
        zz, 
        levels=[0.5], 
        colors='k', 
        vmin=0, vmax=1, 
        linestyles='dashed')
    
    plt.title('Interactive Logistic Regression Decision Boundary')
    plt.xlabel('Feature 0')
    plt.ylabel('Feature 1')
    plt.legend()
    plt.grid(True)
    plt.show()

slider_style = {'description_width': 'initial', 'handle_color': 'lightblue'}  # Adjust handle color and description width
layout = Layout(width='600px')

# Create interactive widget
interactive_plot = interactive(
    plot_decision_boundary,                       
    weight1=FloatSlider(value=1, min=-15, max=15, step=0.01, style=slider_style, layout=layout, description='Weight 1:'),                 
    weight2=FloatSlider(value=-1, min=-15, max=15, step=0.01, style=slider_style, layout=layout, description='Weight 2:'),       
    bias=FloatSlider(value=0, min=-20, max=20, step=0.01, style=slider_style, layout=layout, description='Bias:'))
display(interactive_plot)


interactive(children=(FloatSlider(value=1.0, description='Weight 1:', layout=Layout(width='600px'), max=15.0, …

In [28]:
weight_1 = widgets.FloatText(
    value=0.0,
    description='Weight 1:',
    step=0.1,
    style={'description_width': 'initial'}
)

weight_2 = widgets.FloatText(
    value=0.0,
    description='Weight 2:',
    step=0.1,
    style={'description_width': 'initial'}
)

bias = widgets.FloatText(
    value=0.0,
    description='Bias',
    step=0.1,
    style={'description_width': 'initial'}
)

# Display the widget
display(weight_1, weight_2, bias)

# Output widget to display the results
output = widgets.Output()
display(output)

# Function to update the output based on the inputs
def update_output(*args):
    with output:
        output.clear_output()
        
        y_pred = logistic_regression(
            features[:, 0], 
            features[:, 1], 
            weights=[weight_1.value, weight_2.value], 
            bias=bias.value
        )
            
        # Calculate loss
        loss = calculate_bce_loss(y_true, y_pred)
        
        print(f"Weight 1 Value: {weight_1.value}")
        print(f"Weight 2 Value: {weight_2.value}")
        print(f"Bias Value: {bias.value}")

        print(f"BCE LOSS (lower better): {loss:.4f}")

# Attach the observer to the 'value' trait of the float_input widget
# Observe changes in each widget and call update_output when any change happens
weight_1.observe(update_output, names='value')
weight_2.observe(update_output, names='value')
bias.observe(update_output, names='value')

FloatText(value=0.0, description='Weight 1:', step=0.1, style=DescriptionStyle(description_width='initial'))

FloatText(value=0.0, description='Weight 2:', step=0.1, style=DescriptionStyle(description_width='initial'))

FloatText(value=0.0, description='Bias', step=0.1, style=DescriptionStyle(description_width='initial'))

Output()

## End.