# Explainable Artificial Intelligence with MicroPython
#### A comprehensive codebook by Prof. Dr. habil. Dennis Klinkhammer (2025)

## Statistical Basics - I
### Mean, Variance and Standard Deviation

One variable of the **trees dataset**, provided by Atkinson, A. C. (1985): *Plots, Transformations and Regression* via Oxford University Press:

In [1]:
# Girth (x) of Black Cherry Trees
x = [8.3, 8.6, 8.8, 10.5, 10.7, 10.8, 11, 11, 11.1, 11.2, 11.3, 
     11.4, 11.4, 11.7, 12, 12.9, 12.9, 13.3, 13.7, 13.8, 14, 
     14.2, 14.5, 16, 16.3, 17.3, 17.5, 17.9, 18, 18, 20.6]

The **mean**, also known as the arithmetic mean, is one of the most common measures of central tendency in statistics. It represents the average value of a dataset and provides a single value that summarizes the entire data distribution. To calculate the mean, you sum all values in your dataset and divide this total by the number of values:

In [2]:
# Mean
def mean(data):
    return sum(data) / len(data)

The **sample variance** is a measure of how spread out the values in a dataset are. It quantifies the average squared deviation from the mean, giving insight into the variability within the sample. Unlike population variance, it divides by n−1n−1 to account for the degrees of freedom, making it an unbiased estimator when working with a sample:

In [3]:
# Variance
def variance(data):
    m = mean(data)
    return sum((x - m) ** 2 for x in data) / (len(data) - 1)

The **standard deviation** is the square root of the variance and provides a measure of spread in the same units as the original data. It indicates how much the values in a dataset typically deviate from the mean, making it easier to interpret than variance in practical terms:

In [4]:
# Standard Deviation
def std_dev(data):
    return variance(data) ** 0.5

These are **application examples** for mean, sample variance and standard deviation in MicroPython:

In [5]:
# Application Examples
print("Mean", mean(x))
print("Variance", variance(x))
print("Standard Deviation", std_dev(x))

Mean 13.248387096774193
Variance 9.847913978494624
Standard Deviation 3.1381386168387504


## Statistical Basics - II
### Covariance, Correlation and Single Linear Regression

Two variables of the **trees dataset**, provided by Atkinson, A. C. (1985): *Plots, Transformations and Regression* via Oxford University Press:

In [6]:
# Girth (x) and Volume (y) of Black Cherry Trees
x = [8.3, 8.6, 8.8, 10.5, 10.7, 10.8, 11, 11, 11.1, 11.2, 11.3, 
     11.4, 11.4, 11.7, 12, 12.9, 12.9, 13.3, 13.7, 13.8, 14, 
     14.2, 14.5, 16, 16.3, 17.3, 17.5, 17.9, 18, 18, 20.6]

y = [10.3, 10.3, 10.2, 16.4, 18.8, 19.7, 15.6, 18.2, 22.6, 19.9,
     24.2, 21, 21.4, 21.3, 19.1, 22.2, 33.8, 27.4, 25.7, 24.9,
     34.5, 31.7, 36.3, 38.3, 42.6, 55.4, 55.7, 58.3, 51.5, 51, 77]

The **covariance** measures the directional relationship between two variables. A positive covariance indicates that the variables tend to increase together, while a negative covariance suggests that as one increases, the other tends to decrease. It's a foundational concept in statistics for understanding how two variables vary together:

In [7]:
# Covariance
def covariance(x, y):
    mx = mean(x)
    my = mean(y)
    return sum((x[i] - mx) * (y[i] - my) for i in range(len(x))) / (len(x) - 1)

The **correlation** quantifies the strength and direction of the linear relationship between two variables. It standardizes the covariance by dividing it by the product of the standard deviations, resulting in a value between -1 and 1. A correlation close to 1 or -1 indicates a strong relationship, while a value near 0 suggests little to no linear association:

In [8]:
# Correlation
def correlation(x, y):
    return covariance(x, y) / (std_dev(x) * std_dev(y))

A **simple linear regression** models the relationship between two variables by fitting a straight line to the data. It calculates the slope b and intercept a of the line y=a+bx, where b indicates how much y changes for each unit increase in x, and a is the predicted value of y when x=0:

In [9]:
# Simple Linear Regression
def linear_regression(x, y):
    b = covariance(x, y) / variance(x)
    a = mean(y) - b * mean(x)
    return a, b

The **predict function** is required to determine the respective y values for the underlying x values via a and b:

In [10]:
# Predict Function
def predict(x_new, a, b):
    return a + b * x_new

**Residuals** represent the differences between the observed values and the predicted values from a linear regression model. They indicate how well the model fits the data: a residual close to 0 means a good fit, while larger residuals suggest that the model doesn't capture the data as accurately. The residuals can be used to assess the assumptions of linear regression and identify any outliers:

In [11]:
# Residuals
def residuals(x, y, a, b):
    return [y[i] - (a + b * x[i]) for i in range(len(x))]

The **coefficient of determination** measures the proportion of variance in the dependent variable that is explained by the independent variable in a regression model. It indicates the goodness of fit: an coefficient of determination close to 1 means that the model explains most of the variance, while a value near 0 suggests the model doesn’t capture much of the variability:

In [12]:
# Coefficient of Determination
def r_squared(x, y, a, b):
    y_mean = mean(y)
    ss_tot = sum((yi - y_mean) ** 2 for yi in y)
    ss_res = sum((y[i] - (a + b * x[i])) ** 2 for i in range(len(y)))
    return 1 - ss_res / ss_tot

These are **application examples** for covariance, correlation, as well as the single linear regression with the coresponding predictions, residuals and the coefficient of determination in MicroPython:

In [13]:
# Apllication Examples
print("Covariance:", covariance(x, y))
print("Correlation:", correlation(x, y))

a, b = linear_regression(x, y)
print("\nSingle Linear Regression: y = {:.2f} + {:.2f} * x".format(a, b))
print("Predictions for x = 11.4:", predict(11.4, a, b))
print("\nResiduals:", residuals(x, y, a, b))
print("\nCoefficient of Determination:", r_squared(x, y, a, b))

Covariance: 49.888118279569895
Correlation: 0.9671193682556305

Single Linear Regression: y = -36.94 + 5.07 * x
Predictions for x = 11.4: 20.8073040958404

Residuals: [5.196850814975267, 3.6770938881221404, 2.563922603553383, 0.15196668471898533, 1.5387954001502386, 1.9322097578658521, -3.1809615267028963, -0.5809615267028967, 3.3124528310127275, 0.10586718872835377, 3.8992815464439694, 0.19269590415959925, 0.5926959041595978, -1.0270610226935268, -4.74681794954666, -6.20608873010605, 5.393911269893948, -3.0324312992435623, -6.758773868381059, -8.065359510665438, 0.5214692047658076, -3.291702079802949, -0.21145900665607087, -5.810243640921726, -3.030000567774856, 4.704143009381376, 3.990971724812624, 4.564629155675121, -2.7419564866092543, -3.2419564866092543, 9.586816813996947]

Coefficient of Determination: 0.93531987245517


## Machine Learning: Regression
### Multiple Linear Regression

Three variables of the **trees dataset**, provided by Atkinson, A. C. (1985): *Plots, Transformations and Regression* via Oxford University Press:

In [14]:
# Girth (x1), Height (x2) and Volume (y) of Black Cherry Trees 
X = [[8.3, 70], [8.6, 65], [8.8, 63], [10.5, 72], [10.7, 81], [10.8, 83], [11, 66], [11, 75], [11.1, 80], [11.2, 75],
    [11.3, 79], [11.4, 76], [11.4, 76], [11.7, 69], [12, 75], [12.9, 74], [12.9, 85], [13.3, 86], [13.7, 71], [13.8, 64],
    [14, 78], [14.2, 80], [14.5, 74], [16, 72], [16.3, 77], [17.3, 81], [17.5, 82], [17.9, 80], [18, 80], [18, 80], [20.6, 87]]

y = [10.3, 10.3, 10.2, 16.4, 18.8, 19.7, 15.6, 18.2, 22.6, 19.9,
     24.2, 21, 21.4, 21.3, 19.1, 22.2, 33.8, 27.4, 25.7, 24.9,
     34.5, 31.7, 36.3, 38.3, 42.6, 55.4, 55.7, 58.3, 51.5, 51, 77]

**Matrix inversion** is essential in solving systems of linear equations, particularly in methods like multiple linear regression. The following code implements the Gaussian elimination method to invert a matrix, ensuring it is invertible by checking for non-zero pivots during the process:

In [15]:
# Mathematical Basics - Matrix Inversion
def invert_matrix(matrix):
    n = len(matrix)
    identity = [[float(i == j) for j in range(n)] for i in range(n)]
    m = [row[:] for row in matrix]

    for i in range(n):
        max_row = i
        max_val = abs(m[i][i])
        for k in range(i + 1, n):
            if abs(m[k][i]) > max_val:
                max_val = abs(m[k][i])
                max_row = k

        if max_val == 0:
            raise ValueError("Matrix is not invertible!")

        if max_row != i:
            m[i], m[max_row] = m[max_row], m[i]
            identity[i], identity[max_row] = identity[max_row], identity[i]

        factor = m[i][i]
        for j in range(n):
            m[i][j] /= factor
            identity[i][j] /= factor

        for k in range(n):
            if k != i:
                factor = m[k][i]
                for j in range(n):
                    m[k][j] -= factor * m[i][j]
                    identity[k][j] -= factor * identity[i][j]

    return identity

**Matrix transposition** involves flipping a matrix over its diagonal, converting rows into columns and vice versa. The resulting matrix is called the transpose of the original matrix. Transposition is commonly used in linear algebra, especially in operations like solving systems of equations or adjusting data representations:

In [16]:
# Mathematical Basics - Matrix Transposition
def transpose(matrix):
    return [[row[i] for row in matrix] for i in range(len(matrix[0]))]

**Matrix multiplication** is a way of combining two matrices to create a new one. This operation is essential in many areas of linear algebra, including solving systems of linear equations and applying transformations. It is important for multiple linear regression because it allows you to calculate the coefficients of the regression model by multiplying the inverse of the design matrix with the target values:

In [17]:
# Mathematical Basics - Matrix Multiplication
def matmul(A, B):
    result = []
    for i in range(len(A)):
        row = []
        for j in range(len(B[0])):
            val = sum(A[i][k] * B[k][j] for k in range(len(B)))
            row.append(val)
        result.append(row)
    return result

With these mathematical basics, the **multiple linear regression** can be calculated as follows:

In [18]:
 # Multiple Linear Regression
def multivariate_regression(X_raw, y):
    X = [[1] + row for row in X_raw]
    y_vec = [[val] for val in y]
    
    XT = transpose(X)
    XTX = matmul(XT, X)
    XTX_inv = invert_matrix(XTX)
    XTy = matmul(XT, y_vec)
    
    beta = matmul(XTX_inv, XTy)
    return [b[0] for b in beta]

A slightly modified **predict function** is required to determine the respective y values for the underlying x values:

In [19]:
# Predict Function
def predict_multi(X_raw, beta):
    X = [[1] + row for row in X_raw]
    return [sum(b * x for b, x in zip(beta, row)) for row in X]

Again, **residuals** represent the differences between the observed values and the predicted values:

In [20]:
# Residuals
def residuals_multi(X_raw, y, beta):
    y_pred = predict_multi(X_raw, beta)
    return [yi - y_hat for yi, y_hat in zip(y, y_pred)]

The **coefficient of determination** for a multiple linear regression model measures how well the model's predictions match the actual data. It indicates the proportion of the variance in the target variable that can be explained by the model.  It's interpretation is therefore similar to the coefficient of determination of a single linear regression model and may vary between 0 and 1, while a value closer to 0 means the model doesn't explain much of the variance:

In [21]:
# Coefficient of Determination
def r_squared_multi(X_raw, y, beta):
    y_pred = predict_multi(X_raw, beta)
    y_mean = sum(y) / len(y)
    
    ss_tot = sum((yi - y_mean) ** 2 for yi in y)
    ss_res = sum((yi - y_hat) ** 2 for yi, y_hat in zip(y, y_pred))
    
    return 1 - ss_res / ss_tot if ss_tot != 0 else 0

Finally, these are **application examples** for the multiple linear regression coefficients with coresponding predictions for one case, the residuals of the model as well as the coefficient of determination in MicroPython:

In [22]:
# Application Example
beta = multivariate_regression(X, y)
print("Coefficients:")
for i, b in enumerate(beta):
    if i == 0:
        print("a =", b)
    else:
        print("b{} = {}".format(i, b))

x_case_13 = [X[12]]
y_pred_13 = predict_multi(x_case_13, beta)[0]
print("\nPredictions for x1 = 11.4 and x2 = 76:", y_pred_13)

residuals = residuals_multi(X, y, beta)
print("\nResiduals:", residuals)

r2 = r_squared_multi(X, y, beta)
print("\nCoefficient of Determination:", r2)

Coefficients:
a = -57.987658918381555
b1 = 4.708160503017467
b2 = 0.3392512342447134

Predictions for x1 = 11.4 and x2 = 76: 21.46846461861579

Residuals: [5.462340346206641, 5.746148366524974, 5.3830187344109, 0.5258847710787862, -1.069008437727124, -1.3183269565183018, -0.5926880749616661, -1.0459491831640868, 1.186978595310606, -0.2875812837675795, 2.184597728951818, -0.4684646186157906, -0.068464618615792, 0.7938458701919693, -4.854109686181552, -5.652202904652565, 2.2160335186555855, -6.4064819167961105, -4.900977604332386, -3.797035014921157, 0.11181560504937238, -4.30831896404354, 0.9147402905194895, -3.4689979955172845, -2.2777023176460958, 4.457132242357581, 3.4762489075093868, 4.871487174791824, -2.399328875509923, -2.899328875509923, 8.484695176931666]

Coefficient of Determination: 0.9479500377816745


## Machine Learning: Classification
### Multiple Logistic Regression

Three variables of the **trees dataset**, provided by Atkinson, A. C. (1985): *Plots, Transformations and Regression* via Oxford University Press. The dependant variable has been dichotomized, whereby a volume greater than 20 results in 1, else 0:

In [23]:
# Girth (x1), Height (x2) and Binary Volume (y) of Black Cherry Trees
X = [[8.3, 70], [8.6, 65], [8.8, 63], [10.5, 72], [10.7, 81], [10.8, 83], [11, 66], [11, 75], [11.1, 80], [11.2, 75],
    [11.3, 79], [11.4, 76], [11.4, 76], [11.7, 69], [12, 75], [12.9, 74], [12.9, 85], [13.3, 86], [13.7, 71], [13.8, 64],
    [14, 78], [14.2, 80], [14.5, 74], [16, 72], [16.3, 77], [17.3, 81], [17.5, 82], [17.9, 80], [18, 80], [18, 80], [20.6, 87]]


y = [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

The **sigmoid function** in (multiple) logistic regression maps any input value to a range between 0 and 1, allowing us to interpret the result as a probability by producing an S-shaped curve ideal for binary classification with 0=no and 1=yes, for example:

In [24]:
# Mathematical Basics - Sigmoid Function
def sigmoid(z):
    return 1 / (1 + pow(2.71828, -z))

The **log function** approximates the natural logarithm using a numerical method based on the limit definition, useful when built-in log functions are unavailable in MicroPython. The natural logarithm (ln) is the inverse of the exponential function and tells us how many times we must multiply e≈2.71828 to get a given number:

In [25]:
# Mathematical Basics - Log Function
def log(x):
    n = 1000.0
    return n * ((x**(1/n)) - 1)

As a result of these mathematical basics, a function for the **prediction of probabilities** is required for processing the values of the previous sigmoid function:

In [26]:
# Mathematical Basics - Prediction of Probability
def predict_proba(x_input, weights, bias):
    z = bias
    for i in range(len(x_input)):
        z += weights[i] * x_input[i]
    return sigmoid(z)

This function trains a logistic regression model using **gradient descent**. It iteratively updates the weights and bias to minimize the error between predicted probabilities (from the sigmoid function) and actual labels. By adjusting the weights in the direction that reduces the loss, the model gradually learns to classify input data:

In [27]:
# Multivariate Logistic Regression via Gradient Descent
def train_logistic_regression(X, y, lr=0.01, epochs=5000):
    n_samples = len(X)
    n_features = len(X[0])
    weights = [0] * n_features
    bias = 0

    for _ in range(epochs):
        grad_w = [0] * n_features
        grad_b = 0
        for i in range(n_samples):
            z = bias
            for j in range(n_features):
                z += weights[j] * X[i][j]
            p = sigmoid(z)
            error = p - y[i]
            for j in range(n_features):
                grad_w[j] += error * X[i][j]
            grad_b += error
        for j in range(n_features):
            weights[j] -= lr * grad_w[j] / n_samples
        bias -= lr * grad_b / n_samples

    return weights, bias

Again, a slightly modified **predict function** is required to determine the respective y values for the underlying x values:

In [28]:
# Binary Prediction of Multivariate Logistic Regression
def predict(x_input, weights, bias):
    p = predict_proba(x_input, weights, bias)
    return 1 if p >= 0.5 else 0, p

The **application examples** for the multiple logistic regression focus on weights and bias of the model and return logits and probabilities as values for classification. A classification example highlights the functionalty of multiple logistic regression models: 

In [29]:
# Application Examples
weights, bias = train_logistic_regression(X, y)
print("Weights:", weights)
print("Intercept:", bias)

print("\nLogits and Probabilities:")
for i in range(len(X)):
    z = bias + sum([weights[j] * X[i][j] for j in range(len(weights))])
    p = sigmoid(z)
    print("x =", X[i], "Logit =", z, "P(y=1) =", p)

classification = predict([11.4, 76], weights, bias)
print("\nPredicted Class:", classification)

Weights: [4.6434554996184625, -0.6104014099036235]
Intercept: -0.6030681183051588

Logits and Probabilities:
x = [8.3, 70] Logit = -4.790486164725556 P(y=1) = 0.008239982480800134
x = [8.6, 65] Logit = -0.34544246532190825 P(y=1) = 0.4144881023070313
x = [8.8, 63] Logit = 1.8040514544090351 P(y=1) = 0.8586412534285582
x = [10.5, 72] Logit = 4.204313114627809 P(y=1) = 0.9852885767622177
x = [10.7, 81] Logit = -0.3606084745811121 P(y=1) = 0.4108123380995226
x = [10.8, 83] Logit = -1.1170657444265109 P(y=1) = 0.2465561029564774
x = [11, 66] Logit = 10.188449323858778 P(y=1) = 0.9999623990060196
x = [11, 75] Logit = 4.694836634726166 P(y=1) = 0.990940436398861
x = [11.1, 80] Logit = 2.1071751351698973 P(y=1) = 0.8915984734384245
x = [11.2, 75] Logit = 5.623527734649856 P(y=1) = 0.9964011082757194
x = [11.3, 79] Logit = 3.6462676449972173 P(y=1) = 0.9745749171336164
x = [11.4, 76] Logit = 5.941817424669929 P(y=1) = 0.9973796234329447
x = [11.4, 76] Logit = 5.941817424669929 P(y=1) = 0.99737

## Machine Learning: Clustering
### K-Means based upon Within Cluster Sum of Squares

Two variables and 15 cases of the original **trees dataset**, provided by Atkinson, A. C. (1985): *Plots, Transformations and Regression* via Oxford University Press. The other 15 cases are simulated trees, based upon another type of tree. Therefore, the dependant variable is dichotomized, indicating black cherry trees from the original dataset by 0 and simulated trees by 1:

In [30]:
# Girth (x1), Height (x2) and Class (y) of Black Cherry Trees and Simulated Trees
X = [
    [8.3, 70], [8.6, 65], [8.8, 63], [10.5, 72], [10.7, 81], [10.8, 83], [11.0, 66], [11.0, 75], [11.1, 80],
    [11.2, 75], [11.3, 79], [11.4, 76], [11.7, 69], [12.0, 75], [12.9, 74], [5.2, 45], [5.5, 48], [6.0, 50],
    [6.3, 46], [6.7, 49], [7.0, 51], [7.2, 47], [7.4, 52], [7.5, 50], [7.7, 46], [7.9, 53], [8.1, 49],
    [8.4, 47], [8.5, 54], [8.7, 52]
]

# y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1 ,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

The **euclidean distance** measures the straight-line distance between two points in a multi-dimensional space, calculated as the square root of the sum of the squared differences between corresponding coordinates. It’s commonly used in clustering and classification tasks to determine similarity between data points:

In [31]:
# Mathematical Basics - Euclidean Distance
def euclidean_distance(p1, p2):
    return sum((p1[i] - p2[i])**2 for i in range(len(p1))) ** 0.5

The **initializing centroids function** sets the starting points for the cluster centers and influences the convergence of the algorithm and the quality of the final clusters, as it determines how the data is grouped during the iterative process:

In [32]:
# Initializing Centroids Function
def initialize_centroids(X, k):
    return [X[i][:] for i in range(k)]

The **assigning clusters function** groups data points into clusters based on their proximity to the centroids. For each point, it calculates the euclidean distance to each centroid and assigns the point to the closest centroid’s cluster, ensuring that each cluster contains the points nearest to its respective centroid:

In [33]:
# Assigning Clusters Function
def assign_clusters(X, centroids):
    clusters = [[] for _ in centroids]
    for point in X:
        distances = [euclidean_distance(point, centroid) for centroid in centroids]
        min_index = distances.index(min(distances))
        clusters[min_index].append(point)
    return clusters

The **computing centroids function** calculates the new centroids by finding the mean of all points within each cluster. For each cluster, it averages the values of each feature across all points, updating the centroid to represent the center of that cluster.

In [34]:
# Computing Centroids Function
def compute_centroids(clusters):
    new_centroids = []
    for cluster in clusters:
        if not cluster:
            continue
        n_features = len(cluster[0])
        mean = [0] * n_features
        for point in cluster:
            for i in range(n_features):
                mean[i] += point[i]
        mean = [val / len(cluster) for val in mean]
        new_centroids.append(mean)
    return new_centroids

The **within cluster sum of squares** is defined as the total squared distance between each point and its assigned cluster centroid. It measures the compactness of the clusters, with smaller values indicating tighter clusters. The code computes this by summing the squared differences for all points in each cluster, relative to the centroid of that cluster:

In [35]:
# Within Cluster Sum of Squares
def wcss(clusters, centroids):
    total = 0
    for i in range(len(clusters)):
        for point in clusters[i]:
            total += sum((point[j] - centroids[i][j])**2 for j in range(len(point)))
    return total

The **k-means algorithm** groups data points into k clusters. It iteratively assigns points to the closest centroids, recalculates the centroids, and computes the within cluster sum of squares until the centroids no longer change or the maximum number of iterations is reached, returning the final within cluster sum of squares value to assess the clustering quality:

In [36]:
# K-Means Algorithm
def kmeans_wcss(X, k=2, max_iter=100):
    centroids = initialize_centroids(X, k)
    for _ in range(max_iter):
        clusters = assign_clusters(X, centroids)
        new_centroids = compute_centroids(clusters)
        if new_centroids == centroids:
            break
        centroids = new_centroids

    labels = [0] * len(X)
    for cluster_index, cluster_points in enumerate(clusters):
        for point in cluster_points:
            for idx, original_point in enumerate(X):
                if original_point == point:
                    labels[idx] = cluster_index
                    break

    return wcss(clusters, centroids), labels, centroids

The **k-means indicator** highlights the chance of the within cluster sum of squares values when the number of centroids is increased. A decreasing value indicates a better allocation of the cases to the centroids:

In [37]:
# K-Means Indicator
def kmeans_indicator(X, max_k=10):
    wcss_values = []
    for k in range(1, max_k + 1):
        wcss_value, _, _ = kmeans_wcss(X, k)
        wcss_values.append(wcss_value)
    return wcss_values

The **application examples** indicate the within cluster sum of squares for each number of clusters. In addition, it indicates the number of clusters within the dataset, which in this case is supposed to be 2 and assigns the labels accordingly. The position of the centroids is highlighted as well:

In [38]:
## Application examples
wcss_list = kmeans_indicator(X, max_k=6)

print("WCSS values for k = 1 to 6:")
for k, w in enumerate(wcss_list, 1):
    print("k =", k, "-> WCSS =", w)

wcss_value, cluster_labels, final_centroids = kmeans_wcss(X, k=2)

print("\nWCSS:", wcss_value)
print("\nCluster Labels:", cluster_labels)
print("\nCentroids:", final_centroids)

WCSS values for k = 1 to 6:
k = 1 -> WCSS = 5162.787999999997
k = 2 -> WCSS = 651.9133333333333
k = 3 -> WCSS = 282.73966666666666
k = 4 -> WCSS = 230.22711111111107
k = 5 -> WCSS = 155.58683333333332
k = 6 -> WCSS = 148.796

WCSS: 651.9133333333333

Cluster Labels: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Centroids: [[10.753333333333334, 73.53333333333333], [7.206666666666667, 49.266666666666666]]


# Machine Learning: Dimensionality Reduction
## Exploratory Factor Analysis

These are 10 variables, based upon 5 variables each for the two personality dimensions extraversion and neuroticism, from the **bfi dataset** by Revelle, W., Wilt, J. and A. Rosenthal (2010): *Individual Differences in Cognition: New Methods for examining the Personality-Cognition Link* via Springer:

In [39]:
# Extraversion (x1:x5) and Neuroticism (x6:x10) of the Big Five Inventory
X = [
    [2, 1, 6, 5, 6, 3, 5, 2, 2, 3],
    [3, 6, 4, 2, 1, 6, 3, 2, 6, 4],
    [1, 3, 2, 5, 4, 3, 3, 4, 2, 3],
    [3, 4, 3, 6, 5, 2, 4, 2, 2, 3],
    [2, 1, 2, 5, 2, 2, 2, 2, 2, 2],
    [2, 2, 4, 6, 6, 4, 4, 4, 6, 6],
    [3, 2, 5, 5, 6, 2, 3, 3, 1, 1],
    [1, 1, 6, 6, 6, 2, 3, 1, 2, 1],
    [2, 4, 4, 2, 6, 3, 3, 5, 3, 2],
    [1, 2, 6, 5, 4, 1, 4, 2, 2, 5],
    [1, 2, 6, 5, 5, 5, 4, 4, 3, 1],
    [1, 2, 4, 5, 5, 3, 2, 4, 1, 2],
    [6, 6, 2, 1, 1, 1, 2, 1, 3, 6],
    [3, 4, 3, 2, 3, 5, 3, 4, 4, 3],
    [6, 6, 3, 2, 2, 2, 2, 2, 4, 1],
    [3, 4, 3, 3, 5, 5, 6, 5, 5, 4],
    [3, 2, 3, 6, 5, 1, 2, 1, 2, 1],
    [4, 3, 4, 4, 4, 2, 2, 3, 3, 3],
    [3, 3, 2, 5, 4, 2, 3, 1, 3, 2],
    [6, 4, 4, 4, 3, 2, 2, 3, 4, 5]
]

Mean centering via a **mean center function**, is often used in data preprocessing to make the dataset more suitable for machine learning algorithms by ensuring all features contribute equally to the model:

In [40]:
# Mean Center Function
def mean_center(X):
    cols = len(X[0])
    rows = len(X)
    means = [sum(X[i][j] for i in range(rows)) / rows for j in range(cols)]
    centered = [[X[i][j] - means[j] for j in range(cols)] for i in range(rows)]
    return centered, means

Again, the **correlation matrix** quantifies the strength and direction of the linear relationship between two variables. This code can be used to correlate several variables and summarize the corresponding values in one matrix:

In [41]:
# Correlation Matrix
def correlation_matrix(X):
    rows = len(X)
    cols = len(X[0])
    corr = [[0]*cols for _ in range(cols)]

    for i in range(cols):
        for j in range(cols):
            xi = [row[i] for row in X]
            xj = [row[j] for row in X]
            num = sum(xi[k] * xj[k] for k in range(rows))
            denom_i = sum(xi[k]**2 for k in range(rows)) ** 0.5
            denom_j = sum(xj[k]**2 for k in range(rows)) ** 0.5
            corr[i][j] = num / (denom_i * denom_j)
    return corr

The **power iteration function** is an algorithm used to compute the dominant eigenvalues and eigenvectors of a matrix. The process involves iteratively applying matrix-vector multiplication to a random initial vector and normalizing it to avoid overflow or underflow, which allows the vector to converge to the eigenvector corresponding to the largest eigenvalue: 

In [42]:
# Power Iteration Function 
def power_iteration(A, num_vectors=2, iterations=100):
    n = len(A)
    eigenvectors = []
    eigenvalues = []

    for _ in range(num_vectors):
        b = [1.0]*n
        for _ in range(iterations):
            # Multiply A * b
            Ab = [sum(A[i][j] * b[j] for j in range(n)) for i in range(n)]
            norm = sum(x**2 for x in Ab) ** 0.5
            b = [x / norm for x in Ab]
        # Rayleigh quotient for eigenvalue
        Ab = [sum(A[i][j] * b[j] for j in range(n)) for i in range(n)]
        eigval = sum(b[i] * Ab[i] for i in range(n))
        eigenvalues.append(eigval)
        eigenvectors.append(b)

        # Deflation
        for i in range(n):
            for j in range(n):
                A[i][j] -= eigval * b[i] * b[j]

    return eigenvalues, eigenvectors

The **factor loadings function** computes the factor loadings based on the correlation matrix, eigenvalues, and eigenvectors. In factor analysis, factor loadings represent the relationships between observed variables and the underlying latent factors.

In [43]:
# Factor Loadings Function
def factor_loadings(corr_matrix, eigenvalues, eigenvectors):
    loadings = []
    for i in range(len(corr_matrix)):
        row = []
        for j in range(len(eigenvectors)):
            loading = eigenvectors[j][i] * (eigenvalues[j] ** 0.5)
            row.append(loading)
        loadings.append(row)
    return loadings

This **application example** shows how to compute the correlation matrix, the eigenvalues as well as the corresponding factor loadings for identifying the underlying factors:

In [44]:
X_centered, means = mean_center(X)
R = correlation_matrix(X_centered)
eigvals, eigvecs = power_iteration([row[:] for row in R], num_vectors=2)
loadings = factor_loadings(R, eigvals, eigvecs)

print("Correlation matrix:")
for row in R:
    print(["{0:.2f}".format(x) for x in row])

print("\nEigenvalues:")
for i, val in enumerate(eigvals):
    print("Factor", i+1, ":", round(val, 3))

print("\nFactor Loadings:")
for i, row in enumerate(loadings):
    print("V" + str(i+1), ":", ["{0:.2f}".format(x) for x in row])

Correlation matrix:
['1.00', '0.70', '-0.41', '-0.56', '-0.57', '-0.26', '-0.39', '-0.26', '0.33', '0.26']
['0.70', '1.00', '-0.46', '-0.82', '-0.65', '0.19', '-0.18', '0.01', '0.54', '0.32']
['-0.41', '-0.46', '1.00', '0.30', '0.51', '0.11', '0.35', '0.06', '-0.16', '-0.12']
['-0.56', '-0.82', '0.30', '1.00', '0.62', '-0.27', '0.16', '-0.17', '-0.45', '-0.25']
['-0.57', '-0.65', '0.51', '0.62', '1.00', '-0.01', '0.47', '0.33', '-0.33', '-0.30']
['-0.26', '0.19', '0.11', '-0.27', '-0.01', '1.00', '0.46', '0.59', '0.62', '0.08']
['-0.39', '-0.18', '0.35', '0.16', '0.47', '0.46', '1.00', '0.34', '0.23', '0.21']
['-0.26', '0.01', '0.06', '-0.17', '0.33', '0.59', '0.34', '1.00', '0.24', '0.08']
['0.33', '0.54', '-0.16', '-0.45', '-0.33', '0.62', '0.23', '0.24', '1.00', '0.51']
['0.26', '0.32', '-0.12', '-0.25', '-0.30', '0.08', '0.21', '0.08', '0.51', '1.00']

Eigenvalues:
Factor 1 : 3.836
Factor 2 : 2.552

Factor Loadings:
V1 : ['0.79', '-0.29']
V2 : ['0.91', '0.11']
V3 : ['-0.59', '0.25'

# Deep Learning
## Pretrained Neural Network (with Weights and Biases from TensorFlow)

These are five variables from the **iris dataset** by Fisher, R. (1936): *The use of multiple measurements in taxonomic problems* via John Wiley & Sons. The four independent variables are based upon length and width of the sepal leaf (x1 and x2) as well as the petal leaf (x3 and x4). All indipendent variables are standardized. Additionaly, the dependent variable differs between versicolor (0) and virginica (1) as different species of iris flowers.

In [45]:
# Standardized independent variables (Xtest) and dichotomized dependent variable (ytrue)
X = [[ 0.81575475, -0.21746808, -0.12904165, -0.65303909],
         [ 0.05761837,  1.59476592,  0.84485761,  1.71304456],
         [ 0.96738203,  0.68864892, -0.00730424, -0.41643072],
         [ 2.02877297,  0.38660992,  2.06223168,  1.00321947],
         [ 1.42226386,  0.99068792,  1.33180724,  0.29339437],
         [ 0.81575475,  0.99068792,  1.21006983,  1.4764362 ],
         [-1.00377258,  0.38660992, -0.49425387, -0.41643072],
         [ 0.05761837, -0.51950708, -0.00730424,  0.29339437],
         [ 0.36087292,  0.38660992,  1.08833242,  1.23982783],
         [ 0.66412748,  0.38660992,  0.35790798,  1.4764362 ],
         [ 0.05761837,  0.08457092,  0.84485761,  0.29339437],
         [-0.70051802, -0.51950708,  0.23617057,  0.53000274],
         [ 0.20924564, -0.21746808,  0.84485761,  1.00321947],
         [-0.24563619,  0.08457092, -0.25077906, -0.65303909],
         [-2.06516352, -1.42562408, -1.95510276, -1.59947255],
         [-1.15539985, -1.42562408, -1.34641572, -1.36286418],
         [ 0.05761837, -1.12358508, -0.00730424, -0.41643072],
         [ 0.20924564,  0.08457092, -0.73772869, -0.88964745],
         [-0.39726347, -0.51950708,  0.23617057, -0.17982236],
         [ 0.5125002 ,  0.08457092, -0.37251647, -0.88964745]]

y = [0,1,0,1,1,1,0,1,1,1,1,1,1,0,0,0,0,0,0,0]

Normally all functions in MicroPython can be coded manually. However, the **math library** is imported here to simplify the execution the exponential function.

In [46]:
# Libraries
import math

Neural networks are based upon neurons and **activation functions** decide whether a neuron should be activated or not. This means that it will decide whether the neuron's input to the neural network is important or not in the process of prediction using simpler mathematical operations like Rectified Linear Unit (ReLU), Leaky Rectified Linear Unit (Leaky ReLU), Hyperbolig Tangent (Tanh), Logistic Regression (Sigmoid) or Softmax (Softmax).

In [47]:
# ReLU
def relu(x):
    y = []
    for i in range(len(x)):
        if x[i] >= 0:
            y.append(x[i])
        else:
            y.append(0)
    return y

# Leaky ReLU
def leaky_relu(x, alpha=0.01):
    p = []
    for i in range(len(x)):
        if x[i] >= 0:
            p.append(x[i])
        else:
            p.append(alpha * x[i])
    return p

# Tanh
def tanh(x):
    t = [(math.exp(x[val]) - math.exp(-x[val])) / (math.exp(x[val]) + math.exp(-x[val])) for val in range(len(x))]
    return t

# Sigmoid
def sigmoid(x):
    z = [1 / (1 + math.exp(-x[val])) for val in range(len(x))]
    return z

# Softmax
def softmax(x):
    max_x = max(x[val])
    exp_x = [math.exp(val - max_x) for val in range(len(x))]
    sum_exp_x = sum(exp_x)
    s = [j / sum_exp_x for j in exp_x]
    return s

A **single neuron** therefore accesses one of the previously defined activation functions and can be defined as follows in MicroPython:

In [48]:
# Single Neuron
def neuron(x, w, b, activation):

    tmp = zero_dim(x[0])

    for i in range(len(x)):
        tmp = add_dim(tmp, [(float(w[i]) * float(x[i][j])) for j in range(len(x[0]))])

    if activation == "sigmoid":
        yp = sigmoid([tmp[i] + b for i in range(len(tmp))])
    elif activation == "relu":
        yp = relu([tmp[i] + b for i in range(len(tmp))])
    elif activation == "leaky_relu":
        yp = relu([tmp[i] + b for i in range(len(tmp))])
    elif activation == "tanh":
        yp = tanh([tmp[i] + b for i in range(len(tmp))])
    elif activation == "softmax":
        yp = tanh([tmp[i] + b for i in range(len(tmp))])
    else:
        print("Function unknown!")

    return yp

In order for the data to be adequately processed by a neural network, a series of data formats such as vectors, matrices and the architecture of neural networks via layers must be defined. These **mathematical basics** of a neural network can be defined as follows:

In [49]:
# Mathematical Basics - I
def zero_dim(x):
    z = [0 for i in range(len(x))]
    return z

# Mathematical Basics - II
def add_dim(x, y):
    z = [x[i] + y[i] for i in range(len(x))]
    return z

# Mathematical Basics - III
def zeros(rows, cols):
    M = []
    while len(M) < rows:
        M.append([])
        while len(M[-1]) < cols:
            M[-1].append(0.0)
    return M

# Mathematical Basics - IV
def transpose(M):
    if not isinstance(M[0], list):
        M = [M]
    rows = len(M)
    cols = len(M[0])
    MT = zeros(cols, rows)
    for i in range(rows):
        for j in range(cols):
            MT[j][i] = M[i][j]
    return MT

# Mathematical Basics - V
def print_matrix(M, decimals=3):
    for row in M:
        print([round(x, decimals) + 0 for x in row])

# Mathematical Basics - VI
def dense(nunit, x, w, b, activation):
    res = []
    for i in range(nunit):
        z = neuron(x, w[i], b[i], activation)
        res.append(z)
    return res

The architecture of a neural network can be reconstructed in MicroPython with the **weights and biases** from a already pretrained deep learning model. They can be transferred from TensorFlow (which is a deep learning library suitable for Python) to MicroPython. The following structure indicates four independent variables (rows) for two neurons (columns) in the input layer with the according weight w1. In addition, the first layer has two accoring biases b1. Therefore, the first hidden layer consists of three neurons with w2 and b2, the second hidden layer consists of two neurons with w3 and b3 and the output layer is a single neuron with w4 and b4. As a result, this neural network consists of a total of eight neurons.

In [50]:
# Include Parameters from TensorFlow
w1 = [[-0.75323504, -0.25906014],
      [-0.46379513, -0.5019245 ],
      [ 2.1273055 ,  1.7724446 ],
      [ 1.1853403 ,  0.88468695]]
b1 = [0.53405946, 0.32578036]
w2 = [[-1.6785783,  2.0158117,  1.2769054],
      [-1.4055765,  0.6828738,  1.5902631]]
b2 = [ 1.18362  , -1.1555661, -1.0966455]
w3 = [[ 0.729278  , -1.0240695 ],
      [-0.80972326,  1.4383037 ],
      [-0.90892404,  1.6760625 ]]
b3 = [0.10695826, 0.01635581]
w4 = [[-0.2019448],
      [ 1.5772797]]
b4 = [-1.2177287]

# Transpose
w1 = transpose(w1)
w2 = transpose(w2)
w3 = transpose(w3)
w4 = transpose(w4)

According to the transferred weights and biases the **architecture of the neural network** can be defined in MicroPython as follows. This specifies the number of neurons within each layer and the activation functions for activating the neurons.

In [51]:
# Neural Network Architecture
yout1 = dense(2, transpose(X), w1, b1, 'relu') # input layer (2 neurons)
yout2 = dense(3, yout1, w2, b2, 'sigmoid') # hidden layer (3 neurons)
yout3 = dense(2, yout2, w3, b3, 'relu') # hidden layer (2 neurons)
ypred = dense(1, yout3, w4, b4,'sigmoid') # output layer (1 neuron)
print(ypred)

[[0.21977697810066976, 0.9762814497719644, 0.21977697810066976, 0.9763437580183316, 0.9752601081958471, 0.9763223651158681, 0.21977697810066976, 0.9324981536051185, 0.9763252950730309, 0.975354442588011, 0.9757636857966179, 0.975140989570476, 0.9762884065437853, 0.21977697810066976, 0.21977697810066976, 0.21977697810066976, 0.7158511603338569, 0.21977697810066976, 0.9567955464832789, 0.21977697810066976]]


A **confusion matrix**, also known as an error matrix, is a table that visualizes the performance of a classification model by comparing its predictions against the actual results. It's a two-dimensional matrix that displays the counts of true positives, true negatives, false positives, and false negatives, providing a detailed view of where a model's predictions are correct and where it's making errors.

In [52]:
# Confusion Matrix Basics
def classification_report(y, ypred):
    TP = TN = FP = FN = 0
    for true, pred in zip(y, ypred):
        if true == pred:
            if true == 1:
                TP += 1
            else:
                TN += 1
        else:
            if true == 1:
                FN += 1
            else:
                FP += 1
    accuracy = (TP + TN) / len(y)
    print("Accuracy: {:.3f}".format(accuracy))
    print("Confusion Matrix:")
    print("TN: {}, FP: {}".format(TN, FP))
    print("FN: {}, TP: {}".format(FN, TP))

The **performance of the pretrained neural network** can be viewed via the following MicroPython code:

In [53]:
# Confusion Matrix
ypred_class = [1 if i > 0.5 else 0 for i in ypred[0]]
print(ypred_class)
print(classification_report(y, ypred_class))

[0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0]
Accuracy: 0.900
Confusion Matrix:
TN: 8, FP: 2
FN: 0, TP: 10
None


## Self-Learning Neural Network

Again, the five variables from the **iris dataset** by Fisher, R. (1936): *The use of multiple measurements in taxonomic problems* via John Wiley & Sons will be used. The four independent variables are based upon length and width of the sepal leaf (x1 and x2) as well as the petal leaf (x3 and x4). All indipendent variables are standardized. Additionaly, the dependent variable differs between versicolor (0) and virginica (1) as different species of iris flowers.

In [54]:
# Standardized independent variables (Xtest) and dichotomized dependent variable (ytrue)
X = [[ 0.81575475, -0.21746808, -0.12904165, -0.65303909],
         [ 0.05761837,  1.59476592,  0.84485761,  1.71304456],
         [ 0.96738203,  0.68864892, -0.00730424, -0.41643072],
         [ 2.02877297,  0.38660992,  2.06223168,  1.00321947],
         [ 1.42226386,  0.99068792,  1.33180724,  0.29339437],
         [ 0.81575475,  0.99068792,  1.21006983,  1.4764362 ],
         [-1.00377258,  0.38660992, -0.49425387, -0.41643072],
         [ 0.05761837, -0.51950708, -0.00730424,  0.29339437],
         [ 0.36087292,  0.38660992,  1.08833242,  1.23982783],
         [ 0.66412748,  0.38660992,  0.35790798,  1.4764362 ],
         [ 0.05761837,  0.08457092,  0.84485761,  0.29339437],
         [-0.70051802, -0.51950708,  0.23617057,  0.53000274],
         [ 0.20924564, -0.21746808,  0.84485761,  1.00321947],
         [-0.24563619,  0.08457092, -0.25077906, -0.65303909],
         [-2.06516352, -1.42562408, -1.95510276, -1.59947255],
         [-1.15539985, -1.42562408, -1.34641572, -1.36286418],
         [ 0.05761837, -1.12358508, -0.00730424, -0.41643072],
         [ 0.20924564,  0.08457092, -0.73772869, -0.88964745],
         [-0.39726347, -0.51950708,  0.23617057, -0.17982236],
         [ 0.5125002 ,  0.08457092, -0.37251647, -0.88964745]]

y = [0,1,0,1,1,1,0,1,1,1,1,1,1,0,0,0,0,0,0,0]

The **random library** and **math library** are imported to simplify the execution of some functions required for self-learning neural networks.

In [55]:
# Libraries
import random
import math

Self-learning neural networks not only require **activation functions**, but also their derivates. The derivative of a function represents its instantaneous rate of change at a specific point. This allows the neural network to be trained.

In [56]:
# Sigmoid
def sigmoid(x):
    return 1 / (1 + math.exp(-x))

# Derivate of Sigmoid
def sigmoid_derivative(output):
    return output * (1 - output)

# ReLU
def relu(x):
    return max(0, x)

# Derivate of ReLU
def relu_derivative(output):
    return 1 if output > 0 else 0

Since the neural network is supposed to learn the weights and biases by itself, the layers and neurons of the neural network will be **initialized with some random values**.

In [57]:
# Function for Initializing Weights and Biases
def init_layer(input_size, output_size):
    weights = [[random.uniform(-0.5, 0.5) for _ in range(input_size)] for _ in range(output_size)]
    biases = [random.uniform(-0.5, 0.5) for _ in range(output_size)]
    return weights, biases

In neural networks, **forward propagation** is the process of passing input data through the network's layers to generate a prediction and **backward propagation**, on the other hand, is the mechanism used to train the network by calculating the error between the prediction and the actual output, and then adjusting the network's weights to minimize that error. This important for the learning ability of a neural network.

In [58]:
# Forward Propagation
def dense_forward(inputs, weights, biases, activation='relu'):
    outputs = []
    pre_activations = []
    for w, b in zip(weights, biases):
        z = sum(i*w_ij for i, w_ij in zip(inputs, w)) + b
        pre_activations.append(z)
        if activation == 'sigmoid':
            outputs.append(sigmoid(z))
        elif activation == 'relu':
            outputs.append(relu(z))
        else:
            raise Exception("Unknown activation")
    return outputs, pre_activations

# Backward Propagation
def dense_backward(inputs, grad_outputs, outputs, pre_activations, weights, biases, activation='relu', lr=0.01):
    input_grads = [0.0 for _ in range(len(inputs))]
    for j in range(len(weights)):
        if activation == 'sigmoid':
            delta = grad_outputs[j] * sigmoid_derivative(outputs[j])
        elif activation == 'relu':
            delta = grad_outputs[j] * relu_derivative(pre_activations[j])
        else:
            raise Exception("Unknown activation")
        for i in range(len(inputs)):
            input_grads[i] += weights[j][i] * delta
            weights[j][i] -= lr * delta * inputs[i]
        biases[j] -= lr * delta
    return input_grads

Furthermore, a **loss function** quantifies the difference between a deep learning model's prediction and the actual outcome, essentially acting as a measure of the model's error. Cross-entropy, a specific type of loss function, is commonly used for classification problems, especially when the model outputs probabilities. 

In [59]:
# Loss Function
def binary_cross_entropy(predicted, target):
    epsilon = 1e-7
    return - (target * math.log(predicted + epsilon) + (1 - target) * math.log(1 - predicted + epsilon))

def binary_cross_entropy_derivative(predicted, target):
    epsilon = 1e-7
    return -(target / (predicted + epsilon)) + (1 - target) / (1 - predicted + epsilon)

This time the **architecture of the neural network** consists of four independent variables which will be forwarded to three neurons in the input layer and one neuron in the output layer. This is a very simple neural network that consists of four neurons in two layers with according weights (w1 and w2) and biases (b1 and b2).

In [60]:
# Initialize Weights and Biases
w1, b1 = init_layer(4, 3)
w2, b2 = init_layer(3, 1)

Finally, the number of **epochs and the learning rate** need to be specified in MicriPython. In neural networks, an epoch represents one complete pass of the entire training dataset through the model. Learning rate determines how much the model's weights are adjusted during each update step in the training process. Both are crucial hyperparameters that influence training and model performance.

In [61]:
# Epochs and Learning Rate for Training
epochs = 100
lr = 0.05

for epoch in range(epochs):
    total_loss = 0
    for xi, yi in zip(X, y):
        # Forward pass
        out1, pre1 = dense_forward(xi, w1, b1, 'relu')
        out2, pre2 = dense_forward(out1, w2, b2, 'sigmoid')
        loss = binary_cross_entropy(out2[0], yi)
        total_loss += loss

        # Backward pass
        dL_dout2 = [binary_cross_entropy_derivative(out2[0], yi)]
        dL_dout1 = dense_backward(out1, dL_dout2, out2, pre2, w2, b2, 'sigmoid', lr)
        _ = dense_backward(xi, dL_dout1, out1, pre1, w1, b1, 'relu', lr)

    if epoch % 10 == 0 or epoch == epochs - 1:
        print(f"Epoch {epoch+1}, Loss: {total_loss:.4f}")

Epoch 1, Loss: 13.6811
Epoch 11, Loss: 5.7964
Epoch 21, Loss: 2.9631
Epoch 31, Loss: 1.8962
Epoch 41, Loss: 1.3641
Epoch 51, Loss: 1.0455
Epoch 61, Loss: 0.8331
Epoch 71, Loss: 0.6832
Epoch 81, Loss: 0.5741
Epoch 91, Loss: 0.4921
Epoch 100, Loss: 0.4341


The **outcome of the neural network** can be predicted with the collowing code in MicroPython:

In [62]:
def predict(x):
    out1, _ = dense_forward(x, w1, b1, 'relu')
    out2, _ = dense_forward(out1, w2, b2, 'sigmoid')
    return 1 if out2[0] > 0.5 else 0

ypred = [predict(xi) for xi in X]

As in the pretrained neural network before, a **confusion matrix** can be used to evaluate the performance of the neural network.

In [63]:
def classification_report(ytrue, ypred):
    TP = TN = FP = FN = 0
    for true, pred in zip(ytrue, ypred):
        if true == pred:
            if true == 1:
                TP += 1
            else:
                TN += 1
        else:
            if true == 1:
                FN += 1
            else:
                FP += 1
    accuracy = (TP + TN) / len(ytrue)
    print("Accuracy: {:.3f}".format(accuracy))
    print("Confusion Matrix:")
    print("TN: {}, FP: {}".format(TN, FP))
    print("FN: {}, TP: {}".format(FN, TP))

Finally, the **performance of the neural network** can be inspected:

In [64]:
# Generate predictions
ypred = [predict(xi) for xi in X]

# Show classification metrics
classification_report(y, ypred)

Accuracy: 1.000
Confusion Matrix:
TN: 10, FP: 0
FN: 0, TP: 10
