diff --git a/PackageInfo.g b/PackageInfo.g
index cca34fa..a83ba77 100644
--- a/PackageInfo.g
+++ b/PackageInfo.g
@@ -10,7 +10,7 @@ SetPackageInfo( rec(
PackageName := "MachineLearningForCAP",
Subtitle := "Exploring categorical machine learning in CAP",
-Version := "2024.07-17",
+Version := "2024.07-20",
Date := (function ( ) if IsBound( GAPInfo.SystemEnvironment.GAP_PKG_RELEASE_DATE ) then return GAPInfo.SystemEnvironment.GAP_PKG_RELEASE_DATE; else return Concatenation( ~.Version{[ 1 .. 4 ]}, "-", ~.Version{[ 6, 7 ]}, "-01" ); fi; end)( ),
License := "GPL-2.0-or-later",
diff --git a/README.md b/README.md
index 6277692..c1f0490 100644
--- a/README.md
+++ b/README.md
@@ -42,3 +42,502 @@ To obtain current versions of all dependencies, `git clone` (or `git pull` to up
[code-img]: https://img.shields.io/badge/-View%20code-blue?logo=github
[code-url]: https://github.com/homalg-project/MachineLearningForCAP#top
+
+### Introduction
+Let $A$ and $B$ be two sets and $\Theta$ be a set of parameters. A parametrized map is a map of the form $f:\Theta\times A \to B$ where $A$ is the domain of the input variables, $\Theta$ the domain of parameters and $B$ the codomain. For each fixed $\theta \in \Theta$, we get the map $f_\theta:A \to B,~~a \mapsto f_\theta(a):=f(\theta,a)$.
+
+In machine learning, both predictions maps and loss maps can be seen as parametrized maps, any they play distinct but complementary roles in the model training process. Let us break down how each of these fits into the concept of parametrized maps.
+
+Let $\mathcal{D}$ be the **training set** containing $N$ _labeled examples_: $\mathcal{D}=$ { $(x^{[i]},y^{[i]})|i=1,2,\dots,N$ } $\subset X\times Y$.
+- $x^{[i]}$ is the input-vector (feature vector) and it belongs to the input-space $X$.
+- $y^{[i]}$ is the label-vector of $x^{[i]}$ and it belongs to the output-space $Y$.
+
+A **prediction map** can be defined as a parametrized map $f:\Theta \times X \to Y$. For a given and parameters-vector $\theta\in\Theta$ and an input-vector $x\in X$, the prediction is $\hat{y}=f_\theta(x)\in Y$.
+
+For example let $\mathcal{D}=$ { $(1,2.9), (2,5.1), (3,7.05)$ } $\subset \mathbb{R}^1\times \mathbb{R}^1 \simeq \mathbb{R}^2$. For a linear regression model, the prediction might be defined as follows:
+
+$$f:\mathbb{R}^2\times \mathbb{R}^1 \to \mathbb{R}^1;~~~(\theta,x) \mapsto \theta_1 x + \theta_2$$
+
+where $\theta=(\theta_1,\theta_2)\in \mathbb{R}^2$ is a parameter-vector. For instance, if $\theta=(2,1)$, then the prediction associated to the input $x=2$ is $\hat{y}=2\cdot 2 + 1=5$.
+
+A **loss map** can be defined as a parametrized map $\ell:\Theta \times X \times Y \to \mathbb{R}$. It quantifies the discrepancy between the predicted output $\hat{y}=f_\theta(x) \in Y$ and the true output $y\in Y$. Given a training set, the **total loss** map is defined by
+
+$$\mathcal{L}:\Theta \to \mathbb{R};~~~\theta \mapsto \frac{1}{N}\sum_{i=1}^{N}\ell(\theta,x^{[i]},y^{[i]})$$
+
+where $N=|\mathcal{D}|$ (the number of labeled examples in the training set).
+
+For a linear regression model, the loss map can be defined as follows:
+
+$$\ell:\mathbb{R}^2\times \mathbb{R}^1 \times \mathbb{R}^1 \to \mathbb{R}, (\theta,x,y) \mapsto (\hat{y} - y)^2=(\theta_1 x + \theta_2 - y)^2$$
+
+where $\theta=(\theta_1,\theta_2)\in \mathbb{R}^2$ is a parameter-vector. For instance, if $\theta=(2,1)$, then the total loss on the above training set $\mathcal{D}$ is $\frac{(2\cdot 1 + 1-2.9)^2+(2\cdot 2 + 1-5.1)^2+(2\cdot 3 + 1-7.05)^2}{3}=0.0074$.
+
+Given a training set $\mathcal{D}\subset X\times Y$ and a loss map $\ell:\Theta \times X \times Y \to \mathbb{R}$, we would like to "learn" a parameter-vector $\theta\in\Theta$ such that the total loss $\mathcal{L}(\theta)$ is as small as possible. We illustrate the learning procedure via the next two examples.
+
+## Example 1: Neural Network to Perform Linear-Regression
+
+In this example, we consider a training dataset consisting of the three points $\mathcal{D}=$ { $( 1, 2.9 ), ( 2, 5.1 ), ( 3, 7.05 )$ } $\subset\mathbb{R}^2$.
+
+
+

+
+
+We aim to compute a line that fits $\mathcal{D}$.
+```julia
+gap> LoadPackage( "MachineLearningForCAP" );
+true
+
+gap> Para := CategoryOfParametrisedMorphisms( SkeletalSmoothMaps );
+CategoryOfParametrisedMorphisms( SkeletalSmoothMaps )
+
+gap> D := [ [ 1, 2.9 ], [ 2, 5.1 ], [ 3, 7.05 ] ];;
+```
+Let us create a neural network with the following architecture:
+
+
+

+
+
+where the activation map applied on the output layer is the identity function _IdFunc_. Its input dimension is 1 and output dimension is 1 and has no hidden layers.
+
+```julia
+gap> input_dim := 1;; hidden_dims := [ ];; output_dim := 1;;
+
+gap> f := PredictionMorphismOfNeuralNetwork( Para, input_dim, hidden_dims, output_dim, "IdFunc" );;
+```
+As a parametrized map this neural network is defined as:
+
+%5Cmapsto%5Cbegin%7Bbmatrix%7Dx&1%5Cend%7Bbmatrix%7D%5Ccdot%5Cbegin%7Bbmatrix%7D%5Ctheta_%7B1%7D%5C%5C%5Ctheta_2%5Cend%7Bbmatrix%7D=%5Ctheta_1%20x+%5Ctheta_2%20)
+
+Note that $(\theta_1,\theta_2)$ represents the parameters-vector while $(x)$ represents the input-vector. Hence, the above output is an affine transformation of $(x)\in \mathbb{R}^1$.
+```julia
+gap> input := ConvertToExpressions( [ "theta_1", "theta_2", "x" ] );
+[ theta_1, theta_2, x ]
+
+gap> Display( f : dummy_input := input );
+ℝ^1 -> ℝ^1 defined by:
+
+Underlying Object:
+-----------------
+ℝ^2
+
+Underlying Morphism:
+-------------------
+ℝ^3 -> ℝ^1
+
+‣ theta_1 * x + theta_2
+```
+Let us now evaluate this prediction map on a random parameters-vector in $\mathbb{R}^2$ and a random input-vector in $\mathbb{R}^1$.
+
+```julia
+gap> theta := [ 2, 1 ];; x := [ 2 ];;
+
+gap> Eval( f, [ theta, x ] );
+[ 5 ]
+```
+
+To train the neural network, we need to specify a loss map that will be used to learn the weights by minimizing the total loss. Since the activation map applied on the output layer is _IdFunc_, we use the _Quadratic-Loss_ map:
+
+$$
+\ell:\mathbb{R}^2\times \mathbb{R}^1 \times \mathbb{R}^1 \to \mathbb{R},~~ (\theta_1,\theta_2,x,y) \mapsto \text{Quadratic-Loss}\left( f((\theta_1, \theta_2,x)), (y) \right) = (\theta_1 x + \theta_2 - y)^2
+$$
+
+Note that $(\theta_1,\theta_2)$ represents the parameters-vector while $(x,y)$ represents an example in the training set. The loss map $\ell$ quantifies the discrepancy between the predicted vector $f((\theta_1, \theta_2,x)) \in \mathbb{R}^1$ and the true output $(y) \in \mathbb{R}^1$.
+
+In the following we construct the aforementioned loss-map:
+
+```julia
+gap> ell := LossMorphismOfNeuralNetwork( Para, input_dim, hidden_dims, output_dim, "IdFunc" );;
+
+gap> input := ConvertToExpressions( [ "theta_1", "theta_2", "x", "y" ] );
+[ theta_1, theta_2, x, y ]
+
+gap> Display( ell : dummy_input := input );
+ℝ^2 -> ℝ^1 defined by:
+
+Underlying Object:
+-----------------
+ℝ^2
+
+Underlying Morphism:
+-------------------
+ℝ^4 -> ℝ^1
+
+‣ (theta_1 * x + theta_2 - y) ^ 2 / 1
+```
+In order to learn the parameters we need to specifiy an optimization procedure. In this example, we will use Gradient-Descent-Optimizer. Starting with initial values, it computes the gradient of the loss function and updates the parameters in the opposite direction of the gradient, scaled by a learning rate. This process continues until the loss function converges to a minimum.
+
+```julia
+gap> Lenses := CategoryOfLenses( SkeletalSmoothMaps );
+CategoryOfLenses( SkeletalSmoothMaps )
+
+gap> optimizer := Lenses.GradientDescentOptimizer( : learning_rate := 0.01 );;
+```
+
+Now we compute the One-Epoch-Update-Lens using the _batch size_ = 1:
+
+```julia
+gap> batch_size := 1;;
+
+gap> one_epoch_update := OneEpochUpdateLens( ell, optimizer, D, batch_size );
+(ℝ^2, ℝ^2) -> (ℝ^1, ℝ^0) defined by:
+
+Get Morphism:
+----------
+ℝ^2 -> ℝ^1
+
+Put Morphism:
+----------
+ℝ^2 -> ℝ^2
+```
+The _Get Morphism_ computes the total loss associated to a parameter-vector $\theta \in \mathbb{R}^2$ and _Put Morphism_ updates the extended parameter-vector.
+
+Let us initialize a parameter-vector:
+
+```julia
+gap> theta := [ 0.1, -0.1 ];;
+```
+
+To perform _nr_epochs_ = 15 updates on $\theta\in\mathbb{R}^2$ we can use the _Fit_ operation:
+```julia
+gap> nr_epochs := 10;;
+
+gap> theta := Fit( one_epoch_update, nr_epochs, theta );
+Epoch 0/15 - loss = 26.777499999999993
+Epoch 1/15 - loss = 13.002145872163197
+Epoch 2/15 - loss = 6.3171942035316935
+Epoch 3/15 - loss = 3.0722513061917534
+Epoch 4/15 - loss = 1.4965356389126505
+Epoch 5/15 - loss = 0.73097379078374358
+Epoch 6/15 - loss = 0.35874171019291579
+Epoch 7/15 - loss = 0.1775574969062125
+Epoch 8/15 - loss = 0.089228700384937534
+Epoch 9/15 - loss = 0.046072054531129378
+Epoch 10/15 - loss = 0.024919378473509772
+Epoch 11/15 - loss = 0.014504998499450883
+Epoch 12/15 - loss = 0.0093448161379050161
+Epoch 13/15 - loss = 0.0067649700132868147
+Epoch 14/15 - loss = 0.0054588596501628835
+Epoch 15/15 - loss = 0.0047859930295160499
+[ 2.08995, 0.802632 ]
+```
+
+The parameter-vector after 15 epochs is $\theta=[2.08995, 0.802632]$. That is, the prediction map is $f_{\theta}:\mathbb{R}^1 \to \mathbb{R}^1,~~x \mapsto 2.08995x + 0.802632$.
+
+```julia
+gap> theta := SkeletalSmoothMaps.Constant( theta );
+ℝ^0 -> ℝ^2
+
+gap> f_theta := ReparametriseMorphism( f, theta );
+ℝ^1 -> ℝ^1 defined by:
+
+Underlying Object:
+-----------------
+ℝ^0
+
+Underlying Morphism:
+-------------------
+ℝ^1 -> ℝ^1
+
+gap> f_theta := UnderlyingMorphism( f_theta );;
+
+gap> Display( f_theta );
+ℝ^1 -> ℝ^1
+
+‣ 2.08995 * x1 + 0.802632
+```
+
+Let us compute the predicted values associated to $x\in$ {1,2,3}.
+```julia
+gap> Eval( f_theta, [ 1 ] );
+[ 2.89259 ]
+
+gap> Eval( f_theta, [ 2 ] );
+[ 4.98254 ]
+
+gap> Eval( f_theta, [ 3 ] );
+[ 7.07249 ]
+```
+
+The following image illustrates the lines defined by the parameter-vectors over the course of the 15 epochs.
+
+

+
+
+## Example 2: Neural Network to Perform Multi-Class Classification
+In this example, we consider a training dataset consisting of points in the two-dimensional Euclidean space, $\mathbb{R}^2$. Each point in this space is represented by a pair of real numbers (coordinates). The goal is to classify these points into one of three distinct categories.
+
+
+

+
+
+To facilitate this classification, we use a one-hot encoding scheme for the labels. There are three possible classes for the data points. We denote these classes as _class-1_ (red), _class-2_ (green), and _class-3_ (blue). The labels are encoded using one-hot vectors. A one-hot vector is a binary vector with a length equal to the number of classes, where only the element corresponding to the true class is 1, and all other elements are 0. That is
+ - _class-1_ : [1, 0, 0]
+ - _class-2_ : [0, 1, 0]
+ - _class-3_ : [0, 0, 1]
+
+That is, the training set (the set of labeled exmaples) is a finite subset of $\mathbb{R}^2 \times \mathbb{R}^3 \simeq \mathbb{R}^5$.
+
+```julia
+gap> LoadPackage( "MachineLearningForCAP" );
+true
+
+gap> Para := CategoryOfParametrisedMorphisms( SkeletalSmoothMaps );
+CategoryOfParametrisedMorphisms( SkeletalSmoothMaps )
+
+gap> class_1 := Concatenation( List( [ -2 .. 3 ], i -> [ [ i, i - 1, 1, 0, 0 ], [ i + 1, i - 1, 1, 0, 0 ] ] ) );;
+
+gap> class_2 := Concatenation( List( [ -3 .. -1 ], i -> List( [ i + 1 .. - i - 1 ], j -> [ i, j, 0, 1, 0 ] ) ) );;
+
+gap> class_3 := Concatenation( List( [ 1 .. 3 ], i -> List( [ - i + 1 .. i - 1 ], j -> [ j, i, 0, 0, 1 ] ) ) );;
+
+gap> D := Concatenation( class_1, class_2, class_3 );
+[ [ -2, -3, 1, 0, 0 ], [ -1, -3, 1, 0, 0 ], [ -1, -2, 1, 0, 0 ], [ 0, -2, 1, 0, 0 ], [ 0, -1, 1, 0, 0 ],
+ [ 1, -1, 1, 0, 0 ], [ 1, 0, 1, 0, 0 ], [ 2, 0, 1, 0, 0 ], [ 2, 1, 1, 0, 0 ], [ 3, 1, 1, 0, 0 ],
+ [ 3, 2, 1, 0, 0 ], [ 4, 2, 1, 0, 0 ], [ -3, -2, 0, 1, 0 ], [ -3, -1, 0, 1, 0 ], [ -3, 0, 0, 1, 0 ],
+ [ -3, 1, 0, 1, 0 ], [ -3, 2, 0, 1, 0 ], [ -2, -1, 0, 1, 0 ], [ -2, 0, 0, 1, 0 ], [ -2, 1, 0, 1, 0 ],
+ [ -1, 0, 0, 1, 0 ], [ 0, 1, 0, 0, 1 ], [ -1, 2, 0, 0, 1 ], [ 0, 2, 0, 0, 1 ], [ 1, 2, 0, 0, 1 ],
+ [ -2, 3, 0, 0, 1 ], [ -1, 3, 0, 0, 1 ], [ 0, 3, 0, 0, 1 ], [ 1, 3, 0, 0, 1 ], [ 2, 3, 0, 0, 1 ] ]
+
+```
+
+Let us create a neural network with the following architecture:
+
+

+
+
+where the activation map applied on the output layer is _Softmax_.
+
+Its input dimension is 2 and output dimension is 3 and has no hidden layers.
+```julia
+gap> input_dim := 2;; hidden_dims := [ ];; output_dim := 3;;
+
+gap> f := PredictionMorphismOfNeuralNetwork( Para, input_dim, hidden_dims, output_dim, "Softmax" );;
+```
+
+As a parametrized map this neural network is defined as:
+
+%5Cmapsto%5Ctext%7BSoftmax%7D%5Cleft(%5Cbegin%7Bbmatrix%7Dx_%7B1%7D&x_%7B2%7D&1%5Cend%7Bbmatrix%7D%5Ccdot%5Cbegin%7Bbmatrix%7D%5Ctheta_%7B1%7D&%5Ctheta_%7B4%7D&%5Ctheta_%7B7%7D%5C%5C%5Ctheta_%7B2%7D&%5Ctheta_%7B5%7D&%5Ctheta_%7B8%7D%5C%5C%5Ctheta_%7B3%7D&%5Ctheta_%7B6%7D&%5Ctheta_%7B9%7D%5Cend%7Bbmatrix%7D%5Cright))
+
+Note that $(\theta_1,\dots,\theta_9)$ represents the parameters-vector while $(x_{1},x_{2})$ represents the input-vector. Hence, the above output is the _Softmax_ of an affine transformation of $(x_{1},x_{2})$.
+```julia
+gap> input := ConvertToExpressions( [ "theta_1", "theta_2", "theta_3", "theta_4", "theta_5", "theta_6", "theta_7", "theta_8", "theta_9", "x1", "x2" ] );
+[ theta_1, theta_2, theta_3, theta_4, theta_5, theta_6, theta_7, theta_8, theta_9, x1, x2 ]
+
+gap> Display( f : dummy_input := input );
+ℝ^2 -> ℝ^3 defined by:
+
+Underlying Object:
+-----------------
+ℝ^9
+
+Underlying Morphism:
+-------------------
+ℝ^11 -> ℝ^3
+
+‣ Exp( (theta_1 * x1 + theta_2 * x2 + theta_3) )
+ / (Exp( theta_1 * x1 + theta_2 * x2 + theta_3 ) + Exp( (theta_4 * x1 + theta_5 * x2 + theta_6) ) + Exp( (theta_7 * x1 + theta_8 * x2 + theta_9) ))
+‣ Exp( (theta_4 * x1 + theta_5 * x2 + theta_6) )
+ / (Exp( theta_1 * x1 + theta_2 * x2 + theta_3 ) + Exp( (theta_4 * x1 + theta_5 * x2 + theta_6) ) + Exp( (theta_7 * x1 + theta_8 * x2 + theta_9) ))
+‣ Exp( (theta_7 * x1 + theta_8 * x2 + theta_9) )
+ / (Exp( theta_1 * x1 + theta_2 * x2 + theta_3 ) + Exp( (theta_4 * x1 + theta_5 * x2 + theta_6) ) + Exp( (theta_7 * x1 + theta_8 * x2 + theta_9) ))
+```
+Let us now evaluate this prediction map on a random parameters-vector in $\mathbb{R}^9$ and a random input-vector in $\mathbb{R}^2$.
+
+```julia
+gap> theta := [ 0.1, -0.1, 0, 0.1, 0.2, 0, -0.2, 0.3, 0 ];;
+gap> input_vec := [ 1, 2 ];;
+
+gap> prediction_vec := Eval( f, [ theta, input_vec ] );
+[ 0.223672, 0.407556, 0.368772 ]
+
+gap> PositionMaximum( prediction_vec );
+2
+```
+That is, the input-vector $x=[1, 2]$ is predicted to belong to _class-2_ (which is of course incorrect as the neural network has not been yet trained).
+
+To train the neural network, we need to specify a loss map that will be used to learn the weights by minimizing the total loss. Since the activation map applied on the output layer is _Softmax_, we use the _Cross-Entropy_ loss map:
+
+$$
+\ell:\mathbb{R}^9\times \mathbb{R}^2 \times \mathbb{R}^3 \to \mathbb{R},~~ (x_1,\dots,x_{14}) \mapsto \text{Cross-Entroy}\left( f((\theta_1, \dots, \theta_9,x_{1},x_{2})), (y_{1}, y_{2}, y_{3}) \right)
+$$
+
+Note that $(\theta_1,\dots,\theta_9)$ represents the parameters-vector while $(x_{1},x_{2},y_{1},y_{2},y_{3})$ represents an example in the training set. The loss map $\ell$ quantifies the discrepancy between the predicted probabilities vector $f((\theta_1, \dots, \theta_9,x_{1},x_{2})) \in \mathbb{R}^3$ and the true label $(y_{1}, y_{2}, y_{3}) \in$ { $(1,0,0),(0,1,0),(0,0,1)$ } $\in \mathbb{R}^3$.
+
+More explicitely, if $(\hat{y}_1,\hat{y}_2,\hat{y}_3) := f((\theta_1, \dots, \theta_9,x_{1},x_{2})) \in \mathbb{R}^3$, then
+$$
+\text{Cross-Entroy}((\hat{y}_1,\hat{y}_1,\hat{y}_1),(y_{1},y_{2},y_{3})) := -\frac{1}{3}\left(y_1\log(\hat{y}_1)+y_2\log(\hat{y}_2)+y_3\log(\hat{y}_3)\right)
+$$
+
+In the following we construct the aforementioned loss-map:
+
+```julia
+gap> ell := LossMorphismOfNeuralNetwork( Para, input_dim, hidden_dims, output_dim, "Softmax" );;
+
+gap> input := ConvertToExpressions( [ "theta_1", "theta_2", "theta_3", "theta_4", "theta_5", "theta_6", "theta_7", "theta_8", "theta_9", "x1", "x2", "y1", "y2", "y3" ] );
+
+gap> Display( ell : dummy_input := input );
+ℝ^5 -> ℝ^1 defined by:
+
+Underlying Object:
+-----------------
+ℝ^9
+
+Underlying Morphism:
+-------------------
+ℝ^14 -> ℝ^1
+
+‣ ((Log( Exp( theta_1 * x1 + theta_2 * x2 + theta_3 ) + Exp( (theta_4 * x1 + theta_5 * x2 + theta_6) ) + Exp( (theta_7 * x1 + theta_8 * x2 + theta_9) ) ) - (theta_1 * x1 + theta_2 * x2 + theta_3)) * y1
+ + (Log( Exp( theta_1 * x1 + theta_2 * x2 + theta_3 ) + Exp( (theta_4 * x1 + theta_5 * x2 + theta_6) ) + Exp( (theta_7 * x1 + theta_8 * x2 + theta_9) ) ) - (theta_4 * x1 + theta_5 * x2 + theta_6)) * y2
+ + (Log( Exp( theta_1 * x1 + theta_2 * x2 + theta_3 ) + Exp( (theta_4 * x1 + theta_5 * x2 + theta_6) ) + Exp( (theta_7 * x1 + theta_8 * x2 + theta_9) ) ) - (theta_7 * x1 + theta_8 * x2 + theta_9)) * y3) / 3
+```
+Until now, we have the training set $\mathcal{D} \subseteq \mathbb{R}^2\times \mathbb{R}^3 \simeq \mathbb{R}^5$ and a loss-map $\ell:\mathbb{R}^9\times \mathbb{R}^2 \to \mathbb{R}$. In order to learn the parameters we need to specifiy an optimization procedure.
+
+In this example, we want to use the Adam-Optimizer procedure. The Adam-Optimizer is an extension of stochastic gradient descent that computes adaptive learning rates for each parameter. When using the Adam-Optimizer, additional auxiliary weights are maintained for each parameter to store moment estimates. Specifically, for a model with $n$ parameters, the Adam-Optimizer maintains:
+
+ - Time Step ($\mathbf{t}$): one auxiliary weight.
+ - First Moment Estimates ($\mathbf{m}$): $n$ auxiliary weights.
+ - Second Moment Estimates ($\mathbf{v}$): $n$ auxiliary weights.
+ - Original Model Weights: $n$ parameters.
+
+So, in total, the Adam-Optimizer uses $1+3n$ weights. Since our network has $9$ parameters, the Adam-Optimizer uses $1+3\cdot 9=28$ weights: the first $1+2\cdot 9=19$ weights are auxiliary weights and the last $9$ weights are the original model weights.
+
+```julia
+gap> Lenses := CategoryOfLenses( SkeletalSmoothMaps );
+CategoryOfLenses( SkeletalSmoothMaps )
+
+gap> optimizer := Lenses.AdamOptimizer( : learning_rate := 0.01, beta_1 := 0.9, beta_2 := 0.999 );;
+
+gap> optimizer( 9 )
+(ℝ^28, ℝ^28) -> (ℝ^9, ℝ^9) defined by:
+
+Get Morphism:
+----------
+ℝ^28 -> ℝ^9
+
+Put Morphism:
+----------
+ℝ^37 -> ℝ^28
+```
+
+Now we compute the One-Epoch-Update-Lens using the _batch size_ = 1:
+
+```julia
+gap> batch_size := 1;;
+
+gap> one_epoch_update := OneEpochUpdateLens( ell, optimizer, D, batch_size );;
+(ℝ^28, ℝ^28) -> (ℝ^1, ℝ^0) defined by:
+
+Get Morphism:
+----------
+ℝ^28 -> ℝ^1
+
+Put Morphism:
+----------
+ℝ^28 -> ℝ^28
+```
+The _Get Morphism_ computes the total loss associated to the extended parameter-vector and _Put Morphism_ updates the extended parameter-vector.
+
+
+Let us initialize a parameter-vector:
+
+```julia
+gap> t := [ 1 ];; # one entry
+gap> m := [ 0, 0, 0, 0, 0, 0, 0, 0, 0 ]; # 9 entries
+gap> v := [ 0, 0, 0, 0, 0, 0, 0, 0, 0 ]; # 9 entries
+gap> theta := [ 0.1, -0.1, 0, 0.1, 0.2, 0, -0.2, 0.3, 0 ];; # 9 entries
+
+gap> w := Concatenation( t, m, v, theta );;
+```
+
+The total loss associated to the extended parameter-vector $w$ we can use the "Get Morphism":
+
+```julia
+gap> current_loss := Eval( GetMorphism( one_epoch_update ), w );
+[ 0.345836 ]
+```
+
+And to update the extended parameter-vector (one-epoch update) we use the "Put Morphism":
+
+```julia
+gap> new_w := Eval( PutMorphism( one_epoch_update ), w );
+[ 31, 0.104642, -0.355463, -0.197135, -0.109428, -0.147082, 0.00992963,
+ 0.00478679, 0.502546, 0.187206, 0.0105493, 0.00642903, 0.00211548,
+ 0.00660062, 0.00274907, 0.00110985, 0.00278786, 0.0065483, 0.00112838,
+ 5.45195, -1.26208, 3.82578, -5.40639, -0.952146, -3.42835, -2.79496, 3.09008, -6.80672 ]
+```
+
+To perform _nr_epochs_ = 4 updates on $w$ we can use the _Fit_ operation:
+
+```julia
+gap> nr_epochs := 4;;
+
+gap> w := Fit( one_epoch_update, nr_epochs, w );
+Epoch 0/4 - loss = 0.34583622811001763
+Epoch 1/4 - loss = 0.6449437167091393
+Epoch 2/4 - loss = 0.023811108587716449
+Epoch 3/4 - loss = 0.0036371652708073405
+Epoch 4/4 - loss = 0.0030655216725219204
+[ 121, -4.57215e-05, -0.00190417, -0.0014116, -0.00181528, 0.00108949, 0.00065518, 0.001861, 0.000814679,
+ 0.000756424, 0.0104885, 0.00846858, 0.0022682, 0.00784643, 0.00551702, 0.0014626, 0.00351408, 0.00640225,
+ 0.00115053, 5.09137, -4.83379, 3.06257, -5.70976, 0.837175, -4.23622, -1.71171, 5.54301, -4.80856 ]
+```
+
+Now let us use the updated theta (is the last $9$ entries) to predict the label $\in$ {_class-1_, _class-2_, _class-3_} of the point $[1,-1]\in\mathbb{R}^2$.
+
+```julia
+gap> theta := SplitDenseList( w, [ 19, 9 ] )[2];
+[ 5.09137, -4.83379, 3.06257, -5.70976, 0.837175, -4.23622, -1.71171, 5.54301, -4.80856 ]
+
+gap> theta := SkeletalSmoothMaps.Constant( theta );
+ℝ^0 -> ℝ^9
+
+gap> f_theta := ReparametriseMorphism( f, theta );
+ℝ^2 -> ℝ^3 defined by:
+
+Underlying Object:
+-----------------
+ℝ^0
+
+Underlying Morphism:
+-------------------
+ℝ^2 -> ℝ^3
+
+gap> f_theta := UnderlyingMorphism( f_theta );;
+
+gap> Display( f_theta );
+ℝ^2 -> ℝ^3
+
+‣ Exp( (5.09137 * x1 + (- 4.83379) * x2 + 3.06257) ) /
+ (Exp( 5.09137 * x1 + (- 4.83379) * x2 + 3.06257 ) + Exp( ((- 5.70976) * x1 + 0.837175 * x2 + (- 4.23622)) ) + Exp( ((- 1.71171) * x1 + 5.54301 * x2 + (- 4.80856)) ))
+‣ Exp( ((- 5.70976) * x1 + 0.837175 * x2 + (- 4.23622)) ) /
+ (Exp( 5.09137 * x1 + (- 4.83379) * x2 + 3.06257 ) + Exp( ((- 5.70976) * x1 + 0.837175 * x2 + (- 4.23622)) ) + Exp( ((- 1.71171) * x1 + 5.54301 * x2 + (- 4.80856)) ))
+‣ Exp( ((- 1.71171) * x1 + 5.54301 * x2 + (- 4.80856)) ) /
+ (Exp( 5.09137 * x1 + (- 4.83379) * x2 + 3.06257 ) + Exp( ((- 5.70976) * x1 + 0.837175 * x2 + (- 4.23622)) ) + Exp( ((- 1.71171) * x1 + 5.54301 * x2 + (- 4.80856)) ))
+
+gap> input_vec := [ 1, -1 ];;
+
+gap> predictions_vec := Eval( f_theta, input_vec );
+[ 1., 4.74723e-11, 1.31974e-11 ]
+
+gap> PositionMaximum( predictions_vec );
+1
+```
+
+That is, the predicted label of the input-vector $[1, -1]$ is _class-1_ which is indeed correct.
+
+```julia
+gap> predictions_vec := Eval( f_theta, [ 1, 3 ] );
+[ 7.13122e-08, 2.40484e-08, 1. ]
+
+gap> PositionMaximum( predictions_vec );
+3
+```
+
+That is, the predicted label of the input-vector $[1, 3]$ is _class-3_ which is indeed correct.
+
+The following image illustrates the predictions of the points {$(0.5i,0.5j)|i,j\in${$-6,-5,\dots,5,6$}} over the course of the $4$ epochs.
+
+
+

+
\ No newline at end of file
diff --git a/examples/CategoryOfParametrisedMorphisms.g b/examples/CategoryOfParametrisedMorphisms.g
index 76dc979..259bc8c 100644
--- a/examples/CategoryOfParametrisedMorphisms.g
+++ b/examples/CategoryOfParametrisedMorphisms.g
@@ -26,22 +26,22 @@ f := Smooth.Softmax( 3 );
f := MorphismConstructor( Para, R1 / Para, [ R2, f ], R3 / Para );
#! ℝ^1 -> ℝ^3 defined by:
#!
-#! Parameter Object:
+#! Underlying Object:
#! -----------------
#! ℝ^2
#!
-#! Parametrised Morphism:
-#! ----------------------
+#! Underlying Morphism:
+#! -------------------
#! ℝ^3 -> ℝ^3
Display( f );
#! ℝ^1 -> ℝ^3 defined by:
#!
-#! Parameter Object:
+#! Underlying Object:
#! -----------------
#! ℝ^2
#!
-#! Parametrised Morphism:
-#! ----------------------
+#! Underlying Morphism:
+#! -------------------
#! ℝ^3 -> ℝ^3
#!
#! ‣ Exp( x1 ) / (Exp( x1 ) + Exp( x2 ) + Exp( x3 ))
@@ -59,22 +59,22 @@ Display( r );
g := ReparametriseMorphism( f, r );
#! ℝ^1 -> ℝ^3 defined by:
#!
-#! Parameter Object:
+#! Underlying Object:
#! -----------------
#! ℝ^2
#!
-#! Parametrised Morphism:
-#! ----------------------
+#! Underlying Morphism:
+#! -------------------
#! ℝ^3 -> ℝ^3
Display( g );
#! ℝ^1 -> ℝ^3 defined by:
#!
-#! Parameter Object:
+#! Underlying Object:
#! -----------------
#! ℝ^2
#!
-#! Parametrised Morphism:
-#! ----------------------
+#! Underlying Morphism:
+#! -------------------
#! ℝ^3 -> ℝ^3
#!
#! ‣ Exp( Sqrt( x1 ) ) / (Exp( Sqrt( x1 ) ) + Exp( Cos( x2 ) ) + Exp( x3 ))
@@ -83,32 +83,32 @@ Display( g );
l := Para.AffineTransformation( 3, 2 );
#! ℝ^3 -> ℝ^2 defined by:
#!
-#! Parameter Object:
+#! Underlying Object:
#! -----------------
#! ℝ^8
#!
-#! Parametrised Morphism:
-#! ----------------------
+#! Underlying Morphism:
+#! -------------------
#! ℝ^11 -> ℝ^2
h := PreCompose( g, l );
#! ℝ^1 -> ℝ^2 defined by:
#!
-#! Parameter Object:
+#! Underlying Object:
#! -----------------
#! ℝ^10
#!
-#! Parametrised Morphism:
-#! ----------------------
+#! Underlying Morphism:
+#! -------------------
#! ℝ^11 -> ℝ^2
Display( h );
#! ℝ^1 -> ℝ^2 defined by:
#!
-#! Parameter Object:
+#! Underlying Object:
#! -----------------
#! ℝ^10
#!
-#! Parametrised Morphism:
-#! ----------------------
+#! Underlying Morphism:
+#! -------------------
#! ℝ^11 -> ℝ^2
#!
#! ‣ x1 * (Exp( Sqrt( x9 ) ) / (Exp( Sqrt( x9 ) ) + Exp( Cos( x10 ) ) + Exp( x11 )))
@@ -123,22 +123,22 @@ r := Smooth.Constant( constants );
t := ReparametriseMorphism( h, r );
#! ℝ^1 -> ℝ^2 defined by:
#!
-#! Parameter Object:
+#! Underlying Object:
#! -----------------
#! ℝ^0
#!
-#! Parametrised Morphism:
-#! ----------------------
+#! Underlying Morphism:
+#! -------------------
#! ℝ^1 -> ℝ^2
Display( t );
#! ℝ^1 -> ℝ^2 defined by:
#!
-#! Parameter Object:
+#! Underlying Object:
#! -----------------
#! ℝ^0
#!
-#! Parametrised Morphism:
-#! ----------------------
+#! Underlying Morphism:
+#! -------------------
#! ℝ^1 -> ℝ^2
#!
#! ‣ 0.91 * (2.39116 / (5.10727 + Exp( x1 ))) + 0.24 * (2.71611 / (5.10727 + Exp( x1 )))
@@ -148,22 +148,22 @@ Display( t );
s := SimplifyMorphism( t, infinity );
#! ℝ^1 -> ℝ^2 defined by:
#!
-#! Parameter Object:
+#! Underlying Object:
#! -----------------
#! ℝ^0
#!
-#! Parametrised Morphism:
-#! ----------------------
+#! Underlying Morphism:
+#! -------------------
#! ℝ^1 -> ℝ^2
Display( s );
#! ℝ^1 -> ℝ^2 defined by:
#!
-#! Parameter Object:
+#! Underlying Object:
#! -----------------
#! ℝ^0
#!
-#! Parametrised Morphism:
-#! ----------------------
+#! Underlying Morphism:
+#! -------------------
#! ℝ^1 -> ℝ^2
#!
#! ‣ (1.47 * Exp( x1 ) + 5.84111) / (Exp( x1 ) + 5.10727)
@@ -175,22 +175,22 @@ ApplyFunctor( iota, Smooth.( 1 ) );
psi := ApplyFunctor( iota, Smooth.Sum( 2 ) );
#! ℝ^2 -> ℝ^1 defined by:
#!
-#! Parameter Object:
+#! Underlying Object:
#! -----------------
#! ℝ^0
#!
-#! Parametrised Morphism:
-#! ----------------------
+#! Underlying Morphism:
+#! -------------------
#! ℝ^2 -> ℝ^1
Print( DisplayString( psi ) );
#! ℝ^2 -> ℝ^1 defined by:
#!
-#! Parameter Object:
+#! Underlying Object:
#! -----------------
#! ℝ^0
#!
-#! Parametrised Morphism:
-#! ----------------------
+#! Underlying Morphism:
+#! -------------------
#! ℝ^2 -> ℝ^1
#!
#! ‣ x1 + x2
diff --git a/gap/CategoryOfParametrisedMorphisms.gi b/gap/CategoryOfParametrisedMorphisms.gi
index 10cbe10..fd0035f 100644
--- a/gap/CategoryOfParametrisedMorphisms.gi
+++ b/gap/CategoryOfParametrisedMorphisms.gi
@@ -546,9 +546,9 @@ InstallMethod( ViewString,
" -> ",
ViewString( Target( f ) ),
" defined by:",
- "\n\nParameter Object:\n-----------------\n",
+ "\n\nUnderlying Object:\n-----------------\n",
ViewString( UnderlyingObject( f ) ),
- "\n\nParametrised Morphism:\n----------------------\n",
+ "\n\nUnderlying Morphism:\n-------------------\n",
ViewString( UnderlyingMorphism( f ) ) );
end );
@@ -565,9 +565,9 @@ InstallMethod( DisplayString,
" -> ",
ViewString( Target( f ) ),
" defined by:",
- "\n\nParameter Object:\n-----------------\n",
+ "\n\nUnderlying Object:\n-----------------\n",
ViewString( UnderlyingObject( f ) ),
- "\n\nParametrised Morphism:\n----------------------\n",
+ "\n\nUnderlying Morphism:\n-------------------\n",
DisplayString( UnderlyingMorphism( f ) ) );
end );
@@ -584,9 +584,9 @@ InstallMethod( Display,
" -> ",
ViewString( Target( f ) ),
" defined by:",
- "\n\nParameter Object:\n-----------------\n",
+ "\n\nUnderlying Object:\n-----------------\n",
ViewString( UnderlyingObject( f ) ),
- "\n\nParametrised Morphism:\n----------------------\n" ) );
+ "\n\nUnderlying Morphism:\n-------------------\n" ) );
Display( UnderlyingMorphism( f ) );
diff --git a/pictures/linear_regression.gif b/pictures/linear_regression.gif
new file mode 100644
index 0000000..300d3db
Binary files /dev/null and b/pictures/linear_regression.gif differ
diff --git a/pictures/logistic_regression.gif b/pictures/logistic_regression.gif
new file mode 100644
index 0000000..84b4019
Binary files /dev/null and b/pictures/logistic_regression.gif differ
diff --git a/pictures/network-1.png b/pictures/network-1.png
new file mode 100644
index 0000000..5307bba
Binary files /dev/null and b/pictures/network-1.png differ
diff --git a/pictures/network-2.png b/pictures/network-2.png
new file mode 100644
index 0000000..166a9b9
Binary files /dev/null and b/pictures/network-2.png differ
diff --git a/pictures/training_data_1.png b/pictures/training_data_1.png
new file mode 100644
index 0000000..4f16217
Binary files /dev/null and b/pictures/training_data_1.png differ
diff --git a/pictures/training_data_2.png b/pictures/training_data_2.png
new file mode 100644
index 0000000..2c4445a
Binary files /dev/null and b/pictures/training_data_2.png differ
diff --git a/tst/categorical_vs_direct_constructions.tst b/tst/categorical_vs_direct_constructions.tst
index 0c69b63..a59cce7 100644
--- a/tst/categorical_vs_direct_constructions.tst
+++ b/tst/categorical_vs_direct_constructions.tst
@@ -12,23 +12,23 @@ gap> Assert( 0, Para.AffineTransformation_( 3, 4 ) = Para.AffineTransformation(
gap> l := Para.AffineTransformation( 3, 4 );
ℝ^3 -> ℝ^4 defined by:
-Parameter Object:
+Underlying Object:
-----------------
ℝ^16
-Parametrised Morphism:
-----------------------
+Underlying Morphism:
+-------------------
ℝ^19 -> ℝ^4
gap> dummy_input := DummyInputForAffineTransformation( 3, 4, "w", "b", "x" );;
gap> Display( l : dummy_input := dummy_input );
ℝ^3 -> ℝ^4 defined by:
-Parameter Object:
+Underlying Object:
-----------------
ℝ^16
-Parametrised Morphism:
-----------------------
+Underlying Morphism:
+-------------------
ℝ^19 -> ℝ^4
‣ w1_1 * x1 + w2_1 * x2 + w3_1 * x3 + b_1
diff --git a/tst/example-in-readme-1.tst b/tst/example-in-readme-1.tst
new file mode 100644
index 0000000..00be118
--- /dev/null
+++ b/tst/example-in-readme-1.tst
@@ -0,0 +1,113 @@
+gap> LoadPackage( "MachineLearningForCAP" );
+true
+
+gap> Para := CategoryOfParametrisedMorphisms( SkeletalSmoothMaps );
+CategoryOfParametrisedMorphisms( SkeletalSmoothMaps )
+
+gap> training_set := [ [ 1, 2.9 ], [ 2, 5.1 ], [ 3, 7.05 ] ];
+[ [ 1, 2.9 ], [ 2, 5.1 ], [ 3, 7.05 ] ]
+
+gap> input_dim := 1;; output_dim := 1;; hidden_dims := [ ];;
+
+gap> f := PredictionMorphismOfNeuralNetwork( Para, input_dim, hidden_dims, output_dim, "IdFunc" );;
+
+gap> Display( f );
+ℝ^1 -> ℝ^1 defined by:
+
+Underlying Object:
+-----------------
+ℝ^2
+
+Underlying Morphism:
+-------------------
+ℝ^3 -> ℝ^1
+
+‣ x1 * x3 + x2
+
+gap> parameters := [ 2, 1 ];; x := [ 2 ];;
+
+gap> Eval( f, [ parameters, x ] );
+[ 5 ]
+
+gap> ell := LossMorphismOfNeuralNetwork( Para, input_dim, hidden_dims, output_dim, "IdFunc" );;
+
+gap> Display( ell );
+ℝ^2 -> ℝ^1 defined by:
+
+Underlying Object:
+-----------------
+ℝ^2
+
+Underlying Morphism:
+-------------------
+ℝ^4 -> ℝ^1
+
+‣ (x1 * x3 + x2 - x4) ^ 2 / 1
+
+gap> Lenses := CategoryOfLenses( SkeletalSmoothMaps );
+CategoryOfLenses( SkeletalSmoothMaps )
+
+gap> optimizer := Lenses.GradientDescentOptimizer( : learning_rate := 0.01 );;
+
+gap> batch_size := 1;;
+
+gap> one_epoch_update := OneEpochUpdateLens( ell, optimizer, training_set, batch_size );;
+
+gap> parameters := [ 0.1, -0.1 ];;
+
+gap> nr_epochs := 15;;
+
+gap> parameters := Fit( one_epoch_update, nr_epochs, parameters );
+Epoch 0/15 - loss = 26.777499999999993
+Epoch 1/15 - loss = 13.002145872163197
+Epoch 2/15 - loss = 6.3171942035316935
+Epoch 3/15 - loss = 3.0722513061917534
+Epoch 4/15 - loss = 1.4965356389126505
+Epoch 5/15 - loss = 0.73097379078374358
+Epoch 6/15 - loss = 0.35874171019291579
+Epoch 7/15 - loss = 0.1775574969062125
+Epoch 8/15 - loss = 0.089228700384937534
+Epoch 9/15 - loss = 0.046072054531129378
+Epoch 10/15 - loss = 0.024919378473509772
+Epoch 11/15 - loss = 0.014504998499450883
+Epoch 12/15 - loss = 0.0093448161379050161
+Epoch 13/15 - loss = 0.0067649700132868147
+Epoch 14/15 - loss = 0.0054588596501628835
+Epoch 15/15 - loss = 0.0047859930295160499
+[ 2.08995, 0.802632 ]
+
+gap> theta := SkeletalSmoothMaps.Constant( parameters );
+ℝ^0 -> ℝ^2
+
+gap> Display( theta );
+ℝ^0 -> ℝ^2
+
+‣ 2.08995
+‣ 0.802632
+
+gap> f_theta := ReparametriseMorphism( f, theta );;
+
+gap> Display( f_theta );
+ℝ^1 -> ℝ^1 defined by:
+
+Underlying Object:
+-----------------
+ℝ^0
+
+Underlying Morphism:
+-------------------
+ℝ^1 -> ℝ^1
+
+‣ 2.08995 * x1 + 0.802632
+
+gap> f_theta := UnderlyingMorphism( f_theta );
+ℝ^1 -> ℝ^1
+
+gap> Eval( f_theta, [ 1 ] );
+[ 2.89259 ]
+
+gap> Eval( f_theta, [ 2 ] );
+[ 4.98254 ]
+
+gap> Eval( f_theta, [ 3 ] );
+[ 7.07249 ]
diff --git a/tst/example-in-readme-2.tst b/tst/example-in-readme-2.tst
new file mode 100644
index 0000000..04100ae
--- /dev/null
+++ b/tst/example-in-readme-2.tst
@@ -0,0 +1,159 @@
+gap> LoadPackage( "MachineLearningForCAP" );
+true
+
+gap> Para := CategoryOfParametrisedMorphisms( SkeletalSmoothMaps );
+CategoryOfParametrisedMorphisms( SkeletalSmoothMaps )
+
+gap> class_1 := Concatenation( List( [ -2 .. 3 ], i -> [ [ i, i - 1, 1, 0, 0 ], [ i + 1, i - 1, 1, 0, 0 ] ] ) );;
+gap> class_2 := Concatenation( List( [ -3 .. -1 ], i -> List( [ i + 1 .. - i - 1 ], j -> [ i, j, 0, 1, 0 ] ) ) );;
+gap> class_3 := Concatenation( List( [ 1 .. 3 ], i -> List( [ - i + 1 .. i - 1 ], j -> [ j, i, 0, 0, 1 ] ) ) );;
+
+gap> training_set := Concatenation( class_1, class_2, class_3 );
+[ [ -2, -3, 1, 0, 0 ], [ -1, -3, 1, 0, 0 ], [ -1, -2, 1, 0, 0 ], [ 0, -2, 1, 0, 0 ], [ 0, -1, 1, 0, 0 ],
+ [ 1, -1, 1, 0, 0 ], [ 1, 0, 1, 0, 0 ], [ 2, 0, 1, 0, 0 ], [ 2, 1, 1, 0, 0 ], [ 3, 1, 1, 0, 0 ],
+ [ 3, 2, 1, 0, 0 ], [ 4, 2, 1, 0, 0 ], [ -3, -2, 0, 1, 0 ], [ -3, -1, 0, 1, 0 ], [ -3, 0, 0, 1, 0 ],
+ [ -3, 1, 0, 1, 0 ], [ -3, 2, 0, 1, 0 ], [ -2, -1, 0, 1, 0 ], [ -2, 0, 0, 1, 0 ], [ -2, 1, 0, 1, 0 ],
+ [ -1, 0, 0, 1, 0 ], [ 0, 1, 0, 0, 1 ], [ -1, 2, 0, 0, 1 ], [ 0, 2, 0, 0, 1 ], [ 1, 2, 0, 0, 1 ],
+ [ -2, 3, 0, 0, 1 ], [ -1, 3, 0, 0, 1 ], [ 0, 3, 0, 0, 1 ], [ 1, 3, 0, 0, 1 ], [ 2, 3, 0, 0, 1 ] ]
+
+gap> input_dim := 2;; output_dim := 3;; hidden_dims := [ ];;
+
+gap> f := PredictionMorphismOfNeuralNetwork( Para, input_dim, hidden_dims, output_dim, "Softmax" );;
+
+gap> input := ConvertToExpressions( [ "theta_1", "theta_2", "theta_3", "theta_4", "theta_5", "theta_6", "theta_7", "theta_8", "theta_9", "x1", "x2" ] );;
+
+gap> Display( f : dummy_input := input );
+ℝ^2 -> ℝ^3 defined by:
+
+Underlying Object:
+-----------------
+ℝ^9
+
+Underlying Morphism:
+-------------------
+ℝ^11 -> ℝ^3
+
+‣ Exp( (theta_1 * x1 + theta_2 * x2 + theta_3) )
+ / (Exp( theta_1 * x1 + theta_2 * x2 + theta_3 ) + Exp( (theta_4 * x1 + theta_5 * x2 + theta_6) ) + Exp( (theta_7 * x1 + theta_8 * x2 + theta_9) ))
+‣ Exp( (theta_4 * x1 + theta_5 * x2 + theta_6) )
+ / (Exp( theta_1 * x1 + theta_2 * x2 + theta_3 ) + Exp( (theta_4 * x1 + theta_5 * x2 + theta_6) ) + Exp( (theta_7 * x1 + theta_8 * x2 + theta_9) ))
+‣ Exp( (theta_7 * x1 + theta_8 * x2 + theta_9) )
+ / (Exp( theta_1 * x1 + theta_2 * x2 + theta_3 ) + Exp( (theta_4 * x1 + theta_5 * x2 + theta_6) ) + Exp( (theta_7 * x1 + theta_8 * x2 + theta_9) ))
+
+gap> parameters := [ 0.1, -0.1, 0, 0.1, 0.2, 0, -0.2, 0.3, 0 ];;
+gap> x := [ 1, 2 ];;
+
+gap> prediction_x := Eval( f, [ parameters, x ] );
+[ 0.223672, 0.407556, 0.368772 ]
+
+gap> PositionMaximum( prediction_x );
+2
+
+gap> ell := LossMorphismOfNeuralNetwork( Para, input_dim, hidden_dims, output_dim, "Softmax" );;
+
+gap> input := ConvertToExpressions( [ "theta_1", "theta_2", "theta_3", "theta_4", "theta_5", "theta_6", "theta_7", "theta_8", "theta_9", "x1", "x2", "y1", "y2", "y3" ] );;
+
+gap> Display( ell : dummy_input := input );
+ℝ^5 -> ℝ^1 defined by:
+
+Underlying Object:
+-----------------
+ℝ^9
+
+Underlying Morphism:
+-------------------
+ℝ^14 -> ℝ^1
+
+‣ ((Log( Exp( theta_1 * x1 + theta_2 * x2 + theta_3 ) + Exp( (theta_4 * x1 + theta_5 * x2 + theta_6) ) + Exp( (theta_7 * x1 + theta_8 * x2 + theta_9) ) ) - (theta_1 * x1 + theta_2 * x2 + theta_3)) * y1
+ + (Log( Exp( theta_1 * x1 + theta_2 * x2 + theta_3 ) + Exp( (theta_4 * x1 + theta_5 * x2 + theta_6) ) + Exp( (theta_7 * x1 + theta_8 * x2 + theta_9) ) ) - (theta_4 * x1 + theta_5 * x2 + theta_6)) * y2
+ + (Log( Exp( theta_1 * x1 + theta_2 * x2 + theta_3 ) + Exp( (theta_4 * x1 + theta_5 * x2 + theta_6) ) + Exp( (theta_7 * x1 + theta_8 * x2 + theta_9) ) ) - (theta_7 * x1 + theta_8 * x2 + theta_9)) * y3) / 3
+
+gap> Lenses := CategoryOfLenses( SkeletalSmoothMaps );
+CategoryOfLenses( SkeletalSmoothMaps )
+
+gap> optimizer := Lenses.AdamOptimizer( : learning_rate := 0.01, beta_1 := 0.9, beta_2 := 0.999 );;
+
+gap> batch_size := 1;;
+
+gap> one_epoch_update := OneEpochUpdateLens( ell, optimizer, training_set, batch_size );
+(ℝ^28, ℝ^28) -> (ℝ^1, ℝ^0) defined by:
+
+Get Morphism:
+----------
+ℝ^28 -> ℝ^1
+
+Put Morphism:
+----------
+ℝ^28 -> ℝ^28
+
+gap> parameters := [ 0.1, -0.1, 0, 0.1, 0.2, 0, -0.2, 0.3, 0 ];;
+
+gap> w := Concatenation( [ 1 ], 0 * parameters, 0 * parameters, parameters );;
+
+gap> Eval( GetMorphism( one_epoch_update ), w );
+[ 0.345836 ]
+
+gap> new_w := Eval( PutMorphism( one_epoch_update ), w );
+[ 31, 0.104642, -0.355463, -0.197135, -0.109428, -0.147082, 0.00992963,
+ 0.00478679, 0.502546, 0.187206, 0.0105493, 0.00642903, 0.00211548,
+ 0.00660062, 0.00274907, 0.00110985, 0.00278786, 0.0065483, 0.00112838,
+ 5.45195, -1.26208, 3.82578, -5.40639, -0.952146, -3.42835, -2.79496, 3.09008, -6.80672 ]
+
+gap> nr_epochs := 4;;
+
+gap> w := Fit( one_epoch_update, nr_epochs, w );
+Epoch 0/4 - loss = 0.34583622811001763
+Epoch 1/4 - loss = 0.6449437167091393
+Epoch 2/4 - loss = 0.023811108587716449
+Epoch 3/4 - loss = 0.0036371652708073405
+Epoch 4/4 - loss = 0.0030655216725219204
+[ 121, -4.57215e-05, -0.00190417, -0.0014116, -0.00181528, 0.00108949, 0.00065518, 0.001861, 0.000814679,
+ 0.000756424, 0.0104885, 0.00846858, 0.0022682, 0.00784643, 0.00551702, 0.0014626, 0.00351408, 0.00640225,
+ 0.00115053, 5.09137, -4.83379, 3.06257, -5.70976, 0.837175, -4.23622, -1.71171, 5.54301, -4.80856 ]
+
+gap> theta := SplitDenseList( w, [ 19, 9 ] )[2];
+[ 5.09137, -4.83379, 3.06257, -5.70976, 0.837175, -4.23622, -1.71171, 5.54301, -4.80856 ]
+
+gap> theta := SkeletalSmoothMaps.Constant( theta );
+ℝ^0 -> ℝ^9
+
+gap> f_theta := ReparametriseMorphism( f, theta );;
+
+gap> Display( f_theta );
+ℝ^2 -> ℝ^3 defined by:
+
+Underlying Object:
+-----------------
+ℝ^0
+
+Underlying Morphism:
+-------------------
+ℝ^2 -> ℝ^3
+
+‣ Exp( (5.09137 * x1 + (- 4.83379) * x2 + 3.06257) )
+ / (Exp( 5.09137 * x1 + (- 4.83379) * x2 + 3.06257 )
+ + Exp( ((- 5.70976) * x1 + 0.837175 * x2 + (- 4.23622)) )
+ + Exp( ((- 1.71171) * x1 + 5.54301 * x2 + (- 4.80856)) ))
+‣ Exp( ((- 5.70976) * x1 + 0.837175 * x2 + (- 4.23622)) )
+ / (Exp( 5.09137 * x1 + (- 4.83379) * x2 + 3.06257 )
+ + Exp( ((- 5.70976) * x1 + 0.837175 * x2 + (- 4.23622)) )
+ + Exp( ((- 1.71171) * x1 + 5.54301 * x2 + (- 4.80856)) ))
+‣ Exp( ((- 1.71171) * x1 + 5.54301 * x2 + (- 4.80856)) )
+ / (Exp( 5.09137 * x1 + (- 4.83379) * x2 + 3.06257 )
+ + Exp( ((- 5.70976) * x1 + 0.837175 * x2 + (- 4.23622)) )
+ + Exp( ((- 1.71171) * x1 + 5.54301 * x2 + (- 4.80856)) ))
+
+gap> f_theta := UnderlyingMorphism( f_theta );
+ℝ^2 -> ℝ^3
+
+gap> predictions_vec := Eval( f_theta, [ 1, -1 ] );
+[ 1., 4.74723e-11, 1.31974e-11 ]
+
+gap> PositionMaximum( predictions_vec );
+1
+
+gap> predictions_vec := Eval( f_theta, [ 1, 3 ] );
+[ 7.13122e-08, 2.40484e-08, 1. ]
+
+gap> PositionMaximum( predictions_vec );
+3
diff --git a/tst/functor.tst b/tst/functor.tst
index cac1106..b4da613 100644
--- a/tst/functor.tst
+++ b/tst/functor.tst
@@ -10,12 +10,12 @@ gap> dummy_input := ConvertToExpressions( [ "w1", "w2", "b1", "x1", "x2", "y" ]
gap> Display( ell : dummy_input := dummy_input );
ℝ^3 -> ℝ^1 defined by:
-Parameter Object:
+Underlying Object:
-----------------
ℝ^3
-Parametrised Morphism:
-----------------------
+Underlying Morphism:
+-------------------
ℝ^6 -> ℝ^1
‣ (w1 * x1 + w2 * x2 + b1 - y) ^ 2 / 1
@@ -24,12 +24,12 @@ Embedding into category of parametrised morphisms
gap> Rf := ApplyFunctor( R, ell );
(ℝ^3, ℝ^3) -> (ℝ^1, ℝ^1) defined by:
-Parameter Object:
+Underlying Object:
-----------------
(ℝ^3, ℝ^3)
-Parametrised Morphism:
-----------------------
+Underlying Morphism:
+-------------------
(ℝ^6, ℝ^6) -> (ℝ^1, ℝ^1) defined by:
Get Morphism:
@@ -42,12 +42,12 @@ Put Morphism:
gap> Display( Rf );
(ℝ^3, ℝ^3) -> (ℝ^1, ℝ^1) defined by:
-Parameter Object:
+Underlying Object:
-----------------
(ℝ^3, ℝ^3)
-Parametrised Morphism:
-----------------------
+Underlying Morphism:
+-------------------
(ℝ^6, ℝ^6) -> (ℝ^1, ℝ^1) defined by:
Get Morphism: