# **DiffKT** Model

 **Copyright (c) Meta Platforms, Inc. and affiliates.**
 
 This source code is licensed under the MIT license found in the
 LICENSE file in the root directory of this source tree.

## Introduction

This notebook discusses the model api, which is in __[diffkt/kotlin/api/src/main/kotlin/org/diffkt/model](https://github.com/facebookresearch/diffkt/tree/main/kotlin/api/src/main/kotlin/org/diffkt/model)__.
The model api is used to build deep neural networks using the automatic differentiation in **DiffKt**.
This notebook will use a simple linear regression example to show how to use the model api. This notebook is based on the example __[Linear Regression](https://github.com/facebookresearch/diffkt/tree/main/kotlin/examples/src/main/kotlin/examples/linreg)__.

There are additional examples using the model api:

__[Iris](https://github.com/facebookresearch/diffkt/tree/main/kotlin/examples/src/main/kotlin/examples/iris)__, an image processing example using dense layers in a neural network.

__[MNIST](https://github.com/facebookresearch/diffkt/tree/main/kotlin/examples/src/main/kotlin/examples/mnist)__, an image processing example using a convolution neural network.

__[RESNET](https://github.com/facebookresearch/diffkt/tree/main/kotlin/examples/src/main/kotlin/examples/resnet)__, an image processing example using a deep convolution neural network.


### Housekeeping

The following jars need to be included in the notebook.

In [1]:
@file:DependsOn("../kotlin/api/build/libs/api.jar")
@file:DependsOn("../kotlin/data/build/libs/data.jar")

Import the following classes for the notebook.

In [2]:
import org.diffkt.*
import org.diffkt.data.Data
import org.diffkt.model.*
import org.diffkt.tracing.jit
import kotlin.math.min
import kotlin.random.Random

## Linear Regression

This notebook implements a simple linear regression model of $labels = m * features + b$, where

$labels$ - output

$features$ - input

$m$ - the weight

$b$ - the bias

For this notebook, the model that generates the data is $labels = 5 * features + 2$

The goal of linear regression is to recover the weight and the bias given the model, the input data, and the output data.

## The Model API

The `Model` API uses gradient descent to estimate the coefficients for the linear regression model. The linear regression model uses a single layer of a neural network.

A number of steps are required to use the Model API.

1) Create some training data to use to build the model,

2) Create an interator over the data for training the model,

3) Create a linear regression model that inherits from the Model API,

4) Create a loss function,

5) Create an optimizer,

6) Create a learning class,

7) Train the model.

### Setup

The training data set size is 100 data points.

In [3]:
// Setup

val trainingDataSize = 100
val random = Random(1234567)

### Training Data

The function `makeTrainingData()` creates a vector, or a 1D tensor, of 100 random inputs, or features. Tensor based arithmatic is used to create the labels vector, where $labels = features * trueWeight + trueBias$. The $trueWeight = 5.0$ and the $trueBias = 2.0$, so the model to be learned will be $labels = 5.0 * features + 2.0$ The features, labels, trueWeight, and trueBias are return in an object of class `TrainingData`. The `trueWeight` and `trueBias` are stored with the data so we can see how accurate a model is produced from training.

In [4]:
// Training Data

class TrainingData(val features : FloatTensor, 
                   val labels : FloatTensor, 
                   val trueWeight : FloatScalar, 
                   val trueBias : FloatScalar) {
       
    companion object {
              
        fun makeTrainingData(trainingDataSize: Int, random : Random ) : TrainingData {    

            val trueWeight = FloatScalar(5f)
            val trueBias = FloatScalar(2f)
            
            val features = FloatTensor(Shape(trainingDataSize)) { random.nextFloat() }
            val labels = (features * trueWeight + trueBias) as FloatTensor
        
            return TrainingData(features, labels, trueWeight, trueBias)
        }
    }
} 

### Create the Training Data

In [5]:
val trainingData = TrainingData.makeTrainingData(trainingDataSize, random)

### Data Iterator

The `SimpleDataIterator` class creates an iterator over class `Data`. Class `Data` is located in __[diffkt/kotlin/data/src/main/kotlin/org/diffkt/data/Data.kt](https://github.com/facebookresearch/diffkt/blob/main/kotlin/data/src/main/kotlin/org/diffkt/data/Data.kt)__. Class `Data` holds the labels and features for a training set and provides an iterator over the data to provide data to the learning algorithm in batches.

In [6]:
// Data Iterator

class SimpleDataIterator(val features: FloatTensor,
                         val labels: FloatTensor,
                         val batchSize: Int = 1): Iterable<Data> {

    // you need the same number of labels as features
    init {
        require(features.shape.first == labels.shape.first)
    }

    // number of training examples
    private val n = features.shape.first

    fun withBatchSize(batchSize: Int) = SimpleDataIterator(features, labels, batchSize)

    // gets a slide of data from the tensors
    override fun iterator(): Iterator<Data> = object : Iterator<Data> {
        var loc = 0
        override fun hasNext(): Boolean = loc < n
        override fun next(): Data {
            require(hasNext())

            val start = loc
            val end = min(loc + batchSize, n)
            val f = features.slice(start, end)
            val l = labels.slice(start, end)
            loc = end
            
            // The actual data for this interation is stored in a Data object
            return Data(f, l)
        }
    }
}

### Create the Data Iterator

Even though the number of data points is 100, the batchSize in the iterator will default to 1.

In [7]:
val dataIterator = SimpleDataIterator(trainingData.features, trainingData.labels)

### Linear Regression Model

`LinearRegresson` is a single layer neural network using an __[AffineTransform](https://github.com/facebookresearch/diffkt/blob/main/kotlin/api/src/main/kotlin/org/diffkt/model/AffineTransform.kt)__ layer. The `AffineTransform` layer takes an input tensor, $features$, an estimated weight tensor, $m$, and an estimated bias tensor, $b$ and calculates the estmated tensor $labels$, where $labels = m * features + b$ using tensor operations. `LinearRegression` inherits from __[Model](https://github.com/facebookresearch/diffkt/blob/main/kotlin/api/src/main/kotlin/org/diffkt/model/Model.kt)__. `Model` is an abstract class, so `layers`, `withLayers()`, `hashCode()`, and `equals()` have to be implemented. `DTensor` and `DScalar` are like vals in Kotlin, they are fixed in value once they are initialized. In the linear regression model you want to learn the estimated weight and estimated bias in the model. To have a tensor that can be modified in the learning process, use a __[TrainableTensor](https://github.com/facebookresearch/diffkt/blob/main/kotlin/api/src/main/kotlin/org/diffkt/model/TrainableTensor.kt)__ instead. The estimated weight $m$ and the estimated bias $b$ are set as a `TrainableTensor`, initialized to a random value. If the learning is sucessful, the estmated weight and estimated bias will approximate the true weight and true bias.

In [8]:
// Linear Regression

class LinearRegression(val l: AffineTransform): Model<LinearRegression>() {
    
    constructor(m: DScalar, b: DScalar) : this(AffineTransform(TrainableTensor(m), TrainableTensor(b)))
    constructor(random: Random) : this(FloatScalar(random.nextFloat()), FloatScalar(random.nextFloat()))

    override val layers: List<Layer<*>> = listOf(l)

    override fun withLayers(newLayers: List<Layer<*>>): LinearRegression {
        require(newLayers.size == 1)
        val newLayer = newLayers[0] as AffineTransform
        return LinearRegression(newLayer)
    }

    override fun hashCode(): Int = combineHash("LinearRegression", l)
    override fun equals(other: Any?): Boolean = other is LinearRegression &&
            other.l == l
}

### Create the Linear Regression Model

In [9]:
val linReg = LinearRegression(random)

### Loss Function

The loss function is necessary for gradient descent optimization. The loss function is a L2 loss function, which is the sum of the squared differences between the predicted value of a label and the actual value of a label.

In [10]:
fun lossFun(predictions: DTensor, labels: DTensor): DScalar {
    val diff = predictions - labels
    return (diff * diff).sum()
}

### Optimizer

The optimizer uses the built in __[FixedLearningRateOptimizer](https://github.com/facebookresearch/diffkt/blob/main/kotlin/api/src/main/kotlin/org/diffkt/model/FixedLearningRateOptimizer.kt)__. It implements a simple gradient descent optimization algorithm.

In [11]:
val optimizer = FixedLearningRateOptimizer<LinearRegression>(0.5F / trainingDataSize)

### Learner

A Learner class needs to be created to train the model. At this point in time it does not inherit from a class in the model directory and needs to be implemented from scratch. The `modelTrainStep()` function does the actual training. It calculated the derivatives of the model and passes them on to the optimizer to be used in the gradient descent algorithm. The `optimizeModel` function generates a batch of data to pass to the `modelTrainStep` for the number of epochs in the `train()` function call.

In [12]:
class Learner<T : Model<T>>(val batchedData: Iterable<Data>,
                            val lossFunc: (predictions: DTensor, labels: DTensor) -> DScalar,
                            val optimizer: Optimizer<T>) 
{
    var totalTime = 0L

    /**
     * Trains the given model on the data set, for [epochs] epochs processing the data of the [dataIterator] in
     * batches of size [batchSize].  Returns the trained model.
     */
    fun train(model: T,
              epochs: Int,
              printProgress: Boolean = false): T 
    {
        

        // The model training step function, which could possibly be optimized.
        fun modelTrainStep(model2: T, batch: Data): Pair<DScalar, T> 
        {
            val (loss, tangent) = primalAndReverseDerivative(
                x = model2,
                f = { model3: T ->
                    val output = model3.predict(batch.features)
                    val loss = lossFunc(output, batch.labels)
                    loss
                },
                extractDerivative = { model3: T,
                                      loss: DScalar,
                                      extractor: (input: DTensor, output: DTensor) -> DTensor ->
                                                model3.extractTangent(loss, extractor)
                }
            )

            val trainedModel: T = optimizer.train(model2, tangent)
            return Pair(loss, trainedModel)
        }
        
        // batch management 
        val optimizedModel = (0 until epochs).fold(model) { model1: T, e: Int ->
            var lossTotal: DScalar = FloatScalar.ZERO
            
            val trainedModel = batchedData.fold(model1) { model2: T, batch: Data ->
                
                // get some data
                val batchOnDevice = batch.to(Device.CPU)
                
                // time the training step
                val t1 = System.nanoTime()
                val (loss, trainedModel) = modelTrainStep(model2, batchOnDevice)
                val t2 = System.nanoTime()
                totalTime += t2 - t1  
                
                if (printProgress) lossTotal += loss
                trainedModel
            }
            
            if (printProgress && ((e % 10) == 0)) println("Epoch $e Cumulative Loss: $lossTotal")
            trainedModel
        }
        
        return optimizedModel
    }

    private fun e(n: Long) = n / 1e9f

    fun dumpTimes() {
        println("running time:  ${e(totalTime)} sec")
    }
}


### Build the Model

An instance of the `Learner` is created and `train()` is called for 230 epochs.

The loss is displayed every 10 epochs. The total training time is displayed.

You can compare the trueWeight and the estimated trueWeight.

You can compare the trueBias to the estimated trueBias.

In [13]:
// Build the model

val learner = Learner(batchedData = dataIterator,
                      lossFunc = ::lossFun,
                      optimizer = optimizer)

val trainedModel = learner.train(linReg, 230, printProgress = true)

println()
learner.dumpTimes()
println()
println("trueWeight = ${trainingData.trueWeight}, estimated trueWeight = ${trainedModel.l.m.tensor.toString()}")
println("trueBias = ${trainingData.trueBias}, estimated trueBias = ${trainedModel.l.b.tensor.toString()}")


Epoch 0 Cumulative Loss: 453.91916
Epoch 10 Cumulative Loss: 17.30477
Epoch 20 Cumulative Loss: 5.533401
Epoch 30 Cumulative Loss: 1.7693163
Epoch 40 Cumulative Loss: 0.56576484
Epoch 50 Cumulative Loss: 0.18091545
Epoch 60 Cumulative Loss: 0.057852346
Epoch 70 Cumulative Loss: 0.018499354
Epoch 80 Cumulative Loss: 0.005913419
Epoch 90 Cumulative Loss: 0.0018901364
Epoch 100 Cumulative Loss: 6.049421E-4
Epoch 110 Cumulative Loss: 1.9333517E-4
Epoch 120 Cumulative Loss: 6.179842E-5
Epoch 130 Cumulative Loss: 1.9667383E-5
Epoch 140 Cumulative Loss: 6.244504E-6
Epoch 150 Cumulative Loss: 2.044776E-6
Epoch 160 Cumulative Loss: 5.9208685E-7
Epoch 170 Cumulative Loss: 1.5495567E-7
Epoch 180 Cumulative Loss: 4.2282977E-8
Epoch 190 Cumulative Loss: 2.2936376E-8
Epoch 200 Cumulative Loss: 1.7647608E-8
Epoch 210 Cumulative Loss: 1.7243167E-8
Epoch 220 Cumulative Loss: 1.7243167E-8

running time:  0.7214423 sec

trueWeight = 5.0, estimated trueWeight = 4.9999514
trueBias = 2.0, estimated trueBias

## The End