## 3D Nonlinear Regression

In this notebook let's use DiffKt to fit a nonlinear regression to a dataset of two input variables $ x_1 $ and $ x_2 $ and output variable $ y $. Bring in the DiffKt library for the tensor library and automatic differentiation.

In [1]:
@file:DependsOn("../kotlin/api/build/libs/api.jar")

The dataset can be found [here](https://bit.ly/35ReT3i) and here it is visualized as a scatterplot below. 

![](./resources/ybgAGvOQXT.mp4)

We see a 3D parabola-like shape above, and we are going to fit the following function:

$$
y = ax_1^2 + bx_2^2 + c 
$$

$ x_1 $ and $ x_2 $ are the input variables, $ y $ is the output variable, and $ a $, $ b $, and $ c $ are the coefficients we will use gradient descent to solve for. 

First let's `import` three libraries: `Random`, `URL`,ad `diffkt`. 

In [2]:
import java.net.URL
import kotlin.random.Random
import org.diffkt.*

Next let's declare a `Point` class that will hold the two input variables $ x_1 $ and $ x_2 $ and the output variable $ y $. We will then use the `URL` function to read the CSV stored [here](https://bit.ly/3ty5BRZ), split the lines using a regular expression, and process each line as a `Sequence`. Note we drop the first line with column names and split the comma-separated values, and then convert them into `Float` values. Then we can package each trio of values into a `Point` and collect into a `List<Point>`. 

In [3]:
data class Point(val x1: Float, val x2: Float, val y: Float)

val points = URL("https://bit.ly/35ebET5")    // read CSV
    .readText().split(Regex("\\r?\\n"))       // split lines using regular expression
    .filter { it.matches(Regex("[-,.0-9]+")) }  // filter only numeric records using regular expression
    .map { it.split(",").map{ it.toFloat()} } // split commas into columns
    .map { (x1,x2,y) -> Point(x1,x2,y) }      // map to Point objects

We are going to need to map these points to DiffKt tensors. We will use the `tensorOf()` function and map the `points` inside it. Now when we map to the inputs `x1` and `x2` and the output tensor `y`, we use lambda functions as arguments to specify what columns we want to generate and on what values. However notice on the `x` tensor below we add a third column simply returning a $ 1 $. This is going to add a column of 1's next to our `x1` and `x2` input variables. Why is this necessary? It will serve as a placeholder to generate the intercept coefficient. Without it, we would only generate the slopes for `x1` and `x2` without any intercept. 

In [4]:
// map variables to input and output variable tensors
// add a placeholder "1" column to generate intercept on input tensor
val x = tensorOf(points.flatMap { listOf(it.x1, it.x2, 1f) }.map(::FloatScalar) ).reshape(points.size, 3)
val y = tensorOf(points.map { it.y }.map(::FloatScalar) ).reshape(points.size, 1)

To represent our three coefficients $ a $, $ b $, and $ c $ we will use a float tensor holding these three values. Let's initialize them as random values between $ 0 $ and $ 1 $.  

In [5]:
// initialize coefficients
var coeffs: DTensor = FloatTensor.random(Random,Shape(3,1))

To visualize the tensor operations, let's say our coefficients $ a $ were initialized with the following values. 

$ A = \left[\begin{matrix}0.1\\0.2\\0.5\end{matrix}\right] $ 

And let's say we have 3 records of $ X $ inputs with the added column of 1's. 

$ X = \left[\begin{matrix}2 & 10 & 1\\4 & 20 & 1\\10 & 30 & 1\end{matrix}\right] $ 

To get the predicted $ \hat{Y} $ values, we apply matrix multiplication (dot products) between the squared input $ X $ variables (with the additional column of 1's) and the coefficients $ A $.  

$ \hat{Y} = X^2 \cdot A $ 

$ \hat{Y} = \left[\begin{matrix}2 & 10 & 1\\4 & 20 & 1\\10 & 30 & 1\end{matrix}\right]^2 \cdot \left[\begin{matrix}0.1\\0.2\\0.5\end{matrix}\right] $ 

$ \hat{Y} = \left[\begin{matrix}4 & 100 & 1\\16 & 400 & 1\\100 & 900 & 1\end{matrix}\right] \cdot \left[\begin{matrix}0.1\\0.2\\0.5\end{matrix}\right] $ 

$ \hat{Y} =  \left[\begin{matrix}(4 \times 0.1) + (100 \times 0.2) + (1 \times 0.5) \\(16 \times 0.1) + (400 \times 0.2) + (1 \times 0.5) \\(100 \times 0.1) + (900 \times 0.2) + (1 \times 0.5) \end{matrix}\right] $

$ \hat{Y} = \left[\begin{matrix}20.9\\82.1\\190.5\end{matrix}\right] $

So that would yield predictions of $ 20.9 $, $ 82.1 $, and $ 190.5 $. 

To get predictions on all data given the current coefficients, use DiffKt's `*` operator: 

In [6]:
val yPredictions = x.matmul(coeffs)
yPredictions

[[0.0944885], [0.42068002], [-0.09135753], [0.011326373], [1.3516259], [0.971581], [1.208416], [0.6007465], [0.6806519], [0.14446294], [0.8846638], [1.4533465], [0.42978024], [0.7038765], [0.5565234], [1.1836759], [0.28992015], [1.3788936], [1.2073104], [0.6339359], [0.16737527], [0.5773643], [0.8801312], [0.33763707], [0.930137], [0.2672027], [0.8772792], [0.45577544], [1.4415357], [0.24095264], [0.5018616], [0.32369173], [1.2470438], [0.060830057], [0.8686333], [0.46965322], [0.98861235], [0.65131944], [0.51845235], [0.47044143], [0.6836105], [1.0125384], [1.3029736], [1.0642189], [0.12467849]]

To calculate the total loss, let's use a sum of squared loss. Subtract the actual $ Y $ values from the predicted $ \hat{Y} $ values. Take those differences, square them, and sum them. 

$ E = \sum{(Y - \hat{Y})^2 } $ 

Let's say we have these predicted $ \hat{Y} $ and actual $ Y $ values. 

$ \hat{Y} = \left[\begin{matrix}20.9\\82.1\\190.5\end{matrix}\right] $

$ Y = \left[\begin{matrix}21.2\\85.3\\189.1\end{matrix}\right] $

Here is how we would calculate the sum of squares. 

$ E = \sum{(Y - \hat{Y})^2 } $ 

$ E = \sum{( \left[\begin{matrix}21.2\\85.3\\189.1\end{matrix}\right] - \left[\begin{matrix}20.9\\82.1\\190.5\end{matrix}\right])^2 } $ 

$ E = \sum{(\left[\begin{matrix}0.3\\3.2\\-1.4\end{matrix}\right])^2} $ 

$ E = \sum{\left[\begin{matrix}0.09\\10.24\\0.0625\end{matrix}\right]} $ 

$ E = 10.3925 $ 

We can implement this as a `loss()` function in Kotlin using DiffKt as shown below. Remember that the predicted $ \hat{Y} $ values are the dot products of `x` and the coefficients. 

In [7]:
// calculate sum of squares of the error with given slope and intercept for a line
fun loss(coeffs: DTensor): DScalar =
    (y - x.pow(2).matmul(coeffs)).pow(2).sum()

Let's perform gradient descent. For $ 2,000 $ iterations, we will use a learning rate of $ .0001 $ and take the reverse derivative of the `loss()` function with regards to the `coeffs` tensor. This will return the gradient for each $ a $, $ b $, and $ c $ coefficient respectively which we multiply by the learning rate and subtract from the `coeffs` tensor. We subtract because we want to descend on the gradients. 

In [8]:
// The learning rate
val lr = .0001F

// The number of iterations to perform gradient descent
val iterations = 2000

// Perform gradient descent
for (i in 0..iterations) {

    // get gradients for line slope and intercept
    val betaGradients = reverseDerivative(coeffs, ::loss)

    // update m and b by subtracting the (learning rate) * (slope)
    coeffs -= betaGradients * lr
}
print("betas=$coeffs")

betas=[[0.022444753], [-0.046475355], [-1.1381321]]

Now we have fitted our function to the data! Here is a visualization of the gradient descent in action.


![](./resources/nipuzkhOdO.mp4)