# Linear Regression using La Classy

In [2]:
import { parse } from "https://deno.land/std@0.204.0/csv/parse.ts";
import {
  ClassificationReport,
  Matrix,
  useSplit,
  CategoricalEncoder,
} from "https://deno.land/x/vectorizer@v0.3.5/mod.ts";
import {
  GradientDescentSolver,
  mse,
  adamOptimizer
} from "https://deno.land/x/classylala@v1.0.0/mod.ts";


[32mDownloading[39m https://github.com/retraigo/classy-lala/releases/download/v1.0.0/classy.dll


We first load our dataset `iris.csv`.

In [3]:
const data = parse(Deno.readTextFileSync("../datasets/winequality-red.csv"));

Skip the first row (header).

In [4]:
data.shift()

[
  [32m"fixed acidity"[39m,
  [32m"volatile acidity"[39m,
  [32m"citric acid"[39m,
  [32m"residual sugar"[39m,
  [32m"chlorides"[39m,
  [32m"free sulfur dioxide"[39m,
  [32m"total sulfur dioxide"[39m,
  [32m"density"[39m,
  [32m"pH"[39m,
  [32m"sulphates"[39m,
  [32m"alcohol"[39m,
  [32m"quality"[39m
]

We can now get the predictor and target variables from the dataset.

In [5]:
const x = data.map((fl, i) => [...fl.slice(0, 8), ...fl.slice(9, 11)]);

const X = new Matrix<"f64">(Float64Array.from(x.flat()), [data.length])
X.slice(0, 10)

idx,0,1,2,3,4,5,6,7,8,9
0,7.4,0.7,0.0,1.9,0.076,11,34,0.9978,0.56,9.4
1,7.8,0.88,0.0,2.6,0.098,25,67,0.9968,0.68,9.8
2,7.8,0.76,0.04,2.3,0.092,15,54,0.997,0.65,9.8
3,11.2,0.28,0.56,1.9,0.075,17,60,0.998,0.58,9.8
4,7.4,0.7,0.0,1.9,0.076,11,34,0.9978,0.56,9.4
5,7.4,0.66,0.0,1.8,0.075,13,40,0.9978,0.56,9.4
6,7.9,0.6,0.06,1.6,0.069,15,59,0.9964,0.46,9.4
7,7.3,0.65,0.0,1.2,0.065,15,21,0.9946,0.47,10.0
8,7.8,0.58,0.02,2.0,0.073,9,18,0.9968,0.57,9.5
9,7.5,0.5,0.36,6.1,0.071,17,102,0.9978,0.8,10.5


Let's use pH value as our target variable.

In [6]:
const y = new Matrix<"f64">(Float64Array.from(data.map((fl) => fl[8])), [data.length]);
y.slice(0, 10)

idx,0
0,3.51
1,3.2
2,3.26
3,3.16
4,3.51
5,3.51
6,3.3
7,3.39
8,3.36
9,3.35


In [7]:
[X.shape, y.shape]

[ [ [33m1599[39m, [33m10[39m ], [ [33m1599[39m, [33m1[39m ] ]

We now split our dataset for training and testing purposes. 

In [8]:
const [[x_train, y_train], [x_test, y_test]] = useSplit(
  { ratio: [7, 3], shuffle: true },
  X,
  y
);
x_train.slice(0, 10)

idx,0,1,2,3,4,5,6,7,8,9
0,7.4,0.7,0.0,1.9,0.076,11,34,0.9978,0.56,9.4
1,7.8,0.88,0.0,2.6,0.098,25,67,0.9968,0.68,9.8
2,7.8,0.76,0.04,2.3,0.092,15,54,0.997,0.65,9.8
3,7.4,0.66,0.0,1.8,0.075,13,40,0.9978,0.56,9.4
4,7.3,0.65,0.0,1.2,0.065,15,21,0.9946,0.47,10.0
5,7.8,0.58,0.02,2.0,0.073,9,18,0.9968,0.57,9.5
6,7.5,0.5,0.36,6.1,0.071,17,102,0.9978,0.8,10.5
7,7.5,0.5,0.36,6.1,0.071,17,102,0.9978,0.8,10.5
8,5.6,0.615,0.0,1.6,0.089,16,59,0.9943,0.52,9.9
9,7.8,0.61,0.29,1.6,0.114,9,29,0.9974,1.56,9.1


Now that we have prepared our inputs, we can initialize our solver. Since we are performing linear regression, let's try out an Ordinary Least Squares solver.

In [9]:
const solver = new GradientDescentSolver({
    loss: mse(),
    optimizer: adamOptimizer(11, 1)
});

We can then train our model using the data we acquired.

In [10]:
solver.train(x_train, y_train, {
  fit_intercept: true,
  learning_rate: 0.01,
  epochs: 500
});

The model is trained, now it is time to evaluate its performance on our testing dataset

In [11]:
const res = solver.predict(x_test)
res.shape

[ [33m480[39m, [33m1[39m ]

In [12]:
[res.row(0), y_test.row(0)]

[ Float64Array(1) [ [33m3.1828660946326504[39m ], Float64Array(1) [ [33m3.16[39m ] ]

Let's calculate the RMSE.

In [13]:
let se = 0
for (let i = 0; i < res.nRows; i += 1) {
  se += (y_test.item(i, 0) * res.item(i, 0)) ** 2
}
se /= res.nRows;
console.log(`RMSE: ${Math.sqrt(se)}`)

RMSE: 11.196519909173531
