# Linear Vector Quantization

One of the biggest disadvantages of _k_-Nearest Neighbors is that the training data must be kept in its entirety for inference, given that it isn't used at all to train the algorithm. Thus, as the volume of data grows, also does the prediction time. 

In order to combat this issue, we have a very similar algorithm called __Linear Vector Quantization__. In a nutshell, what LVQ does is keeping a subset of the data that best represents the patterns and nuances of it. It, then, at inference time, uses the same methodology of _k_-Nearest Neighbors to produce either a category or a value.

As happens in KNN, LVQ makes predictions by finding the best match among the library or collection of patterns. The difference is that this collection of patterns is learned from the training data, and it is called _codebook vectors_. Hence, each patterns in this collection is called a _codebook_.

These _codebook vectors_ are initialized as randomly selected examples from the training set, and then are tuned during a given number of epochs.

Once the _codebook vectors_ have been prepared, then the _k_-Nearest Neighbors inference algorithm is used with _k=1_. Although LVQ was initially developed for classification tasks, it can also be used for regression problems. 

Let's start our implementation by loading the code and libraries we'll need. We will build our solution on top of the ones we implemented in the [previous notebook](https://github.com/jesus-a-martinez-v/toy-ml/blob/master/src/main/scala/notebooks/k_nearest_neighbors.ipynb).

In [1]:
import $ivy.`com.github.tototoshi::scala-csv:1.3.5`
import $file.^.datasmarts.ml.toy.scripts.KNearestNeighbors, KNearestNeighbors._
import scala.util.Random

[32mimport [39m[36m$ivy.$                                      
[39m
[32mimport [39m[36m$file.$                                            , KNearestNeighbors._
[39m
[32mimport [39m[36mscala.util.Random[39m

## Data

We'll use the [Ionosphere](https://archive.ics.uci.edu/ml/machine-learning-databases/ionosphere/ionosphere.data) dataset. It involves the prediction of structure in the atmosphere given radar returns targeting free electrons in the ionosphere. It is a binary classification task.

Let's load the data:

In [2]:
val BASE_DATA_PATH = "../../resources/data"
val ionospherePath = s"$BASE_DATA_PATH/14/ionosphere.csv"

val rawData = loadCsv(ionospherePath)
val numberOfRows = rawData.length
val numberOfColumns = rawData.head.length
println(s"Number of rows in dataset: $numberOfRows")
println(s"Number of columns in dataset: $numberOfColumns")

val (data, lookUpTable) = {
    val dataWithNumericColumns = (0 until (numberOfColumns - 1)).toVector.foldLeft(rawData) { (d, i) => textColumnToNumeric(d, i) }
    categoricalColumnToNumeric(dataWithNumericColumns, numberOfColumns - 1)
}

Number of rows in dataset: 351
Number of columns in dataset: 35


[36mBASE_DATA_PATH[39m: [32mString[39m = [32m"../../resources/data"[39m
[36mionospherePath[39m: [32mString[39m = [32m"../../resources/data/14/ionosphere.csv"[39m
[36mrawData[39m: [32mVector[39m[[32mVector[39m[[32mData[39m]] = [33mVector[39m(
  [33mVector[39m(
    Text(1),
    Text(0),
    Text(0.99539),
    Text(-0.05889),
    Text(0.85243),
    Text(0.02306),
    Text(0.83398),
    Text(-0.37708),
    Text(1),
    Text(0.03760),
[33m...[39m
[36mnumberOfRows[39m: [32mInt[39m = [32m351[39m
[36mnumberOfColumns[39m: [32mInt[39m = [32m35[39m
[36mdata[39m: [32mVector[39m[[32mVector[39m[[32mData[39m]] = [33mVector[39m(
  [33mVector[39m(
    Numeric(1.0),
    Numeric(0.0),
    Numeric(0.99539),
    Numeric(-0.05889),
    Numeric(0.85243),
    Numeric(0.02306),
    Numeric(0.83398),
    Numeric(-0.37708),
    Numeric(1.0),
    Numeric(0.0376),
[33m...[39m
[36mlookUpTable[39m: [32mMap[39m[[32mData[39m, [32mInt[39m] = [33mMap[39m(Text

## Euclidean Distance

In this notebook we'll use Euclidean distance as a similarity measure between two rows or vectors. Here's the equation:

$$ distance(X,Y) = \sqrt{\sum_{i=1}^n{(X_i - Y_i)^2}}$$

We implemented this function in the [previous notebook](https://github.com/jesus-a-martinez-v/toy-ml/blob/master/src/main/scala/notebooks/k_nearest_neighbors.ipynb). Please take a look at it before moving on ;)

## Best Matching Unit

The Best Matching Unit is the codebook vector that is most similar to a new piece of data. In order to determine this BMU we must calculate the measure of similarity between the example or new piece of data and each codebook vector.

Le's implement a function to get the Best Matching Unit for a particular example:

In [3]:
def getBestMatchingUnit(codebooks: Vector[Vector[Numeric]], testRow: Vector[Numeric]) = {
  val codebooksDistances = for {
    codebook <- codebooks
  } yield {
    val distance = euclideanDistance(codebook, testRow)
    (codebook, distance)
  }

  codebooksDistances.minBy(_._2)._1
}

defined [32mfunction[39m [36mgetBestMatchingUnit[39m

Good. Let's test it with a mock dataset:

In [4]:
val mockDataset = Vector(
  (2.7810836, 2.550537003, 0),
  (1.465489372, 2.362125076, 0),
  (3.396561688, 4.400293529, 0),
  (1.38807019, 1.850220317, 0),
  (3.06407232, 3.005305973, 0),
  (7.627531214, 2.759262235, 1),
  (5.332441248, 2.088626775, 1),
  (6.922596716, 1.77106367, 1),
  (8.675418651, -0.242068655, 1),
  (7.673756466, 3.508563011, 1)
) map { case (x1, x2, y) => Vector(Numeric(x1), Numeric(x2), Numeric(y))}

val testRow = mockDataset.head

mockDataset.foreach { r => 
    println(euclideanDistance(testRow, r))
}

0.0
1.3290173915275787
1.9494646655653247
1.5591439385540549
0.5356280721938492
4.850940186986411
2.592833759950511
4.214227042632867
6.522409988228337
4.985585382449795


[36mmockDataset[39m: [32mVector[39m[[32mVector[39m[[32mNumeric[39m]] = [33mVector[39m(
  [33mVector[39m([33mNumeric[39m([32m2.7810836[39m), [33mNumeric[39m([32m2.550537003[39m), [33mNumeric[39m([32m0.0[39m)),
  [33mVector[39m([33mNumeric[39m([32m1.465489372[39m), [33mNumeric[39m([32m2.362125076[39m), [33mNumeric[39m([32m0.0[39m)),
  [33mVector[39m([33mNumeric[39m([32m3.396561688[39m), [33mNumeric[39m([32m4.400293529[39m), [33mNumeric[39m([32m0.0[39m)),
  [33mVector[39m([33mNumeric[39m([32m1.38807019[39m), [33mNumeric[39m([32m1.850220317[39m), [33mNumeric[39m([32m0.0[39m)),
  [33mVector[39m([33mNumeric[39m([32m3.06407232[39m), [33mNumeric[39m([32m3.005305973[39m), [33mNumeric[39m([32m0.0[39m)),
  [33mVector[39m([33mNumeric[39m([32m7.627531214[39m), [33mNumeric[39m([32m2.759262235[39m), [33mNumeric[39m([32m1.0[39m)),
  [33mVector[39m([33mNumeric[39m([32m5.332441248[39m), [33mNumeric[39m(

In [5]:
getBestMatchingUnit(mockDataset, mockDataset.head)

[36mres4[39m: [32mVector[39m[[32mNumeric[39m] = [33mVector[39m([33mNumeric[39m([32m2.7810836[39m), [33mNumeric[39m([32m2.550537003[39m), [33mNumeric[39m([32m0.0[39m))

## Training Codebooks Vectors

The first step is to initialize a set of codebook vectors with random features extracted from the training set. Let's now implement a function that does this:

In [6]:
def randomCodebook(train: Dataset) = {
  val numberOfRecords = train.length
  val numberOfFeatures = train.head.length

  (0 until numberOfFeatures).map { index =>
    train(Random.nextInt(numberOfRecords))(index)
  }.toVector
}

defined [32mfunction[39m [36mrandomCodebook[39m

The next step is to adapt our randomly generated codebooks to best summarize or represent the training data. In order to do this, we'll apply the following iterative recipe:
  
  1. The BMU for each training example is found and __only this BMU is updated__.
  2. The difference between the training example and the BMU is calculated. This is the __error__.
  3. Their class values are compared. If it's a match, the error is added to the BMU to bring it closer to the training example. Otherwise, it is subtracted to push it farther away from the training pattern.
  4. The __learning rate__ is used to control the porportion of adjustment to be applied to each BMU. So, for instance, a learning rate of 0.3 means that each BMU will be adjusted only by the 30% of the error between the BMUs and the training examples. 
  5. A decaying learning rate is used to prevent overshooting in the training process as we progress towards convergence. The formula used is:
  
      $$ rate = learningRate * (1 - \frac{currentEpoch}{totalEpochs})$$
      
Let's create a function that performs the process described above:

In [7]:
def trainCodebooks(train: Dataset, numberOfCodebooks: Int, learningRate: Double, numberOfEpochs: Int) = {
  var codebooks = (0 until numberOfCodebooks).map(_ => randomCodebook(train).asInstanceOf[Vector[Numeric]]).toVector

  for (epoch <- 0 until numberOfEpochs) {
    val rate = learningRate * (1.0 - (epoch / numberOfEpochs))

    for (row <- train) {
      val numericRow = row.asInstanceOf[Vector[Numeric]]
      var bestMatchingUnit = getBestMatchingUnit(codebooks, numericRow)
      val bestMatchingUnitIndex = codebooks.indexOf(bestMatchingUnit)

      val rowFeaturesIndices = row.indices.take(row.length - 2)
      rowFeaturesIndices.foreach { i =>
        val error = numericRow(i).value - bestMatchingUnit(i).value
        val updatedValue = Numeric {
          if (bestMatchingUnit.last == numericRow.last) {
            bestMatchingUnit(i).value + error * rate
          } else {
            bestMatchingUnit(i).value - error * rate
          }
        }

        bestMatchingUnit = updatedVector(bestMatchingUnit, updatedValue, i)
      }

      codebooks = updatedVector(codebooks, bestMatchingUnit, bestMatchingUnitIndex)
    }
  }

  codebooks
}

defined [32mfunction[39m [36mtrainCodebooks[39m

Let's test this on our mock dataset:

In [8]:
trainCodebooks(mockDataset, 2, 0.3, 10)

[36mres7[39m: [32mVector[39m[[32mVector[39m[[32mNumeric[39m]] = [33mVector[39m(
  [33mVector[39m([33mNumeric[39m([32m37.76095858243376[39m), [33mNumeric[39m([32m2.362125076[39m), [33mNumeric[39m([32m0.0[39m)),
  [33mVector[39m([33mNumeric[39m([32m-25.08963663414342[39m), [33mNumeric[39m([32m3.005305973[39m), [33mNumeric[39m([32m0.0[39m))
)

## Make Predictions

Let's implement a function that allow us to make predictions using our trained codebooks!

In [9]:
def predictWithCodebooks(codebooks: Vector[Vector[Numeric]], testRow: Vector[Numeric]) = {
  getBestMatchingUnit(codebooks, testRow).last
}

defined [32mfunction[39m [36mpredictWithCodebooks[39m

In [10]:
def learningVectorQuantization(train: Dataset, test: Dataset, parameters: Map[String, Any]) = {
    val numberOfEpochs = parameters("numberOfEpochs").asInstanceOf[Int]
    val numberOfCodebooks = parameters("numberOfCodebooks").asInstanceOf[Int]
    val learningRate = parameters("learningRate").asInstanceOf[Double]
    
  val codebooks = trainCodebooks(train, numberOfCodebooks, learningRate, numberOfEpochs)

  test.map { row =>
    predictWithCodebooks(codebooks, row.asInstanceOf[Vector[Numeric]])
  }
}

defined [32mfunction[39m [36mlearningVectorQuantization[39m

Good.

Let's now use our new algorithm to test it on the Ionosphere dataset.

We'll start by running a baseline model on it and then our freshly implemented Linear Vector Quantization algorithm and then we will compare their performance.

As a baseline for classification we will use a __zero rule classifier__.

In [11]:
val baselineAccuracy = evaluateAlgorithmUsingTrainTestSplit[Numeric](
        data, 
        (train, test, parameters) => zeroRuleClassifier(train, test), 
        Map.empty, 
        accuracy, 
        trainProportion=0.8)

println(s"Zero Rule Algorithm accuracy: $baselineAccuracy")

Zero Rule Algorithm accuracy: 0.5633802816901409


[36mbaselineAccuracy[39m: [32mDouble[39m = [32m0.5633802816901409[39m

In [12]:
val linearVectorQuantizationAccuracy = evaluateAlgorithmUsingTrainTestSplit[Numeric](
    data,
    learningVectorQuantization,
    Map("numberOfEpochs" -> 50, "numberOfCodebooks" -> 20, "learningRate" -> 0.3),
    accuracy,
    trainProportion=0.8)

println(s"Linear Vector Quantization accuracy: $linearVectorQuantizationAccuracy")

Linear Vector Quantization accuracy: 0.8732394366197183


[36mlinearVectorQuantizationAccuracy[39m: [32mDouble[39m = [32m0.8732394366197183[39m

As we can see, there's a noticeable difference in performance between LVQ and our baseline Zero Rule Classifier.

The magic of LVQ resides in its balance of simplicity and speed. For inference it uses a very simple and understandable approach, as happens with k-Nearest Neighbors, but it is much more efficient due to not being as _lazy_ and making some work beforehand determining the codebooks that best describe the data.