Skip to content

Float to Integer Mapping

nickcdryan edited this page Mar 30, 2018 · 9 revisions

Introduction

This page discusses why Nearist hardware uses integer data for vector comparisons instead of floating point data. We show that mapping floating point data to integer values does not significantly impact the accuracy as long as an appropriate mapping function is chosen. We also provide general recommendations about how to choose a mapping function to convert your floating point data into integer data.

Why Use Integers?

To maximize the number of parallel calculators in Nearist hardware, we make a number of simplifications to the k-NN search which allow us to improve acceleration with minimal or no loss in accuracy. One important simplification is in the data precision: by using integer values instead of floating point values, Nearist hardware is able to perform vector comparisons at a much faster rate. Perhaps surprisingly, this occurs with minimal or no loss in accuracy. In our experiments, we have found that mapping 32-bit floating point feature values to lower precision 16-bit or even 8-bit integers can actually serve to remove noise from the features and improve accuracy.

For example, in this experiment, we compare the performance of floating point MNIST vectors obtained from a convolutional neural net against the same vectors that have been transformed to 8-bit integers. We find that across a number of different distance metrics the accuracy is nearly identical despite the loss in input data precision:

Benchmark 32-bit float 8-bit int
MNIST 98.5% 98.5%
word2vec analogies 73.6% 73.7%
Reuters Text 93.6% 93.0%

As an aside, you might ask: if there's no accuracy loss when using lower precision data values, why don't we always use integer values for k-NN, regardless of the hardware it's running on? On traditional CPUs and GPUs, floating point operations are so common and important that companies like Intel and Nvidia have put a lot of design effort into accelerating linear algebra routines on floating point data. An interesting consequence of this is that working with low-precision integers on CPUs and GPUs paradoxically provides no speed improvement.

Of course, when running your specific task on Nearist hardware it's important to validate that accuracy is not significantly impacted by the transformation to integer values. If accuracy is significantly impacted, you should consider either:

  • Increasing the integer precision (from, say, 8-bit integers to 16-bit or 32-bit)
  • Using a more appropriate mapping function

What Mapping Function Should Be Used to Transform Floating Point Data to Integer Data?

While it's true that we can expect good accuracy from integer valued data, this is only holds when we are careful about how we map our data from float values to lower precision integer values.

Since we must transform high precision data to a discrete range of integer values, we want to ensure during transformation that our data gets evenly and representatively distributed over this new, lower precision datatype; otherwise, we run the risk of unnecessarily losing important information in our dataset. We must first understand what our dataset looks like before choosing an appropriate mapping function in order to ensure minimal loss of information.

Which Transform Should I Use?

The linear transformation scales data linearly to a new range by dividing each value in the dataset by the maximum value in the data. This transformation effectively maps the data onto a new range and does not affect or alter the distribution of data.

Here is the equation for a linear mapping function that transforms data to the (0, 255) range:

 

In this equation, min(x) and max(x) are minimum and maximum values of x, respectively, and ‘round’ rounds the value to the nearest integer. Note that this mapping must be done for each component of your vector; if the different components have different ranges or distributions they should be treated separately. Here is what the linear mapping function looks like:

 

The linear mapping equation distributes the available precision evenly over the range of values. If the floating point data is already widely distributed across it's full range, then the linear mapping equation is an appropriate choice. For example, on Nearist's vectorized MNIST dataset, although the data distribution is somewhat skewed the linear transformation preserves enough information to make the classification performance almost identical:

Metric Algorithm Accuracy
Floating Point, L1 KNN, k=10 98.59%
Linear Transformed uint8, L1 KNN, k=10 98.58%
In some cases, however, the data values are not evenly distributed, and a non-linear mapping function may be more appropriate.

 

Logarithmic Mapping

The logarithmic transformation scales data logarithmically and maps the log-transformed data onto a new range.

Here is the equation for a logarithmic mapping function that transforms data to the (0, 255) range:

The logarithmic transformation is particularly useful for reducing skew in highly skewed data distributions. For example, the dataset below is highly skewed (drawn from a Pareto distribution with alpha=2). A linear mapping would result in the majority of our values being mapped to just a few integer values, so instead we use the logarithmic mapping function. This helps to reduce skew and more evenly and representatively distribute the densest portion of our distribution across the new discrete range of 256 integer values:

 

Nearist's vectorized MNIST dataset data distribution is somewhat skewed; the logarithmic transformation not only preserves the information from the floating point data, it actually improves classification performance:

Metric Algorithm Accuracy
Floating Point, L1 KNN, k=10 98.59%
Log Transformed uint8, L1 KNN, k=10 98.63%
 

Logistic Mapping

The logistic transformation scales data via the logistic function and maps the transformed data onto a new range.

Here is the equation for a logistic mapping function that transforms data to the (0, 255) range:

The logistic transformation is a good choice for data that is normally distributed. The logistic function distributes the densest central portion of the below distribution across a wider range, meaning fatter distribution tails and more precision for the values in more common ranges.

Nearist's vectorized word2vec analogies dataset data distribution is somewhat normal; the logistic transformation not only preserves the information from the floating point data, it actually improves classification performance:

Metric Algorithm Accuracy
Floating Point, Cosine KNN, k=10 73.59%
Logistic Transformed uint8, Cosine KNN, k=10 73.66%
 

Summary

We have show that converting floating point data to integer data for use on Nearist hardware does not significantly impact the accuracy of similarity searches provided the appropriate mapping function is used. Of course, there are plenty of transformation functions that you can use to help ensure your data is evenly distributed across a discrete integer range, but the ones discussed in this post should provide a useful starting point. In summary, before transforming your data you should look at the distribution of values. Then, for:

  • Skewed data, try using something similar to the logarithmic mapping function.
  • Uniformly distributed data, try using the linear mapping function.
  • Normally distributed data, try using the logistic mapping function.

As a final safety check, when running your similarity search be sure to benchmark the performance using your transformed integer values against the performance when using the original floating point data. Above all other considerations, this will tell you whether you need to adjust your mapping function or use higher precision integer values.