A repo to allow validation of performance results in the knor paper and provide a fast, scalable k-means implementation.
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Build Status



The k-means NUMA Optimized Routine library or knor is a highly optimized and fast library for computing k-means in parallel with accelerations for Non-Uniform Memory Access (NUMA) architectures.

knor can perform at 10-100 times the speed of popular packages like Spark's MLlib, Dato (GraphLab) and H2O.

knor can operate:

  1. On a single machine with all data in-memory -- knori.
  2. In a cluster in distributed memory -- knord.
  3. On a single machine with some data in-memory and the rest on disk (Semi-External Memory) -- knors.

These modules are incarnations of algorithms in our publication which will appear in the annuls of HPDC 2017.

By default knor can use a scalable adaption of Elkan's algorithm that can drastically reduce the number of distance computations required for k-means without the memory bloat traditionally associated with Elkan's algorithm.


We expose the knori to users through Python. We support Linux and Mac OSX.

For instructions on how to install the Python package please see the Python repository.

In most cases all that is necessary is:

pip install knor


We expose the knori to users through R. We support Linux, Mac OSX, and Windows.

For instructions on how to install the R package please see the R repository.

In most cases all that is necessary is:


knor's backbone

knor relies on the following:

  • p-threads for multithreading.
  • numa for NUMA optimization.
  • FlashGraph for a semi-external memory vertex-centric interface.

System Requirements

  • Linux or Mac OS 10.*
  • At least 4 (GB) of RAM
  • Administrative privileges


The following is Tested on Ubuntu 14.04 - 16.04. We require a fairly modern compiler to take advantage of compile time optimizations:

Native Auto-Install

We assume users have git already installed.

git clone --recursive https://github.com/flashxio/knor.git
cd knor

Docker -- Run in a Container

Users can also choose to use a Docker container to run knor.

Mac Docker Users NOTE:

  • You must use the -O flag for knori and knord to use OpenMP due to a lack of hardware support for our optimizations.
  • knors for Mac is unsupported due to the lack of support for low-level interfaces.

Quick docker image (Stable release)

Users can get a version that is stable, but possibly not up to date as follows:

docker run -i -t flashxio/knor

If you already have the knor docker image then updating to the newest stable release can be done as follows:

docker pull flashxio/knor

Manual build (Bleeding edge release)

Users may also obtain a bleeding-edge up to date stable docker image by building the image themselves. We assume your OS has curl and docker already installed.

curl -O https://raw.githubusercontent.com/flashxio/knor/master/Dockerfile
docker build -t knor .
docker run -i -t knor

Usage (Running knor)

Assume the following:

  • k is the number of clusters
  • dim is the dimensionality of the features
  • nsamples is the number of samples
  • $KNOR_HOME is the directory created when knor was cloned from Github.
  • nproc is the number of MPI processes to use for knord.
  • nthread is the number of threads per process. For knori this is the total number of threads of execution. NOTE: For knord the total number of threads of executions is nproc * nthread.

To run modules from the $KNOR_HOME directory, you may do the following:

Test Data

We maintain a very small dataset for testing: $KNOR_HOME/test-data/matrix_r50_c5_rrw.bin.

This dataset has:

  • nsamples = 50
  • dim = 5

NOTE: knor will fail if you provide more units of parallelism than the nsamples in the dataset. So use small -t and -T values like 2


For a help message and to see valid flags:


An example of how to process with file datafile.bin:

exec/knori datafile.bin nsamples dim k \
    -t random \
    -T 2 \
    -i 10 \
    -o outdir

For comparison (slower speed) run our algorithm using OpenMP as follows:

exec/knori datafile.bin nsamples dim k \
    -t random \
    -T nthread \
    -i 10 \
    -o outdirOMP -O

It is also possible to disable computataion pruning i.e., using Minimal triangle inequality algorithm by using the -P flag.


For a help message and to see valid flags:


An example of how to process with file datafile.bin:

mpirun.mpich -n nproc exec/knord datafile nsamples dim k \
    -t random -i 10 -T nthread -o outdir

See the mpirun list of flags like to allow your processes to distribute correctly across a cluster. Flags of note are:

  • --map-by ppr:n:node, determining how many processes are mapped to a node.
  • --map-by ppr:n:socket, determining how processes per socket.


For semi-external memory use, users must configure the underlying user-space filesystem SAFS as described here

For a help message:


An example of how to process with file matrix_r50_c5_rrw.adj :

exec/knors libsem/FlashX/flash-graph/conf/run_graph.txt test-data/matrix_r50_c5_rrw.adj 50 5 8

The following flags can further aid to accelerate knors performance.

-r: size of the row cache in gb

Specifies another layer within the memory hierarchy to allow for caching at the granularity of rows of data rather than a page like traditional caches. In practice a small cache of 0.5GB-1GB can accelerate the performance of 10 Million+ samples.

-I: row cache update interval

The row cache lazily updates users must specify how often to refresh the cache in iterations of the algorithm. Note that as iterations proceed the cache will automatically update less frequently as computation stabilizes.

Output file from knor

The output file obtained by using the -o flag within knor produces a YAML file that is easily readable by any YAML reader.

For instance, within Python we can use PyYAML which is installable via by pip install pyyaml to read the file as follows:

from yaml import load
with open("kmeans_t.yml", "r") as f:
    kms = load(f) # Returns a python dictionary

> ['niter', 'dim', 'nsamples', 'cluster', 'k', 'centroids', 'size'

The fields are as follows:

  • niter: The number of iterations performed.
  • dim: The number of attributes within the dataset.
  • nsamples: The number of samples within the dataset.
  • cluster: To which cluster, [0-k), each sample is assigned.
  • k: The number of clusters requested.
  • centroids: The k x dim centroids matrix with each row representing a cluster's centroid.
  • size: The number of samples placed within each cluster.

Data format

We use two different formats dependent upon if you operate fully in-memory or use semi-external memory.

knori and knord

In-memory and distributed memory routines (knori, knord) use row-major binary matrices of type double i.e., 16 Bytes per entry. For example the matrix:

1 2 3 4
5 6 7 8

Must be written out in row-major form i.e., the bytes would be organized such that on disk they would look like:

1 2 3 4 5 6 7 8

NOTE: The row-major data on disk will be non-human readable since it is in binary format.


Semi-external memory (knors) data is stored in row-major format with a leading 4KB FlashGraph header. You can convert binary or text data to knors format using the convert_matrix utility.

Data Conversion

We provide a single threaded data conversion script to convert data from one format to the other.

Plain text (space separated values) to row-major binary

We consider a file named example.txt in our knor root directory, $KNOR_HOME with the following content:

1 2 3 4
5 6 7 8

To convert this file from text readable (tsv) format one would do the following:

utils/convert_matrix # gives help on how to use the converter

> usage: ./convert_matrix in_filename in_format [text/knori/knord/knors]\
    out_filename out_format[text/knori/knord/knors] nsamples dim

utils/convert_matrix example.txt \
    text \
    2 4

Formatting data directly from other languages

For knori and knord the format can be produced by any binary writing utility in any language. **NOTE: ** knors is a custom format that should be generated using the convert_matrix utility. Below are a couple examples of writing knori and knord format from some higher-level languages.


The following is an example of how to take an R data frame and write out to raw binary row-major format:

# `mtcars` is a data frame loaded by default into R

mtcars # Take a look at the data

nbytes.per.elem = 8 # NOTE: Always true
mtvec <- as.vector(t(mtcars)) # Turn dataframe into matrix (`t`) then vector
writeBin(mtvec, "yourfilename.bin", size=nbytes.per.elem)


The following is an example of how to take a numpy ndarray and write it out to raw binary row-major format:

import numpy as np
np.random.seed(1) # For consistency
dat = np.random.rand(16, 8)
dat.shape # We have a 16 x 8 matrix

type(dat[0][0]) # Check type is double i.e. 8 bytes = 64 bits
# dat = dat.astype(np.float64) # if data type is wrong


Updating knor

To update knor simply run the update script which pulls directly from Github:


Points of note

  • Different knor routines may not produce the same result dependent upon which initialization routine is used, but every routine will give the same answer over multiple runs.
  • Found a bug? Please create an issue on Github.


Mhembere, D., Zheng, D., Vogelstein, J. T., Priebe, C. E., & Burns, R. (2017). knor: A NUMA-optimized In-memory, Distributed and Semi-external-memory k-means Library. arXiv preprint arXiv:1606.08905.