Measuring the performance of KD Trees for solving the nearest neighbor problem

Authors: Katerina Bosko, Thean Lim, Mubarak Nsamba, Amit Pai

Final Project for CS5800 Algorithms class at the Northeastern University

My contribution

I proposed the project idea and was responsible for implementing the testing part using recoloring approach, including measuring the performance.

Project Description

The Nearest Neighbor Problem is the problem of finding the point in a dataset that is closest to a given query point. The generalized k-Nearest Neighbors algorithm (KNN) that finds k-number of most similar points is very popular in Machine Learning for classification tasks. There are many algorithms for solving the nearest neighbor problem. In this project, we explored one of them - the KD Tree algorithm. Developed by Jon Louis Bentley in 1975, KD Tree has been widely adopted due to its simplicity and efficiency and is used in different fields Clustering, Recommendation, Computer Vision, and many others.

The project’s scope:

To implement KD Tree algorithm
To test the implementation on an image dataset
To measure the performance of KD Tree versus the naive approach

Testing KD Tree implementation

The KD-Tree implementation was tested using the image recoloring approach. This involves mapping the colors of an input image to a specified color palette, which can be formulated as finding the nearest neighbor for each pixel's color from the set of colors in the palette. The output of the algorithm is a recolored version of the input image.

To implement the recoloring approach, we need

color palettes
We created seven color palettes that contain 128, 256, 512, 1024, 2048, 4096, and 8192 colors. Each bigger color palette contains a subset of smaller color palettes. The colors were selected from the whole universe of possible RGB colors, which is 256x256x256 or about 16,78 million colors.
input image
As an input image, a squared color wheel was chosen because it contains 68,850 unique colors equally distributed across RGB color channels. Choosing an image with a very high number of unique colors ensures that the algorithm will calculate a high number of unique distances, which is desired in order to see the differences in performance.

Nearest neighbor problem formulation

In this formulation of the nearest neighbor problem as a recoloring problem, the “nearest” is judged by the Euclidean distances between colors. For instance, if we have two colors - paletteColor: [13, 57, 71] and imgColor [194, 191, 186], the distance between them is:

KD Tree implementation benchmarks

We compare the performance of our KD-Tree implementation against two other benchmarks:

the Naive algorithm: finds the minimum distance for each pixel’s color by brute force iteration through each color in a color palette
Scipy algorithm: the KD-Tree implementation from the scipy library in Python, which is one of the primary libraries for scientific computing.

All algorithms were implemented in Python and are called converters that extend the BaseConverter class:

NaiveConverter
ScipyKDTreeConverter
CustomKDTreeConverter

A script was written in Python to automate the task of measuring the performance of all three converters, each running on seven different color palettes.

Operationalization of performance

The algorithm’s performance was measured by timing how long it took to find the nearest neighbor for each pixel’s color from the color palette. Given that we have 7 color palettes and 3 algorithms, this results in 21 different measurements. perf_counter() was used as a timing function before starting the search and immediately after the search was finished. The final time elapsed is the difference between starting and ending time.

Results

Image recoloring outputs for different color palettes:

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
algos		algos
assets		assets
color_palette		color_palette
input_images		input_images
output_images		output_images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
results.csv		results.csv
script.py		script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

algos

algos

assets

assets

color_palette

color_palette

input_images

input_images

output_images

output_images

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

main.py

main.py

requirements.txt

requirements.txt

results.csv

results.csv

script.py

script.py

Repository files navigation

Measuring the performance of KD Trees for solving the nearest neighbor problem

My contribution

Project Description

Testing KD Tree implementation

Nearest neighbor problem formulation

KD Tree implementation benchmarks

Operationalization of performance

Results

About

Releases

Packages

Languages

License

k-bosko/kd_tree

Folders and files

Latest commit

History

Repository files navigation

Measuring the performance of KD Trees for solving the nearest neighbor problem

My contribution

Project Description

Testing KD Tree implementation

Nearest neighbor problem formulation

KD Tree implementation benchmarks

Operationalization of performance

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Languages