Skip to content

A simple Binary Search Tree implementation for the Advanced Programming 2019-2020 course @ SISSA.

License

Notifications You must be signed in to change notification settings

matteosecli/AP_BST

Repository files navigation

Advanced Programming - Binary Search Tree

Build Status Documentation Status GitHub issues GitHub forks GitHub stars GitHub license

A simple Binary Search Tree implementation for the Advanced Programming 2019-2020 course @ SISSA.

Authors:

  • Angela Carraro <angela_carraro@hotmail.it>
  • Matteo Seclì <secli.matteo@gmail.com>

Documentation

You can read a Doxygen-generated documentation at ap-bst.readthedocs.io.

How to Compile & Run

Using the included Makefile

  • To compile the source files and run a sample program use

    make
    ./AP_BST

    A debug build can be generated by specifying the macro __DEBUG_AP_BST. Run the Makefile via make EXTRA_CXXFLAGS=-D__DEBUG_AP_BST.

  • To generate the documentation (Doxygen is needed) run

    make doc

    The documentation is built into ./doc/_build/; to open the HTML documentation, open the file ./doc/_build/html/index.html. Notice that the HTML documentation is already automatically built on each commit at ap-bst.readthedocs.io.

  • To compile and run the unit tests, written in Catch2, run

    make test
    cd test
    ./test

    The test binary supports a series of options, like e.g. the -s switch that prints out the successful tests as well. A full list of the available options can be obtained via

    ./test --help
  • To compile and run the benchmark binary use

    make benchmark
    cd benchmark
    ./benchmark

    The benchmark binary can take four optional arguments:

    ./benchmark nStart nIncr nEnd nSampl

    More info on the optional arguments in the benchmarks section.

Using qmake (no Qt needed)

You can automatically generate a Makefile through qmake; Makefiles generated in this way support out-of-source builds as well.

The Makefile for the sample program and for building the documentation can be generated by running

qmake

in a terminal; then, follow the steps outlined in the previous section. A debug build can be generated via qmake CONFIG+=debug.

The Makefiles for the test and benchmark targets can be generated by running the qmake command inside their respective folders, and then a simple make will suffice.

Code Design

Structure

The code is structured in three differerent files contained in the folder src:

  • Node.hpp: implements the template class Node, that is used to represent a node in our binary search tree;
  • Iterator.hpp: implements the interal template class __iterator, that serves as a basis for the iterator and const_iterator members of the binary search tree class;
  • BST.hpp: the file contains the actual implementation of a binary search tree class, called bst.

The end user does not need to use Node.hpp and Iterator.hpp; he can directly include BST.hpp.

These thee classes are also separated in two different namespaces:

  • APutils contains Node and __iterator;
  • APbst contains bst.

In this way, the user is not directly exposed to neither Node nor __iterator when including BST.hpp.

Examples of the usage of each class is provided in the file main.cpp, compiled into the binary AP_BST.

Tests

We provide a suite of tests, that cover the majority of the features of our three classes, in the folder test. The tests are written by using the industry-standard Catch2 library, that allow for a meaningful and easy-to-read syntax in C++ tests; other alternatives could have been e.g. Google's library. A main() function is automatically provided by Catch2 in the file 000-CatchMain.cpp; the other files caontains the tests themselves. All the files are compiled into a single binary, test.

Continuous Integration

We extensively checked the quality and standard-compliance of our code by compiling it on different OS's and with different compilers. We mainly worked on the code on MacOS 10.14.6 (Apple Clang 10.0.1, GNU GCC 7.4.0 and 8.3.0 via Homebrew and Intel C++ Compiler 19.0.5) and Ubuntu 18.04 (Clang 6.0.0 and GNU GCC 7.4.0), but we also set up a continuous integration on Travis.

On Travis, we build on Ubuntu 16.04 VMs with GNU GCC 6.5.0, 7.4.0, 8.3.0 and Clang 7.0.0, 8.0.1, 9.0.1. For each build configuration, we

  • compile main.cpp and the tests, in debug mode and with all the error flags activated;
  • run valgrind on the AP_BST binary (except on recent Clang builds due to internal errors of the system libraries);
  • run test -s in order to execute and print all the tests.

We've made sure not to have any compiler warnings, failed tests or memory leaks in all our builds.

A further continuous integration mechanism, this time just for the documentation, has been set-up on Read the Docs.

Benchmarks

We benchmarked our binary search tree, both balanced and unbalanced, against the STL map std::map (which is the closest relative to our bst class). In all cases, we used an int key (and an int value).

We randomly looked for existing keys in each of the containers, for different tree sizes ranging from nStart to nEnd in steps of nIncr. As measuring the lookup time of a single key is dominated by systematic errors out of our control, we opted to measure nSampl at a time until exhausting all the available keys in the container, and then we just divided the result by nSampl (though this method tends to underestimate the error bars). For high enough nSampl, different values of nSampl itself should yield compatible values of the lookup times per key; we've found that nSampl = 50 is a good tradeoff value and we used it thoughout all of our plots.

We stress that this is just one way of measuring the execution times; different methods yield of course different results, so performance comparisons can only be done by using the same methodology.

We run our benchmarks on a desktop workstation equipped with an Intel i7-3770, 16 GB of memory and RHEL 7 with GNU GCC 8.3.0.

The results are pictured below (lower is better).

benchmark_RHEL

As expected, the balanced tree has better lookup times than the unbalanced one; what's surprising, though, is that our balanced tree does better than the standard map and that all the three lines have an "elbow" at roughly n = 20000, after which there seems to be a monotonic growth of lookup times for each container.

Another surprising feature is that we don't see a O(log N) behavior of the lookup times, as we instead expect. There is actually what could be a small logarithmic tail on the left, but for large n the lookup times just increase linearly.

Extra: Benchmarks on Our Portable Machines

On our portable Ubuntu machine, with an Intel i5-3317U, 4GB of memory, GNU GCC 7.3.0 and Ubuntu 18.04, we get some different results.

benchmark_Ubuntu

The lookup times increase linearly with n but we don't see anymore the strange "elbow" seen on the previous machine. Again, we don't see a clear O(log N) behavior; but indeed, as hinted before, if we look at lower n's we see what it seems to be a logarithmic scaling (picture below, data points and errorbars omitted for clarity).

benchmark_Ubuntu_low

On our portable MacOS machine, with an Intel i7-7820HQ, 16GB of memory, Apple's Clang 10.0.1 and MacOS 10.14.6, the benchmarks look quite different.

benchmark_Mac

First of all, they're extremely noisy; the balanced tree is still performing better than the unbalanced one, but the fluctuations are quite large (as you can see both in the oscillating behavior and in the large error bars). The std::map, instead, performs extremely better; the lookup times are O(1). We think that's because the std::map provided by Apple's system libraries doesn't use a tree-like implementation (though we've not looked into that).

TL;DR

We got benchmark results that strongly depend on operating system, compiler and vendor of the STL libraries.

About

A simple Binary Search Tree implementation for the Advanced Programming 2019-2020 course @ SISSA.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages