An end-to-end toy database management system code named BuzzDB. All the code are written in C++17.
Here's a list of software dependencies that you will need to install in the OS.
apt-get install -yqq cmake
apt-get install -yqq llvm
apt-get install -yqq clang-tidy
apt-get install -yqq clang
apt-get install -yqq clang-format
apt-get install -yqq valgrind
apt-get install -yqq libgl-dev libglu-dev libglib2.0-dev libsm-dev libxrender-dev libfontconfig1-dev libxext-dev
In this lab, you will implement a B+-Tree index for your database using the header src/include/index/btree.h
.
The B+-Tree must support the following operations:
lookup
Returns a value or indicates that the key was not found.insert
Inserts a new key-value pair into the tree.erase
Deletes a specified key. You may simplify the logic by accepting under full pages.
You will use page ids and the buffer manager instead of pointers to resolve child nodes in the tree. You can assume that the buffer_manager
methods fix_page
and unfix_page
work as described in the lecture. We added a fake buffer manager that allows you to test your code without I/O, so you can focus only on implementing the B+-Tree.
The index should be implemented as C++ template that accepts the parameters key type, value type, comparator and page size.
Note that this task does not use a dynamic page size but compile-time constants.
You therefore have to implement the B+-Tree primarily in the header src/include/index/btree.h
but you are able to compute a more convenient node layout at compile-time.
You can test your implementation using the tests in test/unit/index/btree_test.cc
.
The tests will instantiate your B+-Tree as BTree<uint64_t, uint64_t, std::less<uint64_t>, 1024>
.
Enter BuzzDB's directory and run
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make
To run the entire test suite, use:
ctest
ctest has a flag option to emit verbose output. Please refer to ctest manual.
We have provided all the test cases for this lab. Gradescope will only test your code against these test-cases.
Similar to labs 1 and 2, your implementation will be checked for memory leaks. You can check for memory leaks using valgrind.
ctest -V -R btree_test_valgrind
You will implement the B+-Tree in the src/include/index/btree.h
file. You are provided with three functions in the skeleton code for the three operations introduced above.
std::optional<ValueT> lookup(const KeyT &key)
- looks up and returns the value of the node with key specified by the parameterkey
.void insert(const KeyT &key, const ValueT &value)
- Inserts a new key-value pair specified by the parameterskey
andvalue
into the tree.void erase(const KeyT &key)
- Deletes the key specified by the parameterkey
.
In addition to these functions, we have also provided the skeleton code for structures that represent different nodes in the B+-Tree:
Node
- This is the base structure for all the nodes in the B+-Tree.InnerNode
- This structure should be used to represent the inner nodes of the tree.LeafNode
- This structure should be used to represent the leaf nodes of the tree.
Note that InnerNode
and LeafNode
inherit members from the base Node
object.
Following are some of the important members from InnerNode
and LeafNode
that you will need to implement:
static constexpr uint32_t kCapacity
- The capacity of a node.std::pair<uint32_t, bool> lower_bound(const KeyT &key)
- Get the index of the first key that is not less than than a provided key.void insert(const KeyT &key, uint64_t split_page)
- Insert a key.KeyT split(std::byte* buffer)
- Split the node.void erase(const KeyT &key)
- Erase a key (Used only with leaf nodes).std::vector<KeyT> get_key_vector()
- Returns the keys (Used only while testing your implementation).std::vector<uint64_t> get_child_vector()
- Returns the child page ids (Used only while testing your implementation).
This is a rough outline of the steps you need to follow to implement the above methods.
-
Lookup
- Locate the leaf node corresponding to the key.
- Start with the root node.
- Is the current node a leaf?
- if yes, return the current page.
- if no, find next node to traverse (hint: use
inner_node->lower_bound
method). - repeat b until leaf node is found.
- Once leaf node is found, find first entry ≥ search key (use binary_search).
- If such entry is found, return it, else return no key is found.
- Locate the leaf node corresponding to the key.
-
Insert
- Create a new root if needed(code snippet provided).
- Locate the appropriate leaf page (same as step (i) in
lookup
). - Starting with the leaf node, execute the following actions.
- Is there free space on the node(leaf/inner)?
- If yes, insert entry and stop (use
node->insert
).
- If yes, insert entry and stop (use
- Split the node(leaf/inner) into two (use
node->split
method). - Insert key into the appropriate node(leaf/inner).
- Insert return value of
node->split
as separator into parent.- If the parent overflows, follow steps 4 through 7 on the parent node until node with free space is found.
Note: If any node in the process does not have a parent, create new root and update its children.
-
Erase
- Lookup the appropriate leaf page (same as step 1 in
lookup
). - Remove the entry from the current page (use
leaf_node->erase
method). (note thaterase
becomes lot easier in our case because we allow under full pages)
- Lookup the appropriate leaf page (same as step 1 in
-
Capacity - Here, you will implement the logic to compute the capacity of a given node.