KDB

A hashtable that seamlessly expands itself without blocking thanks to "incremental rehash" algorithm.

This is the same logic that Redis applies for its internal backbone data structure.

Build

make
./benchmark <data_structure> <data_file>

Supported data structures are adhtable, dhtable and list. This benchmark program will output the operation measurements to report_set.txt and report_get.txt.

Test

cd tests/
./test.sh

valgrind is required for memory leak checking.

Architecture

Test data

A text file contains over 466K English words, with numbers and symbols. words.txt

Benchmark

The measurement unit is CPU clock, in my case CLOCKS_PER_SEC == 1000000, which means 1 CPU clock is equivalent with 1 microsecond.

Advanced Dynamic Hash Table (adhtable - with incremental migration)

Most of the time, the complexity seems to be O(1), except few spikes that happens occasionally when the table expands itself. Thus, this is called constant armotized time complexity. The above table used djb2 hash algorithm. I tried sdbm and the result is not that much different

Here is the detailed figures, djb2 seems to be a little bit better than sdbm

avg_set_djb2	avg_set_sdbm	avg_get_djb2	avg_get_sdbm
1.57724450274461	1.58685331292828	1.18986563098139	1.20032536635866

Dynamic Hash Table (dhtable - without incremental migration)

The pattern of expansion is pretty much the same. GET is very consistent, but SET's performance is signficantly degraded. Let's compare each of them.

adhtable is 3 times faster than dhtable.

Linked-list as a "lookup table" (duh)

This is to see how stupid it is to use the wrong data structure for the job.

SET operation is surely O(1), but GET is painfully slow with O(n). Now, we compare it with the glorious adhtable

Compared to linked-list, adhtable seems to be almost linear.

To-do

Explain about hash table, load factor, rehash, incremental migration...
Explain and improve the architecture diagram
Explain the project structure

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
etc		etc
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
adhtable.c		adhtable.c
adhtable.h		adhtable.h
benchmark.c		benchmark.c
dhtable.c		dhtable.c
dhtable.h		dhtable.h
entry.c		entry.c
entry.h		entry.h
fhtable.c		fhtable.c
fhtable.h		fhtable.h
hash.c		hash.c
hash.h		hash.h
helper.c		helper.c
helper.h		helper.h
list.c		list.c
list.h		list.h
list_string_entry.c		list_string_entry.c
list_string_entry.h		list_string_entry.h
string.c		string.c
string.h		string.h
string_entry.c		string_entry.c
string_entry.h		string_entry.h

hoang-khoi/kdb

Folders and files

Latest commit

History

Repository files navigation

KDB

Build

Test

Architecture

Test data

Benchmark

Advanced Dynamic Hash Table (adhtable - with incremental migration)

Dynamic Hash Table (dhtable - without incremental migration)

Linked-list as a "lookup table" (duh)

Other reading

To-do

About

Topics

Resources

Stars

Watchers

Forks

Languages