Mega-KV is a high-throughput in-memory key-value store (cache) which adopts a novel approach by offloading index data structure and corresponding operations to GPU.
Mega-KV is currently implemented above NVIDIA CUDA APIs and Intel DPDK on Linux, but it can be ported to other GPGPU programming frameworks, such as OpenCL, and operating systems as well.
If you intend to run Mega-KV on AWS p2.xlarge
instances using the AMIs
listed on Deep Learning AMI CUDA 9 Ubuntu Version,
the script in bin/setup.sh
may work for you,
if you have a different environment to set up or
wish to understand better what is going on here,
please follow the USAGE instructions below.
We were able to rent an AWS p2.xlarge
instance at the almost
too cheap to meter AWS spot price
of $0.1301 per hour, many many times cheaper than purchasing the equivalent
CPUs, GPUs, motherboard, RAM, PSU, case/rack and other system components.
This fulfils the standard dramatically lower capex outlay promise of the cloud.
Using htop
and nvidia-smi
combined with the
Mega-KV src
(see USAGE Steps 4-6) we found for the default workload,
that both insert
and search
performance was still CPU-bound, with
GPU utilisation at 1-2% and 4-5% for the insert
and search
phases respectively.
Future work could look into why this is the case and investigate how to attain a higher utilisation of the available GPU resource, such as offloading more of the CPU-bound work onto the GPU itself, other parts of the system such as say the networking stack, or even other AWS instances.
We found that modifying the values NUM_QUEUE_PER_PORT
and MAX_WORKER_NUM
in macros.h from 7 and 12 respectively to 1 and 1 improved
out of the box MegaKV insert phase throughput on our p2.xlarge
instance
from ~0.5 to ~2 Mops, and also improved search phase performance
from ~8 to ~18 Mops, a ~4x and ~2x improvement respectively
for our machine.
This most likely means DPDK-enabled network interfaces are not available on P2 instances, only X1 instances, at the time of writing.
Future work could wait for a P2, P3 or CG1 CUDA-enabled instance to also have DPDK / ENA support, or as Kai Zhang, et al suggested earlier in this README, consider ports to other GPGPU programming frameworks, such as OpenCL, or support for other operating systems as well.
- Jun 1, 2015: megakv-0.1-alpha. Initial release; basic interfaces for an in memory key-value store. This is a demo and is not ready for production use yet. Bugs are expected.
- Nov 1, 2017: For MongoDB Skunkworks -
Updates to run on AWS
p2.xlarge
instances, Intel DPDK v16.11, CUDA 9 and Ubuntu gcc 5.4.0
Mega-KV currently uses a simple self-defined protocol for efficient communication.
- A request packet has a 16-bit magic number in the beginning: 0x1234.
- A request packet has a 16-bit ending mark in the end: 0xFFFF.
- Each GET query in the packet has the format: 16-bit Job Type(0x2), 16-bit Key Length, and the key.
- Each SET query in the packet has the format: 16-bit Job Type(0x3), 16-bit Key Length, 32-bit Value Length, and the key and value.
Anyone can improve or modify this protocol according to the practical needs.
- NIC: Intel 10 Gigabit NIC that is supported by Intel DPDK SDK.
- CPU: Intel CPU that supports the SSE instruction set in Intel DPDK SDK.
- GPU: NVIDIA GPU newer than GTX680. We have conducted experiments on GTX780.
-
Setup network with Intel DPDK. We recommend installing Intel DPDK 1.7.1, which is known to work with Mega-KV. Newer versions of DPDK may have some compiling problems with Mega-KV. Then run
export RTE_SDK=$(PATH_TO_DPDK)
.PATH_TO_DPDK
is the path of the DPDK directory. -
Go to
libgpuhash
directory, editMakefile
to setup correct CUDA installation path. We recommend installing CUDA SDK 6.5, which is known to work with Mega-KV.Some important macros in
gpu_hash.h
:- MEM_P: 2^MEM_P bytes GPU device memory space for hash table.
- HASH_CUCKOO/HASH_2CHOICE: cuckoo hash or two choice hash.
-
Run
make
. This should compile the CUDA hash table library, including cuckoo hash or two choice hash. Macros can be set ingpu_hash.h
. This will generatelibgpuhash.a
in lib directory, which is used by Mega-KV as the GPU hash table library. -
Go to
src
directory, editMakefile
to setup correct CUDA installation path. Setup other macros inMakefile
andmacros.h
for test or production use. Edit the config variables inmega.c
for different GPUs or configurations.In the
Makefile
, a macro is disabled with the_0
suffix. You can enable the macro by removing the suffix.Some important macros in
Makefile
:- PREFETCH_BATCH: enable batch prefetching to improve performance.
- PRELOAD: preload key/value items into Mega-KV before test.
- LOCAL_TEST: run Mega-KV locally, just for testing.
- SIGNATURE: enable a simple signature algorithm instead the one used for testing. You can implement a new signature algorithm under this macro.
Some important macros in
macros.h
:- CPU_FREQUENCY_US: set the CPU frequency for the timers.
- MEM_LIMIT: set the memory limit to avoid using virtual memory.
- NUM_QUEUE_PER_PORT: number of queues per NIC port. Each queue will have one receiver and one sender.
-
Edit the CPU core mappings in
mega.c
. Three functions for launching Receivers, Senders, and the Scheduler:mega_launch_receivers
,mega_launch_senders
, andmega_launch_scheduler
. You can editcontext->core_id
assignment to change the core mapping for these threads.To maximize the resource utilization and system utilization, Hyper-threading is recommended. The Nth Receiver and the Nth Sender can be assignment to two virtual cores that locate on the same physical core. Please note that one physical core should be reserved for the Scheduler so that it will not be affected by other threads.
Corresponding DPDK parameters may also need to be modified in line 527.
-
Run
make
. This should compile Mega-KV. Then Mega-KV can be run with./build/megakv
The above currently defaults to
insert
jobs for about a minute, prints========================== Hash table has been loaded ==========================
and then switches tosearch
jobs, periodically reporting statistics to the terminal. -
Benchmark.
Go to
benchmark
directory. This is also based on Intel DPDK 1.7.1. Modify macros inbenchmark.h
, and modify CPU core mappings between the line 792 and the line 815.Run
make
, then runsudo ./build/benchmark
This benchmark currently only support for 8 byte key and 8 byte value generation. NOTE: LOAD_FACTOR, PRELOAD_CNT, and TOTAL_CNT should be the same with Mega-KV if Mega-KV preloads key-value items locally for testing.
Some important macros in
benchmark.h
:- DIS_ZIPF/DIS_UNIFORM: key popularity distribution.
- WORKLOAD_ID: 100% GET or 95% GET
It should be possible to run the following Linux system utility programs to identify the system's performance bottlenecks:
- CPU/RAM bottlenecks -
top
orhtop
- GPU bottlenecks -
nvidia-smi
There may also be a need for additional specific tools to investigate performance bottlenecks, for a brief overview please see this AskUbuntu.
- Do not support UPDATE command yet.
- Do not support other fields in memcached, such as expiration time. However, they are easy to be implemented and have been planed in the roadmap.
- LOCAL_TEST may not be accurate. Because the overhead of key generation is very huge, especially with zipf key generation.
Go to http://kay21s.github.io/megakv for documentation and other
development notices. You can contact the author at kay21s [AT] gmail [DOT] com
.
This software is not supported by MongoDB, Inc. under any of their commercial support subscriptions or otherwise. Any usage of Mega-KV is at your own risk. Bug reports, feature requests and questions can be posted in the Issues section on GitHub.