memcp: Made for Big Data

A High-Performance, Open-Source Columnar In-Memory Database

Protocol compatible drop-in replacement for MySQL

Introduction

In modern server and mainframe hardware, the memory bandwidth between CPU and RAM has become the new bottleneck. In-RAM compression will be a mayor contribution towards solving that issue.

What is memcp?

memcp is an open-source, high-performance, columnar in-memory database that can handle both OLAP and OLTP workloads. It provides an alternative to proprietary analytical databases and aims to bring the benefits of columnar storage to the open-source world.

memcp is written in Golang and is designed to be portable and extensible, allowing developers to embed the database into their applications with ease. It is also designed with a focus on scalability and performance, making it a suitable choice for distributed applications.

Features

fast: MemCP is built with parallelization in mind. The parallelization pattern is made for minimal overhead.
efficient: The average compression ratio is 1:5 (80% memory saving) compared to MySQL/MariaDB
modern: MemCP is built for modern hardware with caches, NUMA memory, multicore CPUs, NVMe SSDs
versatile: Use it in big mainframes to gain analytical performance, use it in embedded systems to conserve flash lifetime
Columnar storage: Stores data column-wise instead of row-wise, which allows for better compression, faster query execution, and more efficient use of memory.
In-memory database: Stores all data in memory, which allows for extremely fast query execution.
Build fast REST APIs directly in the database (they are faster because there is no network connection / SQL layer in between)
OLAP and OLTP support: Can handle both online analytical processing (OLAP) and online transaction processing (OLTP) workloads.
Compression: Lots of compression formats are supported like bit-packing and dictionary encoding
Scalability: Designed to scale on a single node with huge NUMA memory
Adjustable persistency: Decide whether you want to persist a table or not or to just keep snapshots of a period of time

Project Website: memcp.org

Screenshots

Getting Started 🚶

Using Docker

# first time: build the image
docker build . -t memcp

# run with the interactive scheme shell for debugging and development
mkdir data
docker run -v data:/data -it -p 4321:4321 -p 3307:3307 memcp

# run a specific application
docker run -e PARAMS="lib/main.scm apps/minigame.scm" -v data:/data -it -p 4321:4321 -p 3307:3307 memcp

# run for productive use
mkdir /var/memcp
docker run -v /var/memcp:/data -di -p 4321:4321 -p 3307:3307 --restart unless-stopped memcp

Compile From Source

Make sure, go is installed on your computer.

Compile the project with

make # executes go build

Run the engine with

./memcp

MemCP Scheme Shell

It will drop you at the scheme shell:

~/memcp$ ./memcp
memcp Copyright (C) 2023   Carl-Philip Hänsch
    This program comes with ABSOLUTELY NO WARRANTY;
    This is free software, and you are welcome to redistribute it
    under certain conditions;
Welcome to memcp
Hello World
MySQL server listening on port 3307 (connect with mysql -P 3307 -u user -p)
listening on http://localhost:4321
>

now you can type any scheme expression like:

> (+ 1 2)
= 3
> (* 4 5)
= 20
> (show) /* shows all databases */
= ()
> (createdatabase "yo")
= "ok"
> (show) /* shows all databases */
= ("yo")
> (show "yo") /* shows all tables */
= ()
> (rebuild) /* optimizes memory layout */
= "124.503µs"
> (print (stat)) /* memory usage statistics */
= "Alloc = 0 MiB	TotalAlloc = 1 MiB	Sys = 16 MiB	NumGC = 1"
> (loadCSV "yo" "customers" "customers.csv" ";") /* loads CSV */

MySQL Connection

connect to it via

mysql -u root -p -P 3307 # password is 'admin'

You can try queries like:

SHOW DATABASES
SHOW TABLES
CREATE TABLE foo(bar string, amount int)
INSERT INTO foo(bar, amount) VALUES ('Man', 4), ('Horse', 6)
SELECT * FROM foo
SELECT SUM(amount) FROM foo

If you want to import whole databases from your old MySQL or MariaDB database, do the following:

$ ./tools/mysqldump-to-json.py -H localhost -u [user] -p [password] [database] > dump.jsonl
$ ./memcp
memcp Copyright (C) 2023   Carl-Philip Hänsch
    This program comes with ABSOLUTELY NO WARRANTY;
    This is free software, and you are welcome to redistribute it
    under certain conditions;
Welcome to memcp
Hello World
MySQL server listening on port 3307 (connect with mysql -P 3307 -u user -p)
listening on http://localhost:4321
> (createdatabase "my_database")
"ok"
> (loadJSON "my_database" "dump.jsonl")
"1.454ms"

REST API

curl --user root:admin 'http://localhost:4321/sql/system/SHOW%20DATABASES'
curl --user root:admin 'http://localhost:4321/sql/system/SHOW%20TABLES'
curl --user root:admin 'http://localhost:4321/sql/system/SELECT%20*%20FROM%20user'

Securing the database from external access 🔒

The standard username/password is root/admin. To change that, type the following into scheme console:

(eval (parse_sql "" "ALTER USER root IDENTIFIED BY 'new_password'"))
(eval (parse_sql "" "CREATE USER user2 IDENTIFIED BY 'new_password'"))

Example REST API App 📕

You can take a look at https://github.com/launix-de/rdfop which is a RDF hypertext processor based on MemCP

Contributing 🌿

We welcome contributions to memcp. If you would like to contribute, please follow these steps:,

Fork the repository and create a new branch for your changes.
Make your changes and commit them to your branch.
Push your branch to your fork and create a pull request.

Before submitting a pull request, please make sure that your changes pass the existing tests and add new tests if necessary.

How it Works? ❓

MemCP structures its data into databases and tables
Every table has multiple columns and multiple data shards
Every data shard stores ~64,000 items and is meant to be processed in ~100ms
Parallelization is done over shards
Every shard consists of two parts: main storage and delta storage
main storage is column-based, fixed-size and is compressed
Delta storage is a list of row-based insertions and deletions that is overlaid over a main storage
(rebuild) will merge all main+delta storages into new compressed main storages with empty delta storages
every dataset has a shard-local so-called recordId to re-identify a dataset

Available column compression formats 📃

uncompressed & untyped: versatile storage with JSON-esque freedom
bit-size reduced integer storage with offset: savings of 80% and more for small integers
integer sequences: >99% compression ratio possible with ascending IDs
string-storage: more compact than C strings, cache-friendly
string-dictionary: >90% memory savings for repeating strings like (male, female, male, male, male)
float storage
sparse storage: efficient with lots of NULL values

the best suitable compression technique for a column is detected for a column automatically

Frequently Asked Questions 🤔

What is an in-memory database?

Unlike traditional databases, which store data on disks, in-memory databases (IMDBs) keep data in RAM. This results in much faster access times.

Why it is used?

An in-memory database (IMDB) stores and retrieves data primarily in a computer's RAM, enabling exceptionally fast data processing and retrieval, making it suitable for real-time applications requiring rapid access to data.

What are the benefits of columnar storage?

With columnar storage, data is much more homogeneous than in row-based storage. This enables a technique called "column compression" where compression ratios of around 1:5 (i.e. 80% savings) can be achieved just by a different data representation. This reduces the amount of cache lines that must be transferred from main memory to CPU and thus increases performance, reduces power consumption and decreases latency.

Also, columnar storages are a better fit for analytical queries where only a few out of possibly hundreds of columns are processed in the SQL query. An example of an analytical query is calculating the sum of revenue over a timespan from a lot of data points.

Can in-memory databases be used for my web project?

Yes. MemCP is meant as a drop-in replacement for MySQL and will make your application run faster.

Why does MemCP consume less RAM than MySQL even though MySQL is a hard disk-based database

In order to run fast, MySQL already has to cache all data in RAM. However, MySQL is not capable of compression, so it will consume about 5x the amount of RAM compared to MemCP for the same size of data cache.

Isn't it dangerous to keep all data in RAM? What happens during a crash?

MemCP of course supports some kind of hard disk persistency. The difference to a hard-disk-based database is that in MemCP you can choose how much IO bandwidth you want to sacrifice to achieve full crash safety. In other words: Your accounting data can still be secured with per-transaction write barriers while you can increase the write performance for sensor data by loosening persistency guarantees.

What happens if memory is full?

Usually, the net amount of data in databases is very low. You will be amazed, at how much data fits into your RAM when properly compressed. If that still exceeds the memory of your machine, just remember how slow it would be on the hard disk. Upgrade your RAM if you don't want to be on your swap partition. When you really go big data, a shared memory cluster is the way for you to go.

What's the current development status of MemCP?

We are still in the early alpha phase. MemCP already supports some basic SQL statements but it is not production-ready yet. The best way to use MemCP in a productive environment is over the internal scheme scripting language where you can hand-craft efficient query plans. Contribution to the SQL compiler is of course welcome.

What are MemCP REST services?

Normally, REST applications are implemented in any programming language, make a connection to an SQL server and do their queries. This induces IO overhead for that additional network layer between application and database and for the string-print-send-receive-parse pipeline. With MemCP, you can script MemCP to open a REST server and offer your REST routes directly in the process space of the database. You can prepare SQL statements which can be directly invoked inside the database. And don't be afraid of crashes: a crash in MemCPs scheme scripts will never bring down the whole database process.

Name		Name	Last commit message	Last commit date
Latest commit History 636 Commits
apps		apps
assets		assets
lib		lib
scm		scm
storage		storage
tools		tools
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
go.mod		go.mod
main.go		main.go
memcp.singularity.recipe		memcp.singularity.recipe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

memcp: Made for Big Data

A High-Performance, Open-Source Columnar In-Memory Database

Protocol compatible drop-in replacement for MySQL

Introduction

What is memcp?

Features

Screenshots

Getting Started 🚶

Using Docker

Compile From Source

MemCP Scheme Shell

MySQL Connection

REST API

Securing the database from external access 🔒

Example REST API App 📕

Contributing 🌿

How it Works? ❓

Available column compression formats 📃

Frequently Asked Questions 🤔

What is an in-memory database?

Why it is used?

What are the benefits of columnar storage?

Can in-memory databases be used for my web project?

Why does MemCP consume less RAM than MySQL even though MySQL is a hard disk-based database

Isn't it dangerous to keep all data in RAM? What happens during a crash?

What happens if memory is full?

What's the current development status of MemCP?

What are MemCP REST services?

Further Reading 📚

About

Releases

Packages

Contributors 4

Languages

launix-de/memcp

Folders and files

Latest commit

History

Repository files navigation

memcp: Made for Big Data

A High-Performance, Open-Source Columnar In-Memory Database

Protocol compatible drop-in replacement for MySQL

Introduction

What is memcp?

Features

Screenshots

Getting Started 🚶

Using Docker

Compile From Source

MemCP Scheme Shell

MySQL Connection

REST API

Securing the database from external access 🔒

Example REST API App 📕

Contributing 🌿

How it Works? ❓

Available column compression formats 📃

Frequently Asked Questions 🤔

What is an in-memory database?

Why it is used?

What are the benefits of columnar storage?

Can in-memory databases be used for my web project?

Why does MemCP consume less RAM than MySQL even though MySQL is a hard disk-based database

Isn't it dangerous to keep all data in RAM? What happens during a crash?

What happens if memory is full?

What's the current development status of MemCP?

What are MemCP REST services?

Further Reading 📚

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages