Skip to content

launix-de/memcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

memcp: Made for Big Data

A High-Performance, Open-Source Columnar In-Memory Database

Protocol compatible drop-in replacement for MySQL

Go MySQL HTML5 JavaScript REST JSON


memcp >

Introduction

In modern server and mainframe hardware, the memory bandwidth between CPU and RAM has become the new bottleneck. In-RAM compression will be a mayor contribution towards solving that issue.

What is memcp?

memcp is an open-source, high-performance, columnar in-memory database that can handle both OLAP and OLTP workloads. It provides an alternative to proprietary analytical databases and aims to bring the benefits of columnar storage to the open-source world.

memcp is written in Golang and is designed to be portable and extensible, allowing developers to embed the database into their applications with ease. It is also designed with a focus on scalability and performance, making it a suitable choice for distributed applications.


Features

  • fast: MemCP is built with parallelization in mind. The parallelization pattern is made for minimal overhead.
  • efficient: The average compression ratio is 1:5 (80% memory saving) compared to MySQL/MariaDB
  • modern: MemCP is built for modern hardware with caches, NUMA memory, multicore CPUs, NVMe SSDs
  • versatile: Use it in big mainframes to gain analytical performance, use it in embedded systems to conserve flash lifetime
  • Columnar storage: Stores data column-wise instead of row-wise, which allows for better compression, faster query execution, and more efficient use of memory.
  • In-memory database: Stores all data in memory, which allows for extremely fast query execution.
  • Build fast REST APIs directly in the database (they are faster because there is no network connection / SQL layer in between)
  • OLAP and OLTP support: Can handle both online analytical processing (OLAP) and online transaction processing (OLTP) workloads.
  • Compression: Lots of compression formats are supported like bit-packing and dictionary encoding
  • Scalability: Designed to scale on a single node with huge NUMA memory
  • Adjustable persistency: Decide whether you want to persist a table or not or to just keep snapshots of a period of time

Project Website: memcp.org

Screenshots

Benchmark
mysql client connecting to memcp

Getting Started ๐Ÿšถ

Using Docker

# first time: build the image
docker build . -t memcp

# run with the interactive scheme shell for debugging and development
mkdir data
docker run -v data:/data -it -p 4321:4321 -p 3307:3307 memcp

# run a specific application
docker run -e PARAMS="lib/main.scm apps/minigame.scm" -v data:/data -it -p 4321:4321 -p 3307:3307 memcp

# run for productive use
mkdir /var/memcp
docker run -v /var/memcp:/data -di -p 4321:4321 -p 3307:3307 --restart unless-stopped memcp


Compile From Source

Make sure, go is installed on your computer.

Compile the project with

make # executes go build

Run the engine with

./memcp

MemCP Scheme Shell

It will drop you at the scheme shell:

~/memcp$ ./memcp
memcp Copyright (C) 2023   Carl-Philip Hรคnsch
    This program comes with ABSOLUTELY NO WARRANTY;
    This is free software, and you are welcome to redistribute it
    under certain conditions;
Welcome to memcp
Hello World
MySQL server listening on port 3307 (connect with mysql -P 3307 -u user -p)
listening on http://localhost:4321
>  

now you can type any scheme expression like:

> (+ 1 2)
= 3
> (* 4 5)
= 20
> (show) /* shows all databases */
= ()
> (createdatabase "yo")
= "ok"
> (show) /* shows all databases */
= ("yo")
> (show "yo") /* shows all tables */
= ()
> (rebuild) /* optimizes memory layout */
= "124.503ยตs"
> (print (stat)) /* memory usage statistics */
= "Alloc = 0 MiB	TotalAlloc = 1 MiB	Sys = 16 MiB	NumGC = 1"
> (loadCSV "yo" "customers" "customers.csv" ";") /* loads CSV */

MySQL Connection

connect to it via

mysql -u root -p -P 3307 # password is 'admin'

You can try queries like:

SHOW DATABASES
SHOW TABLES
CREATE TABLE foo(bar string, amount int)
INSERT INTO foo(bar, amount) VALUES ('Man', 4), ('Horse', 6)
SELECT * FROM foo
SELECT SUM(amount) FROM foo

If you want to import whole databases from your old MySQL or MariaDB database, do the following:

$ ./tools/mysqldump-to-json.py -H localhost -u [user] -p [password] [database] > dump.jsonl
$ ./memcp
memcp Copyright (C) 2023   Carl-Philip Hรคnsch
    This program comes with ABSOLUTELY NO WARRANTY;
    This is free software, and you are welcome to redistribute it
    under certain conditions;
Welcome to memcp
Hello World
MySQL server listening on port 3307 (connect with mysql -P 3307 -u user -p)
listening on http://localhost:4321
> (createdatabase "my_database")
"ok"
> (loadJSON "my_database" "dump.jsonl")
"1.454ms"

REST API

curl --user root:admin 'http://localhost:4321/sql/system/SHOW%20DATABASES'
curl --user root:admin 'http://localhost:4321/sql/system/SHOW%20TABLES'
curl --user root:admin 'http://localhost:4321/sql/system/SELECT%20*%20FROM%20user'

Securing the database from external access ๐Ÿ”’

The standard username/password is root/admin. To change that, type the following into scheme console:
(eval (parse_sql "" "ALTER USER root IDENTIFIED BY 'new_password'"))
(eval (parse_sql "" "CREATE USER user2 IDENTIFIED BY 'new_password'"))

Example REST API App ๐Ÿ“•

You can take a look at https://github.com/launix-de/rdfop which is a RDF hypertext processor based on MemCP


Contributing ๐ŸŒฟ

We welcome contributions to memcp. If you would like to contribute, please follow these steps:,

  • Fork the repository and create a new branch for your changes.
  • Make your changes and commit them to your branch.
  • Push your branch to your fork and create a pull request.

Before submitting a pull request, please make sure that your changes pass the existing tests and add new tests if necessary.


How it Works? โ“

  • MemCP structures its data into databases and tables
  • Every table has multiple columns and multiple data shards
  • Every data shard stores ~64,000 items and is meant to be processed in ~100ms
  • Parallelization is done over shards
  • Every shard consists of two parts: main storage and delta storage
  • main storage is column-based, fixed-size and is compressed
  • Delta storage is a list of row-based insertions and deletions that is overlaid over a main storage
  • (rebuild) will merge all main+delta storages into new compressed main storages with empty delta storages
  • every dataset has a shard-local so-called recordId to re-identify a dataset

Available column compression formats ๐Ÿ“ƒ

  • uncompressed & untyped: versatile storage with JSON-esque freedom
  • bit-size reduced integer storage with offset: savings of 80% and more for small integers
  • integer sequences: >99% compression ratio possible with ascending IDs
  • string-storage: more compact than C strings, cache-friendly
  • string-dictionary: >90% memory savings for repeating strings like (male, female, male, male, male)
  • float storage
  • sparse storage: efficient with lots of NULL values

the best suitable compression technique for a column is detected for a column automatically


Frequently Asked Questions ๐Ÿค”

What is an in-memory database?

Unlike traditional databases, which store data on disks, in-memory databases (IMDBs) keep data in RAM. This results in much faster access times.

Why it is used?

An in-memory database (IMDB) stores and retrieves data primarily in a computer's RAM, enabling exceptionally fast data processing and retrieval, making it suitable for real-time applications requiring rapid access to data.

What are the benefits of columnar storage?

With columnar storage, data is much more homogeneous than in row-based storage. This enables a technique called "column compression" where compression ratios of around 1:5 (i.e. 80% savings) can be achieved just by a different data representation. This reduces the amount of cache lines that must be transferred from main memory to CPU and thus increases performance, reduces power consumption and decreases latency.

Also, columnar storages are a better fit for analytical queries where only a few out of possibly hundreds of columns are processed in the SQL query. An example of an analytical query is calculating the sum of revenue over a timespan from a lot of data points.

Can in-memory databases be used for my web project?

Yes. MemCP is meant as a drop-in replacement for MySQL and will make your application run faster.

Why does MemCP consume less RAM than MySQL even though MySQL is a hard disk-based database

In order to run fast, MySQL already has to cache all data in RAM. However, MySQL is not capable of compression, so it will consume about 5x the amount of RAM compared to MemCP for the same size of data cache.

Isn't it dangerous to keep all data in RAM? What happens during a crash?

MemCP of course supports some kind of hard disk persistency. The difference to a hard-disk-based database is that in MemCP you can choose how much IO bandwidth you want to sacrifice to achieve full crash safety. In other words: Your accounting data can still be secured with per-transaction write barriers while you can increase the write performance for sensor data by loosening persistency guarantees.

What happens if memory is full?

Usually, the net amount of data in databases is very low. You will be amazed, at how much data fits into your RAM when properly compressed. If that still exceeds the memory of your machine, just remember how slow it would be on the hard disk. Upgrade your RAM if you don't want to be on your swap partition. When you really go big data, a shared memory cluster is the way for you to go.

What's the current development status of MemCP?

We are still in the early alpha phase. MemCP already supports some basic SQL statements but it is not production-ready yet. The best way to use MemCP in a productive environment is over the internal scheme scripting language where you can hand-craft efficient query plans. Contribution to the SQL compiler is of course welcome.

What are MemCP REST services?

Normally, REST applications are implemented in any programming language, make a connection to an SQL server and do their queries. This induces IO overhead for that additional network layer between application and database and for the string-print-send-receive-parse pipeline. With MemCP, you can script MemCP to open a REST server and offer your REST routes directly in the process space of the database. You can prepare SQL statements which can be directly invoked inside the database. And don't be afraid of crashes: a crash in MemCPs scheme scripts will never bring down the whole database process.


Further Reading ๐Ÿ“š

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages