Skip to content

Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, written in Rust

License

Notifications You must be signed in to change notification settings

s7v7nislands/datafuse

 
 

Repository files navigation

Datafuse

Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture

Datafuse is a Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture written in Rust, inspired by ClickHouse and powered by arrow-rs, built to make it easy to power the Data Cloud.

Principles

  • Fearless

    • No data races, No unsafe, Minimize unhandled errors
  • High Performance

    • Everything is Parallelism
  • High Scalability

    • Everything is Distributed
  • High Reliability

    • True Separation of Storage and Compute

Architecture

Datafuse Architecture

Performance

  • Memory SIMD-Vector processing performance only
  • Dataset: 100,000,000,000 (100 Billion)
  • Hardware: AMD Ryzen 7 PRO 4750U, 8 CPU Cores, 16 Threads
  • Rust: rustc 1.49.0 (e1884a8e3 2020-12-29)
  • Build with Link-time Optimization and Using CPU Specific Instructions
  • ClickHouse server version 21.2.1 revision 54447
Query FuseQuery (v0.1) ClickHouse (v21.2.1)
SELECT avg(number) FROM numbers_mt(100000000000) (3.11 s.) ×3.14 slow, (9.77 s.)
10.24 billion rows/s., 81.92 GB/s.
SELECT sum(number) FROM numbers_mt(100000000000) (2.96 s.) ×2.02 slow, (5.97 s.)
16.75 billion rows/s., 133.97 GB/s.
SELECT min(number) FROM numbers_mt(100000000000) (3.57 s.) ×3.90 slow, (13.93 s.)
7.18 billion rows/s., 57.44 GB/s.
SELECT max(number) FROM numbers_mt(100000000000) (3.59 s.) ×4.09 slow, (14.70 s.)
6.80 billion rows/s., 54.44 GB/s.
SELECT count(number) FROM numbers_mt(100000000000) (1.76 s.) ×2.22 slow, (3.91 s.)
25.58 billion rows/s., 204.65 GB/s.
SELECT sum(number+number+number) FROM numbers_mt(100000000000) (23.14 s.) ×5.47 slow, (126.67 s.)
789.47 million rows/s., 6.32 GB/s.
SELECT sum(number) / count(number) FROM numbers_mt(100000000000) (3.09 s.) ×1.96 slow, (6.07 s.)
16.48 billion rows/s., 131.88 GB/s.
SELECT sum(number) / count(number), max(number), min(number) FROM numbers_mt(100000000000) (6.73 s.) ×4.01 slow, (27.59 s.)
3.62 billion rows/s., 28.99 GB/s.
SELECT number FROM numbers_mt(10000000000) ORDER BY number DESC LIMIT 1000 (6.91 s.) ×1.42 slow, (9.83 s.)
1.02 billion rows/s., 8.14 GB/s.
SELECT max(number),sum(number) FROM numbers_mt(1000000000) GROUP BY number % 3, number % 4, number % 5 (10.87 s.) ×1.95 fast, (5.58 s.)
179.23 million rows/s., 1.43 GB/s.

Note:

  • ClickHouse system.numbers_mt is 16-way parallelism processing
  • FuseQuery system.numbers_mt is 16-way parallelism processing

Status

General

  • SQL Parser
  • Query Planner
  • Query Optimizer
  • Predicate Push Down
  • Limit Push Down
  • Projection Push Down
  • Type coercion
  • Parallel Query Execution
  • Distributed Query Execution
  • Hash GroupBy
  • Merge-Sort OrderBy
  • Joins (WIP)

SQL Support

  • Projection
  • Filter (WHERE)
  • Limit
  • Aggregate Functions
  • Scalar Functions
  • UDF Functions
  • SubQueries
  • Sorting
  • Joins (WIP)
  • Window (TODO)

Getting Started

Learn Datafuse

Try Datafuse

Contributing

Roadmap

  • 0.1 Support aggregation select (2021.02)
  • 0.2 Support distributed query (2021.03)
  • 0.3 Support group by (2021.04)
  • 0.4 Support order by (2021.04)
  • 0.5 Support join
  • 1.0 Support TPC-H benchmark

License

Datafuse is licensed under Apache 2.0.

About

Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, written in Rust

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Rust 94.2%
  • Python 2.2%
  • Shell 1.6%
  • JavaScript 1.3%
  • CSS 0.3%
  • Smarty 0.2%
  • Other 0.2%