Skip to content

Documentation

Updated Jul 11, 2018
  

Improve documentation of RubiX in open source. There is already a backlog of issues tagged appropriately. We should also capture new areas of improvement.

Right now only a small set of metrics are reported to monitoring solutions from the coordinator. Send more metrics to a monitoring solution using statsd.

Performance Investigation

Updated Jul 11, 2018

We have noticed that Rubix provides better performance on Spark over Presto. It is also required to get a sense of similar projects like raptor, LLAP cache and Spark RDDs. This will help us emulate and surpass the performance of these projects.

Rubix Client

Updated Jul 11, 2018

Design a Rubix Client to interact with Rubix without the need of an engine like Spark or Presto. It is also useful in product tests. The client should be future compatible to handle less primitive servers.

Consistent Hashing

Updated Jul 29, 2018

The node a file gets allocated in decided by consistent hashing. The main purpose of consistent hashing it to minimize the membership (between file and node) change when a new node gets added or an existing node gets removed.

Monitoring and Alerting

Updated Aug 1, 2018

No description

File Metadata Caching

Updated Jun 8, 2018

Columnar file metadata to be cached in all the nodes to reduce non-local read for file footers.

Replication of local data on multiple (configurable number) nodes to provide more availability so that users doing ad-hoc queries can have better performance because of data locality

Data Tiering

Updated Jun 8, 2018

Reserve memory for recent accessed data in RubiX so that users doing ad-hoc queries can have better performance

Asynchronous Cache Warmup

Updated Jun 8, 2018

Reduce the cache warm up time by reading from S3 at first, but also downloading in parallel so that users doing ad-hoc queries can have better performance

Pluggable Cache Eviction Policy

Updated Jun 8, 2018

No description

Access Control to the Cached Data

Updated Jun 8, 2018

This includes encrypting the raw data and masking the local storage path which is mapped to the remote object store path.

Metrics for Cache Effectiveness

Updated Jun 8, 2018

No description

No description

You can’t perform that action at this time.