Storage Engine Redesign for improving scalability #31

ambud · 2017-06-25T03:54:28Z

Summary:

Redesign storage engine to support hundreds of thousands of independent time series.

Description:

The current Sidewinder Disk Storage engine stores one unique time series bucket per file, the size of the bucket is configurable however, it still posses a restriction on how many unique timeseries can there be on a given server as the number of open files is limited.

While the dc4d448 does try avoid the issue of max open files by closing the files as soon as the MappedByteBuffer is created, this only pushes the envelope so far and the fundamental issue is unresolved.

The LRU based design proposal I created earlier can only help mitigate the issue when there actually aren't as many concurrent writes for time series. In the case there are, this would cause a lot of cache evictions causing frequent cache swapping adding to degraded performance.

Proposal
The New Storage Engine design proposes to decouple compression and persistence responsibilities, combine multiple series into 1 file while keeping the concept of time series buckets. The whole design is based on a memory allocator that grants buffers to series buckets on request, these buffers are slices of a memory mapped file segment. Once the file reaches a certain size new files are created and existing file is closed. This redesign refactors a lot of components in the StorageEngine while preserving the interface as much as possible therefore there's minimal impact of writer and reader components of the database. Additional testing is added as well to help improve the reliability of the system.

OLD:

Summary:

Limit maximum number of open files.

Description:

Create an Least Recently Used based eviction system to automatically close data files that are not being written to or read from. Operating systems have limit on maximum number of open files, if a user / system exceeds that an exception is thrown that can't be recovered unless files are closed. The LRU based module will prevent this exception from being thrown by proactively limiting exceeding this limit. This feature is specially very helpful for series storing historical data.

The text was updated successfully, but these errors were encountered:

…imit '#31'

Cleaning up archival moving sql package

time series buckets Fixing bugs with refactor

and diskstorage engine

Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests

windows)

Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows)

adding unit tests for persistent measurement

Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement

Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31

ambud · 2017-08-15T05:15:23Z

Pending items:

Write more consistency tests for the system

Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug

Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements

Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage

Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues

ambud · 2017-08-16T03:34:13Z

Committed to development branch

Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue

* Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue

* Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue Fixing pom issue

* Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue

* Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue * Fixing javadoc and javadoc builds

* Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue * Fixing javadoc and javadoc builds * #60 and #74 Adding increment size Refactoring code to make default methods in StorageEngine from the implementations, fixing scripts. Adding clustering implementations Changing back to disktagindex Adding Location Aware Routing Engine Fixing javadocs Adding graphite server for with support for plaintext protocol #73 Fixing bad javadoc

* Clustering enhancements and Graphite Server (#75) * Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue * Fixing javadoc and javadoc builds * #60 and #74 Adding increment size Refactoring code to make default methods in StorageEngine from the implementations, fixing scripts. Adding clustering implementations Changing back to disktagindex Adding Location Aware Routing Engine Fixing javadocs Adding graphite server for with support for plaintext protocol #73 Fixing bad javadoc * [ci skip] Updating build-info file * [ci skip]prepare release sidewinder-parent-0.0.27 * [ci skip]prepare for next development iteration

…imit '#31'

Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue

* Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue Fixing pom issue

* Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue * Fixing javadoc and javadoc builds

* Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue * Fixing javadoc and javadoc builds * #60 and #74 Adding increment size Refactoring code to make default methods in StorageEngine from the implementations, fixing scripts. Adding clustering implementations Changing back to disktagindex Adding Location Aware Routing Engine Fixing javadocs Adding graphite server for with support for plaintext protocol #73 Fixing bad javadoc

* Clustering enhancements and Graphite Server (#75) * Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue * Fixing javadoc and javadoc builds * #60 and #74 Adding increment size Refactoring code to make default methods in StorageEngine from the implementations, fixing scripts. Adding clustering implementations Changing back to disktagindex Adding Location Aware Routing Engine Fixing javadocs Adding graphite server for with support for plaintext protocol #73 Fixing bad javadoc * [ci skip] Updating build-info file * [ci skip]prepare release sidewinder-parent-0.0.27 * [ci skip]prepare for next development iteration

ambud added New Feature Enhancement Help Wanted and removed New Feature labels Jun 25, 2017

ambud added Bug and removed Enhancement labels Jul 15, 2017

ambud mentioned this issue Jul 15, 2017

Configurable time bucket constant & persistent metadata #43

Closed

ambud added a commit that referenced this issue Jul 31, 2017

Adding file channel close as a temporary solution to max open files l…

dc4d448

…imit '#31'

ambud added a commit that referenced this issue Aug 6, 2017

#31 Improving hash function for tags

3e8ea49

Cleaning up archival moving sql package

ambud added a commit that referenced this issue Aug 6, 2017

#31 Refactoring compression classes to externalize byte buffer

a544974

ambud added a commit that referenced this issue Aug 7, 2017

#31 refactoring persistent time series

888bd65

ambud added a commit that referenced this issue Aug 9, 2017

#31 Adding tests for persistent time series and refactoring code for

6c67029

time series buckets Fixing bugs with refactor

ambud added Enhancement and removed Bug Help Wanted labels Aug 9, 2017

ambud self-assigned this Aug 9, 2017

ambud changed the title ~~Limit max open files~~ Storage Engine Redesign for increasing series count Aug 9, 2017

ambud changed the title ~~Storage Engine Redesign for increasing series count~~ Storage Engine Redesign for improving scalability Aug 9, 2017

ambud added a commit that referenced this issue Aug 10, 2017

#31 Fixing compilation issues, refactoring code for memstorage engine

45b92fa

and diskstorage engine

ambud added a commit that referenced this issue Aug 10, 2017

#31 Fixing metadat path issues

25f10a0

ambud added a commit that referenced this issue Aug 10, 2017

#31 fixing compression tests

f78a245

ambud added a commit that referenced this issue Aug 11, 2017

#31 Adding RAT plugin to scan for missing licenses

f71bedd

ambud added a commit that referenced this issue Aug 12, 2017

#31 adding validation check

8d03fb2

ambud added a commit that referenced this issue Aug 12, 2017

#31 Adding ptr file for buffer recovery

0c79eb1

ambud added a commit that referenced this issue Aug 12, 2017

Merge branch '#31' into development

f73cba4

ambud added a commit that referenced this issue Aug 13, 2017

#31 Adding series recovery, fixing unit tests

fad0b52

ambud added a commit that referenced this issue Aug 13, 2017

New storage engine implementation #31

00787bb

Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests

ambud added a commit that referenced this issue Aug 13, 2017

#31 Fixing unit tests for windows (NOTE: file deletes are not working on

bde7534

windows)

ambud added a commit that referenced this issue Aug 13, 2017

New storage engine implementation #31

e03b947

Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows)

ambud added a commit that referenced this issue Aug 13, 2017

#31 Fixing lock bug for byzantine writer

f2aa682

adding unit tests for persistent measurement

ambud added a commit that referenced this issue Aug 14, 2017

#31 fixing test buffer size

d17cb3d

ambud added a commit that referenced this issue Aug 14, 2017

#31 Fixing comparison operator

c416d9e

ambud added the New Feature label Aug 15, 2017

ambud mentioned this issue Aug 16, 2017

New storage engine implementation #31 #66

Merged

ambud closed this as completed in #66 Aug 16, 2017

ambud mentioned this issue Sep 6, 2017

Master version update (#79) #80

Merged

ambud added a commit that referenced this issue Feb 10, 2018

Adding file channel close as a temporary solution to max open files l…

ba906d8

…imit '#31'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storage Engine Redesign for improving scalability #31

Storage Engine Redesign for improving scalability #31

ambud commented Jun 25, 2017 •

edited

ambud commented Aug 15, 2017 •

edited

ambud commented Aug 16, 2017

Storage Engine Redesign for improving scalability #31

Storage Engine Redesign for improving scalability #31

Comments

ambud commented Jun 25, 2017 • edited

Summary:

Description:

OLD:

Summary:

Description:

ambud commented Aug 15, 2017 • edited

ambud commented Aug 16, 2017

ambud commented Jun 25, 2017 •

edited

ambud commented Aug 15, 2017 •

edited