New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storage Engine Redesign for improving scalability #31
Comments
ambud
added a commit
that referenced
this issue
Jul 31, 2017
ambud
added a commit
that referenced
this issue
Aug 6, 2017
ambud
added a commit
that referenced
this issue
Aug 7, 2017
ambud
added a commit
that referenced
this issue
Aug 9, 2017
time series buckets Fixing bugs with refactor
ambud
changed the title
Limit max open files
Storage Engine Redesign for increasing series count
Aug 9, 2017
ambud
changed the title
Storage Engine Redesign for increasing series count
Storage Engine Redesign for improving scalability
Aug 9, 2017
ambud
added a commit
that referenced
this issue
Aug 11, 2017
ambud
added a commit
that referenced
this issue
Aug 12, 2017
ambud
added a commit
that referenced
this issue
Aug 12, 2017
ambud
added a commit
that referenced
this issue
Aug 13, 2017
ambud
added a commit
that referenced
this issue
Aug 13, 2017
Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests
ambud
added a commit
that referenced
this issue
Aug 13, 2017
ambud
added a commit
that referenced
this issue
Aug 13, 2017
Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows)
ambud
added a commit
that referenced
this issue
Aug 13, 2017
adding unit tests for persistent measurement
ambud
added a commit
that referenced
this issue
Aug 14, 2017
Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement
ambud
added a commit
that referenced
this issue
Aug 15, 2017
Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31
Pending items:
|
ambud
added a commit
that referenced
this issue
Aug 15, 2017
Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug
ambud
added a commit
that referenced
this issue
Aug 15, 2017
Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements
ambud
added a commit
that referenced
this issue
Aug 16, 2017
Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage
ambud
added a commit
that referenced
this issue
Aug 16, 2017
Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues
ambud
added a commit
that referenced
this issue
Aug 16, 2017
Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues
Committed to development branch |
ambud
added a commit
that referenced
this issue
Aug 20, 2017
Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue
ambud
added a commit
that referenced
this issue
Aug 23, 2017
* Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue
ambud
added a commit
that referenced
this issue
Aug 23, 2017
* Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue Fixing pom issue
ambud
added a commit
that referenced
this issue
Aug 23, 2017
* Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue
ambud
added a commit
that referenced
this issue
Aug 23, 2017
* Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue * Fixing javadoc and javadoc builds
ambud
added a commit
that referenced
this issue
Sep 6, 2017
* Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue * Fixing javadoc and javadoc builds * #60 and #74 Adding increment size Refactoring code to make default methods in StorageEngine from the implementations, fixing scripts. Adding clustering implementations Changing back to disktagindex Adding Location Aware Routing Engine Fixing javadocs Adding graphite server for with support for plaintext protocol #73 Fixing bad javadoc
ambud
added a commit
that referenced
this issue
Sep 6, 2017
* Clustering enhancements and Graphite Server (#75) * Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue * Fixing javadoc and javadoc builds * #60 and #74 Adding increment size Refactoring code to make default methods in StorageEngine from the implementations, fixing scripts. Adding clustering implementations Changing back to disktagindex Adding Location Aware Routing Engine Fixing javadocs Adding graphite server for with support for plaintext protocol #73 Fixing bad javadoc * [ci skip] Updating build-info file * [ci skip]prepare release sidewinder-parent-0.0.27 * [ci skip]prepare for next development iteration
ambud
added a commit
that referenced
this issue
Sep 6, 2017
* Clustering enhancements and Graphite Server (#75) * Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue * Fixing javadoc and javadoc builds * #60 and #74 Adding increment size Refactoring code to make default methods in StorageEngine from the implementations, fixing scripts. Adding clustering implementations Changing back to disktagindex Adding Location Aware Routing Engine Fixing javadocs Adding graphite server for with support for plaintext protocol #73 Fixing bad javadoc * [ci skip] Updating build-info file * [ci skip]prepare release sidewinder-parent-0.0.27 * [ci skip]prepare for next development iteration
ambud
added a commit
that referenced
this issue
Feb 10, 2018
ambud
added a commit
that referenced
this issue
Feb 10, 2018
Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue
ambud
added a commit
that referenced
this issue
Feb 10, 2018
* Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue Fixing pom issue
ambud
added a commit
that referenced
this issue
Feb 10, 2018
* Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue * Fixing javadoc and javadoc builds
ambud
added a commit
that referenced
this issue
Feb 10, 2018
* Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue * Fixing javadoc and javadoc builds * #60 and #74 Adding increment size Refactoring code to make default methods in StorageEngine from the implementations, fixing scripts. Adding clustering implementations Changing back to disktagindex Adding Location Aware Routing Engine Fixing javadocs Adding graphite server for with support for plaintext protocol #73 Fixing bad javadoc
ambud
added a commit
that referenced
this issue
Feb 10, 2018
* Clustering enhancements and Graphite Server (#75) * Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue * Fixing javadoc and javadoc builds * #60 and #74 Adding increment size Refactoring code to make default methods in StorageEngine from the implementations, fixing scripts. Adding clustering implementations Changing back to disktagindex Adding Location Aware Routing Engine Fixing javadocs Adding graphite server for with support for plaintext protocol #73 Fixing bad javadoc * [ci skip] Updating build-info file * [ci skip]prepare release sidewinder-parent-0.0.27 * [ci skip]prepare for next development iteration
ambud
added a commit
that referenced
this issue
Feb 10, 2018
* Clustering enhancements and Graphite Server (#75) * Adding file channel close as a temporary solution to max open files limit '#31' * New storage engine implementation #31 (#66) Cleaning up archival moving sql package time series buckets Fixing bugs with refactor and diskstorage engine Validating and fixing unit tests windows) adding unit tests for persistent measurement Adding concurrency fixes #31 Fixing recovery bug Fixing broken recovery system due to missing offsets, fixing unit tests for recovery, adding parallel recovery for measurements Improving test converage Fixing minor code issues fixing potential concurrency issue * Adding basic GRPC code for writes with disruptor (#67) Fixing rpc build issue * Fixing javadoc and javadoc builds * #60 and #74 Adding increment size Refactoring code to make default methods in StorageEngine from the implementations, fixing scripts. Adding clustering implementations Changing back to disktagindex Adding Location Aware Routing Engine Fixing javadocs Adding graphite server for with support for plaintext protocol #73 Fixing bad javadoc * [ci skip] Updating build-info file * [ci skip]prepare release sidewinder-parent-0.0.27 * [ci skip]prepare for next development iteration
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Summary:
Redesign storage engine to support hundreds of thousands of independent time series.
Description:
The current Sidewinder Disk Storage engine stores one unique time series bucket per file, the size of the bucket is configurable however, it still posses a restriction on how many unique timeseries can there be on a given server as the number of open files is limited.
While the dc4d448 does try avoid the issue of max open files by closing the files as soon as the MappedByteBuffer is created, this only pushes the envelope so far and the fundamental issue is unresolved.
The LRU based design proposal I created earlier can only help mitigate the issue when there actually aren't as many concurrent writes for time series. In the case there are, this would cause a lot of cache evictions causing frequent cache swapping adding to degraded performance.
Proposal
The New Storage Engine design proposes to decouple compression and persistence responsibilities, combine multiple series into 1 file while keeping the concept of time series buckets. The whole design is based on a memory allocator that grants buffers to series buckets on request, these buffers are slices of a memory mapped file segment. Once the file reaches a certain size new files are created and existing file is closed. This redesign refactors a lot of components in the StorageEngine while preserving the interface as much as possible therefore there's minimal impact of writer and reader components of the database. Additional testing is added as well to help improve the reliability of the system.
OLD:
Summary:
Limit maximum number of open files.
Description:
Create an Least Recently Used based eviction system to automatically close data files that are not being written to or read from. Operating systems have limit on maximum number of open files, if a user / system exceeds that an exception is thrown that can't be recovered unless files are closed. The LRU based module will prevent this exception from being thrown by proactively limiting exceeding this limit. This feature is specially very helpful for series storing historical data.
The text was updated successfully, but these errors were encountered: