Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lossless data compression #73

Closed
deathfromheaven opened this issue Mar 3, 2022 · 4 comments
Closed

lossless data compression #73

deathfromheaven opened this issue Mar 3, 2022 · 4 comments

Comments

@deathfromheaven
Copy link

Hello,I want to know the name of the lossless data compression you use in the OpenHistorian and how to find it.

@ritchiecarroll
Copy link
Member

Technically the openHistorian, being based on SnapDB, can use any compression algorithm.

The default one is designed to do a good job with a variety of streaming time-series data sources, we casually refer to this as Time-Series Special Compression (or TSSC) - but there are variations of this, e.g., the one used in STTP. Most of the compression algorithms were developed by @StevenChisholm.

For the the current implementation in openHistorian, you can find the code here, check the Encode and Decode methods:

@deathfromheaven
Copy link
Author

Where can I find the specific information of the TSSC algorithm or how can I learn this algorithm. Is there any literature which you recommend about this algorithm?

@ritchiecarroll
Copy link
Member

ritchiecarroll commented Mar 21, 2022

openHistorian can archive any time-series type data in a streaming fashion, so the goal is to apply general purpose streaming compression rules to specific data elements, longitudinally, and based on the nature of the data, produce good compression ratios with minimal CPU impact.

The TSSC algorithm used by openHistorian is more simple than the one used by STTP in the number of measurement states that is maintained. When the IEEE 2664 standard is released, it will contain a section and an appendix describing TSSC in greater detail.

In general, TSSC takes each of the elements of a time-series measurement and handles each data type with separate compression algorithms, creating parallel compression streams for each data element in the measurement. The nature of the data element being compressed then infers the necessary compression algorithm tuning to produce the best results.

For archived data, timestamps will be near each other, normally varying by no more than a few seconds. For the 64-bit timestamps, this means the data variation may only occur in the bottom 16 of the total 64 bits of the timestamp. With the bulk of the bits repeating invariably, the total bit set needs to be only archived once or on substantial change, then only the changing bits need to be archived.

Additionally, if the timestamps vary less, the algorithm can automatically adjust and archive even fewer bits. This same type of pattern works well for identification numbers, which are finite in number, and state flags, which vary little. Data values, however, need special attention.

Data value elements for a given measurement can be rather random, however, many values change slowly over time, from measurement to measurement. For example, a measured frequency value tends to change only incrementally over several data measurements; in fact, other frequencies in the same subscription may only differ by just a few bits. With this knowledge a few extra opcodes can be maintained that represent the unvarying bits of many types of measurements. Now only the changed bit values need to be encoded into the archive so that the stream can be reinflated without loss upon reception.

@deathfromheaven
Copy link
Author

Thank you very much!I got it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants