lossless data compression #73

deathfromheaven · 2022-03-03T07:33:10Z

Hello，I want to know the name of the lossless data compression you use in the OpenHistorian and how to find it.

ritchiecarroll · 2022-03-03T15:32:23Z

Technically the openHistorian, being based on SnapDB, can use any compression algorithm.

The default one is designed to do a good job with a variety of streaming time-series data sources, we casually refer to this as Time-Series Special Compression (or TSSC) - but there are variations of this, e.g., the one used in STTP. Most of the compression algorithms were developed by @StevenChisholm.

For the the current implementation in openHistorian, you can find the code here, check the Encode and Decode methods:

C#: HistorianStreamEncoding.cs
Python: historianKeyValueEncoder.py

deathfromheaven · 2022-03-21T08:16:27Z

Where can I find the specific information of the TSSC algorithm or how can I learn this algorithm. Is there any literature which you recommend about this algorithm？

ritchiecarroll · 2022-03-21T14:07:08Z

openHistorian can archive any time-series type data in a streaming fashion, so the goal is to apply general purpose streaming compression rules to specific data elements, longitudinally, and based on the nature of the data, produce good compression ratios with minimal CPU impact.

The TSSC algorithm used by openHistorian is more simple than the one used by STTP in the number of measurement states that is maintained. When the IEEE 2664 standard is released, it will contain a section and an appendix describing TSSC in greater detail.

In general, TSSC takes each of the elements of a time-series measurement and handles each data type with separate compression algorithms, creating parallel compression streams for each data element in the measurement. The nature of the data element being compressed then infers the necessary compression algorithm tuning to produce the best results.

For archived data, timestamps will be near each other, normally varying by no more than a few seconds. For the 64-bit timestamps, this means the data variation may only occur in the bottom 16 of the total 64 bits of the timestamp. With the bulk of the bits repeating invariably, the total bit set needs to be only archived once or on substantial change, then only the changing bits need to be archived.

Additionally, if the timestamps vary less, the algorithm can automatically adjust and archive even fewer bits. This same type of pattern works well for identification numbers, which are finite in number, and state flags, which vary little. Data values, however, need special attention.

Data value elements for a given measurement can be rather random, however, many values change slowly over time, from measurement to measurement. For example, a measured frequency value tends to change only incrementally over several data measurements; in fact, other frequencies in the same subscription may only differ by just a few bits. With this knowledge a few extra opcodes can be maintained that represent the unvarying bits of many types of measurements. Now only the changed bit values need to be encoded into the archive so that the stream can be reinflated without loss upon reception.

deathfromheaven · 2022-03-24T07:54:17Z

Thank you very much！I got it.

deathfromheaven closed this as completed Apr 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lossless data compression #73

lossless data compression #73

deathfromheaven commented Mar 3, 2022

ritchiecarroll commented Mar 3, 2022

deathfromheaven commented Mar 21, 2022

ritchiecarroll commented Mar 21, 2022 •

edited

Loading

deathfromheaven commented Mar 24, 2022

lossless data compression #73

lossless data compression #73

Comments

deathfromheaven commented Mar 3, 2022

ritchiecarroll commented Mar 3, 2022

deathfromheaven commented Mar 21, 2022

ritchiecarroll commented Mar 21, 2022 • edited Loading

deathfromheaven commented Mar 24, 2022

ritchiecarroll commented Mar 21, 2022 •

edited

Loading