Skip to content

Commit

Permalink
Merge pull request #1719 from iain-buclaw-sociomantic/ceresdocs
Browse files Browse the repository at this point in the history
Ceres: Add detailed documentation
  • Loading branch information
obfuscurity committed Nov 29, 2016
2 parents 5f82fd6 + c0eb83f commit 6696a95
Show file tree
Hide file tree
Showing 6 changed files with 125 additions and 3 deletions.
102 changes: 102 additions & 0 deletions docs/ceres.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
The Ceres Database
====================

Ceres is a time-series database format intended to replace Whisper as the default storage format
for Graphite. In contrast with Whisper, Ceres is not a fixed-size database and is designed to
better support sparse data of arbitrary fixed-size resolutions. This allows Graphite to distribute
individual time-series across multiple servers or mounts.


Storage Overview
----------------
Ceres databases are comprised of a single tree contained within a single path on disk that stores all
metrics in nesting directories as nodes.

A Ceres node represents a single time-series metric, and is composed of at least two data files. A slice
to store all data points, and an arbitrary key-value metadata file. The minimum required metadata a node
needs is a ``'timeStep'``. This setting is the finest resolution that can be used for writing. A Ceres
node however can contain and read data with other, less-precise values in its underlying slice data.

Other metadata keys that may be set for compatibility with Graphite are ``'retentions'`` , ``'xFilesFacter'``,
and ``'aggregationMethod'``.

A Ceres slice contains the actual data points in a file. The only other information a slice holds is the
timestamp of the oldest data point, and the resolution. Both of which are encoded as part of its filename
in the format ``timestamp@resolution``.

Data points in Ceres are stored on-disk as a contiguous list of big-endian double-precision floats. The
timestamp of a datapoint is not stored with the value, rather it is calculated by using the timestamp
of the slice plus the index offset of the value multiplied by the resolution.

The timestamp is the number of seconds since the UNIX Epoch (01-01-1970). The data value is parsed by the
Python `float() <http://docs.python.org/library/functions.html#float>`_ function and as such behaves in
the same way for special strings such as ``'inf'``. Maximum and minimum values are determined by the
Python interpreter's allowable range for float values which can be found by executing::

python -c 'import sys; print sys.float_info'


Slices: Precision and Fragmentation
-----------------------------------
Ceres databases contain one or more slices, each with a specific data resolution and a timestamp to mark
the beginning of the slice. Slices are ordered from the most recent timestamp to the oldest timestamp.
Resolution of data is not considered when reading from a slice, only that when writing a slice with the
finest precision configured for the node exists.

Gaps in data are handled in Ceres by padding slices with null datapoints. If the slice gap however is too
big, then a new slice is instead created. If a Ceres node accumulates too many slices, read performance
can suffer. This can be caused by intermittently reported data. To mitigate slice fragmentation there is
a tolerance for how much space can be wasted within a slice file to avoid creating a new one. That
tolerance level is determined by ``'MAX_SLICE_GAP'``, which is the number of consecutive null datapoints
allowed in a slice file.

If set very low, Ceres will waste less of the tiny bit disk space that this feature wastes, but then
will be prone to performance problems caused by slice fragmentation, which can be pretty severe.

If set really high, Ceres will waste a bit more disk space. Although each null datapoint wastes 8 bytes,
you must keep in mind your filesystem's block size. If you suffer slice fragmentation issues, you should
increase this or defrag the data more often. However you should not set it to be huge because then if a
large but allowed gap occurs it has to get filled in, which means instead of a simple 8-byte write to a
new file we could end up doing an ``(8 * MAX_SLICE_GAP)``-byte write to the latest slice.


Rollup Aggregation
------------------
Expected features such as roll-up aggregation and data expiration are not provided by Ceres itself, but
instead are implemented as maintenance plugins.

Such a rollup plugin exists in Ceres that aggregates data points in a way that is similar behavior of
Whisper archives. Where multiple data points are collapsed and written to a lower precision slice, and
data points outside of the set slice retentions are trimmed. By default, an average function is used,
however alternative methods can be chosen by changing the metadata.


Retrieval Behavior
------------------
When data is retrieved (scoped by a time range), the first slice which has data within the requested
interval is used. If the time period overlaps a slice boundary, then both slices are read, with their
values joined together. Any missing data between them are filled with null data points.

There is currently no support in Ceres for handling slices with mixed resolutions in the same way that
is done with Whisper archives.


Database Format
---------------
.. csv-table::
:delim: |
:widths: 10, 10, 10

CeresSlice|*Data*
|Data|*Point+*
Data types in Python's `struct format <http://docs.python.org/library/struct.html#format-strings>`_:

.. csv-table::
:delim: |

Point|``!d``

Metadata for Ceres is stored in `JSON format <https://docs.python.org/3/library/json.html>`_:

{"retentions": [[30, 1440]], "timeStep": 30, "xFilesFactor": 0.5, "aggregationMethod": "average"}
8 changes: 6 additions & 2 deletions docs/config-local-settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ CONF_DIR

STORAGE_DIR
`Default: GRAPHITE_ROOT/storage`
The base directory from which WHISPER_DIR, RRD_DIR, LOG_DIR, and INDEX_FILE default paths are referenced.
The base directory from which WHISPER_DIR, RRD_DIR, CERES_DIR, LOG_DIR, and INDEX_FILE default paths are referenced.

STATIC_ROOT
`Default: See below`
Expand Down Expand Up @@ -112,13 +112,17 @@ WHISPER_DIR
`Default: /opt/graphite/storage/whisper`
The location of Whisper data files.

CERES_DIR
`Default: /opt/graphite/storage/ceres`
The location of Ceres data files.

RRD_DIR
`Default: /opt/graphite/storage/rrd`
The location of RRD data files.

STANDARD_DIRS
`Default: [WHISPER_DIR, RRD_DIR]`
The list of directories searched for data files. By default, this is the value of WHISPER_DIR and RRD_DIR (if rrd support is detected). If this setting is defined, the WHISPER_DIR and RRD_DIR settings have no effect.
The list of directories searched for data files. By default, this is the value of WHISPER_DIR and RRD_DIR (if rrd support is detected). If this setting is defined, the WHISPER_DIR, CERES_DIR, and RRD_DIR settings have no effect.

LOG_DIR
`Default: STORAGE_DIR/log/webapp`
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Graphite Documentation
functions
dashboard
whisper
ceres
storage-backends
events
terminology
Expand Down
4 changes: 4 additions & 0 deletions docs/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,10 @@ Carbon and Graphite-web are installed in ``/opt/graphite/`` with the following l

Location for Whisper data files to be stored and read

- ``ceres``

Location for Ceres data files to be stored and read

- ``webapp/``

Graphite-web ``PYTHONPATH``
Expand Down
7 changes: 6 additions & 1 deletion webapp/graphite/readers.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,11 @@
if bool(whisper):
whisper__readHeader = whisper.__readHeader

try:
import ceres
except ImportError:
ceres = False

try:
import rrdtool
except ImportError:
Expand Down Expand Up @@ -116,7 +121,7 @@ def merge(self, results1, results2):

class CeresReader(object):
__slots__ = ('ceres_node', 'real_metric_path')
supported = True
supported = bool(ceres)

def __init__(self, ceres_node, real_metric_path):
self.ceres_node = ceres_node
Expand Down
6 changes: 6 additions & 0 deletions webapp/graphite/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,12 @@
STANDARD_DIRS.append(WHISPER_DIR)
except ImportError:
print >> sys.stderr, "WARNING: whisper module could not be loaded, whisper support disabled"
try:
import ceres # noqa
if os.path.exists(CERES_DIR):
STANDARD_DIRS.append(CERES_DIR)
except ImportError:
pass
try:
import rrdtool # noqa
if os.path.exists(RRD_DIR):
Expand Down

0 comments on commit 6696a95

Please sign in to comment.