TSFDB server

Overview

TSFDB is a Time Series Database built on top of FoundationDB. TSFDB uses the Connexion library on top of Flask.

TSFDB employs several processes with different purposes

tsfdb: uWSGI powered web app which reads and writes metrics from/to FoundationDB
tsfdb-consumer: Process which consumes queues written by tsfdb and transforms metrics from bulk data to a key-value format
tsfdb-retentions: Process which applies a retention policy in order to conserve storage (Optional)
tsfdb-scraper: Process which scrapes FoundationDB status data and stores it in the Time Series layer (Optional)

#TODO

# adding the mist helm repo
helm repo add mist https://mist-charts.storage.googleapis.com

# installing the helm chart
helm install mist/tsfdb

The FoundationDB operator should be present on the cluster before installing tsfdb.

These instructions are based on the internal tsfbd metrics exposed as a prometheus endpoint.

Degraded processes
#TODO
Least Operating Space Storage / Log
if the space storage is decreasing dramatically try to either:
- Scale up the cluster (Storage or Log)
- Apply more aggressive retentions
Moving Data in Flight / Queue
Flight -> Actual data being transferred
Queue -> Data that is planed to move
if MDF or MDQ > 10% Total key-value space then it means that there is a lot of data transfer
Try to either:
- Alter the knobs for data relocation
```
- knob_relocation_parallelism_per_source_server=6
- knob_fetch_keys_parallelism_bytes=1e+07
- knob_max_outstanding=256
```
- Scale up the cluster (Storage or Log)
Total Queues, Resources, Queues Sizes
if Total Queues == Resources or avg(Queue Size > 10) or non realtime metric data then:
- Scale up tsfdb producers, usually 1 producer to 5 monitored resources
Worst Storage / Log Queue Size
if Storage Queue > 900MB or Log Queue > 1.6GB -> throttling happens
Try to:
- Scale up the cluster (Storage or Log)
- Alter the knobs for relocation
```
- knob_relocation_parallelism_per_source_server=6
- knob_fetch_keys_parallelism_bytes=1e+07
- knob_max_outstanding=256
```
Storage Data lag / Worst lag between log and storage
if lag > 1 min and increasing, then there is definitely a problem: Try the same as 5. Usually 5,6 happen together.

Name		Name	Last commit message	Last commit date
Latest commit History 262 Commits
.openapi-generator		.openapi-generator
chart/tsfdb		chart/tsfdb
tsfdb_server_v1		tsfdb_server_v1
.apigentools-info		.apigentools-info
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.openapi-generator-ignore		.openapi-generator-ignore
.travis.yml		.travis.yml
Architecture.svg		Architecture.svg
Dockerfile		Dockerfile
README.md		README.md
consumer.py		consumer.py
fdb_stats_scraper.py		fdb_stats_scraper.py
git_push.sh		git_push.sh
requirements.txt		requirements.txt
retentions.py		retentions.py
setup.py		setup.py
test-requirements.txt		test-requirements.txt
tox.ini		tox.ini
validation.py		validation.py

Storage	Log	Metrics
4	3	2.3K
8	4	10.4K