Skip to content

Commit

Permalink
Merge d6f61e2 into d28a1e0
Browse files Browse the repository at this point in the history
  • Loading branch information
c0c0n3 committed Oct 16, 2019
2 parents d28a1e0 + d6f61e2 commit b544c1e
Show file tree
Hide file tree
Showing 5 changed files with 247 additions and 13 deletions.
75 changes: 70 additions & 5 deletions docs/manuals/admin/dataMigration.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,74 @@
# Data-Migration-Tool
# Data Migration

Data-Migration-Tool is designed to automatically migrate data stored in [STH-Comet](https://github.com/telefonicaid/fiware-sth-comet) to [QuantumLeap](https://github.com/smartsdk/ngsi-timeseries-api) database. After data migration, data can be accessed by using QuantumLeap's [API](https://app.swaggerhub.com/apis/smartsdk/ngsi-tsdb/0.2).
A few tools are available to assist with migrating data to QuantumLeap.

This tool is developed in [Java](https://en.wikipedia.org/wiki/Java_(software_platform)) using [Eclipse](https://www.eclipse.org/). A python script is used to convert data in [MongoDB](https://github.com/mongodb/mongo) to be compatible for [CrateDB](https://github.com/crate/crate).

Tool is available [here](https://github.com/Data-Migration-Tool/STH-to-QuantumLeap).
## Migrating STH Comet data

User guide for the tool is available [here](https://github.com/Data-Migration-Tool/STH-to-QuantumLeap/blob/master/docs/manuals/README.md).
[Data-Migration-Tool][dmt] is a program designed to automatically
migrate data stored in [STH-Comet][comet] to a QuantumLeap [CrateDB][crate]
database. After migration, the data can be accessed through QuantumLeap's
[REST API][ql-api].

[Data-Migration-Tool][dmt] is developed in [Java][java] using the
[Eclipse IDE][eclipse]. A Python script transforms data in [MongoDB][mongo]
into the format expected by the QuantumLeap [CrateDB][crate] back end.

The tool can be downloaded [here][dmt] and the accompanying user guide
is also [available online][dmt-man].


## Migrating from QuantumLeap Crate to Timescale

QuantumLeap provides a self-contained Python script to help with
migrating tables from a QuantumLeap CrateDB database to a QuantumLeap
Timescale database. The script is located in the `timescale-container`
directory and is called `crate-exporter.py`.
It exports rows in a given Crate table and generates, on `stdout`,
all the SQL statements needed to import that data into Timescale.
These include creating a corresponding schema, table and hypertable
in PostgreSQL as needed. Note that the script generates DDL statements
that, when executed, will result in the exact same table structures
the QuantumLeap Timescale back end would have generated on seeing
NGSI entities corresponding to the rows stored in the Crate table.

Here's an example usage

$ python crate-exporter.py --schema mtyoutenant --table etdevice \
> mtyoutenant.etdevice-import.sql

where we export all the rows in the Crate table `mtyoutenant.etdevice`.
The generated file contains all the SQL statements to recreate the
table and insert the data in Timescale. You may want to put this file
in the `quantumleap-db-setup` script's init directory so that data
are migrated automatically for you when you bootstrap the QuantumLeap
DB on Timescale as explained in the [Timescale section][ts-admin].

By default the script exports all the rows in the Crate table, but
you can also use the `--query` argument to specify a query to select
only a subset of interest as shown below:

$ python crate-exporter.py --schema mtyoutenant --table etdevice --query \
"SELECT * FROM mtyoutenant.etdevice where time_index > '2019-04-15';"




[comet]: https://github.com/telefonicaid/fiware-sth-comet
"FiWare STH Comet Home"
[crate]: https://crate.io
"CrateDB Home"
[dmt]: https://github.com/Data-Migration-Tool/STH-to-QuantumLeap
"Data-Migration-Tool Home"
[dmt-man]: https://github.com/Data-Migration-Tool/STH-to-QuantumLeap/blob/master/docs/manuals/README.md
"Data-Migration-Tool Manual"
[eclipse]: https://www.eclipse.org/
"Eclipse Home"
[java]: https://en.wikipedia.org/wiki/Java_(software_platform)
"Wikipedia - Java"
[mongo]: https://github.com/mongodb/mongo
"MongoDB Home"
[ql-api]: https://app.swaggerhub.com/apis/smartsdk/ngsi-tsdb/0.2
"QuantumLeap REST API"
[ts-admin]: ./timescale.md
"QuantumLeap Timescale"
27 changes: 19 additions & 8 deletions docs/manuals/admin/grafana.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,22 @@
[**Grafana**](https://grafana.com/) is a powerful visualisation tool that we
can use to display graphics of the persisted data.

In order to read data from a [CrateDB](./crate.md) database for your dashboards
in Grafana, you should use the [Postgres Datasource](http://docs.grafana.org/features/datasources/postgres/).
The Postgres Datasource should come preinstalled in the latest Grafana versions.
You can easily connect Grafana to either the QuantumLeap [CrateDB](./crate.md)
or [Timescale](./timescale.md) back end to visualise QuantumLeap data on
your dashboards. In both cases, the Grafana data source to use is the
[Postgres Datasource](http://docs.grafana.org/features/datasources/postgres/)
which normally ships with recent versions of Grafana.

If you followed the [Installation Guide](./index.md), you have already Grafana
running in a Docker container. If deployed locally, it's probably at [http://0.0.0.0:3000](http://0.0.0.0:3000)

You can now follow Crate's recommendations on how to configure the datasource
by checking out [this post](https://crate.io/a/pair-cratedb-with-grafana-an-open-platform-for-time-series-data-visualization/).
If you already put some data in your database, you can jump directly to the
"Add a Data Source" section. The main parts of such post are convered below.
If you're using the CrateDB back end, we suggest you read
[this blog post](https://crate.io/a/pair-cratedb-with-grafana-an-open-platform-for-time-series-data-visualization/)
and follow Crate's recommendations on how to configure the Grafana
datasource which we have summarised in the below section.

## Configuring the DataSource

## Configuring the DataSource for CrateDB

Explore your deployed Grafana instance (e.g [http://0.0.0.0:3000](http://0.0.0.0:3000)).
If you didn't change the defaults credentials, use `admin` as both user and
Expand All @@ -41,6 +44,14 @@ look like

Click *Save & Test* and you should get an OK message.


## Configuring the DataSource for PostgreSQL

The process is pretty much the same as outlined above and is well documented
in the Grafana [PosgreSQL data source manual](https://grafana.com/docs/features/datasources/postgres/).
Note that you should enable the *TimescaleDB* data source option.


## Using the DataSource in your Graph

Having your datasource setup, you can start using it in the different
Expand Down
1 change: 1 addition & 0 deletions docs/manuals/admin/ports.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ are used within the cluster.
|TCP | 27017 | Mongo database |
|TCP | 4200 | CrateDB Admin UI |
|TCP | 4300 | CrateDB Transport Protocol |
|TCP | 5432 | PostgreSQL Protocol |
|TCP | 6379 | Redis cache (used by geocoding) |

For more info on ports numbers, you can always inspect the ports being exposed
Expand Down
156 changes: 156 additions & 0 deletions docs/manuals/admin/timescale.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Timescale

[Timescale][timescale] is one of the time series databases that can be
used with QuantumLeap as a back end to store NGSI entity time series.
Indeed, QuantumLeap provides full support for storing NGSI entities in
Timescale, including geographical features (encoded as GeoJSON or NGSI
Simple Location Format), structured types and arrays. Moreover, it is
possible to dynamically select, at runtime, which storage back end to
use (Crate or Timescale) depending on the tenant who owns the entity
being persisted. Also, QuantumLeap ships with tools to automate the
Timescale back end setup and generate Crate-to-Timescale migration
scripts---details in the [Data Migration section][admin.dm].


## Operation overview

QuantumLeap stores NGSI entities in Timescale using the existing
`notify` endpoint. The Timescale back end is made up of [PostgreSQL][postgres]
with both Timescale and [PostGIS][postgis] extensions enabled:

-------------------------
| Timescale PostGIS | ---------------
| --------------------- | <----- | QuantumLeap |-----O notify
| Postgres | ---------------
-------------------------

PostgreSQL is a rock-solid, battle-tested, open source database,
and its PostGIS extension provides excellent support for advanced
spatial functionality while the Timescale extension has fairly
robust support for time series data. The mechanics of converting
an NGSI entity to tabular format stay pretty much the same as in
the Crate back end except for a few improvements:

* NGSI arrays are stored as (indexable & queryable) JSON as opposed
to the flat array of strings in the Crate back end.
* GeoJSON and NGSI Simple Location Format attributes are stored as
spatial data that can be indexed and queried---full support for
spatial attributes is still patchy in the Crate back end.

The `test_timescale_insert.py` file in the QuantumLeap source base
contains quite a number of examples of how NGSI data are stored in
Timescale.

#### Note: querying & retrieving data
At the moment, QuantumLeap does **not** implement any querying or
retrieving of data through the QuantumLeap REST API as is available
for the Crate back end. This means that for now the only way to access
your data is to query the Timescale DB directly. However, data querying
and retrieval through the REST API is planned for the upcoming
QuantumLeap major release.


## QuantumLeap Timescale DB setup

In order to start using the Timescale back end, a working PostgreSQL
installation is required. Specifically, QuantumLeap requires
**PostgreSQL server 10 or above with the Timescale and PostGIS
extensions already installed** on it. The Docker file in the
`timescale-container/test` can be used to quickly spin up a Timescale
server back end to which QuantumLeap can connect, but for
production deployments a more sophisticated setup is likely to
be needed---e.g. configuring PostgreSQL for high availability.

Once Timescale is up and running, you will have to bootstrap the
QuantumLeap DB and perhaps you may also want to migrate some data
from Crate. QuantumLeap ships with a self-contained Python script
that can automate most of the steps in the process. The script file
is named `quantumleap-db-setup` and is located in the
`timescale-container` directory. It does these three things, in order:

1. Bootstrap the QuantumLeap database if it doesn't exist. It creates
a database for QuantumLeap with all required extensions as well as
an initial QuantumLeap role. If the specified QuantumLeap DB already
exists, the bootstrap phase is skipped.
2. Run any SQL script found in the specified init directory---defaults
to `./ql-db-init`. It picks up any `.sql` file in this directory
tree and, in turn, executes each one in ascending alphabetical
order, stopping at the first one that errors out, in which case
the script exits.
3. Load any data file found in the above init directory. A data file
is any file with a `.csv` extension found in the init directory
tree. Each data file is expected to contain a list of records in
the CSV format to be loaded in a table in the QuantumLeap
database---field delimiter `,` and quoted fields must be quoted
using a single quote char `'`. The file name without the `.csv`
extension is taken to be the FQN of the table in which data should
be loaded, whereas the column spec is given by the names in the
CSV header, which is expected to be in the file. Data files are
loaded in turn following their alphabetical order, stopping at
the first one that errors out, in which case the script exits.

(2) and (3) are mostly relevant for data migration (more about it
in the section below), but the script can just as well be used to
execute arbitrary SQL statements. Note that the Docker compose
file mentioned earlier spins up a Timescale container (with PostGIS)
and another container that will run the script using
`timescale-container/test/ql-db-init` as init directory,
providing a working Timescale DB, complete with some tables
and test data.


## Using the Timescale back end

Once you have a Postgres+Timescale+PostGIS server with a freshly
minted QuantumLeap DB in it, you are ready to connect QuantumLeap
to the DB server. To do that, some environment variables have to
be set and a YAML file edited. The environment variables to use
are:

* `POSTGRES_HOST`: the hostname or IP address of your Timescale server.
Defaults to `timescale` if not specified.
* `POSTGRES_PORT`: the server port to connect to, defaults to `5432`.
* `POSTGRES_DB_NAME`: the name of the QuantumLeap DB, defaults to
`quantumleap`.
* `POSTGRES_DB_USER`: the DB user QuantumLeap should use to connect,
defaults to `quantumleap`.
* `POSTGRES_DB_PASS`: the above user's password, defaults to `*`.
* `POSTGRES_USE_SSL`: should QuantumLeap connect to PostgreSQL using
SSL? If so, then set this variable to any of: `true`, `yes`, `1`, `t`.
Specify any other value or don't set the variable at all to use a
plain TCP connection.
* `QL_CONFIG`: absolute pathname of the QuantumLeap YAML configuration
file. If not set, the default configuration will be used where only
the Crate back end is available.

The YAML configuration file specifies what back end to use for which
tenant as well as the default back end to use for any other tenant
not explicitly mentioned in the file. Here's an example YAML
configuration:

tenants:
t1:
backend: Timescale
t2:
backend: Crate
t3:
backend: Timescale

default-backend: Crate

With this configuration, any NGSI entity coming in for tenant `t1`
or `t3` will be stored in Timescale whereas tenant `t2` will use
Crate. Any tenant other than `t1`, `t2`, or `t3` gets the default
Crate back end.




[admin.dm]: ./dataMigration.md
"QuantumLeap Data Migration"
[postgres]: https://www.postgresql.org
"PostgreSQL Home"
[postgis]: https://postgis.net/
"PostGIS Home"
[timescale]: https://www.timescale.com
"Timescale Home"
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,6 @@ nav:
- 'Sanity Check': 'admin/check.md'
- 'Ports': 'admin/ports.md'
- 'CrateDB': 'admin/crate.md'
- 'Timescale': 'admin/timescale.md'
- 'Grafana': 'admin/grafana.md'
- 'Data-Migration': 'admin/dataMigration.md'

0 comments on commit b544c1e

Please sign in to comment.