Skip to content

Commit

Permalink
Merge pull request #5 from neicnordic/docs/forgotten-updates
Browse files Browse the repository at this point in the history
Docs/forgotten updates
  • Loading branch information
blankdots committed Nov 5, 2020
2 parents 01d65f2 + a68b343 commit 868039f
Show file tree
Hide file tree
Showing 8 changed files with 7,422 additions and 9,486 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/spellcheck.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,4 @@ jobs:
- name: Test spelling errors
shell: bash
run: |
bin/misspell -error docs/*
bin/misspell -error docs/*.rst
96 changes: 76 additions & 20 deletions docs/connection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The RabbitMQ message brokers of each SDA instance are the **only**
components with the necessary credentials to connect to Central EGA
message broker.

We call ``CEGAMQ`` and ``LocalMQ`` (Local Message Broker),
We call ``CEGAMQ`` and ``LocalMQ`` (Local Message Broker, also known as ``sda-mq``),
the RabbitMQ message brokers of, respectively, ``Central EGA``
and ``SDA``/``LocalEGA``.

Expand Down Expand Up @@ -45,6 +45,12 @@ The following environment variables can be used to configure the broker:
| | with CentralEGA |
+----------------------+----------------------------------------------+


.. note:: For SDA stand-alone do not use ``CEGA_CONNECTION`` and do not set up
``Intercept`` service. This will cause no messages to be shoveled to a
CentralEGA, whilst the queues stay the same. ``Orchestrator`` service
would need to be set up to send and recive messages between other services.

Central EGA connection
----------------------

Expand All @@ -58,6 +64,9 @@ creates the credentials to connect to that ``vhost`` in the form of a
amqp[s]://<user>:<password>@<cega-host>:<port>/<vhost>
.. note:: All the messages received from CEGA are intercepted by ``Intercept`` service
and forwarded to the right queue in the ``LocalMQ``


``CEGAMQ`` contains an exchange named ``localega.v1``. ``v1`` is used for
versioning and is internal to CentralEGA. The queues connected to that
Expand All @@ -77,17 +86,33 @@ exchange are also internal to CentralEGA.
| inbox | Notifications of uploaded files |
+-----------------+-------------------------------------------------+

``LocalMQ`` contains two exchanges named ``lega`` and ``cega``,
``LocalMQ`` contains two exchanges named ``sda`` and ``to_cega``,
and the following queues, in the default ``vhost``:

+-----------------+---------------------------------------+
| Name | Purpose |
+=================+=======================================+
| files | Trigger for file ingestion |
| archived | Archived files |
+-----------------+---------------------------------------+
| backup | Signal files to backup |
+-----------------+---------------------------------------+
| completed | Files are backed up |
+-----------------+---------------------------------------+
| error | User-related errors |
+-----------------+---------------------------------------+
| files | Receive notification for ingestion |
| | from ``CEGAMQ`` or Orchestrator |
+-----------------+---------------------------------------+
| inbox | Notifications of uploaded files |
+-----------------+---------------------------------------+
| archived | The file is in the archive |
| ingest | Trigger for file ingestion |
+-----------------+---------------------------------------+
| stableIDs | Receive Accession IDs from ``CEGAMQ`` |
| mappings | Received Dataset to file mapping |
+-----------------+---------------------------------------+
| accessionIDs | Receive Accession IDs from ``CEGAMQ`` |
| | or Orchestrator |
+-----------------+---------------------------------------+
| verified | Files ingested and verified |
+-----------------+---------------------------------------+

``LocalMQ`` registers ``CEGAMQ`` as an *upstream* and listens to the
Expand All @@ -97,24 +122,27 @@ are no messages to work on, ``LocalMQ`` will ask its upstream queue if
it has messages. If so, messages are moved downstream. If not the
Ingest Service will wait for messages to arrive.

.. note:: In order to start a standalone instance of the ``SDA``.
.. note:: More information can be found also at:
https://localega.readthedocs.io/en/latest/amqp.html#message-interface-api-cega-connect-lega


``CEGAMQ`` receives notifications from ``LocalMQ`` using a
*shovel*. Everything that is published to its ``cega`` exchange gets
forwarded to CentralEGA (using the same routing key). This is how we
propagate the different status of the workflow to CentralEGA, using
*shovel*. Everything that is published to its ``to_cega`` exchange gets
forwarded to CentralEGA (using the routing key based on the name ``files.<internal_queue_name>``).
We propagate the different status of the workflow to CentralEGA, using
the following routing keys:

+-----------------------+-------------------------------------------------------+
| Name | Purpose |
+=======================+=======================================================+
| files.verified | In case the file is properly ingested and verified |
+-----------------------+-------------------------------------------------------+
| files.completed | In case the file has been stored in the archive |
| files.completed | For back-up files, ready to be distributed |
+-----------------------+-------------------------------------------------------+
| files.error | In case a user-related error is detected |
+-----------------------+-------------------------------------------------------+
| files.inbox | For inbox file operations |
+-----------------------+-------------------------------------------------------+
| files.verified | For files ready to request accessionID |
+-----------------------+-------------------------------------------------------+

Note that we do not need at the moment a queue to store the completed
message, nor the errors, as we forward them to Central EGA.
Expand Down Expand Up @@ -150,6 +178,8 @@ It is necessary to agree on the format of the messages exchanged
between Central EGA and any Local EGAs. Central EGA's messages are
JSON-formatted.

The JSON schemas can be found in: https://github.com/neicnordic/sda-pipeline/tree/master/schemas

When a ``Submission Inbox`` sends a message to CentralEGA it contains the
following:

Expand All @@ -173,7 +203,16 @@ In order to identify the type of inbox activity,
* ``rename`` - when a file is renamed.

CentralEGA triggers the ingestion and the message sent to ``files`` queue
contains the same information.
contains the same information. In order to distinguish messages,
Central EGA adds a field named type to all outgoing messages.
There are 5 types of messages:

* ``type=ingest``: an ingestion trigger
* ``type=cancel``: an ingestion cancellation
* ``type=accession``: contains an accession id
* ``type=mapping``: contains a dataset to accession ids mapping
* ``type=heartbeat``: A mean to check if the Local EGA instance is “alive”


.. important:: The ``encrypted_checksums`` key is optional. If the key is not present
the sha256 checksum will be calculated by ``Ingest`` service.
Expand All @@ -186,8 +225,12 @@ The ``Ingest`` service upon successful operation will send a message to
{
"user":"john",
"filepath":"somedir/encrypted.file.gpg",
"file_checksum": "abcdefghijklmnopqrstuvwxyz"
"fileID": "1",
"filepath":"somedir/encrypted.file.c4gh",
"archivePath": "somedir/archived.file.c4gh",
"encrypted_checksums": [
{ "type": "sha256", "value": "12345678901234567890"}
]
}
``Verify`` service will consume set message and will forward to ``verified`` queue
Expand All @@ -198,30 +241,43 @@ which will respond with the same content, but adding the `Accession ID`.
{
"user":"john",
"filepath":"somedir/encrypted.file.gpg",
"file_checksum": "abcdefghijklmnopqrstuvwxyz",
"filepath":"somedir/encrypted.file.c4gh",
"decrypted_checksums": [
{ "type": "md5", "value": "abcdefghijklmnopqrstuvwxyz"},
{ "type": "sha256", "value": "12345678901234567890"}
]
}
``Finalize`` service should receive the message below and assign the `Accession ID` to the
corresponding file and send a message to ``completed`` queue.
corresponding file and send a message to ``backup`` queue for the backup services or in case there
is no backup service to the ``completed`` queue.

.. code-block:: javascript
{
"user":"john",
"filepath":"somedir/encrypted.file.gpg",
"accession_id": "EGAF001",
"filepath":"somedir/encrypted.file.c4gh",
"accession_id": "EGAF12345678901",
"decrypted_checksums": [
{ "type": "md5", "value": "abcdefghijklmnopqrstuvwxyz"},
{ "type": "sha256", "value": "12345678901234567890"}
]
}
``Mapper`` service after the file has been published should receive a message
containing accession IDs mapping between files and datasets

.. code-block:: javascript
{
"user":"john",
"filepath":"somedir/encrypted.file.c4gh",
"dataset_id": "EGAD12345678901",
"accession_ids": ["EGAF12345678901", "EGAF12345678902"]
}
.. |connect| unicode:: U+21cc .. <->
.. _RabbitMQ: http://www.rabbitmq.com
.. _RabbitMQ 3.7.8: https://hub.docker.com/_/rabbitmq
41 changes: 24 additions & 17 deletions docs/db.rst
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,12 @@ This is the core table of the schema, which holds file identifiers, status, meta
+------------------------------------------+--------------------+
| created_by | name |
+------------------------------------------+--------------------+
| decrypted_file_size | varchar |
+------------------------------------------+--------------------+
| decrypted_file_checksum_type | checksum_algorithm |
+------------------------------------------+--------------------+
| decrypted_file_size | int8 |
+------------------------------------------+--------------------+
| encryption_method | varchar |
+------------------------------------------+--------------------+
| header | text |
Expand Down Expand Up @@ -193,7 +199,7 @@ Checksums are recorded in order to keep track of already used session keys,

status
""""""
This table holds file statuses, which can range from INIT, IN_INGESTION, ARCHIVED, COMPLETED, READY, ERROR and DISABLED.
This table holds file statuses, which can range from ``INIT``, ``IN_INGESTION``, ``ARCHIVED``, ``COMPLETED``, ``READY``, ``ERROR`` and ``DISABLED``.

+-------------+-----------+
| Column Name | Data type |
Expand Down Expand Up @@ -243,53 +249,54 @@ check_session_keys_checksums_sha256
"""""""""""""""""""""""""""""""""""
It returns if the session key checksums are already found in the database.

* Inputs: checksums
* Inputs: ``checksums``

finalize_file
"""""""""""""
It flags files as READY, by setting their stable id and marking older ingestions as deprecated.

* Inputs: inbox_path, elixir_id, archive_file_checksum, archive_file_checksum_type, stable_id
* Target: local_ega.files
* Inputs: ``inbox_path``, ``elixir_id``, ``archive_file_checksum``, ``archive_file_checksum_type``, ``stable_id``
* Target: ``local_ega.files``

insert_error
""""""""""""
It adds an error entry of a file submission.

* Inputs: file_id, hostname, error_type, msg, from_user
* Target: local_ega.errors
* Inputs: ``file_id``, ``hostname``, ``error_type``, ``msg``, ``from_user``
* Target: ``local_ega.errors``

insert_file
"""""""""""
It adds a new file entry and deprecates old faulty submissions of the same file if present.

* Inputs: submission_file_path, submission_user
* Target: local_ega.main
* Inputs: ``submission_file_path``, ``submission_user``
* Target: ``local_ega.main``

is_disabled
"""""""""""
It returns whether a given entry is disabled or not.

* Input: file id:
* Input: ``file id``

main_updated
""""""""""""
It synchronises the timestamp for each row after update on main.

* Input: None
* Target: local_ega.main
* Target: ``local_ega.main``

mark_ready
""""""""""
When triggered after a file is marked as READY, it deactivates all errors of the given entry.

* Inputs: None
* Target: mark_ready
* Target: ``mark_ready``

local_ega_download tables
^^^^^^^^^^^^^^^^^^^^^^^^^

.. image:: /static/localega-download-schema.svg
:width: 300
:alt: localega download database schema

requests
Expand Down Expand Up @@ -358,23 +365,23 @@ download_complete
"""""""""""""""""
It marks a file download as complete, and calculates the download speed.
Inputs: requested file id, download size, speed
Target: local_ega_download.success
Target: ``local_ega_download.success``

insert_error
""""""""""""

It adds an error entry of a file download.

* Inputs: requested file id, hostname, error code, error description
* Target: local_ega_download.errors
* Target: ``local_ega_download.errors``

make_request
""""""""""""

It inserts a new request or reuses and old request entry of a given file.

* Inputs: stable id, user information, client ip, start coordinate and end coordinate
* Target: local_ega_download.requests
* Target: ``local_ega_download.requests``

local_ega_ebi tables
^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -419,12 +426,12 @@ local_ega_ebi views

file
""""
View for EBI Data-Out which contains all local_ega.main entries marked as ready.
View for EBI Data-Out which contains all ``local_ega.main`` entries marked as ready.

file_dataset
""""""""""""
Used to synchronise with the entity eu.elixir.ega.ebi.downloader.domain.entity.FileDataset.
Used to synchronise with the entity ``eu.elixir.ega.ebi.downloader.domain.entity.FileDataset``.

file_index_file
"""""""""""""""
Used to synchronise with the entity eu.elixir.ega.ebi.downloader.domain.entity.FileIndexFile.
Used to synchronise with the entity ``eu.elixir.ega.ebi.downloader.domain.entity.FileIndexFile``.
Binary file modified docs/static/CEGA-LEGA.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 868039f

Please sign in to comment.