Add logharbour wiki pages in the code repo

remiges-tech · Apr 27, 2024 · 705a1be · 705a1be
1 parent f8d826e
commit 705a1be
Show file tree

Hide file tree

Showing 7 changed files with 984 additions and 0 deletions.
diff --git a/wiki/A-central-log-repository.md b/wiki/A-central-log-repository.md
diff --git a/wiki/Access-control.md b/wiki/Access-control.md
@@ -0,0 +1,60 @@
+# The authorisation model and access control
+
+## LogHarbour is multi-tenant
+
+LogHarbour is designed from Day 1 to run as a shared service and maintain water-tight separation between organisations or user-groups, each of whom will have full access to manage their own LogHarbour data.
+
+## Immutability of logs
+
+LogHarbour does not export any interface to overwrite or modify any log entries. Short of breaking into the LogHarbour service or overcoming system security, no application has any means to change or delete any log entries, once written. There is no access capability or function in the API which provides such operations.
+
+The LogHarbour repository is stored in an ElasticSearch index. Each organisation or user-group gets their own index. Multiple indices may reside on a common set of servers. Business applications which get access to write logs into LogHarbour do not get access to overwrite, modify or delete log entries.
+
+## Of realms, users, applications and capabilities
+
+LogHarbour supports the idea of **realms**. In the real world, a realm may map onto an organisation or a user group. If LogHarbour is operated as a SaaS service, a realm may map onto an "account".
+
+A realm has a log repository with the three types of logs supported by LogHarbour: activity log, data-change log and debug log. There is one-to-one mapping between a log repository and a realm.
+
+There is no concept of a human user of LogHarbour. A business application, or a group of applications, "uses" LogHarbour, the way they use databases. Therefore, there is no concept of a user account with respect to LogHarbour.
+
+Within a realm, an authorised application with sufficient rights can perform all operations which LogHarbour supports. All data, configuration and resources used by LogHarbour are owned by a realm. No application has rights to perform any operation across more than one realm.
+
+Realms are defined *via* channels other than the LogHarbour API. (A command-line program will be run on a server which has direct access to the LogHarbour data store, and will add and remove realms. Therefore, realm management will be done by the same system operations team which installs new versions of LogHarbour software, takes data backups, *etc*. The teams which manage the business applications which will write logs into LogHarbour cannot manage realms.)
+
+LogHarbour does not have its own native UI where human users log in and perform operations. LogHarbour offers a client library, which application developers may use to integrate their application with it. If a UI is felt necessary, each business application may build its own UI and integrate their server-side code with the LogHarbour client library. (The library is currently in Go, and a Java version is slated for release before 4Q2024.)
+
+Each realm of LogHarbour comes with two access tokens: a write token and a query token. Any piece of code which links with the LogHarbour client library and has the write token to a realm can insert log entries into the LogHarbour repository. Any application which has the query token can access the repository and fetch any log entries. The two tokens are not interchangeable. If the tokens are lost, they can be regenerated, by using the same administrative tools (command-line programs) which are used to provision a new realm. But this cannot be done by the business application -- it must be done by the same system operations team which provisions new realms.
+
+A token is an opaque bag of bytes. It is always printable ASCII, since it's a base64-encoded binary stream, and it's always less than 5 Kbytes in length.
+
+If an intruder manages to steal a token, he may be able to write software which will access LogHarbour and carry out all operations which business applications can perform. Therefore, tokens are not to be displayed or disclosed openly.
+
+## How a LogHarbour instance is bootstrapped
+
+At the most fundamental level, LogHarbour is all about writing logs from one or more servers into a central LogHarbour repository -- everything else are wrappers.
+
+Therefore, when the LogHarbour API is first used by a new organisation:
+* a new realm is defined for the organisation
+* a new repository is initialised in the LogHarbour data store
+* Two tokens is created for this realm, and handed over to the team which manages the business applications
+
+This is the starting point. From this point, the business application takes over, and uses the LogHarbour client library to read and write into LogHarbour.
+
+Some command-line utilities is part of the LogHarbour product suite, which can generate and re-generate tokens for a realm. These tokens can then be embedded in the business applications which use LogHarbour, for writing log entries and querying.
+
+## Master data for authorisation
+
+LogHarbour has a private data store where it stores the following details about each realm:
+
+* `id`: an automatically incremented mandatory unique integer
+* `shortname`: mandatory, unique, a one-word string, always in lower-case, following the syntactic rules of identifiers in modern PL
+* `longname`: mandatory, a descriptive string
+* `createdat`: mandatory, timestamp
+* `writetokens`, `querytokens`: one or more write tokens and query tokens associated with this realm. All the write tokens are exactly equivalent to each other, and ditto for all the query tokens. LogHarbour does not remember which specific token was used for a specific operation.
+* `payload`: mandatory, JSONB, carrying all sorts of information about the LogHarbour repository, the ElasticSearch index which will be used to hold the log entries for this realm, *etc*
+
+## The API
+
+LogHarbour comes with client libraries in Java and Go, so that business applications may build LogHarbour into their application code. The specs are given [in this page](Client-library). It is important to study that API to understand the authorisation model and operations of LogHarbour clearly.
+
diff --git a/wiki/Architecture.md b/wiki/Architecture.md
@@ -0,0 +1,55 @@
+LogHarbour runs on servers and stores a repository of all logs generated by a business application. LogHarbour does not have any UI or human users.
+
+LogHarbour's software is in four parts:
+* a client library which links with business application code and implements a Kafka producer. This library is used by application code to write log entries into logs. Other functions in the library also implement a query interface to extract logs from the LogHarbour database.
+* a service, which implements a Kafka consumer, reads log entries from the Kafka stream, and writes them to an ElasticSearch database. This service is referred to as the "LogHarbour writer daemon"
+* an ElasticSearch cluster hosting one or more indexes (databases)
+* some administrative utilities, which are command-line programs run on demand
+
+Multiple business applications running on multiple servers can all pump log entries into their respective LogHarbour repositories, but these repositories may be hosted on a common cluster of servers and managed with a single writer daemon.
+
+## The ElasticSearch database
+
+The database runs on a cluster of three or more servers, and organises data in multiple indexes. An index is dedicated for each realm. (For details about realms see [the page on access controls](Access-control).)
+
+Every time a new realm is created
+* a new ElasticSearch index is created,
+* replication is set up for this index across the cluster of servers so that at least three copies are stored of each record.
+* Two roles are created on the index, one for full read-write access to the index and one for read-only access.
+* The read-write role's access credentials are appended to the realm ID and other data to create a string, this string is encrypted with a shared key known only to ElasticSearch (this shared key is referred to as the token encryption key), and the final encrypted string is base64-encoded. The final output is called the write token for the realm.
+* The read-only role's access credentials are appended to the realm ID, index URI and other data to create a string, and it is base64-encoded. This final base64-encoded output is called the query token for the realm.
+* The realm-creation tools write the write token and query token for the realm to an output file, so that they may be shared with the business application which intends to use this new realm in LogHarbour.
+* The two tokens, plus data about the index and servers, are all written to a special internal ElasticSearch index which holds only internal meta-data for LogHarbour's internal use. (For details of this data, see [Master data for authorisation](Access-control#master-data-for-authorisation).)
+
+At this point, the LogHarbour index for the new realm is ready for use.
+
+## The client library
+
+This library has functions which (a) queue log entries for eventual insertion into the ElasticSearch index, and (b) query the index to extract logs for display or analysis.
+
+All functions which queue log entries for insertion require the write token for the realm, and all functions which query the log repository require the query token.
+
+When a client library function needs to queue a message for eventual insertion, it writes a record in the local Kafka stream where the first field is the write token.
+
+When a function needs to query the repository, it base64-decodes the query token and parses the resultant string. From the parsed data, it extracts
+* the URI needed to connect to one of the servers which host the index which belongs to it
+* the identity of the index which belongs to its realm
+* the access credentials for read-only access to the index
+
+Using this information, it connects to the index, fires the query and pulls out the result needed.
+
+## The writer daemon
+
+This runs on one server (or two servers for redundancy) and reads messages coming in on the Kafka stream. For each message, it first separates out the write token, base64-decodes it, decrypts it, and then extracts from it various details like:
+* the URI of the ElasticSearch index for this message's realm
+* the index name and path
+* the user credentials needed to insert records into this index
+
+Using this information, it logs the record into ElasticSearch.
+
+## The command-line tools
+
+These tools perform three tasks:
+* when a new realm is created, it creates a new ElasticSearch index, sets up replication for it across the cluster, creates two access roles in that index, one for writing and one for querying, creates tokens out of these two as described earlier, and writes all this information into a special private ElasticSearch index used only for LogHarbour metadata.
+* when an additional write token or query token is required for a realm, it generates a new role in the ElasticSearch index for this realm, generates a token out of these credentials, adds the token to the set of tokens against the metadata of this realm in LogHarbour, and writes out the new token to a file so that it may then be shared with the business application.
+* when a token needs to be deactivated, it removes the credentials from the ElasticSearch index so that those credentials cease to work any more, and then it removes the token from the LogHarbour metadata for the realm.