JEP |
212 |
Title |
External Logging API Plugin |
Sponsor |
|
Status |
Deferred ⌛ |
Type |
Standards |
Created |
2018-07-24 |
BDFL-Delegate |
|
Discussions-To |
|
Requires |
On large-scale Jenkins instances master Disk and Network I/O become bottlenecks in particular cases. Build logging and Task reporting are one for the most intensive I/O consumers, hence it would be great to somehow redirect them to an external system. This is a continuation of the original story we had back in 2016 (see the public design document here). JEP-TODO defines requirements to the external logging system and Core APIs.
The External Logging API plugin is a separate plugin which offers shared functionality to External Logging implementations.
-
Configuration UI
-
Extension Points for External Logging implementations
-
Base classes for external logging, which simplify implementations
-
Other shared utility classes
Event is an atomic entry of the external logging. In the data model it includes the following fields:
-
id
(long) - ID of the message. It must be unique within a singleLoggingMethod
writer -
message
(string) - Contains the message body. It may be a single-line or a multi-line message. The message may also contain annotations. -
timestamp
(long) - Timestamp of the message, measured in milliseconds, between the current time and midnight, January 1, 1970 UTC -
data
(Map<String, Serializable>
) - additional metadata. This metadata will be used to categorize event entries.
|
(svanoort) Timestamp when first part of message generated, or last? Or perhaps both? |
|
(svanoort) Why not enforce something simpler than Serializable for (meta)data which can be consumed by non-Java clients potentially, i.e. JSON perhaps? Better yet: JSON (or Map<String, JsonType> that gets converted to a JSON document) for most things, and a Map<String, Serializable> of JenkinsInternal stuff - with as little as possible in the latter category. To do cloud-native effectively, we need to have something consumable directly from the browser or that potentially be consumed by a non-Java source. (ab)use of java serialization is generally discouraged at this point - we should only be doing it where 100% required. Actually yeah, here’s my proposal: we have JSON field that can be trivially converted by Java since types will map over. Then we have a Map<String, Serializable> with serializables stored (edited)base64, which can be ignored by non-Java consumers. |
Some metadata is mandatory. Requited fields are defined below.
The following metadata entries are required to be present for all event entries:
-
jenkinsId
- Instance ID of the Jenkins instance -
loggableType
- Type of theLoggable
entry, e.g.run
-
loggableId
- Identifier of theLoggable
entry
|
(svanoort) Re: |
The following metadata entries are required to be present
for all hudson.model.Run
event entries:
-
All event metadata
-
jobId
- ID of the Job, should be provided by Unique ID Plugin
The plugin should offer the following extension points:
-
ExternalLoggingMethodFactory
- producedExternalLoggingMethod
instances for Runs and, eventually, other objects. This extension point implementsDescribable
and should be configurable via UI. -
ExternalLogBrowserFactory
- Same as above, but forExternalLogBrowser
This class defines how the logs should be sent to the storage. Logging method generally does not define the storage itself, because it may be pointing to intermediate log collectors like Fluentd or Logstash.
Produces ExternalLoggingMethod
instances for Runs and, eventually, other objects.
This extension point implements Describable
and should be configurable via UI.
ExternalLoggingMethod is a new extension point on the top of LoggingMethod
:
-
It implements
LoggingMethod
interfaces -
It provides a new
createWriter()
method, which produces aExternalLoggingEventWriter
class instance.-
This method automatically injects the mandatory metadata for events
-
An abstract
_createWriter()
method is offered for implementation by downstream implementations
-
-
Being compared to
LoggingMethod
, the implementation is designed to always perform logging from the agent side
ExternalLoggingEventWriter class:
-
The class is
Serializable
. It will be sent to the agent side. -
The class offers the following abstract methods:
-
void writeEvent(Event event) throws IOException
-
writing event to the remote storage
-
-
-
The class also stores metadata, which may be injected into the events
-
The class stores a Map of Serializable metadata entries
-
The class offers API, which allow setting the metadata. This API will be used by
ExternalLoggingMethod
implementations and other logic to provide additional metadata if required
-
|
(svanoort) Re: Map of Serializable metadata entries Similarly, consider a non-Java map of metadata that can be consumed by other sources, plus a smaller Java serialized grouping. This makes parsing easier for non-java consumers too |
Log Browser class is an instance, which refers ways to access the logs on the remote storage.
It should offer the following methods:
-
AnnotatedLargeText<T> overallLog()
- Get large text for the entire execution/run -
AnnotatedLargeText<T> stepLog(@CheckForNull String stepId, boolean completed)
- Get large text for a particular step
Some implementations should be also moved from Run
and generalized.
It will provide default convenience methods which can be overridden by implementations for better performance.
-
InputStream getLogInputStream() throws IOException
- gets the log as an input stream -
Reader getLogReader() throws IOException
- get the log as a Reader -
String getLog() throws IOException
- gets the entire log as a single String-
This method is deprecated in
hudson.model.Run
, and it should remain deprecated
-
-
List<String> getLog(int maxLines) throws IOException
- gets a number of log lines as a list of strings -
File getLogFile() throws IOException
- Compatibility method, which retrieves the log as aFile
.-
By default a temporary file will be created, unless an implementation offers something better
-
ExternalLogBrowser
will also provide an abstraction layer for
eventual consistency management.
This layer will be determined during reference implementation polishing.
The plugin should also implement Pipeline LogStorage
and LogStorageFactory
extension points
so that it transparently supports Pipeline with existing API.
Pipeline Storage JEP is documented in JEP-TODO.
The API Plugin should offer a convenience layer in order to support a number of most common logging providers like Logstash, Fluentd, Elasticsearch, AWS CloudWatch, etc.
This layer should provide the following features:
-
Utility classes for reading and writing JSON Events, including parsing/writing
ConsoleNote
objects -
Base classes for constructing JSON queries for fetching data
|
(svanoort) Consider GraphQL for an event querying API? |
Other convenience layers will be defined during prototyping.
External Logging API plugin should be fully configurable via WebUI and Configuration-as-Code Plugin. It includes:
-
Selection of
LoggingMethod
andLogStorage
factories -
Configuration of built-in External Logging and Log Browser factories
-
Any other configuration options
JEP-207 introduces a new API in the core for adding External logging features, but it does not provide neither configuration UI nor convenient API for implementing these storage engines. This API plugin does that. One may say that all bits in the current design could be implemented as a part of the Jenkins core. It is true, but detaching of the plugin has the following motivation:
-
The plugin will have a separate release cycle so that changes in it can be delivered and backported independently from the core’s release cycle
-
The approach allows keeping the patches on the core’s side minimal
-
The approach allows integrating with Pipeline Log Storage API introduced in Pipeline plugins (JEP-210)
All External Log Storage implementations are expected to extend this plugin
instead of just using API provided by the Core.
Core APIs may be still used to define custom LoggingMethodLocator
impelemtations,
e.g. to define a custom logger allocation logic.
After the initial prototyping it was decided to separate Logging Method and LogBrowser
to separate pluggable entities.
It is different from how Pipeline LogStorage
is implemented in
this pull request.
Reasons for such approach:
-
ExternalLoggingMethod
does not define where logs will be actually stored. For example, logging to Fluentd or Logstash may end up in various storages depending on their configuration (e.g. in Elasticsearch, Redis, AWS CloudWatch, etc.) -
Log browsing logic may be shared. E.g. with the current design logs can be browsed from Elasticsearch independently of how the logs get there (Logstash or direct push)
-
It gives more flexibility to Jenkins admins and plugin developers
Originally the separation was done inside the Jenkins Core as a part of JEP-207, but then it was decided to move it to External Logging API.
Jenkins project usually operates with logs as data streams and lines, especially on the agent side. On the other hand, modern log storage systems operate with "events" - atomic objects which may include multi-line strings and various metadata. Example: exception stacktraces may go to log storage as a single event and then they can be processed by external systems like Logstash if needed. The idea in this plugin is to offer bridge logic which converts stream-based logging into event-based logic.
Jesse Glick has raised the concern that Event layer may not be helpful taking the current state, because a lot of code would need to be updated so that the events get captured properly. Opinion of the JEP sponsor is that this JEP offers a foundation layer so that it may be implemented. Reworking the entire Jenkins API to events is NOT an objective for this JEP-212, but it may be added in subsequent JEPs.
Many popular log storage engines store events in a JSON format: Fluentd, Logstash, Elasticsearch, AWS CloudWatch, etc. Offering a JSON layer as a part of the API plugin could greatly simplify such implementations.
Some target storages are eventually consistent. One cannot just write the data to remote storage and then reliably read it. It is critical for log browsers:
-
When a run finishes, querying data does not guarantee we get all the data
-
Loggable#isCompleted()
call is not enough, some entries may be missing for "completed" entries
Such issue explains why we need a special watch layer to determine whether logs are actually completed. It explains why we need a special logic/queries to determine whether the log is actually complete.
|
(svanoort) Polling-based or notification-based watch layer? This strikes me as a wrinkle that may be deceptively complex. |
Full logic for the layer should be determined during reference implementation.
The External Logging API plugin will follow the compatibility requirements defined in the upstream JEP-TODO for the core. It will also offer API for plugins, which will allow reporting incompatibilities.
There is no special security requirements defined at this level. JEP-207 defines top-level security requirements.
There is no special infrastructure requirements defined for this JEP. Subsequent JEPs for the implementations may define such infrastructure requirements.
All tests will be implemented using Jenkins Test Harness or Acceptance Test Harness (ATH) frameworks.
The following use-cases must be covered:
-
Backward compatibility
-
Upgradeability - upgraded instances use the Filesystem Storage by default
-
Smoke tests - logging Method locators are invoked for new runs
Once JENKINS-TODO is implemented, integration tests with External Task Logging for Logstash Plugin
and other reference implementations should be added to the
essentialsTest()
run.
-
JEP-207 - External Build Logging support in the Jenkins Core