Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telemetry with two implementations: basic usage logging and opentelemetry.io observability #92

Merged
merged 17 commits into from
Jan 23, 2024

Conversation

nls-jajuko
Copy link
Collaborator

@nls-jajuko nls-jajuko commented Jan 22, 2024

Fixes #90

This pull request adds telemetry to hakunapi.

  1. Add interfaces, configuration parser, factory and a no operations implementation to hakunapi-core in package fi.nls.hakunapi.telemetry.

Example configuration

telemetry.mode=log-json
telemetry.logger=fi.nls.hakunapi.telemetry
telemetry.collections=*
telemetry.fields=username
telemetry.fields.username.header=remote-user

  1. Adds a new module hakunapi-telemetry which adds a Log4j based telemetry that can be used in basic usage logging.
  • configuration: telemetry.mode=log-json
  • Example logging
2024-01-22 15:23:16.992|INFO |fi.nls.hakunapi.telemetry|{"example-collection":5,"username":"testuser"}
  1. Adds a new module hakunapi-telemetry-webapp-javax
  • module contains no code
  • module is only used for dependencies and telemetry log configurations for basic usage logging
  • General idea is to keep default webapp clean from extra telemetry dependencies
  1. Adds a new module hakunapi-telemetry-opentelemetry which adds some basic observability to hakunapi requests using https://opentelemetry.io/ .

Example logging (using io.opentelemetry.exporter.logging.LoggingSpanExporter )

INFO: 'example-collection' : 7d5030e9639f7ccd445179778619a7f7 d0e0ce8d3b9ee4b8 SERVER [tracer: fi.nls.hakunapi.telemetry:] AttributesMap{data={count=5}, capacity=128, totalAddedValues=1}

@nls-jajuko nls-jajuko added this to the 1.3.0 milestone Jan 22, 2024
@nls-jajuko nls-jajuko added the enhancement New feature or request label Jan 22, 2024
@nls-jajuko nls-jajuko changed the title Telemetry with two implementations logging and opentelemetry.io Telemetry with two implementations: basic usage logging and opentelemetry.io observability Jan 22, 2024
telemetry.setHeaders(headersMap);
if (headers != null) {
for (String header : headers) {
String lookup = parser.get(String.format(TELEMETRY_FIELDS_HEADER, header), header);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a while to understand how this works, but it should.

if (collectionsWildcard == null) {
return ServiceTelemetry.NOP;

} else if (service != null && "*".equals(collectionsWildcard)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if service is null here then everything is probably messed up badly and we should just fail.
Should use TELEMETRY_COLLECTIONS_WILDCARD instead of "*".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


public interface TelemetrySpan extends AutoCloseable {

default void counts(int count) {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the default implementation should be a NOP. I would've marked these as required to implement and only implemented these as empty functions in the NOP implementation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove default method declarations

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

request.getFormat().getResponseHeaders(request).forEach((k, v) -> builder.header(k, v));
builder.entity(baos.toByteArray());
return builder.build();
span.counts(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If telemetry fails then the whole operation fails. This seems more auditlog-gy (logging the operation is an essential part of the operation) - this is probably by design, just wanted to make sure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think telemetry should not prevent operation.
Maybe we'll need a bunch of try catch-all additions to telemetry?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In GetCollectionItemById it's probably simple as we prefetch the feature into memory and always call counts(1), so probably something like:

final Response response = builder.entity(baos.toByteArray()).build();
// Nothing can fail after this line, except the Telemetry, but we can catch it
try (TelemetrySpan span = ftt.span()) {
  span.counts(1);
} catch (Exception e) {
  // log or ignore the exception
}
return response;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spans have the ability to record the duration of operations.
We would lose that if we are not wrapping the actual operations with spans.

Maybe there should be a contract that "telemetry shall never fail"?

Copy link
Collaborator

@jampukka jampukka Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In GetCollectionItemsOperation is probably a bit trickier due to streaming. I guess we could move WriteReport report to outside of the inner try block, e.g.:

WriteReport report = null;
try (FeatureStream features = producer.getFeatures(request, c);
                FeatureCollectionWriter writer = request.getFormat().getFeatureCollectionWriter()) {
    ...
    report = SimpleFeatureWriter.writeFeatureCollection(writer, ft, c.getProperties(), features, request.getOffset(), request.getLimit());
    ...
}
if (report != null) {
    try (TelemetrySpan span = ftt.span()) {
        span.counts(report);
    } catch (Exception e) {
        // Log or ignore
    }
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as before - we would lose duration tracking with that change

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay. Yes then it would be best that telemetry implementations can never throw exceptions (checked or unchecked)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kept this as is to have the possibility to trace spans and span durations


int srid = request.getSRID();
int srid = request.getSRID();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some whitespace changes here make it harder to tell what actually changed in this file. Not necessary to change, should look into moving towards some projection wide linting solution.

Copy link
Collaborator Author

@nls-jajuko nls-jajuko Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be nice.
I had to reindent everything when coding jsonfg due missing indentation settings, sorry about that

public class TracingConfiguration {

/** The number of milliseconds between metric exports. */
private static final long METRIC_EXPORT_INTERVAL_MS = 800L;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be configurable?

Copy link
Collaborator Author

@nls-jajuko nls-jajuko Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes
I'll add that to parse()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@nls-jajuko
Copy link
Collaborator Author

Added lifecycle methods for ServiceTelemetry and added telemetry to toClose collection in context listener

@jampukka
Copy link
Collaborator

Looks good to me

@jampukka jampukka merged commit 7e098a8 into nlsfi:main Jan 23, 2024
2 checks passed
@nls-jajuko nls-jajuko deleted the telemetry branch March 25, 2024 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants