Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component json log exporter #10836

Closed
yotamloe opened this issue Jun 8, 2022 · 12 comments
Closed

New component json log exporter #10836

yotamloe opened this issue Jun 8, 2022 · 12 comments
Labels

Comments

@yotamloe
Copy link
Contributor

yotamloe commented Jun 8, 2022

The purpose and use-cases of the new component

This exporter lets you send log records as JSON bytes to a generic HTTP(S) endpoint.
Will be similar to fluentbit and fluentd HTTP outputs.
An example use case can be forwarding logs to log management backends via logstash (example1, example2)

Our team at logz.io already implemented such an exporter in our vendor-specific component logzioexporter (ref from my fork) to support logging with open telemetry collector. We thought others in the community could find it useful so we decided to contribute the log exporter as a non-specific vendor component if the community will approve it.

Example configuration for the component

exporters:
  jsonlog:
    endpoint: https://api.example.com/logs
    gzip_compression: true

Plus exporterhelper options

Telemetry data types supported

Logs

@mx-psi mx-psi added the Sponsor Needed New component seeking sponsor label Jun 8, 2022
@djaglowski
Copy link
Member

What defines the format used here?

We have an experimental spec for serializing signals to json, here. Could this component use that format instead?

@bogdandrutu
Copy link
Member

Should this be part of a "file exporter" that allows to export any signal in proto/json or any "marshaler" that we support? @djaglowski do we have such thing?

@djaglowski
Copy link
Member

Not to my knowledge. There would be a lot of overlap between exporting signals as json to a file and exporting signals as json to an endpoint. Seems like they'd share code but probably still should be separate exporters.

@yotamloe
Copy link
Contributor Author

yotamloe commented Jun 8, 2022

What defines the format used here?
We have an experimental spec for serializing signals to json, here. Could this component use that format instead?

@djaglowski Thanks for replying.
Right now the implementation I mentioned supports logz.io specific log format (@timestamp and message fields for example). It would be easy to change it to the format you mentioned, If you think it would be more useful for the community.

@atoulme
Copy link
Contributor

atoulme commented Jun 8, 2022

gzip_comprassion -> gzip_compression

See #7840 as well.

@tigrannajaryan
Copy link
Member

Is this some specific JSON format that is different from OTLP/JSON? If it is OTLP/JSON then I think the best place for it is otlphttpexporter. We can an option to this exporter to specify whether to use binary Protobuf or JSON as the format.

@yotamloe
Copy link
Contributor Author

yotamloe commented Jun 9, 2022

Hi @tigrannajaryan

The format we use for logz.io single log entry is essentially a map[string]interface{} that includes the log record fields (severityNumber,traceId,sapnId, timestamp, etc.), log record attributes, and resource attributes associated with the log record. In addition, we also try to "flat" the log body if the value type is ValueTypeMap (code), and add more log fields.

Our general approach is to separate ExportLogsServiceRequest into individual log entries with all the metadata that we can collect.

Take this JSON representation ofExportLogsServiceRequest for example:

{
  "resourceLogs": [
    {
      "resource": {
        "attributes": [
          {
            "key": "resource-attr",
            "value": {
              "stringValue": "resource-attr-val-1"
            }
          }
        ]
      },
      "instrumentationLibraryLogs": [
        {
          "instrumentationLibrary": {},
          "logs": [
            {
              "timeUnixNano": "1581452773000000789",
              "severityNumber": "SEVERITY_NUMBER_INFO",
              "severityText": "Info",
              "name": "logA",
              "body": {
                "stringValue": "This is a log message"
              },
              "attributes": [
                {
                  "key": "app",
                  "value": {
                    "stringValue": "server"
                  }
                },
                {
                  "key": "instance_num",
                  "value": {
                    "intValue": "1"
                  }
                }
              ],
              "droppedAttributesCount": 1,
              "traceId": "08040201000000000000000000000000",
              "spanId": "0102040800000000"
            }
          ]
        }
      ]
    }
  ]
}

Will be converted to this format:

{
  "@timestamp": "1581452773000000789",
  "traceId": "08040201000000000000000000000000",
  "spanId": "0102040800000000",
  "level": "Info",
  "resource-attr": "resource-attr-val-1",
  "app": "server",
  "instance_num": "1",
  "message": "This is a log message"
}

If the ExportLogsServiceRequest contains more than one log, the JSON logs will be appended to the exported byte array, separated by a new line. (code)
I hope that makes sense.
Please feel free, if you guys have any comments or suggestions.

@tigrannajaryan
Copy link
Member

Will be converted to this format...

This conversion loses data (e.g. instrumentationLibrary/Scope data is missing) and has ambiguities (what happens if I have a LogRecord attribute named "resource-attr" or named "spanId"?). I do not see the justification why this format needs to be chosen as a canonical format endorsed by Otel Collector.

IMO, if we want to have a exporter that is given a generic name "jsonexporter" then it needs to be:

  1. Lossless and capable of non-ambiguously representing all pdata.
  2. If it is not OTLP/JSON then have a clear justification why it deviates from it.
  3. Follows Otel recommendations for trace context in text formats (see e.g. "traceId" vs "trace_id").
  4. Follows Otel terminology (why "level" instead of "severity"?).

For now I don't see see 1-4 fulfilled.

My preference to move forward with this initiative would be to:

  1. Add OTLP/JSON support to otlphttpexporter.
  2. If we want to give the user another JSON exporter with more control to choose the shape of the emitted JSON data then implement a new httpexporter or jsonhttpexporter, where the user has knobs to customize the shape of the created JSON data. If a specific JSON structure is somehow significant (how?) then make it the default for the new exporter. If this JSON structure proves to be nice and useful, make an OTEP to propose it as some form of standard for Otel.

@yotamloe
Copy link
Contributor Author

yotamloe commented Jun 9, 2022

Sure that makes sense.
I was presenting the (beta) format we use for vendor-specific preferences. I agree that this initiative should follow 1-4. Thanks for the feedback!

@halcyondude
Copy link

halcyondude commented Jul 13, 2022

+1, in particular it's critical that using the json exporter retains full data fidelity and is lossless.

Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistical redundancy.[1] By contrast, lossy compression permits reconstruction only of an approximation of the original data, though usually with greatly improved compression rates (and therefore reduced media sizes).

@github-actions
Copy link
Contributor

github-actions bot commented Nov 9, 2022

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants