Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filelog receiver looses characters #31512

Closed
eloo-abi opened this issue Feb 29, 2024 · 16 comments
Closed

Filelog receiver looses characters #31512

eloo-abi opened this issue Feb 29, 2024 · 16 comments
Labels
bug Something isn't working receiver/filelog

Comments

@eloo-abi
Copy link

Component(s)

receiver/filelog

What happened?

Description

Hi,
we are using the file log receiver since a few weeks to collect our kubernetes pods logs (json) and push them into elasticsearch.
But recently we have found an issue that some characters are going to be lost for unknown reasons.

We have right now only observed this with huge log entries (~800kb)

Steps to Reproduce

Expected Result

Actual Result

Here is a snippet of our log which is outputted by the otel-collector

"relativeEol":"2028-07-01T06:00:00Z","filter":{"reasons":null,"isPurchasable":true},"level,"isCompatible":true,"isLifetime":false,

And as you can see the "level" has no ending quotes, no the json here got corrupted.
Also we are not sure if more characters here are lost.

We had seen a different example where around 20 characters where lost (manually compared from the console log to the output of otel collector)

We are not sure what could case this issue so far as it looks like the json in console looks good.

Our log entries are having the following structure in general. And the issues occur (at least only observed there) in the "short_message" field:

{
    "@timestamp": "...",
    "@version": "1",
    "short_message": " THIS IS THE LONG FIELD (700k characters) WITH THE LOST CHARACTERS",
    "logger_name": "...",
    "thread_name": "...",
    "level_name": "INFO",
    "level_value": 20000
    ...
}

We can right now not ensure that this missing characters issue is not present for smaller issues. 
So its hard to determine at the moment if the logs are reliable or not.

Is anyone else observing this? Or are there known limitations?

Thanks

### Collector version

0.95.0

### Environment information

## Environment
Running as DaemonSet in Kubernetes (EKS)


### OpenTelemetry Collector configuration

```yaml
connectors:
  forward/...: {}
exporters:
  debug: {}
  logging:
    loglevel: info
extensions:
  health_check:
    endpoint: ${env:MY_POD_IP}:13133
processors:
  attributes/security-log:
    actions:
    - action: upsert
      key: tags
      value:
      - security_log
  batch: {}
  filter/namespace-abc:
    logs:
      exclude:
        match_type: regexp
        resource_attributes:
        - key: k8s.container.name
          value: (linkerd.+|mysql-jump-pod.+)
      include:
        match_type: strict
        resource_attributes:
        - key: k8s.namespace.name
          value: abc
  memory_limiter:
    check_interval: 5s
    limit_percentage: 80
    spike_limit_percentage: 25
  transform/json:
    error_mode: ignore
    log_statements:
    - context: log
      statements:
      - merge_maps(attributes, ParseJSON(body), \"upsert\") where IsMatch(body, \"^\\\\{.+\\\\}(\\\)?$\")
      - set(severity_text, attributes[\"level\"]) where attributes[\"level\"] != nil
      - set(body, \"\") where attributes[\"service\"] != nil
receivers:
  filelog:
    exclude: []
    include:
    - /var/log/pods/*/*/*.log
    include_file_name: false
    include_file_path: true
    operators:
    - id: get-format
      routes:
      - expr: body matches \"^\\\\{\"
        output: parser-docker
      - expr: body matches \"^[^ Z]+ \"
        output: parser-crio
      - expr: body matches \"^[^ Z]+Z\"
        output: parser-containerd
      type: router
    - id: parser-crio
      regex: ^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
      timestamp:
        layout: 2006-01-02T15:04:05.999999999Z07:00
        layout_type: gotime
        parse_from: attributes.time
      type: regex_parser
    - combine_field: attributes.log
      combine_with: \"\"
      id: crio-recombine
      is_last_entry: attributes.logtag == 'F'
      max_log_size: 0
      output: extract_metadata_from_filepath
      source_identifier: attributes[\"log.file.path\"]
      type: recombine
    - id: parser-containerd
      regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
      timestamp:
        layout: '%Y-%m-%dT%H:%M:%S.%LZ'
        parse_from: attributes.time
      type: regex_parser
    - combine_field: attributes.log
      combine_with: \"\"
      id: containerd-recombine
      is_last_entry: attributes.logtag == 'F'
      max_log_size: 0
      output: extract_metadata_from_filepath
      source_identifier: attributes[\"log.file.path\"]
      type: recombine
    - id: parser-docker
      output: extract_metadata_from_filepath
      timestamp:
        layout: '%Y-%m-%dT%H:%M:%S.%LZ'
        parse_from: attributes.time
      type: json_parser
    - id: extract_metadata_from_filepath
      parse_from: attributes[\"log.file.path\"]
      regex: ^.*\\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\\-]+)\\/(?P<container_name>[^\\._]+)\\/(?P<restart_count>\\d+)\\.log$
      type: regex_parser
    - from: attributes.stream
      to: attributes[\"log.iostream\"]
      type: move
    - from: attributes.container_name
      to: resource[\"k8s.container.name\"]
      type: move
    - from: attributes.namespace
      to: resource[\"k8s.namespace.name\"]
      type: move
    - from: attributes.pod_name
      to: resource[\"k8s.pod.name\"]
      type: move
    - from: attributes.restart_count
      to: resource[\"k8s.container.restart_count\"]
      type: move
    - from: attributes.uid
      to: resource[\"k8s.pod.uid\"]
      type: move
    - from: attributes.log
      to: body
      type: move
    retry_on_failure:
      enabled: true
    start_at: end
service:
  extensions:
  - health_check
  - oidc
  pipelines:
    logs:
      exporters:
      - debug
      processors:
      - memory_limiter
      - batch
      receivers:
      - otlp
      - filelog
    logs/namespace-abc:
      exporters:
      - elasticsearch/abc
      processors:
      - filter/namespace-abc
      - transform/json
      - memory_limiter
      - batch
      receivers:
      - filelog

Log output

No response

Additional context

Maybe we should note that we have increase the max_log_size from the crio-recombine to 0

    - combine_field: attributes.log
      combine_with: \"\"
      id: crio-recombine
      is_last_entry: attributes.logtag == 'F'
      max_log_size: 0

We have done this because others wise our logs would be splitted up into multiple log entries (default of the helm chart was around 100k)

So with this settings we have improved the logs in general but now we run into the missing characters issues.

@eloo-abi eloo-abi added bug Something isn't working needs triage New item requiring triage labels Feb 29, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@djaglowski
Copy link
Member

Thanks for reporting @eloo-abi.

manually compared from the console log to the output of otel collector

Can you clarify whether you are tailing the log files and viewing them independently of the collector, vs comparing to output of the debug exporter?


In general, it would be helpful if we can reduce the complexity of the problem space.

Are you able to capture a log file which is not being parsed correctly and then read it through a simpler configuration? This would demonstrate whether the problem is indeed due to the filelog receiver, vs some parsing or other problem. For example, something like below:

exporters:
  debug: {}
receivers:
  filelog:
    include:
    - local/copy.log
    include_file_path: true
    # no operators
service:
  pipelines:
    logs:
      receivers:
      - filelog
      exporters:
      - debug

@JDMooreMN
Copy link

JDMooreMN commented Mar 19, 2024

I believe I am running into this issue too, my pipeline is very similar for extracting containerd logs from a k8s platform. I think it is related with the size of the message being sent to the regex_parser.

With some testing here is what I observed. When a log event is greater than 16385 characters, it seems another log event is sent to the regex-parser which basically is trimming those characters and sending them as a separate event.

Example Log Event with 16385 characters

2024-03-19T11:21:00.839338492-05:00 stdout P 2024-03-13 11:51:00,838 [scheduler-2] INFO  

Example Logs events with 16431 characters

2024-03-19T11:21:00.839338492-05:00 stdout P 2024-03-13 11:51:00,838 [scheduler-2] INFO  

Error in Log File

2024-03-19T16:21:52.232Z	error	helper/transformer.go:98	Failed to process entry	{"kind": "receiver", "name": "filelog/pod", "data_type": "logs", "operator_id": "parser-containerd", "operator_type": "regex_parser", "error": "regex pattern does not match", "action": "send", "entry": {"observed_timestamp":"2024-03-19T16:21:52.232389977Z","timestamp":"0001-01-01T00:00:00Z","body":"yfGWG8aXlbNNKW0iw2e5XVDb6RqBg7LLUAbDH5x8WM3OT2","attributes":{"container_name":"hello","log.file.path":"/var/log/pods/XXXXXXXXXXX/hello/0.log","namespace":"XXXXXX-testing","pod_name":"hello-app-68ffdb8cdc-7wlwp","restart_count":"0","uid":"cc7ff6c8-c5c4-494d-ab00-f67d42f17532"},"severity":0,"scope_name":""}}
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*TransformerOperator).HandleEntryError
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.89.0/operator/helper/transformer.go:98
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ParseWith
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.89.0/operator/helper/parser.go:140
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ProcessWithCallback
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.89.0/operator/helper/parser.go:112
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ProcessWith
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.89.0/operator/helper/parser.go:98
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/regex.(*Parser).Process
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.89.0/operator/parser/regex/regex.go:99
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*WriterOperator).Write
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.89.0/operator/helper/writer.go:53
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ProcessWithCallback
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.89.0/operator/helper/parser.go:122
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ProcessWith
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.89.0/operator/helper/parser.go:98
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/regex.(*Parser).Process
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.89.0/operator/parser/regex/regex.go:99
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*WriterOperator).Write
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.89.0/operator/helper/writer.go:53
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/file.(*Input).emit
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.89.0/operator/input/file/file.go:52
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer/internal/reader.(*Reader).ReadToEnd
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.89.0/fileconsumer/internal/reader/reader.go:106
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer.(*Manager).consume.func1
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.89.0/fileconsumer/file.go:174

@djaglowski
Copy link
Member

The cutoff may be related to scanner.DefaultBufferSize but I don't see the mechanism. Maybe something @ChrsMark or @OverOrion could look into?

@OverOrion
Copy link
Contributor

OverOrion commented Mar 26, 2024

Hey @djaglowski, sure thing, I'll take a look at it 👀

@ChrsMark
Copy link
Member

ChrsMark commented Mar 26, 2024

I run some tests and it seems that there is indeed a problem with the regex_parser as @JDMooreMN mentioned. However I'm not sure if this is the cause of the originally reported issue.

So with the following config there is no error:

receivers:
  filelog:
    start_at: beginning
    include:
    - /var/log/busybox/long_files/xl_long_file.log

exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    logs:
      receivers: [filelog]
      exporters: [debug]
      processors: []

While when I add a parser it fails:

receivers:
  filelog:
    start_at: beginning
    include:
    - /var/log/busybox/long_files/xl_long_file.log
    operators:
      - id: parser-containerd
        regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
        timestamp:
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'
          parse_from: attributes.time
        type: regex_parser

exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    logs:
      receivers: [filelog]
      exporters: [debug]
      processors: []

The error I see is the same from #31512 (comment):

2024-03-26T11:46:26.227+0200	error	helper/transformer.go:98	Failed to process entry	{"kind": "receiver", "name": "filelog", "data_type": "logs", "operator_id": "parser-containerd", "operator_type": "regex_parser", "error": "regex pattern does not match", "action": "send"}
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*TransformerOperator).HandleEntryError
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.96.0/operator/helper/transformer.go:98
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ParseWith
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.96.0/operator/helper/parser.go:140
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ProcessWithCallback
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.96.0/operator/helper/parser.go:112
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ProcessWith
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.96.0/operator/helper/parser.go:98
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/regex.(*Parser).Process
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.96.0/operator/parser/regex/regex.go:106
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*WriterOperator).Write
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.96.0/operator/helper/writer.go:53
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/file.(*Input).emit
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.96.0/operator/input/file/file.go:52
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer/internal/reader.(*Reader).ReadToEnd
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.96.0/fileconsumer/internal/reader/reader.go:89
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer.(*Manager).consume.func1
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.96.0/fileconsumer/file.go:181

Note that the error is no fatal, and the record is logged in the console but apparently it's not properly parsed.

However, in my case it seems that it even fails for logs with less than 16385 characters like the /var/log/busybox/long_files/s_long_file.log 🤔 :

wc -c /var/log/busybox/long_files/*.log                                         
16385 /var/log/busybox/long_files/long_file.log
14698 /var/log/busybox/long_files/s_long_file.log
16431 /var/log/busybox/long_files/xl_long_file.log

@ChrsMark
Copy link
Member

ChrsMark commented Mar 26, 2024

It seems that a log line like 2024-03-19T11:21:00.839338492-05:00 stdout P 2024-03-13 11:51:00,838 [scheduler-2] INFO dLphJ63kHp cannot be parsed by the containerd regepx parser ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$.

A proper containerd log looks like the following:

2021-06-22T10:27:25.813799277Z stdout F some log

regexp

It could be the timestamp 2024-03-19T11:21:00.839338492-05:00 part 🤔.

If I extend the proper containerd log to the size of 16527 it does not fail.

@JDMooreMN could you double check this please so as we can ensure if the problem you spotted is a valid regexp missmatch of it is related with the size of the log?@JDMooreMN could you double check this please so as we can ensure if the problem you spotted is a valid regexp missmatch of it is related with the size of the log?

@OverOrion
Copy link
Contributor

As @ChrsMark has already pointed out @JDMooreMN your problem is likely with the regexp:
^(?P<time>[^ ^Z]+Z) says that the timestamp field should end with the +Z sequence. But in your case it seems like it's ISO 8061 timestamp, so in order to support both you should use this one:
^(?P<time>[^ ]+(?:Z|[-+]\d{2}:\d{2})) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$


Nevertheless I believe we need a simpler reproducer for the original issue.

@JDMooreMN
Copy link

JDMooreMN commented Mar 26, 2024

@ChrsMark @OverOrion - I am using the below containerd-parser, please use this instead. I am not getting regex related error with the entire message, but like I stated in my initial comment it is seems to be chunked, and second chunk doesn't match because it is missing the expected format. I apologize for the confusion.

        - id: parser-containerd
          regex: ^(?P<time>.+) (?P<stream>stdout|stderr) (?P<logtag>\w) ?(?P<message>.*)
          timestamp:
            layout: '%Y-%m-%dT%H:%M:%S.%s%j'
            parse_from: attributes.time
          type: regex_parser

@ChrsMark
Copy link
Member

Thank's @JDMooreMN!

I'm still not able to reproduce the issue with a minimal setup. In my case the collector manages to parse the whole line successfully.

Here is the config I use:

receivers:
  filelog:
    start_at: beginning
    include:
    - /var/log/busybox/long_files/xl_GH_long_file.log
    operators:
      - id: parser-containerd
        regex: ^(?P<time>.+) (?P<stream>stdout|stderr) (?P<logtag>\w) ?(?P<message>.*)
        timestamp:
          layout: '%Y-%m-%dT%H:%M:%S.%s%j'
          parse_from: attributes.time
        type: regex_parser

exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    logs:
      receivers: [filelog]
      exporters: [debug]
      processors: []

And the target file xl_GH_long_file.log:

2024-03-19T11:21:00.839338492-05:00 stdout P 2024-03-13 11:51:00,838 [scheduler-2] INFO  

Where the log length is:

> wc -c /var/log/busybox/long_files/xl_GH_long_file.log
16436 /var/log/busybox/long_files/xl_GH_long_file.log

@JDMooreMN do I miss anything in the above scenario?

@JDMooreMN
Copy link

@ChrsMark - The only difference I am seeing, is as follows. Can you try start_at: end in your filelog receiver, and test by echoing the log event into the file. With 0.89.0 the error outputs the 2nd chunk it failing to parse, in 0.95.0 it just gives regex parsing error.

echo '<log_event>' >> xl_GH_long_file.log

2024-03-27T21:01:54.752Z	error	helper/transformer.go:98	Failed to process entry	{"kind": "receiver", "name": "filelog/pod", "data_type": "logs", "operator_id": "parser-containerd", "operator_type": "regex_parser", "error": "regex pattern does not match", "action": "send"}
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*TransformerOperator).HandleEntryError
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.95.0/operator/helper/transformer.go:98
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ParseWith
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.95.0/operator/helper/parser.go:140
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ProcessWithCallback
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.95.0/operator/helper/parser.go:112
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ProcessWith
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.95.0/operator/helper/parser.go:98
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/regex.(*Parser).Process
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.95.0/operator/parser/regex/regex.go:106
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*WriterOperator).Write
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.95.0/operator/helper/writer.go:53
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ProcessWithCallback
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.95.0/operator/helper/parser.go:122
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ProcessWith
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.95.0/operator/helper/parser.go:98
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/regex.(*Parser).Process
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.95.0/operator/parser/regex/regex.go:106
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*WriterOperator).Write
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.95.0/operator/helper/writer.go:53
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/file.(*Input).emit
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.95.0/operator/input/file/file.go:52
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer/internal/reader.(*Reader).ReadToEnd
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.95.0/fileconsumer/internal/reader/reader.go:81
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer.(*Manager).consume.func1
	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.95.0/fileconsumer/file.go:182

@OverOrion
Copy link
Contributor

Hey @JDMooreMN

I was able to reproduce it, the key seems to be the start_at: end option, will investigate it!

It works fine with start_at: beginning (empty file, then echo ...), it only breaks when the it is set to end 👀

@ChrsMark
Copy link
Member

Thank's @JDMooreMN, +1 I can reproduce it as well with start_at: end.

The first part of the long line is properly parsed and then the left over fails to get parsed. The leftover looks like:

LogRecord #1
ObservedTimestamp: 2024-03-28 12:45:20.288996015 +0000 UTC
Timestamp: 1970-01-01 00:00:00 +0000 UTC
SeverityText: 
SeverityNumber: Unspecified(0)
Body: Str(yfGWG8aXlbNNKW0iw2e5XVDb6RqBg7LLUAbDH5x8WM3OT424242)
Attributes:
     -> log.file.name: Str(xl_GH_long_file.log)
Trace ID: 
Span ID: 
Flags: 0
	{"kind": "exporter", "data_type": "logs", "name": "debug"}

@OverOrion feel free to claim this one and let me know if I could help in any way :).

@OverOrion
Copy link
Contributor

Hey @JDMooreMN!

Just opened a PR that should solve this issue. Could you take a moment to test it out on your end as well? Thanks!

@atoulme atoulme removed the needs triage New item requiring triage label Apr 5, 2024
djaglowski added a commit that referenced this issue Apr 23, 2024
**Description:**
Flush could have sent partial input before EOF was reached, this PR
fixes it.

**Link to tracking Issue:** #31512, #32170

**Testing:** Added unit test `TestFlushPeriodEOF`

**Documentation:** Added a note to `force_flush_period` option

---------

Signed-off-by: Szilard Parrag <szilard.parrag@axoflow.com>
Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
@ChrsMark
Copy link
Member

Since #32100 was merged it's likely that we have fixed that, right?

@crobert-1
Copy link
Member

Thanks for following up, @ChrsMark, and for confirming this has been resolved, @OverOrion!

Closing as resolved by #32100.

rimitchell pushed a commit to rimitchell/opentelemetry-collector-contrib that referenced this issue May 8, 2024
**Description:**
Flush could have sent partial input before EOF was reached, this PR
fixes it.

**Link to tracking Issue:** open-telemetry#31512, open-telemetry#32170

**Testing:** Added unit test `TestFlushPeriodEOF`

**Documentation:** Added a note to `force_flush_period` option

---------

Signed-off-by: Szilard Parrag <szilard.parrag@axoflow.com>
Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
jlg-io pushed a commit to jlg-io/opentelemetry-collector-contrib that referenced this issue May 14, 2024
**Description:**
Flush could have sent partial input before EOF was reached, this PR
fixes it.

**Link to tracking Issue:** open-telemetry#31512, open-telemetry#32170

**Testing:** Added unit test `TestFlushPeriodEOF`

**Documentation:** Added a note to `force_flush_period` option

---------

Signed-off-by: Szilard Parrag <szilard.parrag@axoflow.com>
Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/filelog
Projects
None yet
Development

No branches or pull requests

7 participants