Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/filelog] Improve receiver logging and monitoring metrics #31256

Open
ChrsMark opened this issue Feb 14, 2024 · 6 comments
Open

[receiver/filelog] Improve receiver logging and monitoring metrics #31256

ChrsMark opened this issue Feb 14, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request receiver/filelog release:required-for-ga Must be resolved before GA release

Comments

@ChrsMark
Copy link
Member

ChrsMark commented Feb 14, 2024

Component(s)

receiver/filelog

Is your feature request related to a problem? Please describe.

At the moment OTel Collector can collect logs through the filelog receiver.
However the receiver does not log enough details about the file tracking actions nor extensive summary metrics.

Describe the solution you'd like

OTel Collector should be capable to offer the following:

  1. Sufficient logging when files are discovered, moved, truncated, etc.
  2. Observable metrics about its current state: how many files are opened, being harvested, stale, etc.: [pkg/stanza] Add monitoring metrics for open and harvested files in fileconsumer #31544

Describe alternatives you've considered

No response

Additional context

Sample logs:

log/busybox/foo1.log /var/log/busybox/log3.log /var/log/busybox/simple.log /var/log/busybox/some.log]}	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-01-26T12:59:55.586+0200	debug	fileconsumer/file.go:135	matched files%!(EXTRA zapcore.Field={paths 1 0  [/var/log/busybox/bar.log /var/log/busybox/bzr.log /var/log/busybox/foo.1.log /var/log/busybox/foo.log /var/log/busybox/foo1.log /var/log/busybox/log3.log /var/log/busybox/simple.log /var/log/busybox/some.log]})	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-01-26T12:59:55.586+0200	debug	fileconsumer/file.go:167	Consuming files{paths 1 0  [/var/log/busybox/bar.log /var/log/busybox/bzr.log /var/log/busybox/foo.1.log /var/log/busybox/foo.log /var/log/busybox/foo1.log /var/log/busybox/log3.log /var/log/busybox/simple.log /var/log/busybox/some.log]}	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-01-26T12:59:55.786+0200	debug	fileconsumer/file.go:135	matched files%!(EXTRA zapcore.Field={paths 1 0  [/var/log/busybox/bar.log /var/log/busybox/bzr.log /var/log/busybox/foo.1.log /var/log/busybox/foo.log /var/log/busybox/foo1.log /var/log/busybox/log3.log /var/log/busybox/simple.log /var/log/busybox/some.log]})	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-01-26T12:59:55.786+0200	debug	fileconsumer/file.go:167	Consuming files{paths 1 0  [/var/log/busybox/bar.log /var/log/busybox/bzr.log /var/log/busybox/foo.1.log /var/log/busybox/foo.log /var/log/busybox/foo1.log /var/log/busybox/log3.log /var/log/busybox/simple.log /var/log/busybox/some.log]}	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-01-26T12:59:55.986+0200	debug	fileconsumer/file.go:135	matched files%!(EXTRA zapcore.Field={paths 1 0  [/var/log/busybox/bar.log /var/log/busybox/bzr.log /var/log/busybox/foo.1.log /var/log/busybox/foo.log /var/log/busybox/foo1.log /var/log/busybox/log3.log /var/log/busybox/simple.log /var/log/busybox/some.log]})	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-01-26T12:59:55.986+0200	debug	fileconsumer/file.go:167	Consuming files{paths 1 0  [/var/log/busybox/bar.log /var/log/busybox/bzr.log /var/log/busybox/foo.1.log /var/log/busybox/foo.log /var/log/busybox/foo1.log /var/log/busybox/log3.log /var/log/busybox/simple.log /var/log/busybox/some.log]}	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-01-26T12:59:56.185+0200	debug	fileconsumer/file.go:135	matched files%!(EXTRA zapcore.Field={paths 1 0  [/var/log/busybox/bar.log /var/log/busybox/bzr.log /var/log/busybox/foo.1.log /var/log/busybox/foo.log /var/log/busybox/foo1.log /var/log/busybox/log3.log /var/log/busybox/simple.log /var/log/busybox/some.log]})	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-01-26T12:59:56.185+0200	debug	fileconsumer/file.go:167	Consuming files{paths 1 0  [/var/log/busybox/bar.log /var/log/busybox/bzr.log /var/log/busybox/foo.1.log /var/log/busybox/foo.log /var/log/busybox/foo1.log /var/log/busybox/log3.log /var/log/busybox/simple.log /var/log/busybox/some.log]}	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-01-26T12:59:56.386+0200	debug	fileconsumer/file.go:135	matched files%!(EXTRA zapcore.Field={paths 1 0  [/var/log/busybox/bar.log /var/log/busybox/bzr.log /var/log/busybox/foo.1.log /var/log/busybox/foo.log /var/log/busybox/foo1.log /var/log/busybox/log3.log /var/log/busybox/simple.log /var/log/busybox/some.log]})	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-01-26T12:59:56.386+0200	debug	fileconsumer/file.go:167	Consuming files{paths 1 0  [/var/log/busybox/bar.log /var/log/busybox/bzr.log /var/log/busybox/foo.1.log /var/log/busybox/foo.log /var/log/busybox/foo1.log /var/log/busybox/log3.log /var/log/busybox/simple.log /var/log/busybox/some.log]}	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-01-26T12:59:56.586+0200	debug	fileconsumer/file.go:135	matched files%!(EXTRA zapcore.Field={paths 1 0  [/var/log/busybox/bar.log /var/log/busybox/bzr.log /var/log/busybox/foo.1.log /var/log/busybox/foo.log /var/log/busybox/foo1.log /var/log/busybox/log3.log /var/log/busybox/simple.log /var/log/busybox/some.log]})	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-01-26T12:59:56.586+0200	debug	fileconsumer/file.go:167	Consuming files{paths 1 0  [/var/log/busybox/bar.log /var/log/busybox/bzr.log /var/log/busybox/foo.1.log /var/log/busybox/foo.log /var/log/busybox/foo1.log /var/log/busybox/log3.log /var/log/busybox/simple.log /var/log/busybox/some.log]}	{"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}

Sample metrics:

curl http://localhost:8888/metrics
# HELP otelcol_exporter_queue_capacity Fixed capacity of the retry queue (in batches)
# TYPE otelcol_exporter_queue_capacity gauge
otelcol_exporter_queue_capacity{exporter="otlp/elastic",service_instance_id="f9da8915-5bbd-444f-84cf-60a42107a83e",service_name="otelcontribcol",service_version="0.91.0-dev"} 1000
# HELP otelcol_exporter_queue_size Current size of the retry queue (in batches)
# TYPE otelcol_exporter_queue_size gauge
otelcol_exporter_queue_size{exporter="otlp/elastic",service_instance_id="f9da8915-5bbd-444f-84cf-60a42107a83e",service_name="otelcontribcol",service_version="0.91.0-dev"} 0
# HELP otelcol_exporter_sent_log_records Number of log record successfully sent to destination.
# TYPE otelcol_exporter_sent_log_records counter
otelcol_exporter_sent_log_records{exporter="otlp/elastic",service_instance_id="f9da8915-5bbd-444f-84cf-60a42107a83e",service_name="otelcontribcol",service_version="0.91.0-dev"} 11
# HELP otelcol_process_cpu_seconds Total CPU user and system time in seconds
# TYPE otelcol_process_cpu_seconds counter
otelcol_process_cpu_seconds{service_instance_id="f9da8915-5bbd-444f-84cf-60a42107a83e",service_name="otelcontribcol",service_version="0.91.0-dev"} 0.76
# HELP otelcol_process_memory_rss Total physical memory (resident set size)
# TYPE otelcol_process_memory_rss gauge
otelcol_process_memory_rss{service_instance_id="f9da8915-5bbd-444f-84cf-60a42107a83e",service_name="otelcontribcol",service_version="0.91.0-dev"} 1.84000512e+08
# HELP otelcol_process_runtime_heap_alloc_bytes Bytes of allocated heap objects (see 'go doc runtime.MemStats.HeapAlloc')
# TYPE otelcol_process_runtime_heap_alloc_bytes gauge
otelcol_process_runtime_heap_alloc_bytes{service_instance_id="f9da8915-5bbd-444f-84cf-60a42107a83e",service_name="otelcontribcol",service_version="0.91.0-dev"} 4.588912e+07
# HELP otelcol_process_runtime_total_alloc_bytes Cumulative bytes allocated for heap objects (see 'go doc runtime.MemStats.TotalAlloc')
# TYPE otelcol_process_runtime_total_alloc_bytes counter
otelcol_process_runtime_total_alloc_bytes{service_instance_id="f9da8915-5bbd-444f-84cf-60a42107a83e",service_name="otelcontribcol",service_version="0.91.0-dev"} 1.40668584e+08
# HELP otelcol_process_runtime_total_sys_memory_bytes Total bytes of memory obtained from the OS (see 'go doc runtime.MemStats.Sys')
# TYPE otelcol_process_runtime_total_sys_memory_bytes gauge
otelcol_process_runtime_total_sys_memory_bytes{service_instance_id="f9da8915-5bbd-444f-84cf-60a42107a83e",service_name="otelcontribcol",service_version="0.91.0-dev"} 9.3970712e+07
# HELP otelcol_process_uptime Uptime of the process
# TYPE otelcol_process_uptime counter
otelcol_process_uptime{service_instance_id="f9da8915-5bbd-444f-84cf-60a42107a83e",service_name="otelcontribcol",service_version="0.91.0-dev"} 111.114353151
# HELP otelcol_receiver_accepted_log_records Number of log records successfully pushed into the pipeline.
# TYPE otelcol_receiver_accepted_log_records counter
otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="f9da8915-5bbd-444f-84cf-60a42107a83e",service_name="otelcontribcol",service_version="0.91.0-dev",transport=""} 11
# HELP otelcol_receiver_refused_log_records Number of log records that could not be pushed into the pipeline.
# TYPE otelcol_receiver_refused_log_records counter
otelcol_receiver_refused_log_records{receiver="filelog",service_instance_id="f9da8915-5bbd-444f-84cf-60a42107a83e",service_name="otelcontribcol",service_version="0.91.0-dev",transport=""} 0
@ChrsMark ChrsMark added enhancement New feature or request needs triage New item requiring triage labels Feb 14, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@djaglowski
Copy link
Member

Sufficient logging when files are discovered, moved, truncated, etc

We have some of this already and the rest should be reasonable to add. However, there are some caveats.

  • Discovered: already being logged, but we maintain a limited memory of the files we've previously seen.
  • Moved: We don't currently log this but we should be able to detect it by crossreferencing file names, which I think we have in the necessary places. It would add some computation so perhaps this should be configurable.
  • Truncated: We currently "detect" this but do not log. I think the logic is not perfect and it is possible to miss truncations depending on timing. It'd be helpful to have other perspectives on how this could work performantly.

Observable metrics about its current state: how many files are opened, being harvested, stale, etc

What's your definition of stale?

Up-down counters for open & reading should be straightforward enough to maintain.

@ChrsMark
Copy link
Member Author

What's your definition of stale?

stale when we have reached EOF and the receiver would wait until it looks again for new lines. I'm not sure if this behavior exists though in the receiver. If not, then we can just ignore this metric.

@djaglowski
Copy link
Member

stale when we have reached EOF and the receiver would wait until it looks again for new lines. I'm not sure if this behavior exists though in the receiver. If not, then we can just ignore this metric.

This is pretty much the same as open less reading, but maybe there's some more nuance. We can consider when implementing.

In any case, I'm marking this issue as required for ga because I think these are very basic observability signals which we should have in place.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Apr 16, 2024
@djaglowski djaglowski removed the Stale label Apr 16, 2024
djaglowski pushed a commit that referenced this issue Apr 24, 2024
…erface (#32662)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
This PR resumes the work from
#31618.

The goal is to pass in the `component.TelemetrySettings` so as to use
them later in various ways:
1) report the filelog's state stats:
#31544
2) switch from `SugaredLogger` to `Logger`:
#32177


More about the breaking change decision at
#31618 (comment).

**Link to tracking Issue:** <Issue number if applicable>
#31256

**Testing:** <Describe what testing was performed and which tests were
added.> Testing suite got updated.

#### Manual Testing

1.
```yaml
receivers:
  filelog:
    start_at: end
    include:
    - /var/log/busybox/refactroring_test.log
    operators:
      - id: addon
        type: add
        field: attributes.extra
        value: my-val
exporters:
  debug:
    verbosity: detailed
service:
  pipelines:
    logs:
      receivers: [filelog]
      exporters: [debug]
      processors: []
```
2. `./bin/otelcontribcol_linux_amd64 --config ~/otelcol/config.yaml`
3. `echo 'some line' >> /var/log/busybox/refactroring_test.log`
4.
```console
2024-04-24T09:29:17.104+0300	info	LogsExporter	{"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-04-24T09:29:17.104+0300	info	ResourceLog #0
Resource SchemaURL: 
ScopeLogs #0
ScopeLogs SchemaURL: 
InstrumentationScope  
LogRecord #0
ObservedTimestamp: 2024-04-24 06:29:17.005433317 +0000 UTC
Timestamp: 1970-01-01 00:00:00 +0000 UTC
SeverityText: 
SeverityNumber: Unspecified(0)
Body: Str(some line)
Attributes:
     -> extra: Str(my-val)
     -> log.file.name: Str(1.log)
Trace ID: 
Span ID: 
Flags: 0
	{"kind": "exporter", "data_type": "logs", "name": "debug"}
```

**Documentation:** <Describe the documentation added.> TBA.

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
rimitchell pushed a commit to rimitchell/opentelemetry-collector-contrib that referenced this issue May 8, 2024
…erface (open-telemetry#32662)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
This PR resumes the work from
open-telemetry#31618.

The goal is to pass in the `component.TelemetrySettings` so as to use
them later in various ways:
1) report the filelog's state stats:
open-telemetry#31544
2) switch from `SugaredLogger` to `Logger`:
open-telemetry#32177


More about the breaking change decision at
open-telemetry#31618 (comment).

**Link to tracking Issue:** <Issue number if applicable>
open-telemetry#31256

**Testing:** <Describe what testing was performed and which tests were
added.> Testing suite got updated.

#### Manual Testing

1.
```yaml
receivers:
  filelog:
    start_at: end
    include:
    - /var/log/busybox/refactroring_test.log
    operators:
      - id: addon
        type: add
        field: attributes.extra
        value: my-val
exporters:
  debug:
    verbosity: detailed
service:
  pipelines:
    logs:
      receivers: [filelog]
      exporters: [debug]
      processors: []
```
2. `./bin/otelcontribcol_linux_amd64 --config ~/otelcol/config.yaml`
3. `echo 'some line' >> /var/log/busybox/refactroring_test.log`
4.
```console
2024-04-24T09:29:17.104+0300	info	LogsExporter	{"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-04-24T09:29:17.104+0300	info	ResourceLog #0
Resource SchemaURL: 
ScopeLogs #0
ScopeLogs SchemaURL: 
InstrumentationScope  
LogRecord #0
ObservedTimestamp: 2024-04-24 06:29:17.005433317 +0000 UTC
Timestamp: 1970-01-01 00:00:00 +0000 UTC
SeverityText: 
SeverityNumber: Unspecified(0)
Body: Str(some line)
Attributes:
     -> extra: Str(my-val)
     -> log.file.name: Str(1.log)
Trace ID: 
Span ID: 
Flags: 0
	{"kind": "exporter", "data_type": "logs", "name": "debug"}
```

**Documentation:** <Describe the documentation added.> TBA.

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
jlg-io pushed a commit to jlg-io/opentelemetry-collector-contrib that referenced this issue May 14, 2024
…erface (open-telemetry#32662)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
This PR resumes the work from
open-telemetry#31618.

The goal is to pass in the `component.TelemetrySettings` so as to use
them later in various ways:
1) report the filelog's state stats:
open-telemetry#31544
2) switch from `SugaredLogger` to `Logger`:
open-telemetry#32177


More about the breaking change decision at
open-telemetry#31618 (comment).

**Link to tracking Issue:** <Issue number if applicable>
open-telemetry#31256

**Testing:** <Describe what testing was performed and which tests were
added.> Testing suite got updated.

#### Manual Testing

1.
```yaml
receivers:
  filelog:
    start_at: end
    include:
    - /var/log/busybox/refactroring_test.log
    operators:
      - id: addon
        type: add
        field: attributes.extra
        value: my-val
exporters:
  debug:
    verbosity: detailed
service:
  pipelines:
    logs:
      receivers: [filelog]
      exporters: [debug]
      processors: []
```
2. `./bin/otelcontribcol_linux_amd64 --config ~/otelcol/config.yaml`
3. `echo 'some line' >> /var/log/busybox/refactroring_test.log`
4.
```console
2024-04-24T09:29:17.104+0300	info	LogsExporter	{"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-04-24T09:29:17.104+0300	info	ResourceLog #0
Resource SchemaURL: 
ScopeLogs #0
ScopeLogs SchemaURL: 
InstrumentationScope  
LogRecord #0
ObservedTimestamp: 2024-04-24 06:29:17.005433317 +0000 UTC
Timestamp: 1970-01-01 00:00:00 +0000 UTC
SeverityText: 
SeverityNumber: Unspecified(0)
Body: Str(some line)
Attributes:
     -> extra: Str(my-val)
     -> log.file.name: Str(1.log)
Trace ID: 
Span ID: 
Flags: 0
	{"kind": "exporter", "data_type": "logs", "name": "debug"}
```

**Documentation:** <Describe the documentation added.> TBA.

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
djaglowski pushed a commit that referenced this issue May 22, 2024
…ileconsumer (#31544)

Blocked on
#31618

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

This PR adds support for filelog receiver to emit observable metrics
about its current state: how many files are opened, and harvested.

**Link to tracking Issue:**
#31256

**Testing:** <Describe what testing was performed and which tests were
added.>

#### How to test this manually

1. Use the following collector config:
```yaml
receivers:
  filelog:
    start_at: end
    include:
    - /var/log/busybox/monitoring/*.log
exporters:
  debug:
    verbosity: detailed
service:
  telemetry:
    metrics:
      level: detailed
      address: ":8888"
  pipelines:
    logs:
      receivers: [filelog]
      exporters: [debug]
      processors: []
```

2. Build and run the collector: `make otelcontribcol &&
./bin/otelcontribcol_linux_amd64 --config
~/otelcol/monitoring_telemetry/config.yaml`
3. Produce some logs:
```console
echo 'some line' >> /var/log/busybox/monitoring/1.log
while true; do echo -e "This is a log line" >> /var/log/busybox/monitoring/2.log; done
```
4. Verify that metrics are produced:
```console
curl 0.0.0.0:8888/metrics | grep _files
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4002    0  4002    0     0  1954k      0 --:--:-- --:--:-- --:--:-- 1954k
# HELP otelcol_fileconsumer_open_files Number of open files
# TYPE otelcol_fileconsumer_open_files gauge
otelcol_fileconsumer_open_files{service_instance_id="72b4899d-6ce3-41de-a25b-8f0370e22ec1",service_name="otelcontribcol",service_version="0.99.0-dev"} 2
# HELP otelcol_fileconsumer_reading_files Number of open files that are being read
# TYPE otelcol_fileconsumer_reading_files gauge
otelcol_fileconsumer_reading_files{service_instance_id="72b4899d-6ce3-41de-a25b-8f0370e22ec1",service_name="otelcontribcol",service_version="0.99.0-dev"} 1
```

**Documentation:** <Describe the documentation added.>
Added a respective section in Filelog receiver's docs.

---------

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
@ChrsMark
Copy link
Member Author

Heads up on this. #31544 has been merged. That one provides the metrics about the open and reading files. The stale files are essentially the open minus the reading ones, so we can avoid explicitly reporting it. We can revisit this if we have any requests for that.

What is remaining from this issue now is the logging enhancements. I will be working on this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request receiver/filelog release:required-for-ga Must be resolved before GA release
Projects
None yet
Development

No branches or pull requests

2 participants