[receiver/hostmetrics] change the log level when filesystem fails to scrape patition #18236

dloucasfx · 2023-02-01T20:23:58Z

Component(s)

receiver/hostmetrics filesystem scraper

What happened?

Description

This is a gray area between a bug / improvement, but due to the large number of "unnecessary" error messages in the logs, I am filing it as a bug.

After this change a0abefc the filesystem scraper is logging every partition that fails to be scraped, will add an error message through the errors.AddPartial.

From the first look, this is the right approach, however, some partitions (example: windows partitions that are bitlocker encrypted, or any partition that we don't have access to), are known to fail, problem is that user has no way to filter them out before they get scraped and they will end up with error messages polluting their logs.

Steps to Reproduce

Run the hostmerics/filesystem receiver/scraper on system with non-acessible partition, example: windows with BitLocker Drive

Expected Result

No errors should be logged, only when agent is set on debug.
Or, provide a way to filter out those partitions

Actual Result

error   scraperhelper/scrapercontroller.go:197  Error scraping metrics  {"kind":
"receiver", "name": "hostmetrics", "pipeline": "metrics", "error": "failed collecting partitions
 information: \tError 0: This drive is locked by BitLocker Drive Encryption. You must unlock this drive from Control Panel.\n", "scraper": "filesystem"}
                 go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
                                                                        /builds/o11y-gdi/splunk-otel-collector-releaser/.go/pkg/mod/go.opentelemetry.io/collector@v0.64.1/receiv
                                                                er/scraperhelper/scrapercontroller.go:197
                                                                ```****

### Collector version

latest

### Environment information

## Environment
OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")


### OpenTelemetry Collector configuration

_No response_

### Log output

```shell
error   scraperhelper/scrapercontroller.go:197  Error scraping metrics  {"kind":
"receiver", "name": "hostmetrics", "pipeline": "metrics", "error": "failed collecting partitions
 information: \tError 0: This drive is locked by BitLocker Drive Encryption. You must unlock this drive from Control Panel.\n", "scraper": "filesystem"}
                 go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
                                                                        /builds/o11y-gdi/splunk-otel-collector-releaser/.go/pkg/mod/go.opentelemetry.io/collector@v0.64.1/receiv
                                                                er/scraperhelper/scrapercontroller.go:197
```


### Additional context

Few ideas:
- Provide a way for user to filter out partitions before we call gopsutil
- provide an option specific to the filesystem scraper to skip those errors
- Extend `AddPartial` to pass the log level (ex: debug)

The text was updated successfully, but these errors were encountered:

github-actions · 2023-02-01T20:25:30Z

Pinging code owners:

receiver/hostmetrics: @dmitryax

See Adding Labels via Comments if you do not have permissions to add labels yourself.

atoulme · 2023-02-02T02:09:57Z

@dloucasfx for first option, it looks like this is now supported. Check out

opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/diskscraper/config.go

Line 32 in 7b88749

Include MatchConfig `mapstructure:"include"`

atoulme · 2023-02-02T02:10:28Z

Did you also mention it would be good to offer a way to configure a zap.Filter on the logger?

dloucasfx · 2023-02-02T15:14:43Z

@dloucasfx for first option, it looks like this is now supported. Check out

opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/diskscraper/config.go

Line 32 in 7b88749

Include MatchConfig `mapstructure:"include"`

@atoulme

The link is for the disk scraper, this issue is in the filesystem scraper; Regardless, the Filesystem has filtering options, but the filtering happens after all the filesystem info is collected, ie: after the error is logged
you can see here that the filtering happens after the errors.AddPartial

dloucasfx · 2023-02-02T15:17:41Z

Did you also mention it would be good to offer a way to configure a zap.Filter on the logger?

Oh yeah, when I was looking into this issue I was hoping that our logging support the zapfilter https://pkg.go.dev/moul.io/zapfilter where user can filter based on log messages. This is definitely an enhancement, but if we have it in place, we can workaround this bug.

jvoravong · 2023-02-06T18:18:56Z

Taking a look into this issue now.

jvoravong · 2023-02-28T14:36:20Z

Merged in a small fix, should be available in v0.73.0.

csmith-poppulo · 2024-09-27T17:44:37Z

I'm having a very similar issue with version 0.95.0 of the collector agent on Windows servers where the disks are locked by SIOS. Our Windows Application event logs are flooded with Errors from the agent when hitting locked disks. The disks are only active on the SQL server node where the roles are currently assigned to and if we have to failover they will migrate. This is a dynamic setup and we do need metrics from those disks whenever they are active on any of the nodes in the cluster.

Should I open a new bug ticket for this? I'm not completely familiar with the process but can definitely use this ticket as my guide along with the documentation for contributing.

1.7274589621483753e+09 error scraperhelper/scrapercontroller.go:200 Error scraping metrics {"kind": "receiver", "name": "hostmetrics/localhost_windows_system", "data_type": "metrics", "error": "failed collecting partitions information: \tError 0: Access is denied.\n", "scraper": "filesystem"} go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport go.opentelemetry.io/collector/receiver@v0.95.0/scraperhelper/scrapercontroller.go:200 go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1 go.opentelemetry.io/collector/receiver@v0.95.0/scraperhelper/scrapercontroller.go:176

Quick edit here

I did go ahead and try 0.110.0 and I am having the same issue. I'm not sure how Logz.io repackages the collector though so I was only doing this as a quick test.

dloucasfx added bug Something isn't working needs triage New item requiring triage labels Feb 1, 2023

github-actions bot added the receiver/hostmetrics label Feb 1, 2023

atoulme removed the needs triage New item requiring triage label Feb 1, 2023

atoulme self-assigned this Feb 2, 2023

atoulme mentioned this issue Feb 3, 2023

Support filtering logs open-telemetry/opentelemetry-collector#7107

Open

atoulme removed their assignment Feb 6, 2023

atoulme assigned jvoravong Feb 6, 2023

jvoravong mentioned this issue Feb 23, 2023

[receiver/hostmetrics] Have the hostmetrics receiver file system scraper use more debug messages #18895

Merged

dmitryax closed this as completed Mar 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[receiver/hostmetrics] change the log level when filesystem fails to scrape patition #18236

[receiver/hostmetrics] change the log level when filesystem fails to scrape patition #18236

dloucasfx commented Feb 1, 2023 •

edited

Loading

github-actions bot commented Feb 1, 2023

atoulme commented Feb 2, 2023

atoulme commented Feb 2, 2023

dloucasfx commented Feb 2, 2023 •

edited

Loading

dloucasfx commented Feb 2, 2023

jvoravong commented Feb 6, 2023

jvoravong commented Feb 28, 2023

csmith-poppulo commented Sep 27, 2024 •

edited

Loading

[receiver/hostmetrics] change the log level when filesystem fails to scrape patition #18236

[receiver/hostmetrics] change the log level when filesystem fails to scrape patition #18236

Comments

dloucasfx commented Feb 1, 2023 • edited Loading

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

github-actions bot commented Feb 1, 2023

atoulme commented Feb 2, 2023

atoulme commented Feb 2, 2023

dloucasfx commented Feb 2, 2023 • edited Loading

dloucasfx commented Feb 2, 2023

jvoravong commented Feb 6, 2023

jvoravong commented Feb 28, 2023

csmith-poppulo commented Sep 27, 2024 • edited Loading

dloucasfx commented Feb 1, 2023 •

edited

Loading

dloucasfx commented Feb 2, 2023 •

edited

Loading

csmith-poppulo commented Sep 27, 2024 •

edited

Loading