Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First implementation of physical_disk collector #803

Merged

Conversation

rob-scheepens
Copy link
Contributor

This PR is to implement a first incarnation of the physical_disk collector, as derived from the logical_disk collector. This as a result from the request in #795 (comment).

Copy link
Collaborator

@carlpett carlpett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Some feedback below. Also the CI is failing because you added it as a default collector, which means the end-to-end tests need to be updated (see tools/e2e-output.txt)

collector/physical_disk.go Show resolved Hide resolved
return nil, err
}

for _, disk := range dst {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The disk label currently looks like this: "0 C:". Since I remember your use-case was in non-mounted disks, what does it look like under those circumstances @rob-scheepens?
I'm not super happy about having both disk number and mountpoint, to be honest. Depending on what the label is for non-mounted, perhaps we can do something that'll look reasonable for both cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For non-mounted disks you will only see the disk number. I agree not to have the label and number, I was actually surprised to see C: show up. ;)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this fixed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, not yet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like this to be resolved before we merge, since once this is released it's breaking to change the labels. I can see three different cases that can occur, maybe missing some?

  • Not mounted
  • Mounted as disk root
  • Mounted as a directory inside another (eg D:\Foo\Bar)

Some possible resolutions:

  • Always use disk number, skip the mountpoint
  • Use mountpoint if available (I dislike this, since it'd mean you'd get a label change by mounting the disk)
  • Label with disk number as disk, create a separate _info metric which has the mountpoint

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote for a fix that shows Raw as label for disks that do not have a partition, and thus are also not mounted. We could combine this with the option of always using disk number, so that in case of a raw disk we would see "1 Raw", and in the case of a mounted partition, we would see the behavior as it currently is. Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing Labels

Test vm's disks:

DISKPART> lis dis

  Disk ###  Status         Size     Free     Dyn  Gpt
  --------  -------------  -------  -------  ---  ---
  Disk 0    Online          100 GB      0 B        *
  Disk 1    Offline          20 GB    20 GB
  Disk 2    Offline          20 GB    20 GB
  Disk 3    Offline          20 GB    20 GB
  Disk 4    Offline          20 GB    20 GB
  Disk 5    Offline          20 GB    20 GB
  Disk 6    Offline          20 GB    20 GB
  Disk 7    Online           50 GB    50 GB
  Disk 8    Online           50 GB    50 GB

windows_physical_disk_idle_seconds_total{disk="1"} 1742.6185977999999
windows_physical_disk_idle_seconds_total{disk="2"} 1742.6185988
windows_physical_disk_idle_seconds_total{disk="3"} 1742.6185944
windows_physical_disk_idle_seconds_total{disk="4"} 1742.6185948999998
windows_physical_disk_idle_seconds_total{disk="5"} 1742.6185957999999
windows_physical_disk_idle_seconds_total{disk="6"} 1742.6185965
windows_physical_disk_idle_seconds_total{disk="7"} 40.4170576
windows_physical_disk_idle_seconds_total{disk="8"} 27.7063651

Note: disk 0 is excluded by blacklisting C:. None of disks 1-8 contain any partition.

Created partition on disk 7 using "crea par pri". No change in labels:

windows_physical_disk_idle_seconds_total{disk="1"} 1871.906501
windows_physical_disk_idle_seconds_total{disk="2"} 1871.9065021
windows_physical_disk_idle_seconds_total{disk="3"} 1871.9064982
windows_physical_disk_idle_seconds_total{disk="4"} 1871.9064985999998
windows_physical_disk_idle_seconds_total{disk="5"} 1871.9064994999999
windows_physical_disk_idle_seconds_total{disk="6"} 1871.9065008
windows_physical_disk_idle_seconds_total{disk="7"} 169.68967189999998
windows_physical_disk_idle_seconds_total{disk="8"} 156.99427

Assigned letter X using "assign letter=x", no change in labels still.
Also after formatting no change in labels. Only after reboot the X: volume shows up in the metrics list:

windows_physical_disk_idle_seconds_total{disk="1"} 7.8161184
windows_physical_disk_idle_seconds_total{disk="2"} 7.8161192999999995
windows_physical_disk_idle_seconds_total{disk="3"} 7.8161203
windows_physical_disk_idle_seconds_total{disk="4"} 7.8161213
windows_physical_disk_idle_seconds_total{disk="5"} 7.8161219
windows_physical_disk_idle_seconds_total{disk="6"} 7.8161236999999995
windows_physical_disk_idle_seconds_total{disk="7 X:"} 7.816124599999999
windows_physical_disk_idle_seconds_total{disk="8"} 7.8161261

Is this an issue with the exporter? The interesting thing is that when I did the same workflow for Z: on disk 8 (crea par pri, format ntfs quick, assign letter z), it did show up immediately:

windows_physical_disk_idle_seconds_total{disk="1"} 0.00039109999999999997
windows_physical_disk_idle_seconds_total{disk="2"} 0.00039139999999999997
windows_physical_disk_idle_seconds_total{disk="3"} 0.0003917
windows_physical_disk_idle_seconds_total{disk="4"} 0.0003915
windows_physical_disk_idle_seconds_total{disk="5"} 0.00039139999999999997
windows_physical_disk_idle_seconds_total{disk="6"} 0.0003915
windows_physical_disk_idle_seconds_total{disk="7 X:"} 0.00039139999999999997
windows_physical_disk_idle_seconds_total{disk="8 Z:"} 0.0003917

Deleting the partion was also immediately reflected:

windows_physical_disk_idle_seconds_total{disk="1"} 90.2290264
windows_physical_disk_idle_seconds_total{disk="2"} 90.2290273
windows_physical_disk_idle_seconds_total{disk="3"} 90.2290281
windows_physical_disk_idle_seconds_total{disk="4"} 90.22902839999999
windows_physical_disk_idle_seconds_total{disk="5"} 90.229029
windows_physical_disk_idle_seconds_total{disk="6"} 90.2290299
windows_physical_disk_idle_seconds_total{disk="7 X:"} 90.2290305
windows_physical_disk_idle_seconds_total{disk="8"} 90.1772701

Creating two partitions on disk 8 also was immediately reflected:

windows_physical_disk_idle_seconds_total{disk="1"} 157.879834
windows_physical_disk_idle_seconds_total{disk="2"} 157.8798352
windows_physical_disk_idle_seconds_total{disk="3"} 157.8798363
windows_physical_disk_idle_seconds_total{disk="4"} 157.8798369
windows_physical_disk_idle_seconds_total{disk="5"} 157.87983739999999
windows_physical_disk_idle_seconds_total{disk="6"} 157.8798382
windows_physical_disk_idle_seconds_total{disk="7 X:"} 157.8798388
windows_physical_disk_idle_seconds_total{disk="8 Y: Z:"} 157.77886379999998

To me, this labeling looks consistent and in line with what one would expact. You agree @carlpett? If so, we can put this conversation to rest and merge the PR, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a brief review of this conversation I assume option 2 has been chosen?

Use mountpoint if available (I dislike this, since it'd mean you'd get a label change by mounting the disk)

If so I'd recommend documenting this labeling behaviour, in order to avoid user surprise. Could you add an entry for this collector in the docs/ directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in #803 (comment), I'd like the behavior where the disk number is always shown, whether or not in combination with a partition letter or mount point, in case there is a mounted partition. If you agree, let me know and I'll create the doc.

Edit: thinking this through a bit more, let's just use the disk number (like node_exporter only lists the Linux device name, e.g. sda). In the future we could have a discussion on what data we would like to see in an info struct. Makes sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either the disk number only or having the disk number and drive letter as separate labels would be fine.
E.G. windows_physical_disk_idle_seconds_total{disk="7", drive_letter="X:"} 157.8798388

Keeping the two pieces of information in a single label would have users resorting to regexes for label matching: windows_physical_disk_idle_seconds_total{disk=".*X:"} or windows_physical_disk_idle_seconds_total{disk="7.*"}

@rob-scheepens
Copy link
Contributor Author

Nice! Some feedback below. Also the CI is failing because you added it as a default collector, which means the end-to-end tests need to be updated (see tools/e2e-output.txt)

What has your preference: remove it from the defaults, or shall I update the end-to-end tests?

@rob-scheepens
Copy link
Contributor Author

@carlpett : I've modified e2e-output and included it in this PR. Can you check if all is ok now?

@carlpett
Copy link
Collaborator

carlpett commented Dec 9, 2021

Thanks @rob-scheepens, apart from clarification on if one of the old review comments has been addressed, this looks good! Can you make sure to sign the DCO, though?

@rob-scheepens rob-scheepens force-pushed the rob-physical_disk-collector branch 2 times, most recently from 3fc4cc6 to 1372152 Compare December 9, 2021 15:36
@carlpett
Copy link
Collaborator

carlpett commented Dec 9, 2021

Thanks! The e2e test appear to fail now though, although it's not clear to me why. Do they work locally for you?

@rob-scheepens
Copy link
Contributor Author

Thanks! The e2e test appear to fail now though, although it's not clear to me why. Do they work locally for you?

When I execute "C:\git\rob-scheepens\windows_exporter>powershell -NonInteractive -ExecutionPolicy Bypass -File .\tools\end-to-end-test.ps1" that works fine, but when I trigger the script through make, it fails:

C:\git\rob-scheepens\windows_exporter>make e2e-test
powershell -NonInteractive -ExecutionPolicy Bypass -File .\tools\end-to-end-test.ps1
 
InputObject                                                                                                                                                     SideIndicator
-----------                                                                                                                                                     -------------
windows_exporter_collector_success{collector="physical_disk"} 1                                                                                                 <=
windows_exporter_collector_timeout{collector="physical_disk"} 0                                                                                                 <=
# HELP windows_physical_disk_idle_seconds_total Seconds that the disk was idle (PhysicalDisk.PercentIdleTime)                                                   <=
# TYPE windows_physical_disk_idle_seconds_total counter                                                                                                         <=
# HELP windows_physical_disk_read_bytes_total The number of bytes transferred from the disk during read operations (PhysicalDisk.DiskReadBytesPerSec)           <=
# TYPE windows_physical_disk_read_bytes_total counter                                                                                                           <=
# HELP windows_physical_disk_read_latency_seconds_total Shows the average time, in seconds, of a read operation from the disk (PhysicalDisk.AvgDiskSecPerRead)  <=
# TYPE windows_physical_disk_read_latency_seconds_total counter                                                                                                 <=
# HELP windows_physical_disk_read_seconds_total Seconds that the disk was busy servicing read requests (PhysicalDisk.PercentDiskReadTime)                       <=
# TYPE windows_physical_disk_read_seconds_total counter                                                                                                         <=
# HELP windows_physical_disk_read_write_latency_seconds_total Shows the time, in seconds, of the average disk transfer (PhysicalDisk.AvgDiskSecPerTransfer)     <=
# TYPE windows_physical_disk_read_write_latency_seconds_total counter                                                                                           <=
# HELP windows_physical_disk_reads_total The number of read operations on the disk (PhysicalDisk.DiskReadsPerSec)                                               <=
# TYPE windows_physical_disk_reads_total counter                                                                                                                <=
# HELP windows_physical_disk_requests_queued The number of requests queued to the disk (PhysicalDisk.CurrentDiskQueueLength)                                    <=
# TYPE windows_physical_disk_requests_queued gauge                                                                                                              <=
# HELP windows_physical_disk_split_ios_total The number of I/Os to the disk were split into multiple I/Os (PhysicalDisk.SplitIOPerSec)                          <=
# TYPE windows_physical_disk_split_ios_total counter                                                                                                            <=
# HELP windows_physical_disk_write_bytes_total The number of bytes transferred to the disk during write operations (PhysicalDisk.DiskWriteBytesPerSec)          <=
# TYPE windows_physical_disk_write_bytes_total counter                                                                                                          <=
# HELP windows_physical_disk_write_latency_seconds_total Shows the average time, in seconds, of a write operation to the disk (PhysicalDisk.AvgDiskSecPerWrite) <=
# TYPE windows_physical_disk_write_latency_seconds_total counter                                                                                                <=
# HELP windows_physical_disk_write_seconds_total Seconds that the disk was busy servicing write requests (PhysicalDisk.PercentDiskWriteTime)                    <=
# TYPE windows_physical_disk_write_seconds_total counter                                                                                                        <=
# HELP windows_physical_disk_writes_total The number of write operations on the disk (PhysicalDisk.DiskWritesPerSec)                                            <=
# TYPE windows_physical_disk_writes_total counter                                                                                                               <=
 
 
make: *** [Makefile:19: e2e-test] Error 1

I've attached a make -d log but that didn't show anything obvious to me. Also running ProcMon during make did not show an obvious problem, log also attached.

Is the exit code a standard Windows code? If so it means "file not found", but as said, I didn't see that yet in the ProcMon file or make debug log. Thoughts @carlpett ?

make -d log.txt
make_e2e-test.zip

@rob-scheepens
Copy link
Contributor Author

@carlpett Looking at the various logs generated I noticed that the physical_disk collector is not started during the e2e-test. Looks like I missed a few places where it should be included, working on that.

Could that be the reason of the error code 2? if so, then why would things work ok when running the PS script directly, without make?

@rob-scheepens
Copy link
Contributor Author

@carlpett : looks like the three additional changes and a rebuild fixed it - can you verify? I am now able to run make e2e-test:

Putting child 025bcc08 (e2e-test) PID 40258792 on the chain.
Live child 025bcc08 (e2e-test) PID 40258792
Main thread handle = 00000154
Reaping winning child 025bcc08 PID 40258792
Removing child 025bcc08 PID 40258792 from chain.
Successfully remade target file 'e2e-test'.```

@rob-scheepens
Copy link
Contributor Author

@carlpett : build and test succeeded now. :)

Copy link
Collaborator

@carlpett carlpett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Peak season delays in reviews right now... Thanks for fixing the E2E!
Let's get the final label question resolved and then we can merge 🍰

return nil, err
}

for _, disk := range dst {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like this to be resolved before we merge, since once this is released it's breaking to change the labels. I can see three different cases that can occur, maybe missing some?

  • Not mounted
  • Mounted as disk root
  • Mounted as a directory inside another (eg D:\Foo\Bar)

Some possible resolutions:

  • Always use disk number, skip the mountpoint
  • Use mountpoint if available (I dislike this, since it'd mean you'd get a label change by mounting the disk)
  • Label with disk number as disk, create a separate _info metric which has the mountpoint

What do you think?

@rob-scheepens
Copy link
Contributor Author

Sorry for the delay @carlpett, for some reason I was sure I had already replied... :(

Regarding the labeling: I suggest we use the PhysicalDisk number in combination with a mountpoint if appropriate. Technically speaking a disk (PhysicalDisk) is never mounted, only partitions (LogicalDisks) are. An example label for C: would then look like:

0 C:

An example label for a raw disk without any partitions would be:

1

This seems to match your third possible resolution mentioned above.

Would you agree?

@rob-scheepens
Copy link
Contributor Author

Sorry for the delay @carlpett, for some reason I was sure I had already replied... :(

Regarding the labeling: I suggest we use the PhysicalDisk number in combination with a mountpoint if appropriate. Technically speaking a disk (PhysicalDisk) is never mounted, only partitions (LogicalDisks) are. An example label for C: would then look like:

0 C:

An example label for a raw disk without any partitions would be:

1

This seems to match your third possible resolution mentioned above.

Would you agree?

@carlpett : friendly nudge. ;)

@rob-scheepens
Copy link
Contributor Author

@carlpett Just had another look at the labels on a test machine, and to me this looks good. Are you ok with this too?
image

@breed808
Copy link
Contributor

breed808 commented Dec 3, 2022

@rob-scheepens I don't think Calle's available at the moment.
I could step in and review based on the existing comments.

Could you rebase the feature branch on master so the CI passes? There's also a merge conflict with the Go module files.

@rob-scheepens
Copy link
Contributor Author

@rob-scheepens I don't think Calle's available at the moment. I could step in and review based on the existing comments.

Could you rebase the feature branch on master so the CI passes? There's also a merge conflict with the Go module files.

Hi @breed808 , I cleaned up the git commit history and backed out the unused import, so CI should be able to finish ok now.

@cbwest3-ntnx cbwest3-ntnx force-pushed the rob-physical_disk-collector branch 2 times, most recently from 3fbfb95 to a5812b7 Compare December 22, 2022 15:15
@breed808 breed808 added this to the v0.21.0 milestone Dec 24, 2022
@rob-scheepens
Copy link
Contributor Author

Closing this since we now have a working version of physical_disk collector using PDH instead of Perflib, as mentioned in #799 (comment). A PR for that will follow shortly.

@rob-scheepens
Copy link
Contributor Author

Eventually I could reopen this PR, due to GitHub issues. I closed the duplicate I created of this, PR 1211.

image

@rob-scheepens
Copy link
Contributor Author

@breed808 @carlpett : could you please (re)review this?

@cbwest3-ntnx
Copy link

@breed808 & @carlpett, this MR is tagged for milestone v0.21.0, however v0.22.0 is already released. Is it possible for this PR to go into the next release? tia!

@breed808
Copy link
Contributor

Looks like the CI isn't happy due to the logging dependency in the collector. Have a look at #1192 for context on the logging change.

rob-scheepens and others added 8 commits September 18, 2023 10:21
Signed-off-by: Rob Scheepens <rob.scheepens@nutanix.com>
Signed-off-by: Brantley West <brantley@nutanix.com>
Signed-off-by: Rob Scheepens <rob.scheepens@nutanix.com>
Signed-off-by: Brantley West <brantley@nutanix.com>
Signed-off-by: Rob Scheepens <rob.scheepens@nutanix.com>
Signed-off-by: Brantley West <brantley@nutanix.com>
Signed-off-by: Brantley West <brantley@nutanix.com>
Signed-off-by: Brantley West <brantley@nutanix.com>
Signed-off-by: Rob Scheepens <rob.scheepens@nutanix.com>
…ter label

Signed-off-by: Brantley West <brantley@nutanix.com>
Signed-off-by: Brantley West <brantley@nutanix.com>
Signed-off-by: Brantley West <brantley@nutanix.com>
Signed-off-by: Ben Reedy <breed808@breed808.com>
Signed-off-by: Ben Reedy <breed808@breed808.com>
@breed808
Copy link
Contributor

Apologies for the delay in getting back to this. I've re-run the CI and resolved the issues it's found.

Are we happy with using the disk number in the disk flag for the physical_disk collector metrics?

@rob-scheepens
Copy link
Contributor Author

@breed808 thanks for checking, and yes, we're happy with the disk number.

@breed808 breed808 merged commit 07fff6a into prometheus-community:master Sep 22, 2023
5 checks passed
mansikulkarni96 pushed a commit to mansikulkarni96/prometheus-community-windows_exporter that referenced this pull request Oct 31, 2023
…#803)

* Added physical_disk collector.

Signed-off-by: Rob Scheepens <rob.scheepens@nutanix.com>
Signed-off-by: Brantley West <brantley@nutanix.com>

* exporter.go: Added physical_disk to defaultCollectors

Signed-off-by: Rob Scheepens <rob.scheepens@nutanix.com>
Signed-off-by: Brantley West <brantley@nutanix.com>

* Fix test cases for physicaldisk metrics

Signed-off-by: Rob Scheepens <rob.scheepens@nutanix.com>
Signed-off-by: Brantley West <brantley@nutanix.com>

* physical_disk.go: cleanup gofmt error

Signed-off-by: Brantley West <brantley@nutanix.com>

* physical_disk.go: populate physical disk 'number' label

Signed-off-by: Brantley West <brantley@nutanix.com>

* Added docs/collector.physical_disk.md.

Signed-off-by: Rob Scheepens <rob.scheepens@nutanix.com>

* physical_disk.go: change 'number' label to 'disk' to match node_exporter label

Signed-off-by: Brantley West <brantley@nutanix.com>

* physical_disk.go: adopt github.com/go-kit/log

Signed-off-by: Brantley West <brantley@nutanix.com>

* physical_disk.go: adopt include/exclude disk list

Signed-off-by: Brantley West <brantley@nutanix.com>

* fix: Add init config for physical_disk collector

Signed-off-by: Ben Reedy <breed808@breed808.com>

* chore: gofmt physical_disk collector

Signed-off-by: Ben Reedy <breed808@breed808.com>

---------

Signed-off-by: Rob Scheepens <rob.scheepens@nutanix.com>
Signed-off-by: Brantley West <brantley@nutanix.com>
Signed-off-by: Ben Reedy <breed808@breed808.com>
Co-authored-by: Brantley West <brantley@nutanix.com>
Co-authored-by: Ben Reedy <breed808@breed808.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants