Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: correct smartctl_device_bytes_written & smartctl_device_bytes_read for NVMe #211

Merged
merged 1 commit into from
Apr 9, 2024

Conversation

robbat2
Copy link
Contributor

@robbat2 robbat2 commented Mar 6, 2024

The NVMe specification says that the controller is responsible for reporting "Data Units Read" & "Data Units Written" converted as needed for logicial block sizes other than 512-bytes. smartmontools already has the correct** behavior.

What is correct in this case? For now, track what smartmontools does: take the counter, multiply by 512*1000, report the value.

We should be clear that it means the drive has read/written at most that many bytes.

This has a few impacts:

  • NVME devices will now show these metrics, if they did not before.
  • NVME devices with blocksize other than 512-bytes may previously reported inflated metrics, but are now corrected (is this worthy of larger notice in changelogs?)

Reference: https://github.com/smartmontools/smartmontools/blob/11415ee0b9d5f4a22ddfb3722fdfb05e72372a03/smartmontools/nvmeprint.cpp#L394-L397
Closes: #122

Copy link

@neurodrone neurodrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple comments but LGTM otherwise.

smartctl.go Outdated Show resolved Hide resolved
smartctl.go Outdated Show resolved Hide resolved
…ad for NVMe

The NVMe specification says that the controller is responsible for
reporting "Data Units Read" & "Data Units Written" converted as needed
for logicial block sizes other than 512-bytes. smartmontools already has
the correct behavior.

What is correct in this case? For now, track what smartmontools does:
take the counter, multiply by 512*1000, report the value.

We should be clear that it means the drive has read/written at most
that many bytes.

This has a few impacts:
- NVME devices will now show these metrics, if they did not before.
- NVME devices with blocksize other than 512-bytes may have previously
  reported inflated metrics, but are now corrected (is this worthy of
  larger notice in changelogs?)

Reference: https://github.com/smartmontools/smartmontools/blob/11415ee0b9d5f4a22ddfb3722fdfb05e72372a03/smartmontools/nvmeprint.cpp#L394-L397
Closes: prometheus-community#122
Signed-off-by: Robin H. Johnson <rjohnson@coreweave.com>
@robbat2
Copy link
Contributor Author

robbat2 commented Mar 20, 2024

@SuperQ bump

@robbat2
Copy link
Contributor Author

robbat2 commented Apr 8, 2024

@SuperQ bump again

Copy link
Contributor

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

We could work around the precision by wrapping the int128 counter with a 2^53 modulus. I've done this for converting uint64 counters.

@SuperQ SuperQ merged commit ea8a383 into prometheus-community:master Apr 9, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zero smartctl_device_bytes_written due to missing logical_block_size for some drives
3 participants