Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add disk UUIDs as labels. #304

Closed
raypettersen opened this issue Sep 15, 2016 · 29 comments
Closed

Add disk UUIDs as labels. #304

raypettersen opened this issue Sep 15, 2016 · 29 comments

Comments

@raypettersen
Copy link

In short:

Today:
node_disk_sectors_written{device="sdj"}

Suggestion:
node_disk_sectors_written{device="sdj", uuid="e7821b62-64a0-4f24-a19a-85ed74da0c14"}

Reason for this request is that we have external USB devices that we want monitored, but dashboards and so forth break when devices are occasionally mapped to new device-names. My understanding is that UUIDs are persistent, or maybe not?

@raypettersen
Copy link
Author

To answer my own question, UUIDs seem to be persistent as they are generated from device metadata. Ref: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Online_Storage_Reconfiguration_Guide/persistent_naming-uuid_and_others.html

@SuperQ
Copy link
Member

SuperQ commented Sep 15, 2016

👍 I think this would be useful. The difficulty is getting the UUIDs based on the name of the device read from /proc/diskstats.

@raypettersen
Copy link
Author

Since the UUIDs are generated from metadata, I guess its just a matter of figuring out how the generation is made and do the same?

@SuperQ
Copy link
Member

SuperQ commented Sep 15, 2016

Sorta, it's complicated. Many of the UUIDs come from reading the filesystem metadata, not generating them programatically.

For example:

# ls -l /dev/disk/by-uuid/ | grep sda1
lrwxrwxrwx 1 root root 10 Jul 24 08:29 1196ae70-dca7-4c89-8ea7-52456bf23052 -> ../../sda1
# tune2fs -l /dev/sda1 | grep UUID
Filesystem UUID:          1196ae70-dca7-4c89-8ea7-52456bf23052

@raypettersen
Copy link
Author

Investing this further shows that you can infact manipulate the UUIDs. This would break any logic in node_exporter if it were to auto-generate them. I guess the easiest thing to do, would be to read /dev/disk/by-uuid, but that is not optimal. For example, my zfs array is not visible on my storage server. blkid however manages to fetch everything. Perhaps there is something in the blkid source worth looking at?

@brian-brazil
Copy link
Contributor

It wouldn't be appropriate to have both labels on the disk metrics, as each uniquely identifies a disk. This may be best something handled by textfile collector.

@raypettersen
Copy link
Author

/dev/sdx does not uniquely identify a disk. If you for example plug-in random usb devices you're going to get burned if you're using node_exporter to monitor these disks. The devicelabel alone is not a good solution. UUIDs on the other hand, does in fact uniquely identify a disk.

@brian-brazil
Copy link
Contributor

That depends on entirely your use case.

@raypettersen
Copy link
Author

From my point of view, I could debate that it makes more sense having the UUID as a label than the /dev/ name. It offers superior identification as it can even identify disks that are moved across servers/instances. I'm just sharing my suggestion that will make the node_exporter perform better in storage-environments. If its more hazzle than it's worth, then I'll just close this issue and do my own work-around.

@SuperQ
Copy link
Member

SuperQ commented Sep 15, 2016

I agree that device label is insufficient to uniquely identify devices. I'd prefer both device and UUID.

@brian-brazil
Copy link
Contributor

Device label is sufficient, and labels should be minimal. You either get UUID or device as a label.

@raypettersen
Copy link
Author

Respectfully disagree that device label is sufficient. It's not a unique identifier for a drive or a partition.

@brian-brazil
Copy link
Contributor

You can't have two /dev/sda, thus it is unique. It uniquely identifies a controller. These are per-device stats, not per-partition stats.

@SuperQ
Copy link
Member

SuperQ commented Sep 15, 2016

@brian-brazil Sorry, but that's not how hardware works. Device label is an indication of where it is connected, and UUID is an indication of what is connected. We want both.

@brian-brazil
Copy link
Contributor

I am well aware of how hardware works. I consider the UUID to be an annotation, so it doesn't belong on these metrics. I'd expect the vast majority of our users not to care about UUIDs and device name order is pretty consistent these days particularly in cloud environments.

If you're looking for this then it should come as other annotations do, via another metric taking the machine roles approach.

@SuperQ
Copy link
Member

SuperQ commented Sep 15, 2016

And that's the difference, I don't think they're annotations. The exist as separate unique identifiying dimensions of the block device. "Where" and "Which".

@brian-brazil
Copy link
Contributor

Generally when this happens you choose one to avoid having more labels to work with, and have the other via the machine roles approach. Of the two I believe the device name is what most users would want.

@SuperQ
Copy link
Member

SuperQ commented Sep 15, 2016

@raypettersen So here's some data we found on how to get UUID information from udev.

Get the udev info from /sys

$ cat  /sys/class/block/sda1/uevent 
MAJOR=8
MINOR=1
DEVNAME=sda1
DEVTYPE=partition

Then you can get the current udev data from /run.

$ cat  /run/udev/data/b8\:1 
S:disk/by-uuid/1196ae70-dca7-4c89-8ea7-52456bf23052
S:disk/by-id/wwn-0x5001b449ce83154a-part1
S:disk/by-id/ata-SanDisk_SD5SG2256G1052E_132119402826-part1
S:disk/by-path/pci-0000:00:1f.2-ata-1-part1
W:18
I:1643451
E:ID_ATA=1
E:ID_ATA_DOWNLOAD_MICROCODE=1
E:ID_ATA_FEATURE_SET_APM=1
E:ID_ATA_FEATURE_SET_APM_CURRENT_VALUE=128
E:ID_ATA_FEATURE_SET_APM_ENABLED=1
E:ID_ATA_FEATURE_SET_HPA=1
E:ID_ATA_FEATURE_SET_HPA_ENABLED=1
E:ID_ATA_FEATURE_SET_PM=1
E:ID_ATA_FEATURE_SET_PM_ENABLED=1
E:ID_ATA_FEATURE_SET_SECURITY=1
E:ID_ATA_FEATURE_SET_SECURITY_ENABLED=0
E:ID_ATA_FEATURE_SET_SECURITY_ENHANCED_ERASE_UNIT_MIN=18
E:ID_ATA_FEATURE_SET_SECURITY_ERASE_UNIT_MIN=2
E:ID_ATA_FEATURE_SET_SECURITY_FROZEN=1
E:ID_ATA_FEATURE_SET_SMART=1
E:ID_ATA_FEATURE_SET_SMART_ENABLED=1
E:ID_ATA_ROTATION_RATE_RPM=0
E:ID_ATA_SATA=1
E:ID_ATA_SATA_SIGNAL_RATE_GEN1=1
E:ID_ATA_SATA_SIGNAL_RATE_GEN2=1
E:ID_ATA_WRITE_CACHE=1
E:ID_ATA_WRITE_CACHE_ENABLED=1
E:ID_BUS=ata
E:ID_MODEL=SanDisk_SD5SG2256G1052E
E:ID_MODEL_ENC=SanDisk\x20SD5SG2256G1052E\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20
E:ID_PART_TABLE_TYPE=dos
E:ID_PART_TABLE_UUID=00052f59
E:ID_PATH=pci-0000:00:1f.2-ata-1
E:ID_PATH_TAG=pci-0000_00_1f_2-ata-1
E:ID_REVISION=10.04.01
E:ID_SERIAL=SanDisk_SD5SG2256G1052E_132119402826
E:ID_SERIAL_SHORT=132119402826
E:ID_TYPE=disk
E:ID_WWN=0x5001b449ce83154a
E:ID_WWN_WITH_EXTENSION=0x5001b449ce83154a
E:ID_FS_UUID=1196ae70-dca7-4c89-8ea7-52456bf23052
E:ID_FS_UUID_ENC=1196ae70-dca7-4c89-8ea7-52456bf23052
E:ID_FS_VERSION=1.0
E:ID_FS_TYPE=ext2
E:ID_FS_USAGE=filesystem
E:ID_PART_ENTRY_SCHEME=dos
E:ID_PART_ENTRY_UUID=00052f59-01
E:ID_PART_ENTRY_TYPE=0x83
E:ID_PART_ENTRY_FLAGS=0x80
E:ID_PART_ENTRY_NUMBER=1
E:ID_PART_ENTRY_OFFSET=2048
E:ID_PART_ENTRY_SIZE=497664
E:ID_PART_ENTRY_DISK=8:0
G:systemd

@HyperDevil
Copy link

I completely agree with @raypettersen, uuid is the only way of identifying a volume or partition. The /dev point is actually irrelevant for metrics. One is monitoring a disk or partition, not a mounting point. Mounting points can change for various reasons.

@brian-brazil
Copy link
Contributor

I'd like to remind ye that these are per-block device stats we're talking about in this issue, not volumes, filesystems or mount points.

@raypettersen
Copy link
Author

raypettersen commented Sep 15, 2016

We could probably discuss this for hours. I do not agree with your logic Brian. Perhaps the solution is to create a new metric instead of messing with labels. I get you want labels to a minimum, but you should recognize that without a true identification of disks and partitions - metrics is at risk for becoming corrupt. Let's say you're monitoring backup storage that is mounted each night, and throughput performance is what you're after along with a couple of alarms. Somehow the backup media is mounted with a new label and the data you're getting is false. This would never happen if we had a way of pinpointing the with the help of UUID. This is a real life scenario from where I'm coming from.

As SuperQ nailed it:

They exist as separate unique identifiying dimensions of the block device. "Where" and "Which".

That is, how I see it the key to this argument.

@SuperQ
Copy link
Member

SuperQ commented Sep 15, 2016

What @brian-brazil is suggesting is this is solved by having a metric that contains the UUID and device labels and use PromQL to join them.

node_disk_sectors_written * on (device) group_left (uuid) node_disk_info

I consider the block device UUID to be more important than the device name, and I think the device and UUID are separate metric dimensions that should always be included, but I understand where Brian is coming from.

@SuperQ
Copy link
Member

SuperQ commented Sep 15, 2016

The one plus side to the info metric is that basically everything in in the udev info can be included as labels, this allows for very flexible matching without having to include every possible info option in the source metric.

@raypettersen
Copy link
Author

Sounds like a good solution/workaround.

@discordianfish
Copy link
Member

I agree that technically UUID shouldn't be a label because the metric is about a device, not a volume. A UUID is also unbounded. Not sure if this has partical implications but I could imaging systems building/mounting images which always would cause a new timeseries to get created.

I think join as @SuperQ describes is the right way. But think it would be nice if we could provide node_disk_info in the collector instead of having a user manage that via the textfile collector.

@discordianfish
Copy link
Member

If we can agree on this, I'd close the issue and create a new one for adding such metric.

@raypettersen
Copy link
Author

No complaints from me.

@SuperQ
Copy link
Member

SuperQ commented Jan 15, 2017

Yes, let's make this a text file collector for now. It could possibly be triggered/managed by udev infrastructure.

@n27051538
Copy link

n27051538 commented Feb 26, 2021

What @brian-brazil is suggesting is this is solved by having a metric that contains the UUID and device labels and use PromQL to join them.

node_disk_sectors_written * on (device) group_left (uuid) node_disk_info

I consider the block device UUID to be more important than the device name, and I think the device and UUID are separate metric dimensions that should always be included, but I understand where Brian is coming from.

Thank you @SuperQ , your idea helped to me.

I had lots of such metrics:
node_disk_read_bytes_total{device="dm-16", hostname="plat194", instance="192.168.1.1:9100", job="node_exporter"}
...

Then I created a script which generated dictionary-file for text-filecollector:

$ cat resolve-ora-disks.prom
node_disk_info{asmdev="DATA01",mpath="/dev/mapper/mpatha4",device="dm-16"} 1
node_disk_info{asmdev="DATA02",mpath="/dev/mapper/mpathb4",device="dm-29"} 1
node_disk_info{asmdev="DATA03",mpath="/dev/mapper/mpathc4",device="dm-19"} 1
node_disk_info{asmdev="DATA04",mpath="/dev/mapper/mpathd4",device="dm-17"} 1
node_disk_info{asmdev="DATA05",mpath="/dev/mapper/mpathe4",device="dm-20"} 1
node_disk_info{asmdev="DATA06",mpath="/dev/mapper/mpathf4",device="dm-18"} 1
node_disk_info{asmdev="DATA07",mpath="/dev/mapper/mpathg4",device="dm-27"} 1
node_disk_info{asmdev="DATA08",mpath="/dev/mapper/mpathh4",device="dm-26"} 1
node_disk_info{asmdev="DATA09",mpath="/dev/mapper/mpathi4",device="dm-24"} 1
node_disk_info{asmdev="DATA10",mpath="/dev/mapper/mpathj4",device="dm-22"} 1
node_disk_info{asmdev="FRA01",mpath="/dev/mapper/mpathl4",device="dm-25"} 1
node_disk_info{asmdev="FRA02",mpath="/dev/mapper/mpathm4",device="dm-21"} 1
node_disk_info{asmdev="FRA03",mpath="/dev/mapper/mpathn4",device="dm-28"} 1
node_disk_info{asmdev="MGMT01",mpath="/dev/mapper/mpathk4",device="dm-23"} 1

And then I got SUM for asmdev starting with DATA by this PromQL:

sum(irate(node_disk_read_bytes_total{hostname=~"plat194"}[30m]) * on (instance,device) group_left(asmdev,mpath) node_disk_info{asmdev=~"DATA.*"})

Cron refreshes resolve-ora-disks.prom every hour.
It works :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants