Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement process collector for Windows #596

Merged
merged 3 commits into from Jun 14, 2019

Conversation

Projects
None yet
3 participants
@carlpett
Copy link
Contributor

commented Jun 6, 2019

This PR implements a process collector for Windows, resolves #376.

Some discussion points:

  • The process_cpu_seconds_total uses an underlying data source that is, from my understanding, incremented with a resolution of ~16 ms. There might be better APIs to use? The ones I've found work with cycles, though, which isn't that easy to convert to seconds.
  • There aren't really "file descriptors" on Windows. I did an interpretation of it to be handles (which covers lots of different things in addition to files) instead. And there is a hard-coded max of 16M handles per process.
  • I'm not fully sure I did the mapping for Linux memory concepts to Windows correctly. Would love a sanity check there.
Implement process collector for Windows
Signed-off-by: Calle Pettersson <carlpett@users.noreply.github.com>

@carlpett carlpett force-pushed the carlpett:windows-process-metrics branch from 8652d6b to 427315c Jun 6, 2019

@carlpett

This comment has been minimized.

Copy link
Contributor Author

commented Jun 7, 2019

ping @beorn7, what do you think? (Apart from my apparently having missed something with go modules, I'll fix that)

@beorn7
Copy link
Member

left a comment

Looks very cool at a first glance. I'll have a closer look ASAP.

Also, it would make sense to have somebody with some Windows foo looking at this. (Anybody?)

@beorn7

This comment has been minimized.

Copy link
Member

commented Jun 11, 2019

The test failures are because you haven't updated the go.mod file. (Let me know if you need help with the Go modules part.)

}
ch <- MustNewConstMetric(c.vsize, GaugeValue, float64(mem.WorkingSetSize))
ch <- MustNewConstMetric(c.maxVsize, GaugeValue, float64(mem.PeakWorkingSetSize))
ch <- MustNewConstMetric(c.rss, GaugeValue, float64(mem.PrivateUsage))

This comment has been minimized.

Copy link
@brian-brazil

brian-brazil Jun 11, 2019

Member

I think RSS is WorkingSetSize

This comment has been minimized.

Copy link
@beorn7

beorn7 Jun 11, 2019

Member

Yeah, the Windows terminology is kind of weird. WSS is something else than RSS, but apparently, when Windows says "WSS", it means "RSS".

This comment has been minimized.

Copy link
@carlpett

carlpett Jun 11, 2019

Author Contributor

Yes, that seems correct.

The "working set" of a process is the set of memory pages currently visible to the process in physical RAM memory. These pages are resident and available for an application to use without triggering a page fault

@beorn7
Copy link
Member

left a comment

Left a few comments.

This looks good in general, we just need to figure out the memory nomenclature of MS Windows.

Show resolved Hide resolved prometheus/process_collector_other.go Outdated
Show resolved Hide resolved prometheus/process_collector_windows.go
return
}
ch <- MustNewConstMetric(c.vsize, GaugeValue, float64(mem.WorkingSetSize))
ch <- MustNewConstMetric(c.maxVsize, GaugeValue, float64(mem.PeakWorkingSetSize))

This comment has been minimized.

Copy link
@beorn7

beorn7 Jun 11, 2019

Member

maxVsize is the maximum possible virtual memory size, not the observed peak. Not sure if there is a way to get this from somewhere on MS Windows. Perhaps it has to be hardcoded depending on whether it's 64bit or 32bit.

This comment has been minimized.

Copy link
@carlpett

carlpett Jun 11, 2019

Author Contributor

It seems it might depend on OS version as well. I found this source https://blogs.technet.microsoft.com/markrussinovich/2008/11/17/pushing-the-limits-of-windows-virtual-memory/ where they claim 8TB on 64 bit. They have a reference table which goes to Windows 2012. On my Windows lab host, I could reserve 128TB, though.

I'll see if this is available through some API, but otherwise I don't know. Not super keen on the idea of maintaining a lookup table which is hard to know when it needs to be updated.

This comment has been minimized.

Copy link
@carlpett

carlpett Jun 11, 2019

Author Contributor

Alternatively, on Linux it seems we represent unlimited with -1. Might make sense here too?

This comment has been minimized.

Copy link
@brian-brazil

brian-brazil Jun 11, 2019

Member

We could not export it. I doubt there's many 32bit systems out there that would care (and it's a hardcoded value if they need it).

This comment has been minimized.

Copy link
@carlpett

carlpett Jun 12, 2019

Author Contributor

That's also an option, of course. But 32-bit isn't actually something we can hardcode, if I understand it correctly. The value will depend on if the OS is 32 bit or not, and if booted with a 2/2 GB or 3/1 GB split between OS and application. On a 64 bit OS, it depends on if the IMAGE_FILE_LARGE_ADDRESS_AWARE flag is set on the binary by the go compiler, which seems to be the case: https://github.com/golang/go/blob/e883d000f4ce0c47711c3a7c59df8bb2f0ec557f/src/cmd/link/internal/ld/pe.go#L785-L788

This comment has been minimized.

Copy link
@carlpett

carlpett Jun 12, 2019

Author Contributor

But again, it is not clear if those hoops are worth jumping through.

This comment has been minimized.

Copy link
@beorn7

beorn7 Jun 12, 2019

Member

In doubt, let's just not export this metric, as Brian suggested.

This comment has been minimized.

Copy link
@carlpett

carlpett Jun 13, 2019

Author Contributor

Sounds good to me. In that case, should we drop the "max fds" too? That value is pretty uninteresting (although "correct"), and I added it to try to keep parity.

This comment has been minimized.

Copy link
@beorn7

beorn7 Jun 13, 2019

Member

Yes, let's keep it as it is the technically correct value and doesn't imply any maintenance overhead to keep it correct.

}
ch <- MustNewConstMetric(c.vsize, GaugeValue, float64(mem.WorkingSetSize))
ch <- MustNewConstMetric(c.maxVsize, GaugeValue, float64(mem.PeakWorkingSetSize))
ch <- MustNewConstMetric(c.rss, GaugeValue, float64(mem.PrivateUsage))

This comment has been minimized.

Copy link
@beorn7

beorn7 Jun 11, 2019

Member

Yeah, the Windows terminology is kind of weird. WSS is something else than RSS, but apparently, when Windows says "WSS", it means "RSS".

c.reportError(ch, nil, err)
return
}
ch <- MustNewConstMetric(c.vsize, GaugeValue, float64(mem.WorkingSetSize))

This comment has been minimized.

Copy link
@beorn7

beorn7 Jun 11, 2019

Member

Concluding from the comments below, the WorkingSetSize is more like the RSS. I have no clue how to get something like the vsize on Windows.

Show resolved Hide resolved prometheus/process_collector_windows.go Outdated
@beorn7

This comment has been minimized.

Copy link
Member

commented Jun 11, 2019

Wild guess: For vsize, we have to check PagefileUsage. If it is zero, we have to use PrivateUsage.

But it would be really good if somebody with a good understanding of Windows's memory management could help out.

Update with review comments
Signed-off-by: Calle Pettersson <carlpett@users.noreply.github.com>

@carlpett carlpett force-pushed the carlpett:windows-process-metrics branch from 51cceab to 5c67f39 Jun 11, 2019

@carlpett

This comment has been minimized.

Copy link
Contributor Author

commented Jun 11, 2019

@beorn7 From my understanding of PROCESS_MEMORY_COUNTERS_EX, PrivateUsage and PagefileUsage are the same, but the latter is deprecated and always zero (as some sort of backwards compatibility with the non-EX PROCESS_MEMORY_COUNTERS, I suppose)?

@beorn7

This comment has been minimized.

Copy link
Member

commented Jun 11, 2019

It seemed to me that PagefileUsage could be non-zero on older Windows version (which then would not have PrivateUsage?). The MS documentation is not really clear to me. Feels like a blast from the past...

@carlpett

This comment has been minimized.

Copy link
Contributor Author

commented Jun 12, 2019

Yeah, it is weirdly formulated. My interpretation is that PagefileUsage=PrivateUsage, except on older versions where PagefileUsage=0 (but PrivateUsage is still set). But I may be wrong.

@beorn7

This comment has been minimized.

Copy link
Member

commented Jun 13, 2019

In lack of better information, let's do it as discussed. I think the only remaining change is to remove the c.maxVsize metric. Then we can merge this.

@carlpett carlpett changed the title WIP: Implement process collector for Windows Implement process collector for Windows Jun 14, 2019

Remove maxVsize
Signed-off-by: Calle Pettersson <calle@cape.nu>

@carlpett carlpett force-pushed the carlpett:windows-process-metrics branch from e397318 to 09741ab Jun 14, 2019

@carlpett

This comment has been minimized.

Copy link
Contributor Author

commented Jun 14, 2019

Done!

@beorn7

beorn7 approved these changes Jun 14, 2019

@beorn7 beorn7 merged commit c5f4190 into prometheus:master Jun 14, 2019

2 checks passed

DCO DCO
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@beorn7

This comment has been minimized.

Copy link
Member

commented Jun 14, 2019

Thanks 1M!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.