Conversation
|
@codesome @roidelapluie Since I have this patch running for almost a month on OpenBSD without any issues (no corruption or compaction errors), any chance for this being reviewed? Output of the |
|
Totally missed this PR, adding to my TODO list |
|
I can confirm your patch works on the ports version, prometheus 2.24.1 on OpenBSD -current (9/15/21 build). |
0c43c0f to
83031e5
Compare
|
@codesome But since this change also affects all other operating systems and workloads I'm not sure how to know there is no impact for those. |
|
This will require some extensive testing than just a review for the reasons that you mentioned, so I have not been able to get to this yet. Can't give an ETA yet, but will try my best to get to this soon. |
|
@codesome would be great to get to this PR :) |
|
Catching up on stuff, will try to dedicate some time to this |
|
Any update on this PR? I'm also having issues with Prometheus on OpenBSD |
|
I've been using this patch on OpenBSD for at least 6 months now with no issues. |
|
Just started using this branch today and so far it seems to have solved the issue |
Switch the port to use the official prometheus-web-ui package to avoid the nightmare of building the react project on the ports infrastructure. Include a patch from an open pull request that works around the issue of TSDB on systems with no unified buffer cache by using mmap both for read and writes. For more info about patch-mmap_openbsd check out: prometheus/prometheus#9085 and prometheus/prometheus#8877 OK sthen@
This still includes a patch to workaround the UBC issues in TSDB. For more info check out prometheus/prometheus#8799 and prometheus/prometheus#9085 OK sthen@
|
We discussed this PR during our bug scrub. |
|
Hitting this again in the bug scrub. @jesusvazquez your input would still be welcome. Feel free to close if you think this is out of reach for now. |
jesusvazquez
left a comment
There was a problem hiding this comment.
@ston1th I finally got a bit of time for this. I like the changes you've made and the fact that there is evidence from other users that his is making Prometheus work under OpenBSD.
To my understanding this branch requires a rebasing plus some testing for the fileutil files you added. Would you be up to get this done?
|
Hi @jesusvazquez yes I can look into it once I find some spare time. Regarding testing and performance and since I lack knowledge and available hardware in this field: do you have an idea on how we could benchmark such a change and see if we get a performance regression (or maybe even a boost)? |
I had in mind to start with unit testing, particularly for your write.go file to make sure that no mistake can mess up the implementation. These would be unit tests under this fileutil directory. Regarding benchmarking we could do a high level benchmark by writing chunks. Once the max chunk size is reached the head chunks are mmapped and a new chunk is created, we can try creating chunks and measuring write speed of prior vs actual implementation. Then we can do a query and see also which implementation is faster. in head_test.go you'll find some examples of how to create mmapped head chunks Lines 3549 to 3555 in 501bc64 |
|
Hello from the bug scrub : @ston1th @jesusvazquez what can be done to move this along? Should we seek out more OpenBSD folks? Also there's not OpenBSD test in CI, is that something feasible. Overall, since we don't have the OpenBSD tests, a way forward would be to rebase the PR, run Prombench and if there's no big regression, merge the change and see feedback from both OpenBSD and non OpenBSD communities. Prioritize fixing the problem over making it efficient on OpenBSD. |
|
Hello again from the bug-scrub! @aknuds1 you planned to look at this. Do you think it will actually happen anytime soon? |
|
@beorn7 I will have to ask @jesusvazquez and @codesome, since they've actually investigated the PR. |
|
Sorry, I forgot about this. |
|
@ston1th Thanks! I will await your rebasing the PR then, before pinging the others. |
This is a possible fix for the issues prometheus#8799 and prometheus#8877. Changes made: * introduce two mmaps: * mmapRw: to read and write an mmaped region * mmapRo: to read from an mmaped region * implement a Writer interface for the RW mmap: `MmapWriter` * this is used to write the index data instead of using the underlying file descriptor * the promql `ActiveQueryTracker` has been rewritten to also use the `MmapWriter` * the dependency `github.com/edsrzf/mmap-go` has been removed * the test `TestDBReadOnly` is still broken on OpenBSD. This is caused by the early call to `dbWritable.Close()`. This closes mmaps in the background which are accessed later here which causes a segfault: `require.Equal(t, expChunks, readOnlySeries, ...` I ran the promql, tsdb and web tests with these changes on OpenBSD (amd64), Linux (amd64) and Windows (amd64). All tests (except TestDBReadOnly on OpenBSD) pass successfully. Signed-off-by: ston1th <ston1th@giftfish.de>
|
@aknuds1 Sorry for the delay. The latest version is now available. |
|
Can you make CI pass first, please? |
Signed-off-by: ston1th <ston1th@giftfish.de>
|
I've asked @jesusvazquez and @codesome whether they can re-review. |
|
Generally we have been reducing the amount of memory-mapped IO, because it does not play well with the Go scheduler. I see the motivation for this is OpenBSD; it might be acceptable to do more mmap on that one platform but I don't think we should do it on Linux. |
|
@bboreham as of now though prometheus uses this mmap on linux (and every other OS). My plan was to work around the underlying issue of openbsd not having a unified buffer cache (file and mmap I/O share the same memory in the kernel). This could probably also be solved by rewriting the reader part to not use mmap but to read the file. Im not sure about the impact this has on the index, though. Maybe there was a reason to use mmap for read access. |
|
Hello, sorry I didn't spot this reply in March. Yes, changing both reading and writing to use file APIs would be the most consistent, but also the most work.
I wasn't there, but my guess is the reason was that it seems magical. I will also mention #15365 which adds direct I/O (skipping the buffer cache) for writing chunks; perhaps this could be extended to write the index and also extended to work on OpenBSD. |
|
This came round again at the bug-scrub. |
This is a possible fix for the issues #8799 and #8877.
Changes made:
MmapWriterunderlying file descriptor
ActiveQueryTrackerhas been rewritten to also usethe
MmapWritergithub.com/edsrzf/mmap-gohas been removedTestDBReadOnlyis still broken on OpenBSD. This is causedby the early call to
dbWritable.Close(). This closes mmaps inthe background which are accessed later here which causes a segfault:
require.Equal(t, expChunks, readOnlySeries, ...I ran the promql, tsdb and web tests with these changes on OpenBSD
(amd64), Linux (amd64) and Windows (amd64).
All tests (except TestDBReadOnly on OpenBSD) pass successfully.