Skip to content

tsdb: index write via mmap#9085

Closed
ston1th wants to merge 2 commits intoprometheus:mainfrom
ston1th:mmap_openbsd
Closed

tsdb: index write via mmap#9085
ston1th wants to merge 2 commits intoprometheus:mainfrom
ston1th:mmap_openbsd

Conversation

@ston1th
Copy link
Copy Markdown

@ston1th ston1th commented Jul 13, 2021

This is a possible fix for the issues #8799 and #8877.

Changes made:

  • introduce two mmaps:
    • mmapRw: to read and write an mmaped region
    • mmapRo: to read from an mmaped region
  • implement a Writer interface for the RW mmap: MmapWriter
    • this is used to write the index data instead of using the
      underlying file descriptor
  • the promql ActiveQueryTracker has been rewritten to also use
    the MmapWriter
    • the dependency github.com/edsrzf/mmap-go has been removed
  • the test TestDBReadOnly is still broken on OpenBSD. This is caused
    by the early call to dbWritable.Close(). This closes mmaps in
    the background which are accessed later here which causes a segfault:
    require.Equal(t, expChunks, readOnlySeries, ...

I ran the promql, tsdb and web tests with these changes on OpenBSD
(amd64), Linux (amd64) and Windows (amd64).
All tests (except TestDBReadOnly on OpenBSD) pass successfully.

@ston1th
Copy link
Copy Markdown
Author

ston1th commented Aug 23, 2021

@codesome @roidelapluie Since I have this patch running for almost a month on OpenBSD without any issues (no corruption or compaction errors), any chance for this being reviewed?

Output of the /status page:

Runtime Information
Start time	Mon, 26 Jul 2021 06:33:00 GMT
Working directory	/
Configuration reload	Successful
Last successful configuration reload	2021-07-26T06:33:01Z
WAL corruptions	0
Goroutines	43
GOMAXPROCS	2
GOGC	
GODEBUG	
Storage retention	30d

@codesome
Copy link
Copy Markdown
Member

Totally missed this PR, adding to my TODO list

@brianatpu
Copy link
Copy Markdown

I can confirm your patch works on the ports version, prometheus 2.24.1 on OpenBSD -current (9/15/21 build).

@ston1th ston1th force-pushed the mmap_openbsd branch 3 times, most recently from 0c43c0f to 83031e5 Compare October 9, 2021 18:01
@ston1th
Copy link
Copy Markdown
Author

ston1th commented Oct 9, 2021

@codesome
For me this is running pretty fine on OpenBSD and my rather small workload.

But since this change also affects all other operating systems and workloads I'm not sure how to know there is no impact for those.
I think there needs to be some good testing on the other platforms and different workloads to be sure we don't introduce leaks, performance issues or other types of regressions.

@codesome
Copy link
Copy Markdown
Member

This will require some extensive testing than just a review for the reasons that you mentioned, so I have not been able to get to this yet. Can't give an ETA yet, but will try my best to get to this soon.

@stale stale bot added the stale label Dec 12, 2021
@stale stale bot removed the stale label Jan 8, 2022
@roidelapluie
Copy link
Copy Markdown
Member

@codesome would be great to get to this PR :)

@codesome
Copy link
Copy Markdown
Member

codesome commented Feb 9, 2022

Catching up on stuff, will try to dedicate some time to this

@dantecatalfamo
Copy link
Copy Markdown

Any update on this PR? I'm also having issues with Prometheus on OpenBSD

@renatoaguiar
Copy link
Copy Markdown

I've been using this patch on OpenBSD for at least 6 months now with no issues.

@dantecatalfamo
Copy link
Copy Markdown

Just started using this branch today and so far it seems to have solved the issue

bob-beck pushed a commit to openbsd/ports that referenced this pull request Jun 28, 2022
Switch the port to use the official prometheus-web-ui package to avoid
the nightmare of building the react project on the ports infrastructure.

Include a patch from an open pull request that works around the issue
of TSDB on systems with no unified buffer cache by using mmap both for
read and writes. For more info about patch-mmap_openbsd check out:
   prometheus/prometheus#9085
and
   prometheus/prometheus#8877
OK sthen@
bob-beck pushed a commit to openbsd/ports that referenced this pull request Sep 13, 2022
This still includes a patch to workaround the UBC issues in TSDB.
For more info check out
prometheus/prometheus#8799 and
prometheus/prometheus#9085

OK sthen@
@beorn7
Copy link
Copy Markdown
Member

beorn7 commented Aug 29, 2023

We discussed this PR during our bug scrub.
We think it would still be nice to get this done, so to make Prometheus work again on OpenBSD. But the interface changes here are fairly invasive. @jesusvazquez do you think this is something we can do?

@beorn7
Copy link
Copy Markdown
Member

beorn7 commented Feb 1, 2024

Hitting this again in the bug scrub. @jesusvazquez your input would still be welcome. Feel free to close if you think this is out of reach for now.

Copy link
Copy Markdown
Member

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ston1th I finally got a bit of time for this. I like the changes you've made and the fact that there is evidence from other users that his is making Prometheus work under OpenBSD.

To my understanding this branch requires a rebasing plus some testing for the fileutil files you added. Would you be up to get this done?

@ston1th
Copy link
Copy Markdown
Author

ston1th commented Feb 3, 2024

Hi @jesusvazquez yes I can look into it once I find some spare time.

Regarding testing and performance and since I lack knowledge and available hardware in this field: do you have an idea on how we could benchmark such a change and see if we get a performance regression (or maybe even a boost)?

@jesusvazquez
Copy link
Copy Markdown
Member

Regarding testing and performance and since I lack knowledge and available hardware in this field: do you have an idea on how we could benchmark such a change and see if we get a performance regression (or maybe even a boost)?

I had in mind to start with unit testing, particularly for your write.go file to make sure that no mistake can mess up the implementation. These would be unit tests under this fileutil directory.

Regarding benchmarking we could do a high level benchmark by writing chunks. Once the max chunk size is reached the head chunks are mmapped and a new chunk is created, we can try creating chunks and measuring write speed of prior vs actual implementation. Then we can do a query and see also which implementation is faster.

in head_test.go you'll find some examples of how to create mmapped head chunks

prometheus/tsdb/head_test.go

Lines 3549 to 3555 in 501bc64

head.mmapHeadChunks()
}
// There should be 20 mmap chunks in s1.
ms := head.series.getByHash(s1.Hash(), s1)
require.Len(t, ms.mmappedChunks, 25)
expMmapChunks := make([]*mmappedChunk, 0, 20)

@krajorama
Copy link
Copy Markdown
Member

Hello from the bug scrub : @ston1th @jesusvazquez what can be done to move this along? Should we seek out more OpenBSD folks? Also there's not OpenBSD test in CI, is that something feasible.

Overall, since we don't have the OpenBSD tests, a way forward would be to rebase the PR, run Prombench and if there's no big regression, merge the change and see feedback from both OpenBSD and non OpenBSD communities. Prioritize fixing the problem over making it efficient on OpenBSD.

@aknuds1 aknuds1 self-requested a review September 16, 2024 19:50
@beorn7
Copy link
Copy Markdown
Member

beorn7 commented Mar 4, 2025

Hello again from the bug-scrub!

@aknuds1 you planned to look at this. Do you think it will actually happen anytime soon?

@aknuds1
Copy link
Copy Markdown
Contributor

aknuds1 commented Mar 4, 2025

@beorn7 I will have to ask @jesusvazquez and @codesome, since they've actually investigated the PR.

@ston1th
Copy link
Copy Markdown
Author

ston1th commented Mar 4, 2025

Sorry, I forgot about this.
I already prepared a rebase but wanted to do a review before the push.
Will do this soon so we can continue on a updated version.

@aknuds1
Copy link
Copy Markdown
Contributor

aknuds1 commented Mar 5, 2025

@ston1th Thanks! I will await your rebasing the PR then, before pinging the others.

This is a possible fix for the issues prometheus#8799 and prometheus#8877.

Changes made:

* introduce two mmaps:
  * mmapRw: to read and write an mmaped region
  * mmapRo: to read from an mmaped region
* implement a Writer interface for the RW mmap: `MmapWriter`
  * this is used to write the index data instead of using the
    underlying file descriptor
* the promql `ActiveQueryTracker` has been rewritten to also use
  the `MmapWriter`
  * the dependency `github.com/edsrzf/mmap-go` has been removed
* the test `TestDBReadOnly` is still broken on OpenBSD. This is caused
  by the early call to `dbWritable.Close()`. This closes mmaps in
  the background which are accessed later here which causes a segfault:
  `require.Equal(t, expChunks, readOnlySeries, ...`

I ran the promql, tsdb and web tests with these changes on OpenBSD
 (amd64), Linux (amd64) and Windows (amd64).
All tests (except TestDBReadOnly on OpenBSD) pass successfully.

Signed-off-by: ston1th <ston1th@giftfish.de>
@ston1th
Copy link
Copy Markdown
Author

ston1th commented Mar 11, 2025

@aknuds1 Sorry for the delay. The latest version is now available.

@aknuds1
Copy link
Copy Markdown
Contributor

aknuds1 commented Mar 12, 2025

Can you make CI pass first, please?

Signed-off-by: ston1th <ston1th@giftfish.de>
@aknuds1
Copy link
Copy Markdown
Contributor

aknuds1 commented Mar 18, 2025

I've asked @jesusvazquez and @codesome whether they can re-review.

@bboreham
Copy link
Copy Markdown
Member

Generally we have been reducing the amount of memory-mapped IO, because it does not play well with the Go scheduler.

I see the motivation for this is OpenBSD; it might be acceptable to do more mmap on that one platform but I don't think we should do it on Linux.

@ston1th
Copy link
Copy Markdown
Author

ston1th commented Mar 19, 2025

@bboreham as of now though prometheus uses this mmap on linux (and every other OS).

My plan was to work around the underlying issue of openbsd not having a unified buffer cache (file and mmap I/O share the same memory in the kernel).
Currently the index in written via file I/O and read via mmap. So I tried to keep the reading mmap part and work around the issue by also writing to the same mmap.

This could probably also be solved by rewriting the reader part to not use mmap but to read the file.
If the overall plan is to use less to no mmap's going forward, then maybe this should be considered.

Im not sure about the impact this has on the index, though. Maybe there was a reason to use mmap for read access.

@bboreham
Copy link
Copy Markdown
Member

Hello, sorry I didn't spot this reply in March.

Yes, changing both reading and writing to use file APIs would be the most consistent, but also the most work.
This blog explains in more detail why mmap can cause problems in Go programs: https://valyala.medium.com/mmap-in-go-considered-harmful-d92a25cb161d

Maybe there was a reason to use mmap for read access.

I wasn't there, but my guess is the reason was that it seems magical.

I will also mention #15365 which adds direct I/O (skipping the buffer cache) for writing chunks; perhaps this could be extended to write the index and also extended to work on OpenBSD.

@bboreham
Copy link
Copy Markdown
Member

bboreham commented Jan 6, 2026

This came round again at the bug-scrub.
Closing, since the interaction between memory-mapped IO and the Go scheduler is so bad that we will never want to increase the usage of mmap in Prometheus.

@bboreham bboreham closed this Jan 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.