-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large minimum allocation size in AppendAlloc for SSIDS factor data #119
Comments
Many thanks for the detailed investigation @mjacobse, it seems to me that this is trying to second-guess the hardware by assuming a minimum page size of 8MB. Now while that may well have been the case ten years ago I see no reason why that should be the case now and I am generally very much against second-guessing the hardware, that is what the OS is for. From your extensive benchmarks I can see no downside to allowing 0MB page sizes, indeed the improvement is quite substantial for smaller matrices, so I would suggest switching to that as default. @tyronerees your thoughts on this? Also does MA97 do something similar with Page Sizes? If so this could maybe explain the recent memory issues we were seeing. |
I don't think PAGE_SIZE was ever meant to correspond to OS page size (which
is probably something like 4-16KB depending on the OS).
I suspect on systems that support unix-style mmap, you can probably get
more performance by replacing this whole mechanism with something that just
grows memory space using mmap (which may even zero things for you depending
on the exact OS).
Jonathan.
…On Mon, Jun 26, 2023 at 8:22 AM Jari ***@***.***> wrote:
Many thanks for the detailed investigation @mjacobse
<https://github.com/mjacobse>, it seems to me that this is trying to
second-guess the hardware by assuming a minimum page size of 8MB. Now while
that may well have been the case ten years ago I see no reason why that
should be the case now and I am generally very much against second-guessing
the hardware, that is what the OS is for.
From your extensive benchmarks I can see no downside to allowing 0MB page
sizes, indeed the improvement is quite substantial for smaller matrices, so
I would suggest switching to that as default. @tyronerees
<https://github.com/tyronerees> your thoughts on this? Also does MA97 do
something similar with Page Sizes? If so this could maybe explain the
recent memory issues we were seeing.
—
Reply to this email directly, view it on GitHub
<#119 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABAYXX5R7G7Y35YZKKKSZJ3XNGSMNANCNFSM6AAAAAAZSXK42M>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
That would be interesting to try I think, but not really trivial to implement I'd imagine? Never worked with |
Yes agreed, that feels like an enhancement to the existing codebase, I propose we switch to |
The
AppendAlloc
allocator that allocates memory for storing the factor data keeps aPool
ofPage
s of memory. These pages have a minimum size of 8MB:spral/src/ssids/cpu/AppendAlloc.hxx
Lines 67 to 72 in 70c4219
spral/src/ssids/cpu/AppendAlloc.hxx
Lines 87 to 91 in 70c4219
spral/src/ssids/cpu/AppendAlloc.hxx
Lines 18 to 20 in 70c4219
Initially I considered to replace the
calloc
on construction of aPage
with a normalmalloc
and onlymemset
the actually used parts to 0 inPage::allocate
. That did help.But then I wondered what the purpose of this minimum
PAGE_SIZE
actually is. From the naming it seems like the idea might be to always request a whole memory page/virtual page from the operating system at once? If so, without knowing much about memory pages, I don't believe that this is achieved with the current implementation. The fixed constant of 8 MB might not be the actual page size andmalloc
does not seem to be well-equipped to help in this task, something likemmap
would probably have to be used (see this stackoverflow question). Maybe the idea is just to reduce the amount of calls tomalloc
in general. But for that to be worth it, in usual matrices, are there really so many factor nodes to allocate separate blocks of memory for?I ended up running benchmarks with
PAGE_SIZE
= 8MB vs.PAGE_SIZE
= 0MB. I used google/benchmark with this source as runner and ran it on all real symmetric matrices from the SuiteSparse Matrix Collection that did not run into a 2min timeout. Parameters:--benchmark_min_time=1s
,--benchmark_repetitions=10
,OMP_NUM_THREADS=1
The plots below show the relative runtime of the
PAGE_SIZE
= 0MB version in % for each example. The bars show the comparison of the fastest measurement across all repetitions whereas the scattered dots show the comparison of all repetitions pairwise. The matrices are given from left to right in ascending order of fastest measured runtime. More plots of the separate groups can be found here and results for the same benchmark run with 4 threads here.Simply put, for small matrices,
PAGE_SIZE = 0
was unsurprisingly often much faster for me. That improvement grows smaller, the larger the matrices become. But even for larger matrices, there does not seem to be a clear advantage ofPAGE_SIZE
= 8MB. It looks like noise around the baseline to me.I understand that SSIDS is intended especially for really large matrices and that performance on small ones is probably less of a concern (if at all). But with these observations it seems to me that removing the overallocation might give performance on small matrices for free, without influencing performance on larger matrices. But perhaps I am missing something. Is the benchmark flawed in some way? Is there a benefit of overallocating once the matrices become really large? Are there platforms with a different implementation of
malloc
that would benefit? Or certain hardware?Would be happy to hear your thoughts, sorry for the lengthy description.
The text was updated successfully, but these errors were encountered: