Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large minimum allocation size in AppendAlloc for SSIDS factor data #119

Closed
mjacobse opened this issue Jun 24, 2023 · 5 comments · Fixed by #120
Closed

Large minimum allocation size in AppendAlloc for SSIDS factor data #119

mjacobse opened this issue Jun 24, 2023 · 5 comments · Fixed by #120

Comments

@mjacobse
Copy link
Collaborator

The AppendAlloc allocator that allocates memory for storing the factor data keeps a Pool of Pages of memory. These pages have a minimum size of 8MB:

class Pool {
const size_t PAGE_SIZE = 8*1024*1024; // 8MB
public:
Pool(size_t initial_size)
: top_page_(new Page(std::max(PAGE_SIZE, initial_size)))
{}
ptr = top_page_->allocate(sz);
if(!ptr) { // Insufficient space on current top page, make a new one
top_page_ = new Page(std::max(PAGE_SIZE, sz), top_page_);
ptr = top_page_->allocate(sz);
}
So no matter how little memory is required next, if it does not fit on the current top page, an additional 8 MB is allocated and even zero'd:
/** A single fixed size page of memory with allocate function.
* We are required to guaruntee it is zero'd, so use calloc rather than anything
* else for the allocation.
I saw this causing unexpectedly large amounts of page faults for tiny matrices and completely dominating the runtime.

Initially I considered to replace the calloc on construction of a Page with a normal malloc and only memset the actually used parts to 0 in Page::allocate. That did help.

But then I wondered what the purpose of this minimum PAGE_SIZE actually is. From the naming it seems like the idea might be to always request a whole memory page/virtual page from the operating system at once? If so, without knowing much about memory pages, I don't believe that this is achieved with the current implementation. The fixed constant of 8 MB might not be the actual page size and malloc does not seem to be well-equipped to help in this task, something like mmap would probably have to be used (see this stackoverflow question). Maybe the idea is just to reduce the amount of calls to malloc in general. But for that to be worth it, in usual matrices, are there really so many factor nodes to allocate separate blocks of memory for?

I ended up running benchmarks with PAGE_SIZE = 8MB vs. PAGE_SIZE = 0MB. I used google/benchmark with this source as runner and ran it on all real symmetric matrices from the SuiteSparse Matrix Collection that did not run into a 2min timeout. Parameters:

  • --benchmark_min_time=1s, --benchmark_repetitions=10, OMP_NUM_THREADS=1
  • Ubuntu 22.04, gcc 11.2, AMD Ryzen 7 PRO 4750U

The plots below show the relative runtime of the PAGE_SIZE = 0MB version in % for each example. The bars show the comparison of the fastest measurement across all repetitions whereas the scattered dots show the comparison of all repetitions pairwise. The matrices are given from left to right in ascending order of fastest measured runtime. More plots of the separate groups can be found here and results for the same benchmark run with 4 threads here.

Simply put, for small matrices, PAGE_SIZE = 0 was unsurprisingly often much faster for me. That improvement grows smaller, the larger the matrices become. But even for larger matrices, there does not seem to be a clear advantage of PAGE_SIZE = 8MB. It looks like noise around the baseline to me.

I understand that SSIDS is intended especially for really large matrices and that performance on small ones is probably less of a concern (if at all). But with these observations it seems to me that removing the overallocation might give performance on small matrices for free, without influencing performance on larger matrices. But perhaps I am missing something. Is the benchmark flawed in some way? Is there a benefit of overallocating once the matrices become really large? Are there platforms with a different implementation of malloc that would benefit? Or certain hardware?

Would be happy to hear your thoughts, sorry for the lengthy description.

benchmark_comparison_Boeing
benchmark_comparison_all

@jfowkes
Copy link
Contributor

jfowkes commented Jun 26, 2023

Many thanks for the detailed investigation @mjacobse, it seems to me that this is trying to second-guess the hardware by assuming a minimum page size of 8MB. Now while that may well have been the case ten years ago I see no reason why that should be the case now and I am generally very much against second-guessing the hardware, that is what the OS is for.

From your extensive benchmarks I can see no downside to allowing 0MB page sizes, indeed the improvement is quite substantial for smaller matrices, so I would suggest switching to that as default. @tyronerees your thoughts on this? Also does MA97 do something similar with Page Sizes? If so this could maybe explain the recent memory issues we were seeing.

@jhogg41
Copy link
Contributor

jhogg41 commented Jun 27, 2023 via email

@jfowkes
Copy link
Contributor

jfowkes commented Jun 28, 2023

Many thanks @jhogg41 for the clarification! @mjacobse what are your thoughts on an mmap replacement implementation?

@mjacobse
Copy link
Collaborator Author

mjacobse commented Jun 28, 2023

That would be interesting to try I think, but not really trivial to implement I'd imagine? Never worked with mmap or any memory API more lower-level than malloc personally. But would be platform-dependent code with a potential parallel implementation for Windows I suppose? I think the fallback of just using standard malloc and letting its implementation deal with it should probably stay in any case. And as mentioned I get the impression that not doing manual overallocations might be a more natural default for that.

@jfowkes
Copy link
Contributor

jfowkes commented Jun 29, 2023

Yes agreed, that feels like an enhancement to the existing codebase, I propose we switch to PAGE_SIZE = 0MB for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants