Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments for "https://os.phil-opp.com/allocator-designs/" #720

Closed
utterances-bot opened this issue Jan 20, 2020 · 46 comments
Closed

Comments for "https://os.phil-opp.com/allocator-designs/" #720

utterances-bot opened this issue Jan 20, 2020 · 46 comments

Comments

@utterances-bot
Copy link

utterances-bot commented Jan 20, 2020

This is a general purpose comment thread for the Allocator Designs post.

Copy link

Great!

Copy link

Thanks for the post! As usual, it is very interesting!

Since the Heap type of the linked_list_allocator crate does not implement GlobalAlloc (as it's not possible without locking).

looks like typo

Copy link

engstad commented Jan 20, 2020

The align_up function could be improved by removing the if-test. Notice that if addr is already aligned, then (addr + align - 1) / align * align equals addr, whereas if it is not, then it equals (addr + align)/ align*align or addr/align*align + align or align_down(addr) + align.

fn align_up(addr: usize, align: usize) -> usize {
    (addr + align - 1) / align * align;
}

Or, if you use powers of two as argument instead:

fn align_up_pow2(addr: usize, align_pow2: usize) -> usize {
    (addr + align - 1) >> align_pow2 << align_pow2
}

Copy link

engstad commented Jan 20, 2020

I should mention that when using a BumpAllocator, that you could have extended the functionality a bit.

First, while you normally can't delete entries from the allocator, you can delete the last entry. In other words, if you know what you are doing, you can allow a reset() function that takes the address of an allocation resetting the allocation to that point. Highly dangerous if you don't know what is going on, but you can safeguard it somewhat if you know that it was the last allocation. This makes it a "stack" allocator.

Second, it is common practice to allocate from both the front and the back of the memory region. This is useful where some of the allocations are temporary. You put temporary allocations at the end, while longer-lasting allocation are put in the front.

Anyway, both these methods are quite dangerous, but in terms of raw speed, there's no comparison.

@phil-opp phil-opp changed the title https://os.phil-opp.com/allocator-designs/ Comments for "https://os.phil-opp.com/allocator-designs/" Jan 21, 2020
@phil-opp
Copy link
Owner

@engstad

Thanks for your comments!

The align_up function could be improved by removing the if-test.

Good point! I'll extend that section. The fastest variant I'm aware of is relying on the optimized align_offset implementation of Rust's standard library:

#[no_mangle]
fn align_up(addr: usize, align: usize) -> usize {
    let offset = (addr as *const u8).align_offset(align);
    addr + offset
}

First, while you normally can't delete entries from the allocator, you can delete the last entry. In other words, if you know what you are doing, you can allow a reset() function that takes the address of an allocation resetting the allocation to that point

One could also check in deallocate whether the end address of the freed allocation equals next. However, this way we could not recover inserted allocation bytes.

Second, it is common practice to allocate from both the front and the back of the memory region. This is useful where some of the allocations are temporary. You put temporary allocations at the end, while longer-lasting allocation are put in the front.

Good point! While this is difficult to implement for a global allocator, it definitely works for manual allocations.

I try to update the bump allocator discussion section with both your suggestions.

@phil-opp
Copy link
Owner

@engstad I prepared two updates for the post related to your comments. #721 adds more variants to implement align_up and shortly discusses their performance. #722 outlines the two bump allocator improvements you mentioned. What do you think about the changes?

@phil-opp
Copy link
Owner

@senseiod Thanks!

@phil-opp
Copy link
Owner

@MikailBag Thank you! Could you maybe clarify what the typo is? I don't see it right now…

Copy link

amosonn commented Jan 26, 2020

Nice comparison, thanks!

For more on bump allocation, see this post (in Rust, even):
https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.html
To summarize, his rounding method is:
(size + align - 1) & !(align - 1)
(which relies on align being a power of 2 of course); and he comments that allocation arithmetic should be checked for overflows (which is the case for all allocators), and that then bumping from the end is more performant.

Also small typo in the beginning:
"This complexity is often undesired [...]"
(the "is" is missing).

@phil-opp
Copy link
Owner

@amosonn Thanks for you comment!

For more on bump allocation, see this post (in Rust, even):
https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.html

I already link this post in the Dicussion section as "can be optimized to just a few assembly operations". I deliberatly decided against bumping from the end because the intention of the post is to explain a basic implementation, not to maximally optimize it. Regarding the alignment function: I think the align_offset function from the standard library should still be faster, given how optimized it is (see rust-lang/rust#50319).

allocation arithmetic should be checked for overflows (which is the case for all allocators)

Good point! I'll prepare a PR to fix this.

Also small typo in the beginning:

Thanks! Fixed in 4b8c902.

@phil-opp
Copy link
Owner

@amosonn

allocation arithmetic should be checked for overflows (which is the case for all allocators)

Good point! I'll prepare a PR to fix this.

I filed pull requests #726 and #727.

@amosonn
Copy link

amosonn commented Jan 27, 2020

Ah sorry, I missed the link :).

Regarding the various alignment implementations: align_offset does something stronger: it checks how many elements of an arbitrary size (mem::size_of::<T>()) should be added to align. In this case, you only need to compute this for stride 1, which devolves in that implementaion to something quite similar to the method above (but with a branch for already aligned pointers). Most of the complexity in that method is for computing for "stranger" sizes.

@phil-opp
Copy link
Owner

Thanks for investigating! I'll update #721 to use the (size + align - 1) & !(align - 1) method.

@Menschenkindlein
Copy link
Contributor

In the code for LinkedListAllocator, self.inner.lock() => self.lock()

@phil-opp
Copy link
Owner

@Menschenkindlein Thanks! Fixed in 00fedc8 and 670ac60.

Copy link

Thank you so much for these posts!
They are a real treasure trove of information presented with such clarity.
I can only imagine how much time it takes to craft the diagrams and the logical progression steps.

And all of this integrated into an actual working/testable(!) system, not just bits and pieces without the oh-so-necessary glue to make them work together.

I'm a beginner in both Rust and the nitty-gritty details of a kernel and this is the best source of information for it I ever found, period.

I'm looking forward for the next post!

@phil-opp
Copy link
Owner

@eagle2com Thank you so much for your comment! Yes, it is a lot of work to create these posts, but comments like yours make it definitely worth it :). It's great to hear that the blog is useful to you as a beginner!

Copy link

Hello!

I have a question about memory alignment. When we want to allocate a memory region with a specific alignment, we might increment the starting address of a memory region, and we return that address in our alloc function.

Then when the dealloc gets called we will get that same address. Must we not, somehow, determine by how much we had to increment the start of the list node in order to get it to align? Otherwise don't we leak that amount of memory?

Thank you!

Copy link

Regarding LinkedListAllocator, it seems that if align_up will return aligned value alloc_start higher than region.start_addr() inside alloc_from_region, then alloc_start - region.start_addr() amount of bytes will be permanently lost.

Copy link

That's what I think too. If the address must be aligned, then a new block must be created when allocating, which will start at the original block's address and end before the aligned address. This means that the space between the original block's address and the aligned address must be big enough to hold a block.

Alternatively before the returned address the original address could be stored in memory, which would waste some space.

@Lowentwickler
Copy link

There is no need in the new block, because the block is already there. But alloc_start must be moved to higher addresses instead.

@phil-opp
Copy link
Owner

@Mandragorian @Lowentwickler Good catch, thanks a lot for reporting! Seems like I forgot to handle this case. (I handle it in the linked_list_allocator crate, but forgot it in the simplified example.)

Unfortunately, I don't have the time to fix it right now because it might require some non-trivial code changes. I therefore opened #796 to track this and I will do my best to fix it soon.

Copy link

PoignardAzur commented May 7, 2020

This is pretty good! Reminds me of my CS university's course on re-implementing malloc.

Question: do you have any resources you would recommend for someone looking to reimplement a slightly more complex allocator?

I'm not looking do to anything state-of-the-art, I'm just itching to implement something a little more clean, with less fragmentation and better time complexity, with multiple fixed-sized pools and fallback allocators and the like.

@phil-opp
Copy link
Owner

phil-opp commented May 7, 2020

Great to hear that you like it. Unfortunately I'm not aware of any such resources. I would recommend looking at the code and documentation of existing allocator such as jemalloc.

Copy link

danwt commented May 14, 2020

impl LinkedListAllocator {
    /// Adds the given memory region to the front of the list.
    unsafe fn add_free_region(&mut self, addr: usize, size: usize) {
        // ensure that the freed region is capable of holding ListNode
        assert!(align_up(addr, mem::align_of::<ListNode>()) == addr);
        assert!(size >= mem::size_of::<ListNode>());

        // create a new list node and append it at the start of the list
        let mut node = ListNode::new(size);
        node.next = self.head.next.take();
        let node_ptr = addr as *mut ListNode;
        node_ptr.write(node);

In this code, where does the mut ListNode 'node' live? I know this must be my poor understanding of Rust, does the fact that ListNode::new is const have an effect?

@phil-opp
Copy link
Owner

@danwt The ListNode node is a normal struct stored in a local variable, so it lives on the stack. This isn't a problem because it isn't part of the list yet. We then write it into the freed memory block in the node_ptr.write(node) line.

does the fact that ListNode::new is const have an effect?

The only effect of const here is that the function can be used to initialize static variables. The const basically tells the compiler: "this is safe to create at compile time" (i.e. it doesn't use any unsupported operations internally).

@danwt
Copy link

danwt commented May 15, 2020

I see, thank you!

Copy link

Hi Phil! Might be useful mentioning, in brackets, after the names of some allocators other names they go by. For example, when I've come across a bump-allocator before it was always referred to as a "stack allocator", also the fixed size block allocator may be known as a "pool" allocator. For us with plenty of experience we know this, but for less experienced types it may be helpful to learn synonyms for the concepts you present here.
Cheers,
Lloyd

@phil-opp
Copy link
Owner

@RoidoChan That's a good idea. Do you know any good resource that gives an overview over common names? Normally wikipedia has something like this, but the articles on memory allocation are fairly general. Also, it seems like some names are not really well-defined and used for quite different kinds of allocators (e.g. slab allocator).

Copy link

@phil-opp, hmmm not off the top of my head. A cursory google brings up many cs course web pages, and a wikipedia page about mem allocation, but if searching for "mem allocator list", well, of course you get lots of articles about linked list allocators!

@phil-opp
Copy link
Owner

@RoidoChan Ok, I think it makes sense to just stick to the most common names then. I just pushed 5e37a0e to at least mention the "stack allocator" and "pool allocator" names you suggested.

Copy link

After going through the various allocator designs and following along with the implementation, it occurred to me that it would be useful to have a test which ensured that the block allocator is correctly falling back to the linked-list allocator. I wanted to post what I came up with to sanity-check my code as I'm not 100% confident that what I have written actually will correctly allocate a large block of memory on the heap.

#[test_case]
fn large_allocation() {
  serial_print!("large_allocation... ");

  let mut arr: [usize; 300] = [0; 300];
  for i in 0..arr.len() {
    arr[i] = i;
  }

  let x = Box::new(arr);
  for i in 0..arr.len() {
    assert_eq!(arr[i], x[i]);
  }

  serial_println!("[ok]");
}

I know that the size of this array should be 2400 bytes (usize * 300), which I have verified with core::mem::size_of. My understanding is that creating a Box around this array will therefore allocate 2400 bytes on the heap (thus validating that the fallback allocator did its job). Am I going about this right, and how could/should I verify for myself that the heap allocation did take place? I assume there is a way, but I'm just not seeing it due to being pretty far in the deep end of my programming abilities here. :)

Loved the series on memory btw! It's probably the hardest thing to wrap my head around yet, but with a lot of re-reading I think I'm coming to understand how everything fits together here.

@amosonn
Copy link

amosonn commented Jul 21, 2020

@drzewiec Box-wrapping indeed allocates the required memory. The question of "how to check that the memory was allocated" depends on what is the alternative you are trying to discount. All the suggstions I make here are however fit for demonstrating that it happend, not for writing a unit test for this (except perhaps the last one).

The most direct way to verify for yourself that the fallback allocator code is running is to put a debug print there. This is a little tricky, but you can look here for hints: https://fasterthanli.me/articles/small-strings-in-rust (a nice, but rather long article; the part with prints from the allocator is quite at the beginning). This is a bit of work though.

Simpler methods - but which will all only tell you memory was allocated, not by which allocator - include:

  • Checking that size_of::<Box<[usize; 300]>>() == 8 while size_of::<[usize; 300]>() == 2400. This only tells you it's a pointer type, also size_of::<&[usize; 300]>() == 8.
  • Looking at Box::new(arr) as *[usize; 300] as usize. It should be different than &arr as *[usize; 300] as usize, but that could still imply stack allocation; but it should also "look" different, because it's quite elsewhere in memory. You can use this also to check which allocator was used, if you know that they are working in different regions of memory.
  • Returning the array or a Box from one function to the other, and comparing the pointers. The array should move, as it was in the stackframe of the callee which doesn't exist anymore, whereas the Box stays safely in the heap. You should mark the function with #[inline(never)] to make sure it never gets inlined. Rust make still optimize this so that the array is already allocated in the caller's stack, but this is unlikely, I don't know if this optimization exists; if this happens, you can just make the function more complicated, e.g. making two arrays and then conditionally returning one.

This last one is something that you can actually write a unit test for; however this only checks that some heap allocation was done, not by which allocator.

Hope this helps!

@drzewiec
Copy link

@amosonn That does help, thank you! I am somewhat ashamed that I didn't think of putting a debug print in the allocator code. I appreciate all the alternatives, as well!

Copy link

First off, thanks for the excellent series, it's been extremely interesting to go through!

Just a small note, if you skip the implementation of the linked list allocator, you won't already have:

#![feature(const_fn)]

in your lib.rs. Just something I ran into.

phil-opp added a commit that referenced this issue Aug 19, 2020
…eBlockAllocator

The post explicitly allows readers to skip the `LinkedListAllocator` implementation, so we should not rely that the reader already enabled the unstable `const_fn` function there.

Reported in #720 (comment).
@phil-opp
Copy link
Owner

@diminishedprime Thanks for reporting! I pushed 10d84fa to fix this issue.

@jiayihu
Copy link

jiayihu commented Sep 11, 2020

As small note, list_index() should be FixedSizeBlockAllocator::list_index in the GlobalAlloc implementation of FixedSizeBlockAllocator

@phil-opp
Copy link
Owner

phil-opp commented Oct 8, 2020

@jiayihu Sorry for the late reply!

As small note, list_index() should be FixedSizeBlockAllocator::list_index in the GlobalAlloc implementation of FixedSizeBlockAllocator

This depends on whether you declare the list_index function inside an impl FixedSizeBlockAllocator block or as a normal independent function. I implemented it as a independent function in this post, but putting it in the impl block is fine too of course.

See:

/// Choose an appropriate block size for the given layout.
///
/// Returns an index into the `BLOCK_SIZES` array.
fn list_index(layout: &Layout) -> Option<usize> {
let required_block_size = layout.size().max(layout.align());
BLOCK_SIZES.iter().position(|&s| s >= required_block_size)
}

Copy link

Sk3pz commented Oct 26, 2020

I found an issue with the allocators which was caused by using a mutable reference in a constant function (both in linked list allocator and in fixed size allocator), and removing that caused the issue calls in statics are limited to constant functions, tuple structs and tuple variants. The solution to this issue for anybody coming across this is to go to lib.rs and add #![feature(const_mut_refs)] to allow for mutable references in constant functions! (I am not sure this is the best way of doing it, but its what I did!)

@phil-opp
Copy link
Owner

@DeathBySpork Thanks for sharing your problem and solution.

Yes, adding #![feature(const_mut_refs)] is the recommend way of doing it, and also mentioned in the post:

In order to get it to compile, we need to add #![feature(const_mut_refs)] to the beginning of our lib.rs.

If you haven't done so already for the LinkedListAllocator implementation, you also need to add #![feature(const_mut_refs)] to the beginning of your lib.rs.

(See https://os.phil-opp.com/allocator-designs/#the-allocator-type and https://os.phil-opp.com/allocator-designs/#the-allocator-type-1)

Copy link

Sk3pz commented Nov 6, 2020

ah, I guess I didnt notice the change, I am sorry!

@phil-opp
Copy link
Owner

phil-opp commented Nov 7, 2020

@DeathBySpork No worries, thanks for reporting your problems!

Copy link

Ananta98 commented Feb 1, 2021

I still got error "calls in statics are limited to constant functions, tuple structs and tuple variants". I've add #![feature(const_in_array_repeat_expressions)] and #![feature(const_mut_refs)] in main.rs but still got error how to solve it ? My Code in https://github.com/Ananta98/PetraOS. Thank you.

@phil-opp
Copy link
Owner

phil-opp commented Feb 2, 2021

@Ananta98 Looks like you forgot to make your Locked::new function a const fn:

diff --git a/src/mm/allocator.rs b/src/mm/allocator.rs
index 1ee24e3..8b911b1 100644
--- a/src/mm/allocator.rs
+++ b/src/mm/allocator.rs
@@ -15,7 +15,7 @@ pub struct Locked<A> {
 }
 
 impl<A> Locked<A> {
-    pub fn new(inner : A) -> Self {
+    pub const fn new(inner : A) -> Self {
         Locked {
             inner : Mutex::new(inner),
         }

After this change, it works for me.

@Ananta98
Copy link

Ananta98 commented Feb 2, 2021

Thank you @phil-opp it works now. sorry this is human fault.

@phil-opp
Copy link
Owner

phil-opp commented Feb 2, 2021

Great to hear that! No worries, I'm happy to help.

Repository owner locked and limited conversation to collaborators Jun 12, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests