[scudo] Move the chunk update into functions #83493

ChiaHungDuan · 2024-02-29T22:30:54Z

The code paths for mte enabled and disabled were interleaving and which increases the difficulty of reading each path in both source level and assembly level. In this change, we move the parts that they have different logic into functions and minor refactors on the code structure.

llvmbot · 2024-02-29T22:31:25Z

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: None (ChiaHungDuan)

Changes

Memory tagging is a flag which requires system reboot to enable. Which
means, the code paths with memory tagging enabled will never be executed
if it's set to off. However, we only mark those paths with UNLIKELY()
which doesn't annotate the expectation properly here. As a result, the
assembly code always interleaves instructions between w/ and w/o memory
tagging enabled. The direct impact is the I-cache may always cache many
unused instructions.

This change explictily splits the paths into different code blocks. This
slightly introduces very few duplicated codes but it creates two
independent execution paths and will improve the cache locality.

Full diff: https://github.com/llvm/llvm-project/pull/83493.diff

2 Files Affected:

(modified) compiler-rt/lib/scudo/standalone/allocator_common.h (+1-1)
(modified) compiler-rt/lib/scudo/standalone/combined.h (+175-125)

diff --git a/compiler-rt/lib/scudo/standalone/allocator_common.h b/compiler-rt/lib/scudo/standalone/allocator_common.h
index 4d39956bf90503..2b77516ad11cae 100644
--- a/compiler-rt/lib/scudo/standalone/allocator_common.h
+++ b/compiler-rt/lib/scudo/standalone/allocator_common.h
@@ -53,7 +53,7 @@ template <class SizeClassAllocator> struct TransferBatch {
   void moveNToArray(CompactPtrT *Array, u16 N) {
     DCHECK_LE(N, Count);
     memcpy(Array, Batch + Count - N, sizeof(Batch[0]) * N);
-    Count -= N;
+    Count = static_cast<u16>(Count - N);
   }
   u16 getCount() const { return Count; }
   bool isEmpty() const { return Count == 0U; }
diff --git a/compiler-rt/lib/scudo/standalone/combined.h b/compiler-rt/lib/scudo/standalone/combined.h
index fa6077384d9826..6d5997525063f1 100644
--- a/compiler-rt/lib/scudo/standalone/combined.h
+++ b/compiler-rt/lib/scudo/standalone/combined.h
@@ -401,133 +401,18 @@ class Allocator {
       reportOutOfMemory(NeededSize);
     }
 
-    const uptr BlockUptr = reinterpret_cast<uptr>(Block);
-    const uptr UnalignedUserPtr = BlockUptr + Chunk::getHeaderSize();
-    const uptr UserPtr = roundUp(UnalignedUserPtr, Alignment);
-
-    void *Ptr = reinterpret_cast<void *>(UserPtr);
-    void *TaggedPtr = Ptr;
-    if (LIKELY(ClassId)) {
-      // We only need to zero or tag the contents for Primary backed
-      // allocations. We only set tags for primary allocations in order to avoid
-      // faulting potentially large numbers of pages for large secondary
-      // allocations. We assume that guard pages are enough to protect these
-      // allocations.
-      //
-      // FIXME: When the kernel provides a way to set the background tag of a
-      // mapping, we should be able to tag secondary allocations as well.
-      //
-      // When memory tagging is enabled, zeroing the contents is done as part of
-      // setting the tag.
-      if (UNLIKELY(useMemoryTagging<Config>(Options))) {
-        uptr PrevUserPtr;
-        Chunk::UnpackedHeader Header;
-        const uptr BlockSize = PrimaryT::getSizeByClassId(ClassId);
-        const uptr BlockEnd = BlockUptr + BlockSize;
-        // If possible, try to reuse the UAF tag that was set by deallocate().
-        // For simplicity, only reuse tags if we have the same start address as
-        // the previous allocation. This handles the majority of cases since
-        // most allocations will not be more aligned than the minimum alignment.
-        //
-        // We need to handle situations involving reclaimed chunks, and retag
-        // the reclaimed portions if necessary. In the case where the chunk is
-        // fully reclaimed, the chunk's header will be zero, which will trigger
-        // the code path for new mappings and invalid chunks that prepares the
-        // chunk from scratch. There are three possibilities for partial
-        // reclaiming:
-        //
-        // (1) Header was reclaimed, data was partially reclaimed.
-        // (2) Header was not reclaimed, all data was reclaimed (e.g. because
-        //     data started on a page boundary).
-        // (3) Header was not reclaimed, data was partially reclaimed.
-        //
-        // Case (1) will be handled in the same way as for full reclaiming,
-        // since the header will be zero.
-        //
-        // We can detect case (2) by loading the tag from the start
-        // of the chunk. If it is zero, it means that either all data was
-        // reclaimed (since we never use zero as the chunk tag), or that the
-        // previous allocation was of size zero. Either way, we need to prepare
-        // a new chunk from scratch.
-        //
-        // We can detect case (3) by moving to the next page (if covered by the
-        // chunk) and loading the tag of its first granule. If it is zero, it
-        // means that all following pages may need to be retagged. On the other
-        // hand, if it is nonzero, we can assume that all following pages are
-        // still tagged, according to the logic that if any of the pages
-        // following the next page were reclaimed, the next page would have been
-        // reclaimed as well.
-        uptr TaggedUserPtr;
-        if (getChunkFromBlock(BlockUptr, &PrevUserPtr, &Header) &&
-            PrevUserPtr == UserPtr &&
-            (TaggedUserPtr = loadTag(UserPtr)) != UserPtr) {
-          uptr PrevEnd = TaggedUserPtr + Header.SizeOrUnusedBytes;
-          const uptr NextPage = roundUp(TaggedUserPtr, getPageSizeCached());
-          if (NextPage < PrevEnd && loadTag(NextPage) != NextPage)
-            PrevEnd = NextPage;
-          TaggedPtr = reinterpret_cast<void *>(TaggedUserPtr);
-          resizeTaggedChunk(PrevEnd, TaggedUserPtr + Size, Size, BlockEnd);
-          if (UNLIKELY(FillContents != NoFill && !Header.OriginOrWasZeroed)) {
-            // If an allocation needs to be zeroed (i.e. calloc) we can normally
-            // avoid zeroing the memory now since we can rely on memory having
-            // been zeroed on free, as this is normally done while setting the
-            // UAF tag. But if tagging was disabled per-thread when the memory
-            // was freed, it would not have been retagged and thus zeroed, and
-            // therefore it needs to be zeroed now.
-            memset(TaggedPtr, 0,
-                   Min(Size, roundUp(PrevEnd - TaggedUserPtr,
-                                     archMemoryTagGranuleSize())));
-          } else if (Size) {
-            // Clear any stack metadata that may have previously been stored in
-            // the chunk data.
-            memset(TaggedPtr, 0, archMemoryTagGranuleSize());
-          }
-        } else {
-          const uptr OddEvenMask =
-              computeOddEvenMaskForPointerMaybe(Options, BlockUptr, ClassId);
-          TaggedPtr = prepareTaggedChunk(Ptr, Size, OddEvenMask, BlockEnd);
-        }
-        storePrimaryAllocationStackMaybe(Options, Ptr);
-      } else {
-        Block = addHeaderTag(Block);
-        Ptr = addHeaderTag(Ptr);
-        if (UNLIKELY(FillContents != NoFill)) {
-          // This condition is not necessarily unlikely, but since memset is
-          // costly, we might as well mark it as such.
-          memset(Block, FillContents == ZeroFill ? 0 : PatternFillByte,
-                 PrimaryT::getSizeByClassId(ClassId));
-        }
-      }
+    const uptr UserPtr = roundUp(
+        reinterpret_cast<uptr>(Block) + Chunk::getHeaderSize(), Alignment);
+    const uptr SizeOrUnusedBytes =
+        ClassId ? Size : SecondaryBlockEnd - (UserPtr + Size);
+
+    if (LIKELY(!useMemoryTagging<Config>(Options))) {
+      return initChunk(ClassId, Origin, Block, UserPtr, SizeOrUnusedBytes,
+                       FillContents);
     } else {
-      Block = addHeaderTag(Block);
-      Ptr = addHeaderTag(Ptr);
-      if (UNLIKELY(useMemoryTagging<Config>(Options))) {
-        storeTags(reinterpret_cast<uptr>(Block), reinterpret_cast<uptr>(Ptr));
-        storeSecondaryAllocationStackMaybe(Options, Ptr, Size);
-      }
-    }
-
-    Chunk::UnpackedHeader Header = {};
-    if (UNLIKELY(UnalignedUserPtr != UserPtr)) {
-      const uptr Offset = UserPtr - UnalignedUserPtr;
-      DCHECK_GE(Offset, 2 * sizeof(u32));
-      // The BlockMarker has no security purpose, but is specifically meant for
-      // the chunk iteration function that can be used in debugging situations.
-      // It is the only situation where we have to locate the start of a chunk
-      // based on its block address.
-      reinterpret_cast<u32 *>(Block)[0] = BlockMarker;
-      reinterpret_cast<u32 *>(Block)[1] = static_cast<u32>(Offset);
-      Header.Offset = (Offset >> MinAlignmentLog) & Chunk::OffsetMask;
+      return initChunkWithMemoryTagging(ClassId, Origin, Block, UserPtr, Size,
+                                        SizeOrUnusedBytes, FillContents);
     }
-    Header.ClassId = ClassId & Chunk::ClassIdMask;
-    Header.State = Chunk::State::Allocated;
-    Header.OriginOrWasZeroed = Origin & Chunk::OriginMask;
-    Header.SizeOrUnusedBytes =
-        (ClassId ? Size : SecondaryBlockEnd - (UserPtr + Size)) &
-        Chunk::SizeOrUnusedBytesMask;
-    Chunk::storeHeader(Cookie, Ptr, &Header);
-
-    return TaggedPtr;
   }
 
   NOINLINE void deallocate(void *Ptr, Chunk::Origin Origin, uptr DeleteSize = 0,
@@ -1143,6 +1028,171 @@ class Allocator {
            reinterpret_cast<uptr>(Ptr) - SizeOrUnusedBytes;
   }
 
+  ALWAYS_INLINE void *initChunk(const uptr ClassId, const Chunk::Origin Origin,
+                                void *Block, const uptr UserPtr,
+                                const uptr SizeOrUnusedBytes,
+                                const FillContentsMode FillContents) {
+    Block = addHeaderTag(Block);
+    // Only do content fill when it's from primary allocator because secondary
+    // allocator has filled the content.
+    if (ClassId != 0 && UNLIKELY(FillContents != NoFill)) {
+      // This condition is not necessarily unlikely, but since memset is
+      // costly, we might as well mark it as such.
+      memset(Block, FillContents == ZeroFill ? 0 : PatternFillByte,
+             PrimaryT::getSizeByClassId(ClassId));
+    }
+
+    Chunk::UnpackedHeader Header = {};
+
+    const uptr DefaultAlignedPtr =
+        reinterpret_cast<uptr>(Block) + Chunk::getHeaderSize();
+    if (UNLIKELY(DefaultAlignedPtr != UserPtr)) {
+      const uptr Offset = UserPtr - DefaultAlignedPtr;
+      DCHECK_GE(Offset, 2 * sizeof(u32));
+      // The BlockMarker has no security purpose, but is specifically meant for
+      // the chunk iteration function that can be used in debugging situations.
+      // It is the only situation where we have to locate the start of a chunk
+      // based on its block address.
+      reinterpret_cast<u32 *>(Block)[0] = BlockMarker;
+      reinterpret_cast<u32 *>(Block)[1] = static_cast<u32>(Offset);
+      Header.Offset = (Offset >> MinAlignmentLog) & Chunk::OffsetMask;
+    }
+
+    Header.ClassId = ClassId & Chunk::ClassIdMask;
+    Header.State = Chunk::State::Allocated;
+    Header.OriginOrWasZeroed = Origin & Chunk::OriginMask;
+    Header.SizeOrUnusedBytes = SizeOrUnusedBytes & Chunk::SizeOrUnusedBytesMask;
+    Chunk::storeHeader(Cookie, reinterpret_cast<void *>(addHeaderTag(UserPtr)),
+                       &Header);
+
+    return reinterpret_cast<void *>(UserPtr);
+  }
+
+  NOINLINE void *
+  initChunkWithMemoryTagging(const uptr ClassId, const Chunk::Origin Origin,
+                             void *Block, const uptr UserPtr, const uptr Size,
+                             const uptr SizeOrUnusedBytes,
+                             const FillContentsMode FillContents) {
+    const Options Options = Primary.Options.load();
+    DCHECK(useMemoryTagging<Config>(Options));
+
+    void *Ptr = reinterpret_cast<void *>(UserPtr);
+    void *TaggedPtr = Ptr;
+
+    if (LIKELY(ClassId)) {
+      // We only need to zero or tag the contents for Primary backed
+      // allocations. We only set tags for primary allocations in order to avoid
+      // faulting potentially large numbers of pages for large secondary
+      // allocations. We assume that guard pages are enough to protect these
+      // allocations.
+      //
+      // FIXME: When the kernel provides a way to set the background tag of a
+      // mapping, we should be able to tag secondary allocations as well.
+      //
+      // When memory tagging is enabled, zeroing the contents is done as part of
+      // setting the tag.
+
+      uptr PrevUserPtr;
+      Chunk::UnpackedHeader Header;
+      const uptr BlockSize = PrimaryT::getSizeByClassId(ClassId);
+      const uptr BlockUptr = reinterpret_cast<uptr>(Block);
+      const uptr BlockEnd = BlockUptr + BlockSize;
+      // If possible, try to reuse the UAF tag that was set by deallocate().
+      // For simplicity, only reuse tags if we have the same start address as
+      // the previous allocation. This handles the majority of cases since
+      // most allocations will not be more aligned than the minimum alignment.
+      //
+      // We need to handle situations involving reclaimed chunks, and retag
+      // the reclaimed portions if necessary. In the case where the chunk is
+      // fully reclaimed, the chunk's header will be zero, which will trigger
+      // the code path for new mappings and invalid chunks that prepares the
+      // chunk from scratch. There are three possibilities for partial
+      // reclaiming:
+      //
+      // (1) Header was reclaimed, data was partially reclaimed.
+      // (2) Header was not reclaimed, all data was reclaimed (e.g. because
+      //     data started on a page boundary).
+      // (3) Header was not reclaimed, data was partially reclaimed.
+      //
+      // Case (1) will be handled in the same way as for full reclaiming,
+      // since the header will be zero.
+      //
+      // We can detect case (2) by loading the tag from the start
+      // of the chunk. If it is zero, it means that either all data was
+      // reclaimed (since we never use zero as the chunk tag), or that the
+      // previous allocation was of size zero. Either way, we need to prepare
+      // a new chunk from scratch.
+      //
+      // We can detect case (3) by moving to the next page (if covered by the
+      // chunk) and loading the tag of its first granule. If it is zero, it
+      // means that all following pages may need to be retagged. On the other
+      // hand, if it is nonzero, we can assume that all following pages are
+      // still tagged, according to the logic that if any of the pages
+      // following the next page were reclaimed, the next page would have been
+      // reclaimed as well.
+      uptr TaggedUserPtr;
+      if (getChunkFromBlock(BlockUptr, &PrevUserPtr, &Header) &&
+          PrevUserPtr == UserPtr &&
+          (TaggedUserPtr = loadTag(UserPtr)) != UserPtr) {
+        uptr PrevEnd = TaggedUserPtr + Header.SizeOrUnusedBytes;
+        const uptr NextPage = roundUp(TaggedUserPtr, getPageSizeCached());
+        if (NextPage < PrevEnd && loadTag(NextPage) != NextPage)
+          PrevEnd = NextPage;
+        TaggedPtr = reinterpret_cast<void *>(TaggedUserPtr);
+        resizeTaggedChunk(PrevEnd, TaggedUserPtr + Size, Size, BlockEnd);
+        if (UNLIKELY(FillContents != NoFill && !Header.OriginOrWasZeroed)) {
+          // If an allocation needs to be zeroed (i.e. calloc) we can normally
+          // avoid zeroing the memory now since we can rely on memory having
+          // been zeroed on free, as this is normally done while setting the
+          // UAF tag. But if tagging was disabled per-thread when the memory
+          // was freed, it would not have been retagged and thus zeroed, and
+          // therefore it needs to be zeroed now.
+          memset(TaggedPtr, 0,
+                 Min(Size, roundUp(PrevEnd - TaggedUserPtr,
+                                   archMemoryTagGranuleSize())));
+        } else if (Size) {
+          // Clear any stack metadata that may have previously been stored in
+          // the chunk data.
+          memset(TaggedPtr, 0, archMemoryTagGranuleSize());
+        }
+      } else {
+        const uptr OddEvenMask =
+            computeOddEvenMaskForPointerMaybe(Options, BlockUptr, ClassId);
+        TaggedPtr = prepareTaggedChunk(Ptr, Size, OddEvenMask, BlockEnd);
+      }
+      storePrimaryAllocationStackMaybe(Options, Ptr);
+    } else {
+      Block = addHeaderTag(Block);
+      Ptr = addHeaderTag(Ptr);
+      storeTags(reinterpret_cast<uptr>(Block), reinterpret_cast<uptr>(Ptr));
+      storeSecondaryAllocationStackMaybe(Options, Ptr, Size);
+    }
+
+    Chunk::UnpackedHeader Header = {};
+
+    const uptr DefaultAlignedPtr =
+        reinterpret_cast<uptr>(Block) + Chunk::getHeaderSize();
+    if (UNLIKELY(DefaultAlignedPtr != UserPtr)) {
+      const uptr Offset = UserPtr - DefaultAlignedPtr;
+      DCHECK_GE(Offset, 2 * sizeof(u32));
+      // The BlockMarker has no security purpose, but is specifically meant for
+      // the chunk iteration function that can be used in debugging situations.
+      // It is the only situation where we have to locate the start of a chunk
+      // based on its block address.
+      reinterpret_cast<u32 *>(Block)[0] = BlockMarker;
+      reinterpret_cast<u32 *>(Block)[1] = static_cast<u32>(Offset);
+      Header.Offset = (Offset >> MinAlignmentLog) & Chunk::OffsetMask;
+    }
+
+    Header.ClassId = ClassId & Chunk::ClassIdMask;
+    Header.State = Chunk::State::Allocated;
+    Header.OriginOrWasZeroed = Origin & Chunk::OriginMask;
+    Header.SizeOrUnusedBytes = SizeOrUnusedBytes & Chunk::SizeOrUnusedBytesMask;
+    Chunk::storeHeader(Cookie, Ptr, &Header);
+
+    return TaggedPtr;
+  }
+
   void quarantineOrDeallocateChunk(const Options &Options, void *TaggedPtr,
                                    Chunk::UnpackedHeader *Header,
                                    uptr Size) NO_THREAD_SAFETY_ANALYSIS {

ChiaHungDuan · 2024-02-29T22:57:26Z

This continues the work https://reviews.llvm.org/D159392

ChiaHungDuan · 2024-03-11T21:32:18Z

I measured the cache miss rate by using simpleperf and running malloc-rss-benchmark (Remove some noises which may impact the result) for 20 times each.

I don't see significant difference (0.5934682 vs 0.5927589, less than 0.5%). As mentioned in the comment, this change helps with readability in both source and asm levels. As a result, I think it's still a good reason to do the clean up.

BTW, there were two changes and I put them in the same pull request to simplify the process

cferris1000 · 2024-03-22T03:07:59Z

compiler-rt/lib/scudo/standalone/combined.h

+      // When memory tagging is enabled, zeroing the contents is done as part of
+      // setting the tag.
+
+      uptr PrevUserPtr;


Super small nit, but it would probably be best if this was down with the TaggedUserPtr declaration.

Nice catch!

cferris1000 · 2024-03-22T03:12:57Z

compiler-rt/lib/scudo/standalone/combined.h

+      storeSecondaryAllocationStackMaybe(Options, Ptr, Size);
+    }
+
+    Chunk::UnpackedHeader Header = {};


All of the header storage code looks very similar in both paths. Should this be made into a separate function? It might be that it's not as similar as I think though.

I agree that they look similar but I may have to review them carefully (the differences between w/ and w/o memory tagging are usually nuanced). Thus I may prefer doing this in a separate CL

Alternatively, move the code for writing the header back into allocate.

The problem is,

On non-mte path, the UserPtr is used by header update and returned.
On mte path, header update requires UserPtr to update the header and TaggedPtr will be returned.

Currently, I'm not sure if I can use one to infer the value of another one (and I prefer not to return two variables or something like that). As a result, I would suggest that let me do another refactor on this and see if we can simplify the logic here and move the code structure only in this CL (try to have least logical changes).

I agree do this in a follow-up.

cferris1000 · 2024-03-22T03:15:37Z

compiler-rt/lib/scudo/standalone/combined.h


+    void *BlockBegin;
    if (UNLIKELY(useMemoryTagging<Config>(Options))) {


Would it make sense to reverse the if and else clauses and check LIKELY(!useMemoryTagging...)?

It's a bit easier to read if the likely choice is first.

Done. This also aligns the order in other place

cferris1000 · 2024-03-22T03:21:14Z

compiler-rt/lib/scudo/standalone/combined.h

+    const u8 PrevTag = extractTag(reinterpret_cast<uptr>(TaggedPtr));
+    storeDeallocationStackMaybe(Options, Ptr, PrevTag, Size);
+    if (Header->ClassId) {
+      if (!TSDRegistry.getDisableMemInit()) {


I know that the original code does this, but it would be better to have a single if instead of the two ifs since neither if has an else.

hctim · 2024-04-02T10:59:22Z

compiler-rt/lib/scudo/standalone/combined.h

-        storeTags(reinterpret_cast<uptr>(Block), reinterpret_cast<uptr>(Ptr));
-        storeSecondaryAllocationStackMaybe(Options, Ptr, Size);
-      }
+      return initChunkWithMemoryTagging(ClassId, Origin, Block, UserPtr, Size,


Dangling else statement after return. Remove the else { block?

hctim · 2024-04-02T11:03:06Z

compiler-rt/lib/scudo/standalone/combined.h

+    void *TaggedPtr = Ptr;
+
+    if (LIKELY(ClassId)) {
+      // We only need to zero or tag the contents for Primary backed


While we're refactoring, is it possible to pull this out into a helper function, so that this reads something like:

if (primary) { initPrimaryChunkWithTagging(); } else { initSecondaryChunkWithTagging(); // It's 5 lines, so maybe leave it just as those 5 lines of code } // code to store header }

I agree it's a good idea to move the primary chunk initialization into a function. However, given the number of referenced variables (it's 10), instead of passing all the variables (even we can mark the function as always-inline) I'm thinking that maybe we also want to review and simplify the logic as well.

I added the comment to highlight what each block is doing. I'm planning to do further refactoring in a follow up CL, what do you think?

Feel free to refactor later, yeah. It's just a huge function right now that I've always had a hard time reading, as it spans about three monitor pages.

FWIW, I've never understood this idea that code is more readable if split into multiple functions that are only called once. In my experience it generally makes the code less readable and less maintainable.

If I am reading the function, I'm reading it because I need to know what it does. That means I need to read the function and its callees. Having to go-to-definition in an IDE, read the callee, go back, etc just adds mental overhead and requires readers to use IDEs. That's much more annoying than the function being three pages long. At least with the three page function, the three pages are in one place. With separate functions it's 3.1 pages long due to the function signatures and scattered in various places in the file.

If I am modifying the function, the multiple functions just get in the way. For example if I now need some information in the callee, now I need to add arguments to the caller and callee for basically no reason other than the function having been arbitrarily split into multiple functions.

While reviewing this cl, trying to work through a three page long function has been painful. Breaking it up into smaller, easier to understand pieces really helps understand what's going on, because I only need to understand what each function does. For example, with a large function, variable names tend to be reused, so it can get confusing trying to figure out where the variable was set and what it means at page one versus page two or page three. You can fix this by never re-using the variable names, but at that point you are emulating breaking the function into smaller functions and might as well go all the way and refactor.

The advice to break functions is almost universal, but I have seen it go too far the other way. Adding a function that has two lines in it and is only used once, that doesn't make sense and should be folded into the caller. I don't think this change is doing that.

Now, if you don't think that this refactor is breaking up the function into easy to understand pieces, that would be a reason to keep modifying this. For example, if you don't think the comments make sense, or that the function partition is right, please mention it.

Otherwise, having had to read this very long function a number of times, I can tell you this new version makes it much easier to understand. Even trying to figure out the MTE enabled code path made it extra difficult to figure out what was happening. Having to go back and forth looking at the modifications done in the MTE enabled path can be confusing. So, this should really help understand the MTE and non-MTE paths alone, which makes this worth doing, in my estimation.

it can get confusing trying to figure out where the variable was set and what it means at page one versus page two or page three

Yeah, this is one of the things that I would need an IDE/clangd for (find variable definition/references), but the thing is that I would need an IDE for this regardless of whether the function was broken up. And for me, the split into multiple functions would make the situation worse because I need to manually track the variable across function calls (find the matching argument in the caller/callee) as well, since an IDE typically will only find definition/references within the local function. I think the fact that there's a lot of code to handle allocations is just an intrinsic property of the code needing to do a lot of stuff and it's not clear to me that moving it around like this really helps.

You can fix this by never re-using the variable names

I would personally be happier with a change like this. The code would still be friendly to IDEs and people who want to read the whole function and it would make it more obvious where the value for each variable comes from.

The advice to break functions is almost universal

Fair enough, I'm not necessarily claiming that my experience is universal, it could just be the way my brain works. If breaking functions really does make the code easiest to read for everyone else, I wouldn't object to it and I'll just put up with it being harder to read for me.

Now, if you don't think that this refactor is breaking up the function into easy to understand pieces, that would be a reason to keep modifying this. For example, if you don't think the comments make sense, or that the function partition is right, please mention it.

Yeah, my other comments are what I have to offer on that front. But since the question of how to structure code to make it easy to read is subjective and my personal preference would be to either not make this change at all or to make a different change like not re-using variables it's difficult to offer more feedback than that.

pcc · 2024-04-02T22:15:51Z

I still think this is a bad idea. People are going to update one copy of the code and forget to update the other one, which will cause all kinds of problems.

ChiaHungDuan · 2024-04-02T22:38:12Z

I still think this is a bad idea. People are going to update one copy of the code and forget to update the other one, which will cause all kinds of problems.

I'm not sure if the title and the description are misleading so that you think we just create two copies which have bunch of copy-paste codes. I can update the them if that's the problem.

So far, this CL only splits the codes that have independent logic (w/ and w/o memory tagging), other parts (like allocating blocks from Primary or Secondary) are still sharing the same code. It's not creating two copies. In addition, if someone changes the layout of header and they don't run mte build, they may still forget to update the mte-path, right? Then we should increase the test coverage. I would think that unittests should help us catching that no matter the mte path is textual inline or not.

pcc · 2024-04-03T01:16:38Z

Yeah, it looked like you were copy-pasting more code than you actually were. But it looks like this change is mixed in with a number of other changes that appear to have been intended to improve performance. Since your experiments showed that this change is performance neutral, I don't think we should make those changes. Please also update the description to not talk about performance.

pcc · 2024-04-02T23:57:37Z

compiler-rt/lib/scudo/standalone/combined.h

  void quarantineOrDeallocateChunk(const Options &Options, void *TaggedPtr,
+                                   void *HeaderTaggedPtr,


No need to pass this in.

pcc · 2024-04-03T00:04:09Z

compiler-rt/lib/scudo/standalone/combined.h

-                       &TaggedEnd);
-        }
-      }
+    void *BlockBegin;


void *BlockBegin = getBlockBegin(untagPointer(Ptr), Header);

and then you can get rid of some complexity below.

I assume you are saying we can remove the code,

if (BypassQuarantine && allocatorSupportsMemoryTagging<Config>()) Ptr = untagPointer(Ptr); BlockBegin = getBlockBegin(Ptr, Header);

Now if BypassQuarantine == false then we don't call untagPointer() on Ptr, so I think it will change the behavior, or no?

Besides, the BlockBegin is overridden on mte path, I think it's fine to leave it as it is. What do you think?

BlockBegin is unused on the branch of the code where BypassQuarantine is false, so it doesn't make a difference. And BlockBegin is evaluated in the same way in reTagBlock.

My question is, it seems that we have the different logic here. If BypassQuarantine == true && allocatorSupportsMemoryTagging<Config>() == false, it doesn't untag the pointer (the case of BypassQuarantine == false was trying to mention the same thing).

Maybe it's something we can simplify but I would like to make it as close to the same logic as possible.

If allocatorSupportsMemoryTagging<Config>() == false then it shouldn't matter whether we untag or not. In that case, all pointers will be untagged so untagPointer will be a no-op. The reason why the old code was checking allocatorSupportsMemoryTagging<Config>() is as a minor optimization -- we don't need to emit the and instruction if we know it'll always be a no-op. But the additional logic being added here outweighs the value of this optimization.

In other words, the logic was like that before because it was the simplest way to write the code before. If you're proposing to reorganize the code into a new structure, then it should be kept simple, but in a way that fits the new structure. If the goal of this change is to make the code more readable, then it shouldn't be making the code less readable by needlessly copying logic from the old version of the code.

inline NORETURN uptr untagPointer(uptr Ptr) { (void)Ptr; UNREACHABLE("memory tagging not supported"); }

When allocatorSupportsMemoryTagging<Config>() == false, untagPointer leads to abort.

Oh, right. How about we add

uptr maybeUntagPointer(uptr Ptr) { return allocatorSupportsMemoryTagging<Config>() ? untagPointer(Ptr) : Ptr; }

and change the other callers of untagPointer to call it?

Sorry I was blocked by other stuff recently.

This is a little bit more than simply move the code structure. If we want to do that, I may suggest doing this in a follow up so that we can catch up if there's any logic mismatch in this CL and do further clean up later.

Okay, land this as is then.

pcc · 2024-04-03T00:21:28Z

compiler-rt/lib/scudo/standalone/combined.h

@@ -1213,6 +1259,36 @@ class Allocator {
    }
  }

+  NOINLINE void *unTagBlock(const Options &Options, void *TaggedPtr,


This doesn't "untag" the block, it retags it as part of deallocation.

renamed to reTagBlock

pcc · 2024-04-03T01:12:41Z

compiler-rt/lib/scudo/standalone/combined.h

+      storeSecondaryAllocationStackMaybe(Options, Ptr, Size);
+    }
+
+    Chunk::UnpackedHeader Header = {};


Alternatively, move the code for writing the header back into allocate.

The code paths for mte enabled and disabled were interleaving and which increases the difficulty of reading each path in both source level and assembly level. In this change, we move the parts that they have different logic into functions and minor refactors on the code structure.

ChiaHungDuan · 2024-04-04T18:34:19Z

Squashed and rebased

cferris1000

LGTM.

But you might want to wait until pcc answers the one outstanding question.

cferris1000 · 2024-04-04T21:44:15Z

compiler-rt/lib/scudo/standalone/combined.h

+      storeSecondaryAllocationStackMaybe(Options, Ptr, Size);
+    }
+
+    Chunk::UnpackedHeader Header = {};


I agree do this in a follow-up.

pcc · 2024-04-05T00:12:37Z

compiler-rt/lib/scudo/standalone/combined.h

+    void *TaggedPtr = Ptr;
+
+    if (LIKELY(ClassId)) {
+      // We only need to zero or tag the contents for Primary backed


FWIW, I've never understood this idea that code is more readable if split into multiple functions that are only called once. In my experience it generally makes the code less readable and less maintainable.

If I am reading the function, I'm reading it because I need to know what it does. That means I need to read the function and its callees. Having to go-to-definition in an IDE, read the callee, go back, etc just adds mental overhead and requires readers to use IDEs. That's much more annoying than the function being three pages long. At least with the three page function, the three pages are in one place. With separate functions it's 3.1 pages long due to the function signatures and scattered in various places in the file.

If I am modifying the function, the multiple functions just get in the way. For example if I now need some information in the callee, now I need to add arguments to the caller and callee for basically no reason other than the function having been arbitrarily split into multiple functions.

pcc · 2024-04-05T00:20:13Z

compiler-rt/lib/scudo/standalone/combined.h

-                       &TaggedEnd);
-        }
-      }
+    void *BlockBegin;


BlockBegin is unused on the branch of the code where BypassQuarantine is false, so it doesn't make a difference. And BlockBegin is evaluated in the same way in reTagBlock.

pcc · 2024-04-05T00:25:28Z

compiler-rt/lib/scudo/standalone/combined.h

+                            Chunk::UnpackedHeader *Header, const uptr Size,
+                            bool BypassQuarantine) {
+    DCHECK(useMemoryTagging<AllocatorConfig>(Options));
+    void *Ptr = HeaderTaggedPtr;


Get rid of this line and just call the argument Ptr.

ChiaHungDuan

Also rename reTagBlock to retagBlock. It makes the style consistent with retagPointer.

ChiaHungDuan · 2024-04-18T22:08:50Z

compiler-rt/lib/scudo/standalone/combined.h

+                            Chunk::UnpackedHeader *Header, const uptr Size,
+                            bool BypassQuarantine) {
+    DCHECK(useMemoryTagging<AllocatorConfig>(Options));
+    void *Ptr = HeaderTaggedPtr;


ChiaHungDuan · 2024-04-18T22:49:40Z

compiler-rt/lib/scudo/standalone/combined.h

-                       &TaggedEnd);
-        }
-      }
+    void *BlockBegin;


My question is, it seems that we have the different logic here. If BypassQuarantine == true && allocatorSupportsMemoryTagging<Config>() == false, it doesn't untag the pointer (the case of BypassQuarantine == false was trying to mention the same thing).

Maybe it's something we can simplify but I would like to make it as close to the same logic as possible.

ChiaHungDuan · 2024-05-15T23:58:23Z

Verified on MTE device again and fixed a left over pointer untagging.

llvm#83493 slightly changed the order of computation of block addresses and pointers, causing the value of DefaultAlignedPtr to include the MTE tag. Move this computation earlier so it matches the old behavior. This fixes a UBSan failure in Trusty: secure os: UBSan: (overflow:-) external/scudo/standalone/combined.h:1070:35 secure os: Details: unsigned integer overflow: 8988807738704 - 144124176883594576 cannot be represented in type 'uptr'

#83493 slightly changed the order of computation of block addresses and pointers, causing the value of DefaultAlignedPtr to include the MTE tag. Move this computation earlier so it matches the old behavior. This fixes a UBSan failure in Trusty: secure os: UBSan: (overflow:-) external/scudo/standalone/combined.h:1070:35 secure os: Details: unsigned integer overflow: 8988807738704 - 144124176883594576 cannot be represented in type 'uptr'

llvm/llvm-project#83493 slightly changed the order of computation of block addresses and pointers, causing the value of DefaultAlignedPtr to include the MTE tag. Move this computation earlier so it matches the old behavior. This fixes a UBSan failure in Trusty: secure os: UBSan: (overflow:-) external/scudo/standalone/combined.h:1070:35 secure os: Details: unsigned integer overflow: 8988807738704 - 144124176883594576 cannot be represented in type 'uptr' GitOrigin-RevId: b17d44558ba4c30a3005089b334f68593d6a9c7c Change-Id: Ie86f195d79144e0539684a71dbedaa0c8b961729

… without tag (#92989) llvm/llvm-project#83493 slightly changed the order of computation of block addresses and pointers, causing the value of DefaultAlignedPtr to include the MTE tag. Move this computation earlier so it matches the old behavior. This fixes a UBSan failure in Trusty: secure os: UBSan: (overflow:-) external/scudo/standalone/combined.h:1070:35 secure os: Details: unsigned integer overflow: 8988807738704 - 144124176883594576 cannot be represented in type 'uptr' GitOrigin-RevId: 68d65af99a063e79222c790f5afcdcf2e58b38db Original-Revision: 01bc3bce33d2f804df1828387577bef23fbd8c7d Roller-URL: https://ci.chromium.org/b/8747120831581303473 CQ-Do-Not-Cancel-Tryjobs: true Change-Id: I78b348eeae725db1367d1580aec2d1e21e5550d0 Reviewed-on: https://fuchsia-review.googlesource.com/c/fuchsia/+/1053852

…25be5ad0d Local branch amd-gfx 43125be Merged main:f97f039e0bb7bb60c9cc437f678059c5ee19c8da into amd-gfx:ce126627c1f0 Remote branch main 772b1b0 [scudo] Move the chunk update into functions (llvm#83493)

llvmbot added compiler-rt compiler-rt:scudo Scudo Hardened Allocator compiler-rt:sanitizer labels Feb 29, 2024

ChiaHungDuan marked this pull request as draft February 29, 2024 22:31

ChiaHungDuan changed the title ~~[scudo] Split the code paths in quarantineOrDeallocateChunk()~~ [scudo] Split the code path of memory tagging out from allocate() Feb 29, 2024

ChiaHungDuan force-pushed the mte-refactor branch from e3178e9 to 8f44836 Compare March 11, 2024 21:13

ChiaHungDuan marked this pull request as ready for review March 11, 2024 21:15

ChiaHungDuan changed the title ~~[scudo] Split the code path of memory tagging out from allocate()~~ [scudo] Split the code path of memory tagging out from allocate() and quarantineOrDeallocateChunk() Mar 11, 2024

ChiaHungDuan force-pushed the mte-refactor branch from 8f44836 to d59370d Compare March 11, 2024 21:34

ChiaHungDuan changed the title ~~[scudo] Split the code path of memory tagging out from allocate() and quarantineOrDeallocateChunk()~~ [scudo] Split the code paths which enable memory tagging Mar 11, 2024

ChiaHungDuan requested review from cferris1000 and pcc March 11, 2024 21:35

cferris1000 reviewed Mar 22, 2024

View reviewed changes

hctim reviewed Apr 2, 2024

View reviewed changes

pcc reviewed Apr 3, 2024

View reviewed changes

ChiaHungDuan changed the title ~~[scudo] Split the code paths which enable memory tagging~~ [scudo] Move the chunk update into functions Apr 3, 2024

ChiaHungDuan force-pushed the mte-refactor branch from fdbaf0a to f50a4bd Compare April 4, 2024 18:33

cferris1000 approved these changes Apr 4, 2024

View reviewed changes

pcc reviewed Apr 5, 2024

View reviewed changes

Rename argument

17968aa

ChiaHungDuan commented Apr 18, 2024

View reviewed changes

Fix missed pointer untagging

543e70f

ChiaHungDuan merged commit 772b1b0 into llvm:main May 16, 2024
3 of 4 checks passed

ahomescu mentioned this pull request May 22, 2024

[scudo] Compute the default aligned pointer without tag #92989

Merged


		void *BlockBegin;
		if (UNLIKELY(useMemoryTagging<Config>(Options))) {

		void quarantineOrDeallocateChunk(const Options &Options, void *TaggedPtr,
		void *HeaderTaggedPtr,

[scudo] Move the chunk update into functions #83493

[scudo] Move the chunk update into functions #83493

Conversation

ChiaHungDuan commented Feb 29, 2024 • edited

llvmbot commented Feb 29, 2024

ChiaHungDuan commented Feb 29, 2024

ChiaHungDuan commented Mar 11, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pcc commented Apr 2, 2024

ChiaHungDuan commented Apr 2, 2024

pcc commented Apr 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChiaHungDuan commented Apr 4, 2024

cferris1000 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChiaHungDuan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChiaHungDuan commented May 15, 2024

ChiaHungDuan commented Feb 29, 2024 •

edited

ChiaHungDuan commented Mar 11, 2024 •

edited