Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[scudo] Move the chunk update into functions #83493

Merged
merged 3 commits into from
May 16, 2024

Conversation

ChiaHungDuan
Copy link
Contributor

@ChiaHungDuan ChiaHungDuan commented Feb 29, 2024

The code paths for mte enabled and disabled were interleaving and which increases the difficulty of reading each path in both source level and assembly level. In this change, we move the parts that they have different logic into functions and minor refactors on the code structure.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 29, 2024

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: None (ChiaHungDuan)

Changes

Memory tagging is a flag which requires system reboot to enable. Which
means, the code paths with memory tagging enabled will never be executed
if it's set to off. However, we only mark those paths with UNLIKELY()
which doesn't annotate the expectation properly here. As a result, the
assembly code always interleaves instructions between w/ and w/o memory
tagging enabled. The direct impact is the I-cache may always cache many
unused instructions.

This change explictily splits the paths into different code blocks. This
slightly introduces very few duplicated codes but it creates two
independent execution paths and will improve the cache locality.


Full diff: https://github.com/llvm/llvm-project/pull/83493.diff

2 Files Affected:

  • (modified) compiler-rt/lib/scudo/standalone/allocator_common.h (+1-1)
  • (modified) compiler-rt/lib/scudo/standalone/combined.h (+175-125)
diff --git a/compiler-rt/lib/scudo/standalone/allocator_common.h b/compiler-rt/lib/scudo/standalone/allocator_common.h
index 4d39956bf90503..2b77516ad11cae 100644
--- a/compiler-rt/lib/scudo/standalone/allocator_common.h
+++ b/compiler-rt/lib/scudo/standalone/allocator_common.h
@@ -53,7 +53,7 @@ template <class SizeClassAllocator> struct TransferBatch {
   void moveNToArray(CompactPtrT *Array, u16 N) {
     DCHECK_LE(N, Count);
     memcpy(Array, Batch + Count - N, sizeof(Batch[0]) * N);
-    Count -= N;
+    Count = static_cast<u16>(Count - N);
   }
   u16 getCount() const { return Count; }
   bool isEmpty() const { return Count == 0U; }
diff --git a/compiler-rt/lib/scudo/standalone/combined.h b/compiler-rt/lib/scudo/standalone/combined.h
index fa6077384d9826..6d5997525063f1 100644
--- a/compiler-rt/lib/scudo/standalone/combined.h
+++ b/compiler-rt/lib/scudo/standalone/combined.h
@@ -401,133 +401,18 @@ class Allocator {
       reportOutOfMemory(NeededSize);
     }
 
-    const uptr BlockUptr = reinterpret_cast<uptr>(Block);
-    const uptr UnalignedUserPtr = BlockUptr + Chunk::getHeaderSize();
-    const uptr UserPtr = roundUp(UnalignedUserPtr, Alignment);
-
-    void *Ptr = reinterpret_cast<void *>(UserPtr);
-    void *TaggedPtr = Ptr;
-    if (LIKELY(ClassId)) {
-      // We only need to zero or tag the contents for Primary backed
-      // allocations. We only set tags for primary allocations in order to avoid
-      // faulting potentially large numbers of pages for large secondary
-      // allocations. We assume that guard pages are enough to protect these
-      // allocations.
-      //
-      // FIXME: When the kernel provides a way to set the background tag of a
-      // mapping, we should be able to tag secondary allocations as well.
-      //
-      // When memory tagging is enabled, zeroing the contents is done as part of
-      // setting the tag.
-      if (UNLIKELY(useMemoryTagging<Config>(Options))) {
-        uptr PrevUserPtr;
-        Chunk::UnpackedHeader Header;
-        const uptr BlockSize = PrimaryT::getSizeByClassId(ClassId);
-        const uptr BlockEnd = BlockUptr + BlockSize;
-        // If possible, try to reuse the UAF tag that was set by deallocate().
-        // For simplicity, only reuse tags if we have the same start address as
-        // the previous allocation. This handles the majority of cases since
-        // most allocations will not be more aligned than the minimum alignment.
-        //
-        // We need to handle situations involving reclaimed chunks, and retag
-        // the reclaimed portions if necessary. In the case where the chunk is
-        // fully reclaimed, the chunk's header will be zero, which will trigger
-        // the code path for new mappings and invalid chunks that prepares the
-        // chunk from scratch. There are three possibilities for partial
-        // reclaiming:
-        //
-        // (1) Header was reclaimed, data was partially reclaimed.
-        // (2) Header was not reclaimed, all data was reclaimed (e.g. because
-        //     data started on a page boundary).
-        // (3) Header was not reclaimed, data was partially reclaimed.
-        //
-        // Case (1) will be handled in the same way as for full reclaiming,
-        // since the header will be zero.
-        //
-        // We can detect case (2) by loading the tag from the start
-        // of the chunk. If it is zero, it means that either all data was
-        // reclaimed (since we never use zero as the chunk tag), or that the
-        // previous allocation was of size zero. Either way, we need to prepare
-        // a new chunk from scratch.
-        //
-        // We can detect case (3) by moving to the next page (if covered by the
-        // chunk) and loading the tag of its first granule. If it is zero, it
-        // means that all following pages may need to be retagged. On the other
-        // hand, if it is nonzero, we can assume that all following pages are
-        // still tagged, according to the logic that if any of the pages
-        // following the next page were reclaimed, the next page would have been
-        // reclaimed as well.
-        uptr TaggedUserPtr;
-        if (getChunkFromBlock(BlockUptr, &PrevUserPtr, &Header) &&
-            PrevUserPtr == UserPtr &&
-            (TaggedUserPtr = loadTag(UserPtr)) != UserPtr) {
-          uptr PrevEnd = TaggedUserPtr + Header.SizeOrUnusedBytes;
-          const uptr NextPage = roundUp(TaggedUserPtr, getPageSizeCached());
-          if (NextPage < PrevEnd && loadTag(NextPage) != NextPage)
-            PrevEnd = NextPage;
-          TaggedPtr = reinterpret_cast<void *>(TaggedUserPtr);
-          resizeTaggedChunk(PrevEnd, TaggedUserPtr + Size, Size, BlockEnd);
-          if (UNLIKELY(FillContents != NoFill && !Header.OriginOrWasZeroed)) {
-            // If an allocation needs to be zeroed (i.e. calloc) we can normally
-            // avoid zeroing the memory now since we can rely on memory having
-            // been zeroed on free, as this is normally done while setting the
-            // UAF tag. But if tagging was disabled per-thread when the memory
-            // was freed, it would not have been retagged and thus zeroed, and
-            // therefore it needs to be zeroed now.
-            memset(TaggedPtr, 0,
-                   Min(Size, roundUp(PrevEnd - TaggedUserPtr,
-                                     archMemoryTagGranuleSize())));
-          } else if (Size) {
-            // Clear any stack metadata that may have previously been stored in
-            // the chunk data.
-            memset(TaggedPtr, 0, archMemoryTagGranuleSize());
-          }
-        } else {
-          const uptr OddEvenMask =
-              computeOddEvenMaskForPointerMaybe(Options, BlockUptr, ClassId);
-          TaggedPtr = prepareTaggedChunk(Ptr, Size, OddEvenMask, BlockEnd);
-        }
-        storePrimaryAllocationStackMaybe(Options, Ptr);
-      } else {
-        Block = addHeaderTag(Block);
-        Ptr = addHeaderTag(Ptr);
-        if (UNLIKELY(FillContents != NoFill)) {
-          // This condition is not necessarily unlikely, but since memset is
-          // costly, we might as well mark it as such.
-          memset(Block, FillContents == ZeroFill ? 0 : PatternFillByte,
-                 PrimaryT::getSizeByClassId(ClassId));
-        }
-      }
+    const uptr UserPtr = roundUp(
+        reinterpret_cast<uptr>(Block) + Chunk::getHeaderSize(), Alignment);
+    const uptr SizeOrUnusedBytes =
+        ClassId ? Size : SecondaryBlockEnd - (UserPtr + Size);
+
+    if (LIKELY(!useMemoryTagging<Config>(Options))) {
+      return initChunk(ClassId, Origin, Block, UserPtr, SizeOrUnusedBytes,
+                       FillContents);
     } else {
-      Block = addHeaderTag(Block);
-      Ptr = addHeaderTag(Ptr);
-      if (UNLIKELY(useMemoryTagging<Config>(Options))) {
-        storeTags(reinterpret_cast<uptr>(Block), reinterpret_cast<uptr>(Ptr));
-        storeSecondaryAllocationStackMaybe(Options, Ptr, Size);
-      }
-    }
-
-    Chunk::UnpackedHeader Header = {};
-    if (UNLIKELY(UnalignedUserPtr != UserPtr)) {
-      const uptr Offset = UserPtr - UnalignedUserPtr;
-      DCHECK_GE(Offset, 2 * sizeof(u32));
-      // The BlockMarker has no security purpose, but is specifically meant for
-      // the chunk iteration function that can be used in debugging situations.
-      // It is the only situation where we have to locate the start of a chunk
-      // based on its block address.
-      reinterpret_cast<u32 *>(Block)[0] = BlockMarker;
-      reinterpret_cast<u32 *>(Block)[1] = static_cast<u32>(Offset);
-      Header.Offset = (Offset >> MinAlignmentLog) & Chunk::OffsetMask;
+      return initChunkWithMemoryTagging(ClassId, Origin, Block, UserPtr, Size,
+                                        SizeOrUnusedBytes, FillContents);
     }
-    Header.ClassId = ClassId & Chunk::ClassIdMask;
-    Header.State = Chunk::State::Allocated;
-    Header.OriginOrWasZeroed = Origin & Chunk::OriginMask;
-    Header.SizeOrUnusedBytes =
-        (ClassId ? Size : SecondaryBlockEnd - (UserPtr + Size)) &
-        Chunk::SizeOrUnusedBytesMask;
-    Chunk::storeHeader(Cookie, Ptr, &Header);
-
-    return TaggedPtr;
   }
 
   NOINLINE void deallocate(void *Ptr, Chunk::Origin Origin, uptr DeleteSize = 0,
@@ -1143,6 +1028,171 @@ class Allocator {
            reinterpret_cast<uptr>(Ptr) - SizeOrUnusedBytes;
   }
 
+  ALWAYS_INLINE void *initChunk(const uptr ClassId, const Chunk::Origin Origin,
+                                void *Block, const uptr UserPtr,
+                                const uptr SizeOrUnusedBytes,
+                                const FillContentsMode FillContents) {
+    Block = addHeaderTag(Block);
+    // Only do content fill when it's from primary allocator because secondary
+    // allocator has filled the content.
+    if (ClassId != 0 && UNLIKELY(FillContents != NoFill)) {
+      // This condition is not necessarily unlikely, but since memset is
+      // costly, we might as well mark it as such.
+      memset(Block, FillContents == ZeroFill ? 0 : PatternFillByte,
+             PrimaryT::getSizeByClassId(ClassId));
+    }
+
+    Chunk::UnpackedHeader Header = {};
+
+    const uptr DefaultAlignedPtr =
+        reinterpret_cast<uptr>(Block) + Chunk::getHeaderSize();
+    if (UNLIKELY(DefaultAlignedPtr != UserPtr)) {
+      const uptr Offset = UserPtr - DefaultAlignedPtr;
+      DCHECK_GE(Offset, 2 * sizeof(u32));
+      // The BlockMarker has no security purpose, but is specifically meant for
+      // the chunk iteration function that can be used in debugging situations.
+      // It is the only situation where we have to locate the start of a chunk
+      // based on its block address.
+      reinterpret_cast<u32 *>(Block)[0] = BlockMarker;
+      reinterpret_cast<u32 *>(Block)[1] = static_cast<u32>(Offset);
+      Header.Offset = (Offset >> MinAlignmentLog) & Chunk::OffsetMask;
+    }
+
+    Header.ClassId = ClassId & Chunk::ClassIdMask;
+    Header.State = Chunk::State::Allocated;
+    Header.OriginOrWasZeroed = Origin & Chunk::OriginMask;
+    Header.SizeOrUnusedBytes = SizeOrUnusedBytes & Chunk::SizeOrUnusedBytesMask;
+    Chunk::storeHeader(Cookie, reinterpret_cast<void *>(addHeaderTag(UserPtr)),
+                       &Header);
+
+    return reinterpret_cast<void *>(UserPtr);
+  }
+
+  NOINLINE void *
+  initChunkWithMemoryTagging(const uptr ClassId, const Chunk::Origin Origin,
+                             void *Block, const uptr UserPtr, const uptr Size,
+                             const uptr SizeOrUnusedBytes,
+                             const FillContentsMode FillContents) {
+    const Options Options = Primary.Options.load();
+    DCHECK(useMemoryTagging<Config>(Options));
+
+    void *Ptr = reinterpret_cast<void *>(UserPtr);
+    void *TaggedPtr = Ptr;
+
+    if (LIKELY(ClassId)) {
+      // We only need to zero or tag the contents for Primary backed
+      // allocations. We only set tags for primary allocations in order to avoid
+      // faulting potentially large numbers of pages for large secondary
+      // allocations. We assume that guard pages are enough to protect these
+      // allocations.
+      //
+      // FIXME: When the kernel provides a way to set the background tag of a
+      // mapping, we should be able to tag secondary allocations as well.
+      //
+      // When memory tagging is enabled, zeroing the contents is done as part of
+      // setting the tag.
+
+      uptr PrevUserPtr;
+      Chunk::UnpackedHeader Header;
+      const uptr BlockSize = PrimaryT::getSizeByClassId(ClassId);
+      const uptr BlockUptr = reinterpret_cast<uptr>(Block);
+      const uptr BlockEnd = BlockUptr + BlockSize;
+      // If possible, try to reuse the UAF tag that was set by deallocate().
+      // For simplicity, only reuse tags if we have the same start address as
+      // the previous allocation. This handles the majority of cases since
+      // most allocations will not be more aligned than the minimum alignment.
+      //
+      // We need to handle situations involving reclaimed chunks, and retag
+      // the reclaimed portions if necessary. In the case where the chunk is
+      // fully reclaimed, the chunk's header will be zero, which will trigger
+      // the code path for new mappings and invalid chunks that prepares the
+      // chunk from scratch. There are three possibilities for partial
+      // reclaiming:
+      //
+      // (1) Header was reclaimed, data was partially reclaimed.
+      // (2) Header was not reclaimed, all data was reclaimed (e.g. because
+      //     data started on a page boundary).
+      // (3) Header was not reclaimed, data was partially reclaimed.
+      //
+      // Case (1) will be handled in the same way as for full reclaiming,
+      // since the header will be zero.
+      //
+      // We can detect case (2) by loading the tag from the start
+      // of the chunk. If it is zero, it means that either all data was
+      // reclaimed (since we never use zero as the chunk tag), or that the
+      // previous allocation was of size zero. Either way, we need to prepare
+      // a new chunk from scratch.
+      //
+      // We can detect case (3) by moving to the next page (if covered by the
+      // chunk) and loading the tag of its first granule. If it is zero, it
+      // means that all following pages may need to be retagged. On the other
+      // hand, if it is nonzero, we can assume that all following pages are
+      // still tagged, according to the logic that if any of the pages
+      // following the next page were reclaimed, the next page would have been
+      // reclaimed as well.
+      uptr TaggedUserPtr;
+      if (getChunkFromBlock(BlockUptr, &PrevUserPtr, &Header) &&
+          PrevUserPtr == UserPtr &&
+          (TaggedUserPtr = loadTag(UserPtr)) != UserPtr) {
+        uptr PrevEnd = TaggedUserPtr + Header.SizeOrUnusedBytes;
+        const uptr NextPage = roundUp(TaggedUserPtr, getPageSizeCached());
+        if (NextPage < PrevEnd && loadTag(NextPage) != NextPage)
+          PrevEnd = NextPage;
+        TaggedPtr = reinterpret_cast<void *>(TaggedUserPtr);
+        resizeTaggedChunk(PrevEnd, TaggedUserPtr + Size, Size, BlockEnd);
+        if (UNLIKELY(FillContents != NoFill && !Header.OriginOrWasZeroed)) {
+          // If an allocation needs to be zeroed (i.e. calloc) we can normally
+          // avoid zeroing the memory now since we can rely on memory having
+          // been zeroed on free, as this is normally done while setting the
+          // UAF tag. But if tagging was disabled per-thread when the memory
+          // was freed, it would not have been retagged and thus zeroed, and
+          // therefore it needs to be zeroed now.
+          memset(TaggedPtr, 0,
+                 Min(Size, roundUp(PrevEnd - TaggedUserPtr,
+                                   archMemoryTagGranuleSize())));
+        } else if (Size) {
+          // Clear any stack metadata that may have previously been stored in
+          // the chunk data.
+          memset(TaggedPtr, 0, archMemoryTagGranuleSize());
+        }
+      } else {
+        const uptr OddEvenMask =
+            computeOddEvenMaskForPointerMaybe(Options, BlockUptr, ClassId);
+        TaggedPtr = prepareTaggedChunk(Ptr, Size, OddEvenMask, BlockEnd);
+      }
+      storePrimaryAllocationStackMaybe(Options, Ptr);
+    } else {
+      Block = addHeaderTag(Block);
+      Ptr = addHeaderTag(Ptr);
+      storeTags(reinterpret_cast<uptr>(Block), reinterpret_cast<uptr>(Ptr));
+      storeSecondaryAllocationStackMaybe(Options, Ptr, Size);
+    }
+
+    Chunk::UnpackedHeader Header = {};
+
+    const uptr DefaultAlignedPtr =
+        reinterpret_cast<uptr>(Block) + Chunk::getHeaderSize();
+    if (UNLIKELY(DefaultAlignedPtr != UserPtr)) {
+      const uptr Offset = UserPtr - DefaultAlignedPtr;
+      DCHECK_GE(Offset, 2 * sizeof(u32));
+      // The BlockMarker has no security purpose, but is specifically meant for
+      // the chunk iteration function that can be used in debugging situations.
+      // It is the only situation where we have to locate the start of a chunk
+      // based on its block address.
+      reinterpret_cast<u32 *>(Block)[0] = BlockMarker;
+      reinterpret_cast<u32 *>(Block)[1] = static_cast<u32>(Offset);
+      Header.Offset = (Offset >> MinAlignmentLog) & Chunk::OffsetMask;
+    }
+
+    Header.ClassId = ClassId & Chunk::ClassIdMask;
+    Header.State = Chunk::State::Allocated;
+    Header.OriginOrWasZeroed = Origin & Chunk::OriginMask;
+    Header.SizeOrUnusedBytes = SizeOrUnusedBytes & Chunk::SizeOrUnusedBytesMask;
+    Chunk::storeHeader(Cookie, Ptr, &Header);
+
+    return TaggedPtr;
+  }
+
   void quarantineOrDeallocateChunk(const Options &Options, void *TaggedPtr,
                                    Chunk::UnpackedHeader *Header,
                                    uptr Size) NO_THREAD_SAFETY_ANALYSIS {

@ChiaHungDuan ChiaHungDuan marked this pull request as draft February 29, 2024 22:31
@ChiaHungDuan
Copy link
Contributor Author

This continues the work https://reviews.llvm.org/D159392

@ChiaHungDuan ChiaHungDuan changed the title [scudo] Split the code paths in quarantineOrDeallocateChunk() [scudo] Split the code path of memory tagging out from allocate() Feb 29, 2024
@ChiaHungDuan ChiaHungDuan marked this pull request as ready for review March 11, 2024 21:15
@ChiaHungDuan ChiaHungDuan changed the title [scudo] Split the code path of memory tagging out from allocate() [scudo] Split the code path of memory tagging out from allocate() and quarantineOrDeallocateChunk() Mar 11, 2024
@ChiaHungDuan
Copy link
Contributor Author

ChiaHungDuan commented Mar 11, 2024

I measured the cache miss rate by using simpleperf and running malloc-rss-benchmark (Remove some noises which may impact the result) for 20 times each.

I don't see significant difference (0.5934682 vs 0.5927589, less than 0.5%). As mentioned in the comment, this change helps with readability in both source and asm levels. As a result, I think it's still a good reason to do the clean up.

BTW, there were two changes and I put them in the same pull request to simplify the process

@ChiaHungDuan ChiaHungDuan changed the title [scudo] Split the code path of memory tagging out from allocate() and quarantineOrDeallocateChunk() [scudo] Split the code paths which enable memory tagging Mar 11, 2024
// When memory tagging is enabled, zeroing the contents is done as part of
// setting the tag.

uptr PrevUserPtr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super small nit, but it would probably be best if this was down with the TaggedUserPtr declaration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

storeSecondaryAllocationStackMaybe(Options, Ptr, Size);
}

Chunk::UnpackedHeader Header = {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the header storage code looks very similar in both paths. Should this be made into a separate function? It might be that it's not as similar as I think though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that they look similar but I may have to review them carefully (the differences between w/ and w/o memory tagging are usually nuanced). Thus I may prefer doing this in a separate CL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, move the code for writing the header back into allocate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is,

On non-mte path, the UserPtr is used by header update and returned.
On mte path, header update requires UserPtr to update the header and TaggedPtr will be returned.

Currently, I'm not sure if I can use one to infer the value of another one (and I prefer not to return two variables or something like that). As a result, I would suggest that let me do another refactor on this and see if we can simplify the logic here and move the code structure only in this CL (try to have least logical changes).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree do this in a follow-up.


void *BlockBegin;
if (UNLIKELY(useMemoryTagging<Config>(Options))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to reverse the if and else clauses and check LIKELY(!useMemoryTagging...)?

It's a bit easier to read if the likely choice is first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. This also aligns the order in other place

const u8 PrevTag = extractTag(reinterpret_cast<uptr>(TaggedPtr));
storeDeallocationStackMaybe(Options, Ptr, PrevTag, Size);
if (Header->ClassId) {
if (!TSDRegistry.getDisableMemInit()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that the original code does this, but it would be better to have a single if instead of the two ifs since neither if has an else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

storeTags(reinterpret_cast<uptr>(Block), reinterpret_cast<uptr>(Ptr));
storeSecondaryAllocationStackMaybe(Options, Ptr, Size);
}
return initChunkWithMemoryTagging(ClassId, Origin, Block, UserPtr, Size,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dangling else statement after return. Remove the else { block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

void *TaggedPtr = Ptr;

if (LIKELY(ClassId)) {
// We only need to zero or tag the contents for Primary backed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're refactoring, is it possible to pull this out into a helper function, so that this reads something like:

 if (primary) {
   initPrimaryChunkWithTagging();
  } else {
   initSecondaryChunkWithTagging(); // It's 5 lines, so maybe leave it just as those 5 lines of code
  }

 // code to store header
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's a good idea to move the primary chunk initialization into a function. However, given the number of referenced variables (it's 10), instead of passing all the variables (even we can mark the function as always-inline) I'm thinking that maybe we also want to review and simplify the logic as well.

I added the comment to highlight what each block is doing. I'm planning to do further refactoring in a follow up CL, what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to refactor later, yeah. It's just a huge function right now that I've always had a hard time reading, as it spans about three monitor pages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I've never understood this idea that code is more readable if split into multiple functions that are only called once. In my experience it generally makes the code less readable and less maintainable.

  • If I am reading the function, I'm reading it because I need to know what it does. That means I need to read the function and its callees. Having to go-to-definition in an IDE, read the callee, go back, etc just adds mental overhead and requires readers to use IDEs. That's much more annoying than the function being three pages long. At least with the three page function, the three pages are in one place. With separate functions it's 3.1 pages long due to the function signatures and scattered in various places in the file.
  • If I am modifying the function, the multiple functions just get in the way. For example if I now need some information in the callee, now I need to add arguments to the caller and callee for basically no reason other than the function having been arbitrarily split into multiple functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reviewing this cl, trying to work through a three page long function has been painful. Breaking it up into smaller, easier to understand pieces really helps understand what's going on, because I only need to understand what each function does. For example, with a large function, variable names tend to be reused, so it can get confusing trying to figure out where the variable was set and what it means at page one versus page two or page three. You can fix this by never re-using the variable names, but at that point you are emulating breaking the function into smaller functions and might as well go all the way and refactor.

The advice to break functions is almost universal, but I have seen it go too far the other way. Adding a function that has two lines in it and is only used once, that doesn't make sense and should be folded into the caller. I don't think this change is doing that.

Now, if you don't think that this refactor is breaking up the function into easy to understand pieces, that would be a reason to keep modifying this. For example, if you don't think the comments make sense, or that the function partition is right, please mention it.

Otherwise, having had to read this very long function a number of times, I can tell you this new version makes it much easier to understand. Even trying to figure out the MTE enabled code path made it extra difficult to figure out what was happening. Having to go back and forth looking at the modifications done in the MTE enabled path can be confusing. So, this should really help understand the MTE and non-MTE paths alone, which makes this worth doing, in my estimation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can get confusing trying to figure out where the variable was set and what it means at page one versus page two or page three

Yeah, this is one of the things that I would need an IDE/clangd for (find variable definition/references), but the thing is that I would need an IDE for this regardless of whether the function was broken up. And for me, the split into multiple functions would make the situation worse because I need to manually track the variable across function calls (find the matching argument in the caller/callee) as well, since an IDE typically will only find definition/references within the local function. I think the fact that there's a lot of code to handle allocations is just an intrinsic property of the code needing to do a lot of stuff and it's not clear to me that moving it around like this really helps.

You can fix this by never re-using the variable names

I would personally be happier with a change like this. The code would still be friendly to IDEs and people who want to read the whole function and it would make it more obvious where the value for each variable comes from.

The advice to break functions is almost universal

Fair enough, I'm not necessarily claiming that my experience is universal, it could just be the way my brain works. If breaking functions really does make the code easiest to read for everyone else, I wouldn't object to it and I'll just put up with it being harder to read for me.

Now, if you don't think that this refactor is breaking up the function into easy to understand pieces, that would be a reason to keep modifying this. For example, if you don't think the comments make sense, or that the function partition is right, please mention it.

Yeah, my other comments are what I have to offer on that front. But since the question of how to structure code to make it easy to read is subjective and my personal preference would be to either not make this change at all or to make a different change like not re-using variables it's difficult to offer more feedback than that.

@pcc
Copy link
Contributor

pcc commented Apr 2, 2024

I still think this is a bad idea. People are going to update one copy of the code and forget to update the other one, which will cause all kinds of problems.

@ChiaHungDuan
Copy link
Contributor Author

I still think this is a bad idea. People are going to update one copy of the code and forget to update the other one, which will cause all kinds of problems.

I'm not sure if the title and the description are misleading so that you think we just create two copies which have bunch of copy-paste codes. I can update the them if that's the problem.

So far, this CL only splits the codes that have independent logic (w/ and w/o memory tagging), other parts (like allocating blocks from Primary or Secondary) are still sharing the same code. It's not creating two copies. In addition, if someone changes the layout of header and they don't run mte build, they may still forget to update the mte-path, right? Then we should increase the test coverage. I would think that unittests should help us catching that no matter the mte path is textual inline or not.

@pcc
Copy link
Contributor

pcc commented Apr 3, 2024

Yeah, it looked like you were copy-pasting more code than you actually were. But it looks like this change is mixed in with a number of other changes that appear to have been intended to improve performance. Since your experiments showed that this change is performance neutral, I don't think we should make those changes. Please also update the description to not talk about performance.

void quarantineOrDeallocateChunk(const Options &Options, void *TaggedPtr,
void *HeaderTaggedPtr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to pass this in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

&TaggedEnd);
}
}
void *BlockBegin;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void *BlockBegin = getBlockBegin(untagPointer(Ptr), Header);

and then you can get rid of some complexity below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you are saying we can remove the code,

      if (BypassQuarantine && allocatorSupportsMemoryTagging<Config>())
        Ptr = untagPointer(Ptr);
      BlockBegin = getBlockBegin(Ptr, Header);

Now if BypassQuarantine == false then we don't call untagPointer() on Ptr, so I think it will change the behavior, or no?

Besides, the BlockBegin is overridden on mte path, I think it's fine to leave it as it is. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BlockBegin is unused on the branch of the code where BypassQuarantine is false, so it doesn't make a difference. And BlockBegin is evaluated in the same way in reTagBlock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is, it seems that we have the different logic here. If BypassQuarantine == true && allocatorSupportsMemoryTagging<Config>() == false, it doesn't untag the pointer (the case of BypassQuarantine == false was trying to mention the same thing).

Maybe it's something we can simplify but I would like to make it as close to the same logic as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If allocatorSupportsMemoryTagging<Config>() == false then it shouldn't matter whether we untag or not. In that case, all pointers will be untagged so untagPointer will be a no-op. The reason why the old code was checking allocatorSupportsMemoryTagging<Config>() is as a minor optimization -- we don't need to emit the and instruction if we know it'll always be a no-op. But the additional logic being added here outweighs the value of this optimization.

In other words, the logic was like that before because it was the simplest way to write the code before. If you're proposing to reorganize the code into a new structure, then it should be kept simple, but in a way that fits the new structure. If the goal of this change is to make the code more readable, then it shouldn't be making the code less readable by needlessly copying logic from the old version of the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline NORETURN uptr untagPointer(uptr Ptr) {                                       
  (void)Ptr;                                                                        
  UNREACHABLE("memory tagging not supported");                                      
}                

When allocatorSupportsMemoryTagging<Config>() == false, untagPointer leads to abort.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. How about we add

uptr maybeUntagPointer(uptr Ptr) {
  return allocatorSupportsMemoryTagging<Config>() ? untagPointer(Ptr) : Ptr;
}

and change the other callers of untagPointer to call it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I was blocked by other stuff recently.

This is a little bit more than simply move the code structure. If we want to do that, I may suggest doing this in a follow up so that we can catch up if there's any logic mismatch in this CL and do further clean up later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, land this as is then.

@@ -1213,6 +1259,36 @@ class Allocator {
}
}

NOINLINE void *unTagBlock(const Options &Options, void *TaggedPtr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't "untag" the block, it retags it as part of deallocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to reTagBlock

storeSecondaryAllocationStackMaybe(Options, Ptr, Size);
}

Chunk::UnpackedHeader Header = {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, move the code for writing the header back into allocate.

@ChiaHungDuan ChiaHungDuan changed the title [scudo] Split the code paths which enable memory tagging [scudo] Move the chunk update into functions Apr 3, 2024
The code paths for mte enabled and disabled were interleaving and which
increases the difficulty of reading each path in both source level and
assembly level. In this change, we move the parts that they have different
logic into functions and minor refactors on the code structure.
@ChiaHungDuan
Copy link
Contributor Author

Squashed and rebased

Copy link
Contributor

@cferris1000 cferris1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

But you might want to wait until pcc answers the one outstanding question.

storeSecondaryAllocationStackMaybe(Options, Ptr, Size);
}

Chunk::UnpackedHeader Header = {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree do this in a follow-up.

void *TaggedPtr = Ptr;

if (LIKELY(ClassId)) {
// We only need to zero or tag the contents for Primary backed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I've never understood this idea that code is more readable if split into multiple functions that are only called once. In my experience it generally makes the code less readable and less maintainable.

  • If I am reading the function, I'm reading it because I need to know what it does. That means I need to read the function and its callees. Having to go-to-definition in an IDE, read the callee, go back, etc just adds mental overhead and requires readers to use IDEs. That's much more annoying than the function being three pages long. At least with the three page function, the three pages are in one place. With separate functions it's 3.1 pages long due to the function signatures and scattered in various places in the file.
  • If I am modifying the function, the multiple functions just get in the way. For example if I now need some information in the callee, now I need to add arguments to the caller and callee for basically no reason other than the function having been arbitrarily split into multiple functions.

&TaggedEnd);
}
}
void *BlockBegin;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BlockBegin is unused on the branch of the code where BypassQuarantine is false, so it doesn't make a difference. And BlockBegin is evaluated in the same way in reTagBlock.

Chunk::UnpackedHeader *Header, const uptr Size,
bool BypassQuarantine) {
DCHECK(useMemoryTagging<AllocatorConfig>(Options));
void *Ptr = HeaderTaggedPtr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get rid of this line and just call the argument Ptr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@ChiaHungDuan ChiaHungDuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also rename reTagBlock to retagBlock. It makes the style consistent with retagPointer.

Chunk::UnpackedHeader *Header, const uptr Size,
bool BypassQuarantine) {
DCHECK(useMemoryTagging<AllocatorConfig>(Options));
void *Ptr = HeaderTaggedPtr;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

&TaggedEnd);
}
}
void *BlockBegin;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is, it seems that we have the different logic here. If BypassQuarantine == true && allocatorSupportsMemoryTagging<Config>() == false, it doesn't untag the pointer (the case of BypassQuarantine == false was trying to mention the same thing).

Maybe it's something we can simplify but I would like to make it as close to the same logic as possible.

@ChiaHungDuan
Copy link
Contributor Author

Verified on MTE device again and fixed a left over pointer untagging.

@ChiaHungDuan ChiaHungDuan merged commit 772b1b0 into llvm:main May 16, 2024
3 of 4 checks passed
ahomescu added a commit to ahomescu/llvm-project that referenced this pull request May 22, 2024
llvm#83493 slightly
changed the order of computation of block addresses and
pointers, causing the value of DefaultAlignedPtr to
include the MTE tag. Move this computation earlier so it
matches the old behavior.

This fixes a UBSan failure in Trusty:
secure os: UBSan: (overflow:-) external/scudo/standalone/combined.h:1070:35
secure os: Details: unsigned integer overflow: 8988807738704 - 144124176883594576 cannot be represented in type 'uptr'
ahomescu added a commit to ahomescu/llvm-project that referenced this pull request May 22, 2024
llvm#83493 slightly
changed the order of computation of block addresses and
pointers, causing the value of DefaultAlignedPtr to
include the MTE tag. Move this computation earlier so it
matches the old behavior.

This fixes a UBSan failure in Trusty:
secure os: UBSan: (overflow:-) external/scudo/standalone/combined.h:1070:35
secure os: Details: unsigned integer overflow: 8988807738704 - 144124176883594576 cannot be represented in type 'uptr'
ahomescu added a commit to ahomescu/llvm-project that referenced this pull request May 23, 2024
llvm#83493 slightly
changed the order of computation of block addresses and
pointers, causing the value of DefaultAlignedPtr to
include the MTE tag. Move this computation earlier so it
matches the old behavior.

This fixes a UBSan failure in Trusty:
secure os: UBSan: (overflow:-) external/scudo/standalone/combined.h:1070:35
secure os: Details: unsigned integer overflow: 8988807738704 - 144124176883594576 cannot be represented in type 'uptr'
ChiaHungDuan pushed a commit that referenced this pull request May 23, 2024
#83493 slightly
changed the order of computation of block addresses and
pointers, causing the value of DefaultAlignedPtr to
include the MTE tag. Move this computation earlier so it
matches the old behavior.

This fixes a UBSan failure in Trusty:
secure os: UBSan: (overflow:-)
external/scudo/standalone/combined.h:1070:35
secure os: Details: unsigned integer overflow: 8988807738704 -
144124176883594576 cannot be represented in type 'uptr'
jrguzman-ms pushed a commit to msft-mirror-aosp/platform.external.scudo that referenced this pull request May 24, 2024
llvm/llvm-project#83493 slightly
changed the order of computation of block addresses and
pointers, causing the value of DefaultAlignedPtr to
include the MTE tag. Move this computation earlier so it
matches the old behavior.

This fixes a UBSan failure in Trusty:
secure os: UBSan: (overflow:-)
external/scudo/standalone/combined.h:1070:35
secure os: Details: unsigned integer overflow: 8988807738704 -
144124176883594576 cannot be represented in type 'uptr'

GitOrigin-RevId: b17d44558ba4c30a3005089b334f68593d6a9c7c
Change-Id: Ie86f195d79144e0539684a71dbedaa0c8b961729
artem pushed a commit to artem/fuchsia-mirror that referenced this pull request May 24, 2024
… without tag (#92989)

llvm/llvm-project#83493 slightly
changed the order of computation of block addresses and
pointers, causing the value of DefaultAlignedPtr to
include the MTE tag. Move this computation earlier so it
matches the old behavior.

This fixes a UBSan failure in Trusty:
secure os: UBSan: (overflow:-)
external/scudo/standalone/combined.h:1070:35
secure os: Details: unsigned integer overflow: 8988807738704 -
144124176883594576 cannot be represented in type 'uptr'

GitOrigin-RevId: 68d65af99a063e79222c790f5afcdcf2e58b38db
Original-Revision: 01bc3bce33d2f804df1828387577bef23fbd8c7d
Roller-URL: https://ci.chromium.org/b/8747120831581303473
CQ-Do-Not-Cancel-Tryjobs: true
Change-Id: I78b348eeae725db1367d1580aec2d1e21e5550d0
Reviewed-on: https://fuchsia-review.googlesource.com/c/fuchsia/+/1053852
bherrera pushed a commit to misttech/mist-os that referenced this pull request May 26, 2024
… without tag (#92989)

llvm/llvm-project#83493 slightly
changed the order of computation of block addresses and
pointers, causing the value of DefaultAlignedPtr to
include the MTE tag. Move this computation earlier so it
matches the old behavior.

This fixes a UBSan failure in Trusty:
secure os: UBSan: (overflow:-)
external/scudo/standalone/combined.h:1070:35
secure os: Details: unsigned integer overflow: 8988807738704 -
144124176883594576 cannot be represented in type 'uptr'

GitOrigin-RevId: 68d65af99a063e79222c790f5afcdcf2e58b38db
Original-Revision: 01bc3bce33d2f804df1828387577bef23fbd8c7d
Roller-URL: https://ci.chromium.org/b/8747120831581303473
CQ-Do-Not-Cancel-Tryjobs: true
Change-Id: I78b348eeae725db1367d1580aec2d1e21e5550d0
Reviewed-on: https://fuchsia-review.googlesource.com/c/fuchsia/+/1053852
qiaojbao pushed a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Jun 5, 2024
…25be5ad0d

Local branch amd-gfx 43125be Merged main:f97f039e0bb7bb60c9cc437f678059c5ee19c8da into amd-gfx:ce126627c1f0
Remote branch main 772b1b0 [scudo] Move the chunk update into functions (llvm#83493)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants