Skip to content

Conversation

rjmansfield
Copy link
Contributor

Fixes #86644

@llvmbot
Copy link
Member

llvmbot commented Sep 18, 2025

@llvm/pr-subscribers-llvm-binary-utilities

Author: Ryan Mansfield (rjmansfield)

Changes

Fixes #86644


Full diff: https://github.com/llvm/llvm-project/pull/159574.diff

3 Files Affected:

  • (added) llvm/test/tools/llvm-size/macho-pagezero.test (+60)
  • (modified) llvm/tools/llvm-size/Opts.td (+2)
  • (modified) llvm/tools/llvm-size/llvm-size.cpp (+8-2)
diff --git a/llvm/test/tools/llvm-size/macho-pagezero.test b/llvm/test/tools/llvm-size/macho-pagezero.test
new file mode 100644
index 0000000000000..d53067504c3ad
--- /dev/null
+++ b/llvm/test/tools/llvm-size/macho-pagezero.test
@@ -0,0 +1,60 @@
+# Test the -z option to skip __PAGEZERO segment in Mach-O files
+
+# RUN: yaml2obj %s --docnum=1 -o %t-pagezero.o
+# RUN: llvm-size %t-pagezero.o | \
+# RUN:   FileCheck %s --check-prefix=NORMAL --match-full-lines \
+# RUN:                --strict-whitespace --implicit-check-not={{.}}
+# RUN: llvm-size -z %t-pagezero.o | \
+# RUN:   FileCheck %s --check-prefix=SKIP --match-full-lines \
+# RUN:                --strict-whitespace --implicit-check-not={{.}}
+
+# NORMAL:__TEXT	__DATA	__OBJC	others	dec	hex
+# NORMAL-NEXT:20	100	0	4096	4216	1078	
+
+# SKIP:__TEXT	__DATA	__OBJC	others	dec	hex
+# SKIP-NEXT:20	100	0	0	120	78	
+
+--- !mach-o
+FileHeader:
+  magic:           0xFEEDFACF
+  cputype:         0x100000C
+  cpusubtype:      0x0
+  filetype:        0x2
+  ncmds:           3
+  sizeofcmds:      216
+  flags:           0x2000
+  reserved:        0x0
+LoadCommands:
+  - cmd:             LC_SEGMENT_64
+    cmdsize:         72
+    segname:         __PAGEZERO
+    vmaddr:          0x0
+    vmsize:          4096
+    fileoff:         0
+    filesize:        0
+    maxprot:         0
+    initprot:        0
+    nsects:          0
+    flags:           0
+  - cmd:             LC_SEGMENT_64
+    cmdsize:         72
+    segname:         __TEXT
+    vmaddr:          0x100000000
+    vmsize:          20
+    fileoff:         248
+    filesize:        20
+    maxprot:         7
+    initprot:        5
+    nsects:          0
+    flags:           0
+  - cmd:             LC_SEGMENT_64
+    cmdsize:         72
+    segname:         __DATA
+    vmaddr:          0x100001000
+    vmsize:          100
+    fileoff:         268
+    filesize:        100
+    maxprot:         7
+    initprot:        3
+    nsects:          0
+    flags:           0
diff --git a/llvm/tools/llvm-size/Opts.td b/llvm/tools/llvm-size/Opts.td
index edae43f1abd24..65478730c2801 100644
--- a/llvm/tools/llvm-size/Opts.td
+++ b/llvm/tools/llvm-size/Opts.td
@@ -21,6 +21,8 @@ def grp_mach_o : OptionGroup<"kind">, HelpText<"OPTIONS (Mach-O specific)">;
 def arch_EQ : Joined<["--"], "arch=">, HelpText<"architecture(s) from a Mach-O file to dump">, Group<grp_mach_o>;
 def : Separate<["--", "-"], "arch">, Alias<arch_EQ>;
 def l : F<"l", "When format is darwin, use long format to include addresses and offsets">, Group<grp_mach_o>;
+def z : F<"z", "Do not include __PAGEZERO segment in totals">,
+        Group<grp_mach_o>;
 
 def : F<"A", "Alias for --format">, Alias<format_EQ>, AliasArgs<["sysv"]>;
 def : F<"B", "Alias for --format">, Alias<format_EQ>, AliasArgs<["berkeley"]>;
diff --git a/llvm/tools/llvm-size/llvm-size.cpp b/llvm/tools/llvm-size/llvm-size.cpp
index acc7843ffac8b..805f8ed1e6dcd 100644
--- a/llvm/tools/llvm-size/llvm-size.cpp
+++ b/llvm/tools/llvm-size/llvm-size.cpp
@@ -79,6 +79,7 @@ static bool DarwinLongFormat;
 static RadixTy Radix = RadixTy::decimal;
 static bool TotalSizes;
 static bool HasMachOFiles = false;
+static bool SkipPageZero = false;
 
 static std::vector<std::string> InputFilenames;
 
@@ -307,7 +308,9 @@ static void printDarwinSegmentSizes(MachOObjectFile *MachO) {
         }
       } else {
         StringRef SegmentName = StringRef(Seg.segname);
-        if (SegmentName == "__TEXT")
+        if (SkipPageZero && SegmentName == "__PAGEZERO")
+          ; // Skip __PAGEZERO segment
+        else if (SegmentName == "__TEXT")
           total_text += Seg.vmsize;
         else if (SegmentName == "__DATA")
           total_data += Seg.vmsize;
@@ -333,7 +336,9 @@ static void printDarwinSegmentSizes(MachOObjectFile *MachO) {
         }
       } else {
         StringRef SegmentName = StringRef(Seg.segname);
-        if (SegmentName == "__TEXT")
+        if (SkipPageZero && SegmentName == "__PAGEZERO")
+          ; // Skip __PAGEZERO segment
+        else if (SegmentName == "__TEXT")
           total_text += Seg.vmsize;
         else if (SegmentName == "__DATA")
           total_data += Seg.vmsize;
@@ -914,6 +919,7 @@ int llvm_size_main(int argc, char **argv, const llvm::ToolContext &) {
 
   ELFCommons = Args.hasArg(OPT_common);
   DarwinLongFormat = Args.hasArg(OPT_l);
+  SkipPageZero = Args.hasArg(OPT_z);
   TotalSizes = Args.hasArg(OPT_totals);
   StringRef V = Args.getLastArgValue(OPT_format_EQ, "berkeley");
   if (V == "berkeley")

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure to update the llvm-size command guide (llvm/docs/CommandGuide/llvm-size.rst) with the new option.

This will need a Mach-O developer to review, since I don't know the intricacies of the segment layout for these files. @drodriguez / @danzimm / @alexander-shaposhnikov, can any of you help?

Comment on lines 5 to 9
# RUN: FileCheck %s --check-prefix=NORMAL --match-full-lines \
# RUN: --strict-whitespace --implicit-check-not={{.}}
# RUN: llvm-size -z %t-pagezero.o | \
# RUN: FileCheck %s --check-prefix=SKIP --match-full-lines \
# RUN: --strict-whitespace --implicit-check-not={{.}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this test isn't about the format of the output, you can get away with omitting the --strict-whitespace and --implicit-check-not options. You might want to keep --match-full-lines, so that you don't accidentally miss a leading/trailing digit in the first/last number.

@drodriguez
Copy link
Contributor

If we use Xcode's size instead of llvm-size, I get the same display.

$ build/bin/yaml2obj llvm/test/tools/llvm-size/macho-pagezero.test --docnum=1 -o macho-pagezero.o
$ build/bin/llvm-size macho-pagezero.o
__TEXT	__DATA	__OBJC	others	dec	hex
20	100	0	4096	4216	1078
$ size macho-pagezero.o
__TEXT	__DATA	__OBJC	others	dec	hex
20	100	0	4096	4216	1078

The man page for size does not have a -z option (it has -m, -l, -x and -arch. Except -arch, the rest seems to be supported). Like @jh7370 says, maybe -z is not the best spelling for an option here, and a long one that makes clear is not for compatibility makes more sense.

But in the Xcode man page one can also find the following:

Size (without the -m option) prints the (decimal) number of bytes required by the __TEXT, __DATA and __OBJC segments. All other segments are totaled and that size is listed in the `others' column.

Which explains the "others" category including __PAGEZERO.

If we want to provide this special behaviour because of a feature request, it should be obvious that it is not a compatibility flag or similar.

@rjmansfield
Copy link
Contributor Author

Something like --skip-pagezero or --exclude-pagezero maybe? From the issue, the counting of the 4G __PAGEZERO segment is making the totals appear skewed and causing some confusion. The bug report calls the default output "useless" because of it. While the default output is correct, a new option that excludes would help people see the numbers are expecting to see.

@jh7370
Copy link
Collaborator

jh7370 commented Sep 24, 2025

I'd go with --exclude-pagezero, personally. Or you could go with a more general-purpose option of --exclude-segment=<segment-name>, but that's probably overkill.

Address code review comments.
@rjmansfield rjmansfield changed the title [llvm-size] Add -z option for Mach-O to exclude __PAGEZERO size. [llvm-size] Add --exclude-pagezero option for Mach-O to exclude __PAGEZERO size. Sep 24, 2025
Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation still needs updating, as stated in my earlier comment.

Also, the test only covers the LC_SEGMENT_64 case, but you've modified both that and the LC_SEGMENT path.

Add test for 32 bit  __PAGEZERO segment.
.. option:: --exclude-pagezero

Do not include the ``__PAGEZERO`` segment when calculating size information
for Mach-O files. ``__PAGEZERO`` segment is a virtual memory region used
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for Mach-O files. ``__PAGEZERO`` segment is a virtual memory region used
for Mach-O files. The ``__PAGEZERO`` segment is a virtual memory region used

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. Let me know if you need me to merge this for you again.

@rjmansfield
Copy link
Contributor Author

@jh7370 If you could merge it, I'd appreciate it. Thanks.

@drodriguez drodriguez merged commit 30b0215 into llvm:main Sep 29, 2025
10 checks passed
@rjmansfield rjmansfield deleted the llvmsize_exclude_pagezero branch September 29, 2025 23:29
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
…EZERO size. (llvm#159574)

Do not include the ``__PAGEZERO`` segment when calculating size information
for Mach-O files when `--exclude-pagezero` is used. The ``__PAGEZERO``
segment is a virtual memory region used for memory protection that does not
contribute to actual size, and excluding can provide a better representation of
actual size.

Fixes llvm#86644
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Default llvm-size output format berkeley (but also darwin) is useless on macOS

4 participants