Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Soft Stop' solution on offset overflow issue: By Produceing a truncated but valid DWP file, discarding any DWO files that would not fit within the 32 bit/4GB limits of the format. #71902

Merged

Conversation

Labman-001
Copy link
Contributor

@Labman-001 Labman-001 commented Nov 10, 2023

When 'ContinueOnCuIndexOverflow' enables without this modification, the forcibly generated '.dwp' won't be recognized by Debugger(gdb 10.1) correctly.
image
it looks like there's a problem with processing the dwarf header, which makes debugging completely impossible.

About patch:
When llvm-dwp enables option '--continue-on-cu-index-overflow=soft-stop' and recognizes the overflow problem , it will stop packing and generate the '.dwp' file at once, discarding any DWO files that would not fit within the 32 bit/4GB limits of the format.
image

Copy link

github-actions bot commented Nov 10, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@Labman-001 Labman-001 force-pushed the users/huangjinjie/llvm-dwp-4gb-limit-modify branch from 7fd2678 to 0271dfa Compare November 10, 2023 08:38
@Labman-001 Labman-001 changed the title Solve llvm-dwp overflow problem, skipped over 4g dwo Solve llvm-dwp overflow problem, skipped over 4GB dwo Nov 10, 2023
@Labman-001 Labman-001 changed the title Solve llvm-dwp overflow problem, skipped over 4GB dwo Solve llvm-dwp overflow problem, skipped over 4GB '.dwo' files Nov 10, 2023
@dwblaikie
Copy link
Collaborator

When 'ContinueOnCuIndexOverflow' enables without this modification, the forcibly generated '.dwp' won't be recognized by Debugger correctly.

The whole point of the ContinueOnCuIndexOverflow is to continue(& allow tools like llvm-dwarfdump and lldb to continue with this corrupted index by reconstructing the .debug_info column in the DWP despite the overflow, by parsing the .debug_info section to find the boundary points/rebuild that column in the index) so this patch wouldn't be compatible with that goal.

We could add another flag that is sort of "best effort" DWP that emits a correct one with as many CUs as it can... but that doesn't seem /super/ useful?

@ayermolo

@ayermolo
Copy link
Contributor

I agree with @dwblaikie. It is possible to re-construct CU index. It was implemented in LLVM toolchain including LLDB. (String size is a different issue, but there are work arounds for it in llvm). Maybe GDB can do the same?

@Labman-001
Copy link
Contributor Author

Labman-001 commented Nov 13, 2023

The whole point of the ContinueOnCuIndexOverflow is to continue(& allow tools like llvm-dwarfdump and lldb to continue with this corrupted index by reconstructing the .debug_info column in the DWP despite the overflow, by parsing the .debug_info section to find the boundary points/rebuild that column in the index) so this patch wouldn't be compatible with that goal.

I see. But according to my current test result, gdb(10.1) can't recognise '.dwp' files emitted after the debug_info section exceeds 4GB, and gdb reports an error about the dwarf version.I'm not sure if it's because GDB10.1 can't rebuild the correct column. But it looks like there's a problem with processing the dwarf header, which makes debugging completely impossible.

I'm wondering if it's better to change the logic of 'ContinueOnCuIndexOverflow' in this case, or if it's better for me to add a extra flag like 'ContinueWithBestEffort'.

By the way, in the part shown below, when there is an overflow problem, it will just RETURN, rather than continue emitting, which seems like a bug?
image

@dwblaikie
Copy link
Collaborator

I see. But according to my current test result, gdb(10.1) can't recognise '.dwp' files emitted after the debug_info section exceeds 4GB, and gdb reports an error about the dwarf version.I'm not sure if it's because GDB10.1 can't rebuild the correct column. But it looks like there's a problem with processing the dwarf header, which makes debugging completely impossible.

Right - if you're using gdb the recommendation would be not to pass this flag, and to make changes to reduce debug info to fit into the current DWARF DWP limit of 4GB per section.

We could add another flag that's "give up and provide a valid but truncated dwp at 4GB" which seems a bit awkward, but perhaps no more/less awkward than the current "create a corrupted index that we know can be recovered from".

Alternatively you could go and work with gdb to do some similar recovery.

(I did just have the thought that our recovery, especially in DWARFv5, could be better - we could tombstone the .debug_info column values in the index (set them to 0xffffffff at dwp-generation time), then, rather than rescanning the /whole/ .debug_info section, we could skim the .debug_info column in the index, find the highest offset that's not tombstoned, and start scanning from there to only rebuild the overflowed part of the column - would require another coordination between consumer and producer, whereas the current solution (discarding the whole column and rebuilding) only requires the consumer to do something different and works with the old gold-dwp that overflows naturally, and with llvm-dwp that can be explicitly requested to overflow)

I'm wondering if it's better to change the logic of 'ContinueOnCuIndexOverflow' in this case, or if it's better for me to add a extra flag like 'ContinueWithBestEffort'.

Right, the ContinueOnCUIndexOverflow is intentionally doing what it's doing, and doesn't seem like what you want.

What you want is less a Continue and more a 'create valid DWP even if it can't include all the inputs'... (which is 'continue' instead of 'error' I guess). Not sure what we'd call that flag.

@ayermolo
Copy link
Contributor

ayermolo commented Nov 13, 2023

I think a list of .o/.dwo files can be specified to llvm-dwp to produce .dwp file, correct?
Maybe better alternative is to have that logic on a side that invokes llvm-dwp to include debug information that is necessary. Instead of dropping entries that happen to be over 4GB, that might or might not be important to actual debugging.

@qshanz373
Copy link
Contributor

How about the option AllowOverflow ?

@ayermolo
Copy link
Contributor

How about the option AllowOverflow ?

How is that different from an option that is there now?

@Labman-001
Copy link
Contributor Author

Labman-001 commented Nov 15, 2023

How about calling the new flag 'StopOnCuIndexOverflow'? Since the old patch is to continue packaging dwo when overflow, the new patch will stop packaging.

@dwblaikie
Copy link
Collaborator

Probably need some terminology to disambiguate between the current default "stop" behavior which is to abort/error/fail, and the new flag behavior of "bail out and emit a best-effort DWP".

(that said, I'd still ask whether this is the right tool to add at all - I know it's a complicated space for all of us hitting these limits, so I'm not ruling anything out for sure, but trying to reduce all these different slices as much as we can - could you build less code in your situation? Or modify your build system to drop certain debug info to keep the dwp under these limits?)

@Labman-001
Copy link
Contributor Author

Labman-001 commented Nov 19, 2023

Probably need some terminology to disambiguate between the current default "stop" behavior which is to abort/error/fail, and the new flag behavior of "bail out and emit a best-effort DWP".

I think this could be described as a 'soft stop' (or Graceful Stop), as opposed to the current default 'hard stop' (Abortive Stop).

(that said, I'd still ask whether this is the right tool to add at all - I know it's a complicated space for all of us hitting these limits, so I'm not ruling anything out for sure, but trying to reduce all these different slices as much as we can - could you build less code in your situation? Or modify your build system to drop certain debug info to keep the dwp under these limits?)

The business scenario we are dealing with involves a large number of different projects all experiencing overflow issues. We are indeed considering modifying our build system to generate DWP only for the DWO parts we are interested in.
However, currently, supporting the dynamic selection of DWO for different business requirements makes solving this issue complex and difficult. At present, we indeed need a solution like this kind of 'soft stop'.

@ayermolo
Copy link
Contributor

and using LLDB is no feasible?

@Labman-001
Copy link
Contributor Author

and using LLDB is no feasible?

currently only gdb

@ayermolo
Copy link
Contributor

Personally I don't have a strong feeling about this, and understand trying to make giant build system work for everyone.
One thing it that it probably should be controlled by a standalone option.

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 20, 2023

@llvm/pr-subscribers-debuginfo

Author: Jinjie Huang (Labman-001)

Changes

When 'ContinueOnCuIndexOverflow' enables without this modification, the forcibly generated '.dwp' won't be recognized by Debugger(gdb 10.1) correctly.
<img width="657" alt="image" src="https://github.com/llvm/llvm-project/assets/150100070/31732775-2548-453a-a47a-fa04c6d05fe9">
it looks like there's a problem with processing the dwarf header, which makes debugging completely impossible.

About patch:
When llvm-dwp enables option 'ContinueOnCuIndexOverflow' and recognizes the overflow problem , it will stop packing and generate the '.dwp' file at once.
<img width="625" alt="image" src="https://github.com/llvm/llvm-project/assets/150100070/77d6be24-262b-4f4c-afc0-9a6c49e133c7">


Full diff: https://github.com/llvm/llvm-project/pull/71902.diff

2 Files Affected:

  • (modified) llvm/lib/DWP/DWP.cpp (+38-12)
  • (modified) llvm/tools/llvm-dwp/llvm-dwp.cpp (+3-1)
diff --git a/llvm/lib/DWP/DWP.cpp b/llvm/lib/DWP/DWP.cpp
index 89101ca7e5736ba..3080939133cfebc 100644
--- a/llvm/lib/DWP/DWP.cpp
+++ b/llvm/lib/DWP/DWP.cpp
@@ -197,11 +197,14 @@ static Error sectionOverflowErrorOrWarning(uint32_t PrevOffset,
   return make_error<DWPError>(Msg);
 }
 
-static Error addAllTypesFromDWP(
-    MCStreamer &Out, MapVector<uint64_t, UnitIndexEntry> &TypeIndexEntries,
-    const DWARFUnitIndex &TUIndex, MCSection *OutputTypes, StringRef Types,
-    const UnitIndexEntry &TUEntry, uint32_t &TypesOffset,
-    unsigned TypesContributionIndex, bool ContinueOnCuIndexOverflow) {
+static Error
+addAllTypesFromDWP(MCStreamer &Out,
+                   MapVector<uint64_t, UnitIndexEntry> &TypeIndexEntries,
+                   const DWARFUnitIndex &TUIndex, MCSection *OutputTypes,
+                   StringRef Types, const UnitIndexEntry &TUEntry,
+                   uint32_t &TypesOffset, unsigned TypesContributionIndex,
+                   bool ContinueOnCuIndexOverflow, bool &SeeOverflowFlag,
+                   bool StopOnCuIndexOverflow) {
   Out.switchSection(OutputTypes);
   for (const DWARFUnitIndex::Entry &E : TUIndex.getRows()) {
     auto *I = E.getContributions();
@@ -235,6 +238,9 @@ static Error addAllTypesFromDWP(
       if (Error Err = sectionOverflowErrorOrWarning(
               OldOffset, TypesOffset, "Types", ContinueOnCuIndexOverflow))
         return Err;
+      if (StopOnCuIndexOverflow)
+        SeeOverflowFlag = true;
+      return Error::success();
     }
   }
   return Error::success();
@@ -244,7 +250,8 @@ static Error addAllTypesFromTypesSection(
     MCStreamer &Out, MapVector<uint64_t, UnitIndexEntry> &TypeIndexEntries,
     MCSection *OutputTypes, const std::vector<StringRef> &TypesSections,
     const UnitIndexEntry &CUEntry, uint32_t &TypesOffset,
-    bool ContinueOnCuIndexOverflow) {
+    bool ContinueOnCuIndexOverflow, bool &SeeOverflowFlag,
+    bool StopOnCuIndexOverflow) {
   for (StringRef Types : TypesSections) {
     Out.switchSection(OutputTypes);
     uint64_t Offset = 0;
@@ -276,6 +283,9 @@ static Error addAllTypesFromTypesSection(
         if (Error Err = sectionOverflowErrorOrWarning(
                 OldOffset, TypesOffset, "types", ContinueOnCuIndexOverflow))
           return Err;
+        if (StopOnCuIndexOverflow)
+          SeeOverflowFlag = true;
+        return Error::success();
       }
     }
   }
@@ -583,7 +593,8 @@ Error handleSection(
 }
 
 Error write(MCStreamer &Out, ArrayRef<std::string> Inputs,
-            bool ContinueOnCuIndexOverflow) {
+            bool ContinueOnCuIndexOverflow,
+            bool StopOnCuIndexOverflow) {
   const auto &MCOFI = *Out.getContext().getObjectFileInfo();
   MCSection *const StrSection = MCOFI.getDwarfStrDWOSection();
   MCSection *const StrOffsetSection = MCOFI.getDwarfStrOffDWOSection();
@@ -613,6 +624,7 @@ Error write(MCStreamer &Out, ArrayRef<std::string> Inputs,
   uint32_t ContributionOffsets[8] = {};
   uint16_t Version = 0;
   uint32_t IndexVersion = 0;
+  bool SeeOverflowFlag = false;
 
   DWPStringPool Strings(Out, StrSection);
 
@@ -687,12 +699,17 @@ Error write(MCStreamer &Out, ArrayRef<std::string> Inputs,
         uint32_t SectionIndex = 0;
         for (auto &Section : Obj.sections()) {
           if (SectionIndex == Index) {
-            return sectionOverflowErrorOrWarning(
-                OldOffset, ContributionOffsets[Index], *Section.getName(),
-                ContinueOnCuIndexOverflow);
+            if (Error Err = sectionOverflowErrorOrWarning(
+                    OldOffset, ContributionOffsets[Index], *Section.getName(),
+                    ContinueOnCuIndexOverflow))
+              return Err;
           }
           ++SectionIndex;
         }
+        if (StopOnCuIndexOverflow) {
+          SeeOverflowFlag = true;
+          break;
+        }
       }
     }
 
@@ -722,6 +739,10 @@ Error write(MCStreamer &Out, ArrayRef<std::string> Inputs,
                     InfoSectionOffset, InfoSectionOffset + C.getLength32(),
                     "debug_info", ContinueOnCuIndexOverflow))
               return Err;
+            if (StopOnCuIndexOverflow) {
+              SeeOverflowFlag = true;
+              break;
+            }
           }
 
           UnitOffset += C.getLength32();
@@ -752,6 +773,8 @@ Error write(MCStreamer &Out, ArrayRef<std::string> Inputs,
               Info.substr(UnitOffset - C.getLength32(), C.getLength32()));
           InfoSectionOffset += C.getLength32();
         }
+        if (SeeOverflowFlag)
+          break;
       }
 
       if (!FoundCUUnit)
@@ -762,7 +785,7 @@ Error write(MCStreamer &Out, ArrayRef<std::string> Inputs,
         if (Error Err = addAllTypesFromTypesSection(
                 Out, TypeIndexEntries, TypesSection, CurTypesSection, CurEntry,
                 ContributionOffsets[getContributionIndex(DW_SECT_EXT_TYPES, 2)],
-                ContinueOnCuIndexOverflow))
+                ContinueOnCuIndexOverflow, SeeOverflowFlag, StopOnCuIndexOverflow))
           return Err;
       }
       continue;
@@ -860,9 +883,12 @@ Error write(MCStreamer &Out, ArrayRef<std::string> Inputs,
       if (Error Err = addAllTypesFromDWP(
               Out, TypeIndexEntries, TUIndex, OutSection, TypeInputSection,
               CurEntry, ContributionOffsets[TypesContributionIndex],
-              TypesContributionIndex, ContinueOnCuIndexOverflow))
+              TypesContributionIndex, ContinueOnCuIndexOverflow,
+              SeeOverflowFlag, StopOnCuIndexOverflow))
         return Err;
     }
+    if (SeeOverflowFlag)
+      break;
   }
 
   if (Version < 5) {
diff --git a/llvm/tools/llvm-dwp/llvm-dwp.cpp b/llvm/tools/llvm-dwp/llvm-dwp.cpp
index f976298dcadc2f4..8383f5e77cea8b4 100644
--- a/llvm/tools/llvm-dwp/llvm-dwp.cpp
+++ b/llvm/tools/llvm-dwp/llvm-dwp.cpp
@@ -144,6 +144,7 @@ int llvm_dwp_main(int argc, char **argv, const llvm::ToolContext &) {
 
   OutputFilename = Args.getLastArgValue(OPT_outputFileName, "");
   ContinueOnCuIndexOverflow = Args.hasArg(OPT_continueOnCuIndexOverflow);
+  SoftStopOnCuIndexOverflow = Args.hasArg(OPT_stopOnCuIndexOverflow);
 
   for (const llvm::opt::Arg *A : Args.filtered(OPT_execFileNames))
     ExecFilenames.emplace_back(A->getValue());
@@ -255,7 +256,8 @@ int llvm_dwp_main(int argc, char **argv, const llvm::ToolContext &) {
   if (!MS)
     return error("no object streamer for target " + TripleName, Context);
 
-  if (auto Err = write(*MS, DWOFilenames, ContinueOnCuIndexOverflow)) {
+  if (auto Err = write(*MS, DWOFilenames, ContinueOnCuIndexOverflow,
+      SoftStopOnCuIndexOverflow)) {
     logAllUnhandledErrors(std::move(Err), WithColor::error());
     return 1;
   }

@Labman-001 Labman-001 changed the title Solve llvm-dwp overflow problem, skipped over 4GB '.dwo' files 'Soft Stop' solution on offset overflow issue: By Produceing a truncated but valid DWP file, discarding any DWO files that would not fit within the 32 bit/4GB limits of the format. Nov 21, 2023
@Labman-001 Labman-001 force-pushed the users/huangjinjie/llvm-dwp-4gb-limit-modify branch from e30f401 to 1d6a206 Compare November 21, 2023 06:39
@Labman-001
Copy link
Contributor Author

Labman-001 commented Nov 21, 2023

@dwblaikie @ayermolo I have merged the 'soft-stop' and 'continue' operations into the 'OnCuIndexOverflow' enumeration.
I also reused the original 'continue-on-cu-index-overflow' option:

  • When this option is not given, llvm-dwp executes a 'hard-stop'.
  • When this option is given but no values specified, it executes 'continues' and continues to generate a DWP file with overflowed offset information..
  • When its value= 'soft-stop', it executes 'continues' and generates a 'best effort' DWP file.
  • Else: When its value is others, it also executes 'continues'

llvm/tools/llvm-dwp/Opts.td Outdated Show resolved Hide resolved
llvm/lib/DWP/DWP.cpp Outdated Show resolved Hide resolved
llvm/include/llvm/DWP/DWP.h Outdated Show resolved Hide resolved
@Labman-001
Copy link
Contributor Author

The the original continue-on-cu-index-overflow option's default value has been set to 'continue', please help to review it, thx.

@dwblaikie
Copy link
Collaborator

Generally looks OK to me - I guess this isn't practical to test due to the size needed to exceed the limit?

@Labman-001
Copy link
Contributor Author

Labman-001 commented Nov 28, 2023

Generally looks OK to me - I guess this isn't practical to test due to the size needed to exceed the limit?

Well, I think it's not totally impossible to test. I have compiled the modified project locally and used llvm-project/llvm/test/tools/llvm-dwp/Inputs/overflow/debug_info_v4.s and main_v4.s files to generate a .dwp file with debug_info exceeding 4GB.
Here is how I test it:

  1. This is what the help interface looks like:
    image
  2. Generate debug_info_v4.dwo(4GB debug_info) and main_v4.dwo (modified to be with 20M debug_info)
    ./bin/llvm-mc --triple=x86_64-unknown-linux --filetype=obj --split-dwarf-file=debug_info_v4.dwo -dwarf-version=4 ../llvm/test/tools/llvm-dwp/Inputs/overflow/debug_info_v4.s -o debug_info_v4.o
    ./bin/llvm-mc --triple=x86_64-unknown-linux --filetype=obj --split-dwarf-file=main_v4.dwo -dwarf-version=4 ../llvm/test/tools/llvm-dwp/Inputs/overflow/main_v4.s -o main_v4.o
  3. Generate .dwp file with 'soft-stop':
    ./bin/llvm-dwp debug_info_v4.dwo main_v4.dwo -continue-on-cu-index-overflow soft-stop -o overflow_soft.dwp
    Section Result:
    image
    image
  4. Generate .dwp file with 'continue':
    ./bin/llvm-dwp debug_info_v4.dwo main_v4.dwo -continue-on-cu-index-overflow continue -o overflow.dwp
    Section Result:
    image
    image

It can be observed that using the continue option, .dwp file increased the size by 20M, and when the soft-stop was enabled, the tool discarded the part that exceeded 4GB.
BTW, we have done some similar tests based on our business project to generate this kind of 'soft-stop' .dwp files and validated similar results.

@ayermolo
Copy link
Contributor

back from PTO. I'll take a look later today/tomorrow.

llvm/lib/DWP/DWP.cpp Outdated Show resolved Hide resolved
return Err;
if (AnySectionOverflow)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

llvm/lib/DWP/DWP.cpp Show resolved Hide resolved
Copy link
Contributor

@ayermolo ayermolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Please wait for David to approve also. :)

Copy link
Collaborator

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for the help with the review @ayermolo

@dwblaikie dwblaikie merged commit 4c44dcf into llvm:main Dec 1, 2023
3 checks passed
ayermolo added a commit to ayermolo/llvm-project that referenced this pull request Dec 15, 2023
This is follow up for llvm#71902. The
default option --continue-on-cu-index-overflow returned an error
--continue-on-cu-index-overflow: missing argument. Changed it so that it is the
same behavior as other flags like -gsplit-dwarf. Where
--continue-on-cu-index-overflow will default to continue, and user can set mode
with --continue-on-cu-index-overflow=<value>.
ayermolo added a commit that referenced this pull request Dec 18, 2023
This is follow up for #71902.
The
default option --continue-on-cu-index-overflow returned an error
--continue-on-cu-index-overflow: missing argument. Changed it so that it
is the
same behavior as other flags like -gsplit-dwarf. Where
--continue-on-cu-index-overflow will default to continue, and user can
set mode
with --continue-on-cu-index-overflow=\<value>.
qihangkong pushed a commit to rvgpu/rvgpu-llvm that referenced this pull request Apr 23, 2024
This is follow up for llvm/llvm-project#71902.
The
default option --continue-on-cu-index-overflow returned an error
--continue-on-cu-index-overflow: missing argument. Changed it so that it
is the
same behavior as other flags like -gsplit-dwarf. Where
--continue-on-cu-index-overflow will default to continue, and user can
set mode
with --continue-on-cu-index-overflow=\<value>.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants