-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[Sema] Implement fix as suggested by FIXME #143142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-clang Author: LUX (lux-QAQ) ChangesMove allocation of deduction failure info from the ASTContext to the heap. Implement corresponding type-safe delete calls in Full diff: https://github.com/llvm/llvm-project/pull/143142.diff 1 Files Affected:
diff --git a/clang/lib/Sema/SemaOverload.cpp b/clang/lib/Sema/SemaOverload.cpp
index 66f84fc67b52f..0e059239e65cb 100644
--- a/clang/lib/Sema/SemaOverload.cpp
+++ b/clang/lib/Sema/SemaOverload.cpp
@@ -756,8 +756,8 @@ clang::MakeDeductionFailureInfo(ASTContext &Context,
case TemplateDeductionResult::DeducedMismatch:
case TemplateDeductionResult::DeducedMismatchNested: {
- // FIXME: Should allocate from normal heap so that we can free this later.
- auto *Saved = new (Context) DFIDeducedMismatchArgs;
+ // Allocate from normal heap so that we can free this later.
+ auto *Saved = new DFIDeducedMismatchArgs;
Saved->FirstArg = Info.FirstArg;
Saved->SecondArg = Info.SecondArg;
Saved->TemplateArgs = Info.takeSugared();
@@ -767,8 +767,8 @@ clang::MakeDeductionFailureInfo(ASTContext &Context,
}
case TemplateDeductionResult::NonDeducedMismatch: {
- // FIXME: Should allocate from normal heap so that we can free this later.
- DFIArguments *Saved = new (Context) DFIArguments;
+ // Allocate from normal heap so that we can free this later.
+ DFIArguments *Saved = new DFIArguments;
Saved->FirstArg = Info.FirstArg;
Saved->SecondArg = Info.SecondArg;
Result.Data = Saved;
@@ -779,8 +779,8 @@ clang::MakeDeductionFailureInfo(ASTContext &Context,
// FIXME: It's slightly wasteful to allocate two TemplateArguments for this.
case TemplateDeductionResult::Inconsistent:
case TemplateDeductionResult::Underqualified: {
- // FIXME: Should allocate from normal heap so that we can free this later.
- DFIParamWithArguments *Saved = new (Context) DFIParamWithArguments;
+ // Allocate from normal heap so that we can free this later.
+ DFIParamWithArguments *Saved = new DFIParamWithArguments;
Saved->Param = Info.Param;
Saved->FirstArg = Info.FirstArg;
Saved->SecondArg = Info.SecondArg;
@@ -799,7 +799,8 @@ clang::MakeDeductionFailureInfo(ASTContext &Context,
break;
case TemplateDeductionResult::ConstraintsNotSatisfied: {
- CNSInfo *Saved = new (Context) CNSInfo;
+ // Allocate from normal heap so that we can free this later.
+ CNSInfo *Saved = new CNSInfo;
Saved->TemplateArgs = Info.takeSugared();
Saved->Satisfaction = Info.AssociatedConstraintsSatisfaction;
Result.Data = Saved;
@@ -831,14 +832,21 @@ void DeductionFailureInfo::Destroy() {
case TemplateDeductionResult::IncompletePack:
case TemplateDeductionResult::Inconsistent:
case TemplateDeductionResult::Underqualified:
+ delete static_cast<DFIParamWithArguments*>(Data);
+ Data = nullptr;
+ break;
case TemplateDeductionResult::DeducedMismatch:
case TemplateDeductionResult::DeducedMismatchNested:
+ delete static_cast<DFIDeducedMismatchArgs*>(Data);
+ Data = nullptr;
+ break;
case TemplateDeductionResult::NonDeducedMismatch:
- // FIXME: Destroy the data?
+ // Destroy the data
+ delete static_cast<DFIArguments*>(Data);
Data = nullptr;
break;
- case TemplateDeductionResult::SubstitutionFailure:
+ case TemplateDeductionResult::SubstitutionFailure:{
// FIXME: Destroy the template argument list?
Data = nullptr;
if (PartialDiagnosticAt *Diag = getSFINAEDiagnostic()) {
@@ -846,16 +854,18 @@ void DeductionFailureInfo::Destroy() {
HasDiagnostic = false;
}
break;
-
- case TemplateDeductionResult::ConstraintsNotSatisfied:
- // FIXME: Destroy the template argument list?
+ }
+ case TemplateDeductionResult::ConstraintsNotSatisfied:{
+ // Destroy the data
+ CNSInfo *Ptr = static_cast<CNSInfo*>(Data);
+ delete Ptr;
Data = nullptr;
if (PartialDiagnosticAt *Diag = getSFINAEDiagnostic()) {
Diag->~PartialDiagnosticAt();
HasDiagnostic = false;
}
break;
-
+ }
// Unhandled
case TemplateDeductionResult::MiscellaneousDeductionFailure:
case TemplateDeductionResult::AlreadyDiagnosed:
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Move allocation of deduction failure info from the ASTContext to the heap. Implement corresponding type-safe delete calls in `DeductionFailureInfo::Destroy()` to prevent memory leaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure this is better? Having to manually delete
these is... not great. Have you benchmarked whether this improves memory usage or makes Clang faster?
See also this comment by Aaron on the issue you linked: #143129 (comment)
I ran performance tests by following the LLVM Test Suite Guide and obtained the following results. The (base) ***@DESKTOP-80T55HI:~/code/llvmleak$ llvm-test-suite/utils/compare.py -m compile_time results_old.json results_new.json
Tests: 3327
Metric: compile_time
Program compile_time
results_old results_new diff
SingleSour...tTests/2020-01-06-coverage-008 0.00 0.02 484.6%
SingleSour...tTests/2010-05-24-BitfieldTest 0.00 0.01 265.9%
SingleSour.../2006-01-29-SimpleIndirectCall 0.00 0.01 238.9%
SingleSour...006-12-07-Compare64BitConstant 0.00 0.01 186.8%
SingleSour...itTests/2003-07-06-IntOverflow 0.01 0.02 159.7%
SingleSour...UnitTests/2002-05-02-CastTest1 0.00 0.01 117.9%
tools/test/check_env 0.02 0.05 107.9%
tools/test/ret0 0.02 0.05 107.9%
tools/test/abrt 0.02 0.05 107.9%
tools/test/ret1 0.02 0.05 107.9%
tools/fpcmp-target 0.02 0.05 107.9%
SingleSour...UnitTests/2006-01-23-UnionInit 0.01 0.02 88.5%
SingleSour.../UnitTests/block-call-r7674133 0.01 0.01 76.2%
SingleSour...UnitTests/2005-05-12-Int64ToFP 0.01 0.02 64.2%
SingleSour.../UnitTests/SignlessTypes/cast2 0.01 0.02 62.4%
Geomean difference -5.4%
compile_time
run results_old results_new diff
count 3327.000000 3327.000000 454.000000
mean 0.197357 0.184994 -0.005927
std 2.103227 1.847044 0.393625
min 0.000000 0.000000 -0.797814
25% 0.000000 0.000000 -0.139927
50% 0.000000 0.000000 -0.031750
75% 0.000000 0.000000 0.055100
max 92.449400 75.736300 4.846154
(base) ***@DESKTOP-80T55HI:~/code/llvmleak$ llvm-test-suite/utils/compare.py --filter-short results_old.json results_new.json
Tests: 3327
Short Running: 2390 (filtered out)
Remaining: 937
Metric: exec_time
Program exec_time
results_old results_new
SingleSour...ut-C++/Shootout-C++-sieve.test 0.00 1.05
SingleSour...marks/CoyoteBench/lpbench.test 0.00 1.29
MultiSourc...lt/CrossingThresholds-flt.test 0.00 1.21
SingleSour...ce/Benchmarks/Misc/perlin.test 0.00 1.20
MultiSourc...CI_Purple/SMG2000/smg2000.test 0.00 1.23
MicroBench...HMARK_ANISTROPIC_DIFFUSION/256 20620.89 48887.01
MicroBench..._MemCmp<31, LessThanZero, Mid> 294.71 677.21
MicroBench...HMARK_ANISTROPIC_DIFFUSION/128 6420.34 13718.01
MicroBench...MemCmp<5, LessThanZero, First> 501.73 1004.02
MicroBench...test:benchAutoVecForBigLoopTC8 1.44 2.88
MicroBench...test:benchAutoVecForBigLoopTC2 1.51 3.01
MicroBench...t:BENCHMARK_asinf_novec_float_ 109.42 217.25
MicroBench...rks.test:benchForIC4VW4LoopTC4 2.04 4.01
MicroBench...BENCHMARK_ORDERED_DITHER/128/2 37.37 72.65
MicroBench...BENCHMARK_acos_autovec_double_ 158.15 302.43
exec_time
run results_old results_new
count 937.000000 937.000000
mean 2723.168099 2608.870807
std 30514.292833 28486.676194
min 0.000000 0.000000
25% 2.978695 3.087209
50% 17.409600 17.785316
75% 281.905940 283.165699
max 666889.746452 622151.000000 Performance Test ResultsFollowing the LLVM Test Suite Guide, I conducted performance tests to evaluate the impact of my changes. The results are encouraging. All tests passed successfully for both the baseline and modified versions. Furthermore, the preliminary data indicates a compile-time reduction of 5.4% with the modified Clang compared to the original. Caveats and Future WorkHowever, I want to be cautious with these initial findings. It is important to note that this is based on a single test run, and I suspect this performance difference could be attributed to random chance or environmental factors. I am currently busy with my final exams, but as soon as I have time, I plan to conduct more rigorous testing. My plan is to repeat the experiments on my WSL setup and on a separate Mac to obtain more robust and reliable results. Test Environment & ConfigurationTo ensure the test was reproducible, the following configuration was used. Host Compiler:
Build Configuration: # Environment variables were set to ensure optimization
unset CFLAGS
unset CXXFLAGS
unset LDFLAGS
unset LD_LIBRARY_PATH
unset CC
unset CXX
export CFLAGS="-O3 -march=native"
export CXXFLAGS="-O3 -march=native"
# CMake command
cmake -G "Ninja" \
-S /home/***/llvmtestfix/llvm \
-B /home/***/llvmtestfix/build \
-DCMAKE_INSTALL_PREFIX=/home/***/fixtestinstall \
-DCMAKE_INSTALL_RPATH=/home/***/fixtestinstall/lib \
-DLLVM_ENABLE_PROJECTS="clang;lld" \
-DLLVM_ENABLE_RUNTIMES="compiler-rt;libcxx;libcxxabi;libunwind;openmp" \
-DCMAKE_C_COMPILER=/usr/bin/clang \
-DCMAKE_CXX_COMPILER=/usr/bin/clang++ \
-DCMAKE_CXX_COMPILER_TARGET=x86_64-pc-linux-gnu \
-DCMAKE_C_COMPILER_TARGET=x86_64-pc-linux-gnu \
-DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-pc-linux-gnu \
-DCMAKE_LINKER=/usr/bin/ld.lld \
-DLLVM_ENABLE_LLD=ON \
-DLLVM_PROFILE_GENERATE=OFF \
-DLLVM_INCLUDE_DOCS=OFF \
-DLLVM_BUILD_DOCS=OFF \
-DLIBCXX_INSTALL_MODULES=ON \
-DCMAKE_AR=/usr/bin/llvm-ar \
-DCMAKE_RANLIB=/usr/bin/llvm-ranlib \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_BUILD_LLVM_DYLIB=ON \
-DLLVM_LINK_LLVM_DYLIB=ON \
-DLLVM_ENABLE_RTTI=ON \
-DLLVM_ENABLE_EH=ON \
-DLLVM_ENABLE_EXCEPTIONS=ON \
-DLLVM_TARGETS_TO_BUILD="X86" |
You might want to get your branch on https://llvm-compile-time-tracker.com/index.php CC @nikic |
max-rss seems to have an improvement on tramp3d-v4 in multiple configurations, so that's probably not noise: https://llvm-compile-time-tracker.com/compare.php?from=e4060d3beab3b525b49baaa15484e3c595740a2f&to=aace43c7cc60347e1853e55ee7c033a19a9a65ea&stat=max-rss |
It seems that using manual |
Fix #143129
Implement fix as suggested by FIXME