-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[LifetimeSafety] Add missing origins stats for lifetime analysis #166568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-temporal-safety Author: None (DEBADRIBASAK) ChangesThis PR adds the implementation for printing missing origin stats for lifetime analysis. Purpose: This capability is added to track the expression types with missing origin. While retrieving the origins from origin manager, some expressions show missing origins. Currently these are created on the fly using getOrCreate function. For analysing the coverage of the check, it will be necessary to see what kind of expressions have a missing origin. It prints the counts in this form: Approach:
Example output: For the file llvm-project/llvm/lib/Demangle/Demangle.cpp: Full diff: https://github.com/llvm/llvm-project/pull/166568.diff 6 Files Affected:
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
index 91ffbb169f947..eb532bc8be3a7 100644
--- a/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h
@@ -23,7 +23,11 @@
#include "clang/Analysis/Analyses/LifetimeSafety/Facts.h"
#include "clang/Analysis/Analyses/LifetimeSafety/LiveOrigins.h"
#include "clang/Analysis/Analyses/LifetimeSafety/LoanPropagation.h"
+#include "clang/Analysis/Analyses/LifetimeSafety/Origins.h"
#include "clang/Analysis/AnalysisDeclContext.h"
+#include "llvm/ADT/StringMap.h"
+#include "llvm/Support/raw_ostream.h"
+#include <string>
namespace clang::lifetimes {
@@ -44,10 +48,6 @@ class LifetimeSafetyReporter {
Confidence Confidence) {}
};
-/// The main entry point for the analysis.
-void runLifetimeSafetyAnalysis(AnalysisDeclContext &AC,
- LifetimeSafetyReporter *Reporter);
-
namespace internal {
/// An object to hold the factories for immutable collections, ensuring
/// that all created states share the same underlying memory management.
@@ -60,6 +60,7 @@ struct LifetimeFactory {
/// Running the lifetime safety analysis and querying its results. It
/// encapsulates the various dataflow analyses.
class LifetimeSafetyAnalysis {
+
public:
LifetimeSafetyAnalysis(AnalysisDeclContext &AC,
LifetimeSafetyReporter *Reporter);
@@ -82,6 +83,12 @@ class LifetimeSafetyAnalysis {
std::unique_ptr<LoanPropagationAnalysis> LoanPropagation;
};
} // namespace internal
+
+/// The main entry point for the analysis.
+std::unique_ptr<internal::LifetimeSafetyAnalysis>
+runLifetimeSafetyAnalysis(AnalysisDeclContext &AC,
+ LifetimeSafetyReporter *Reporter);
+
} // namespace clang::lifetimes
#endif // LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIMESAFETY_H
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
index ba138b078b379..26686a63e9204 100644
--- a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Origins.h
@@ -16,7 +16,10 @@
#include "clang/AST/Decl.h"
#include "clang/AST/Expr.h"
+#include "clang/AST/TypeBase.h"
#include "clang/Analysis/Analyses/LifetimeSafety/Utils.h"
+#include "llvm/ADT/StringMap.h"
+#include "llvm/Support/raw_ostream.h"
namespace clang::lifetimes::internal {
@@ -76,6 +79,8 @@ class OriginManager {
void dump(OriginID OID, llvm::raw_ostream &OS) const;
+ const llvm::StringMap<unsigned> getMissingOrigins() const;
+
private:
OriginID getNextOriginID() { return NextOriginID++; }
@@ -85,6 +90,7 @@ class OriginManager {
llvm::SmallVector<Origin> AllOrigins;
llvm::DenseMap<const clang::ValueDecl *, OriginID> DeclToOriginID;
llvm::DenseMap<const clang::Expr *, OriginID> ExprToOriginID;
+ llvm::StringMap<unsigned> ExprTypeToMissingOriginCount;
};
} // namespace clang::lifetimes::internal
diff --git a/clang/include/clang/Sema/AnalysisBasedWarnings.h b/clang/include/clang/Sema/AnalysisBasedWarnings.h
index 4103c3f006a8f..604039ef61cb7 100644
--- a/clang/include/clang/Sema/AnalysisBasedWarnings.h
+++ b/clang/include/clang/Sema/AnalysisBasedWarnings.h
@@ -14,7 +14,10 @@
#define LLVM_CLANG_SEMA_ANALYSISBASEDWARNINGS_H
#include "clang/AST/Decl.h"
+#include "clang/Analysis/Analyses/LifetimeSafety/Facts.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/StringMap.h"
#include <memory>
namespace clang {
@@ -95,6 +98,9 @@ class AnalysisBasedWarnings {
/// a single function.
unsigned MaxUninitAnalysisBlockVisitsPerFunction;
+ /// Map from expressions missing origin in OriginManager to their counts.
+ llvm::StringMap<unsigned> MissingOriginCount;
+
/// @}
public:
@@ -116,6 +122,9 @@ class AnalysisBasedWarnings {
Policy &getPolicyOverrides() { return PolicyOverrides; }
void PrintStats() const;
+
+ void FindMissingOrigins(AnalysisDeclContext &AC,
+ clang::lifetimes::internal::FactManager &FactMgr);
};
} // namespace sema
diff --git a/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp b/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp
index 00c7ed90503e7..d183ce976f946 100644
--- a/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp
+++ b/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp
@@ -23,9 +23,11 @@
#include "clang/Analysis/AnalysisDeclContext.h"
#include "clang/Analysis/CFG.h"
#include "llvm/ADT/FoldingSet.h"
+#include "llvm/ADT/StringMap.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/TimeProfiler.h"
+#include "llvm/Support/raw_ostream.h"
#include <memory>
namespace clang::lifetimes {
@@ -69,9 +71,12 @@ void LifetimeSafetyAnalysis::run() {
}
} // namespace internal
-void runLifetimeSafetyAnalysis(AnalysisDeclContext &AC,
- LifetimeSafetyReporter *Reporter) {
- internal::LifetimeSafetyAnalysis Analysis(AC, Reporter);
- Analysis.run();
+std::unique_ptr<internal::LifetimeSafetyAnalysis>
+runLifetimeSafetyAnalysis(AnalysisDeclContext &AC,
+ LifetimeSafetyReporter *Reporter) {
+ std::unique_ptr<internal::LifetimeSafetyAnalysis> Analysis =
+ std::make_unique<internal::LifetimeSafetyAnalysis>(AC, Reporter);
+ Analysis->run();
+ return Analysis;
}
} // namespace clang::lifetimes
diff --git a/clang/lib/Analysis/LifetimeSafety/Origins.cpp b/clang/lib/Analysis/LifetimeSafety/Origins.cpp
index ea51a75324e06..453abf48261c2 100644
--- a/clang/lib/Analysis/LifetimeSafety/Origins.cpp
+++ b/clang/lib/Analysis/LifetimeSafety/Origins.cpp
@@ -7,6 +7,9 @@
//===----------------------------------------------------------------------===//
#include "clang/Analysis/Analyses/LifetimeSafety/Origins.h"
+#include "clang/AST/Expr.h"
+#include "clang/AST/TypeBase.h"
+#include "llvm/ADT/StringMap.h"
namespace clang::lifetimes::internal {
@@ -22,6 +25,10 @@ void OriginManager::dump(OriginID OID, llvm::raw_ostream &OS) const {
OS << ")";
}
+const llvm::StringMap<unsigned> OriginManager::getMissingOrigins() const {
+ return ExprTypeToMissingOriginCount;
+}
+
Origin &OriginManager::addOrigin(OriginID ID, const clang::ValueDecl &D) {
AllOrigins.emplace_back(ID, &D);
return AllOrigins.back();
@@ -37,6 +44,17 @@ OriginID OriginManager::get(const Expr &E) {
auto It = ExprToOriginID.find(&E);
if (It != ExprToOriginID.end())
return It->second;
+
+ // if the expression has no specific origin, increment the missing origin
+ // counter.
+ std::string ExprStr(E.getStmtClassName());
+ ExprStr = ExprStr + "<" + E.getType().getAsString() + ">";
+ auto CountIt = ExprTypeToMissingOriginCount.find(ExprStr);
+ if (CountIt == ExprTypeToMissingOriginCount.end()) {
+ ExprTypeToMissingOriginCount[ExprStr] = 1;
+ } else {
+ CountIt->second++;
+ }
// If the expression itself has no specific origin, and it's a reference
// to a declaration, its origin is that of the declaration it refers to.
// For pointer types, where we don't pre-emptively create an origin for the
diff --git a/clang/lib/Sema/AnalysisBasedWarnings.cpp b/clang/lib/Sema/AnalysisBasedWarnings.cpp
index 140b709dbb651..77d2013ff3a93 100644
--- a/clang/lib/Sema/AnalysisBasedWarnings.cpp
+++ b/clang/lib/Sema/AnalysisBasedWarnings.cpp
@@ -29,7 +29,9 @@
#include "clang/Analysis/Analyses/CFGReachabilityAnalysis.h"
#include "clang/Analysis/Analyses/CalledOnceCheck.h"
#include "clang/Analysis/Analyses/Consumed.h"
+#include "clang/Analysis/Analyses/LifetimeSafety/Facts.h"
#include "clang/Analysis/Analyses/LifetimeSafety/LifetimeSafety.h"
+#include "clang/Analysis/Analyses/LifetimeSafety/Origins.h"
#include "clang/Analysis/Analyses/ReachableCode.h"
#include "clang/Analysis/Analyses/ThreadSafety.h"
#include "clang/Analysis/Analyses/UninitializedValues.h"
@@ -52,7 +54,9 @@
#include "llvm/ADT/STLFunctionalExtras.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Casting.h"
#include "llvm/Support/Debug.h"
+#include "llvm/Support/raw_ostream.h"
#include <algorithm>
#include <deque>
#include <iterator>
@@ -3065,7 +3069,11 @@ void clang::sema::AnalysisBasedWarnings::IssueWarnings(
if (EnableLifetimeSafetyAnalysis && S.getLangOpts().CPlusPlus) {
if (AC.getCFG()) {
lifetimes::LifetimeSafetyReporterImpl LifetimeSafetyReporter(S);
- lifetimes::runLifetimeSafetyAnalysis(AC, &LifetimeSafetyReporter);
+ std::unique_ptr<clang::lifetimes::internal::LifetimeSafetyAnalysis>
+ Analysis =
+ lifetimes::runLifetimeSafetyAnalysis(AC, &LifetimeSafetyReporter);
+ if (S.CollectStats)
+ FindMissingOrigins(AC, Analysis->getFactManager());
}
}
// Check for violations of "called once" parameter properties.
@@ -3131,9 +3139,27 @@ void clang::sema::AnalysisBasedWarnings::IssueWarnings(
}
}
+void clang::sema::AnalysisBasedWarnings::FindMissingOrigins(
+ AnalysisDeclContext &AC, lifetimes::internal::FactManager &FactMgr) {
+ if (AC.getCFG()) {
+ for (const auto &[expr, count] :
+ FactMgr.getOriginMgr().getMissingOrigins()) {
+ MissingOriginCount[expr] += count;
+ }
+ }
+}
+
void clang::sema::AnalysisBasedWarnings::PrintStats() const {
+ llvm::errs() << "\n*** LifetimeSafety Missing Origin Stats "
+ "(expression_type : count) :\n\n";
+ unsigned totalMissingOrigins = 0;
+ for (const auto &[expr, count] : MissingOriginCount) {
+ llvm::errs() << expr << " : " << count << '\n';
+ totalMissingOrigins += count;
+ }
+ llvm::errs() << "Total missing origins: " << totalMissingOrigins << "\n";
+ llvm::errs() << "\n****************************************\n";
llvm::errs() << "\n*** Analysis Based Warnings Stats:\n";
-
unsigned NumCFGsBuilt = NumFunctionsAnalyzed - NumFunctionsWithBadCFGs;
unsigned AvgCFGBlocksPerFunction =
!NumCFGsBuilt ? 0 : NumCFGBlocks/NumCFGsBuilt;
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this.
One major comment is that this implementation exposes many internal:: details of lifetime safety to AnalysisBasedWarnings.h.
Layering-wise I propose the following changes:
- Define
struct LifetimeSafetyStats { llvm::StringMap<unsigned> MissingOriginCount; }inLifetimeSafety.h - Have an instance of
LifetimeSafetyStatsas a field ofAnalysisBasedWarningsinstead ofMissingOriginCount. - Change
runLifetimeSafetyAnalysisto accept this stats instance which it can modify if we have collection turned on:
void runLifetimeSafetyAnalysis(AnalysisDeclContext &AC,
LifetimeSafetyReporter *Reporter, LifetimeSafetyStats& Stats, bool CollectStats)- Also add a
void clang::lifetimes::PrintStats(const LifetimeSafetyStats& Stats)in LifetimeSafety.h/.cpp inclang::lifetimesnamespace which prints the stats tolvm::errs(). - Call
clang::lifetimes::PrintStats(Stats)fromAnalysisBasedWarnings::PrintStats().
…and change the signature of runLifetimeSafetyAnalysis to not return the analysis object
usx95
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. This looks much better now.
One remaining concern I have is that we might be missing to record some missingOrigins here. Suggested a more black box collection style using RecursiveASTVisitor. Also it is important to not do this when -print-stats is disabled.
| auto CountIt = ExprTypeToMissingOriginCount.find(ExprStr); | ||
| if (CountIt == ExprTypeToMissingOriginCount.end()) { | ||
| ExprTypeToMissingOriginCount[ExprStr] = 1; | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| // utility function to print missing origin stats. | ||
| void PrintStats(const LifetimeSafetyStats &Stats); | ||
|
|
||
| /// The main entry point for the analysis. | ||
| void runLifetimeSafetyAnalysis(AnalysisDeclContext &AC, | ||
| LifetimeSafetyReporter *Reporter, | ||
| LifetimeSafetyStats &Stats, bool CollectStats); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please keep the public/non-internal functions to the top and before the internal namespace.
| } // namespace internal | ||
|
|
||
| // utility function to print missing origin stats. | ||
| void PrintStats(const LifetimeSafetyStats &Stats); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Name starts with lower case printStats .
https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly
| #include "clang/Analysis/Analyses/LifetimeSafety/Origins.h" | ||
| #include "clang/Analysis/AnalysisDeclContext.h" | ||
| #include "llvm/ADT/StringMap.h" | ||
| #include "llvm/Support/raw_ostream.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this looks like an unused header
| /// Running the lifetime safety analysis and querying its results. It | ||
| /// encapsulates the various dataflow analyses. | ||
| class LifetimeSafetyAnalysis { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unintentional new line ?
| void clang::sema::AnalysisBasedWarnings::PrintStats() const { | ||
| llvm::errs() << "\n*** Analysis Based Warnings Stats:\n"; | ||
|
|
||
| clang::lifetimes::PrintStats(LSStats); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go after llvm::errs() << "\n*** Analysis Based Warnings Stats:\n";. Maybe move to the end of the function.
| #include "llvm/Support/Casting.h" | ||
| #include "llvm/Support/Debug.h" | ||
| #include "llvm/Support/raw_ostream.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: These two look unrelated to changes in this file (although could be not unused).
| /// a single function. | ||
| unsigned MaxUninitAnalysisBlockVisitsPerFunction; | ||
|
|
||
| /// Map from expressions missing origin in OriginManager to their counts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep this documentation generic and not specific to missing origins.
Suggestion:
/// Statistics collected during lifetime safety analysis.
/// These are accumulated across all analyzed functions and printed
/// when -print-stats is enabled.
| LifetimeSafetyStats &Stats, bool CollectStats) { | ||
| std::unique_ptr<internal::LifetimeSafetyAnalysis> Analysis = | ||
| std::make_unique<internal::LifetimeSafetyAnalysis>(AC, Reporter); | ||
| Analysis->run(); | ||
| for (const auto &[expr, count] : | ||
| Analysis->getFactManager().getOriginMgr().getMissingOrigins()) { | ||
| Stats.MissingOriginCount[expr] += count; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool CollectStats param is unused. We should be collecting the expensive statistitcs only when required and not in the default mode.
I would suggest to move this to another function say LifetimeSafetyAnalysis::collectStats(LifetimeSafetyStats &).
So it should look like:
void runLifetimeSafetyAnalysis(AnalysisDeclContext &AC,
LifetimeSafetyReporter *Reporter,
LifetimeSafetyStats &Stats, bool CollectStats) {
internal::LifetimeSafetyAnalysis Analysis(AC, Reporter);
Analysis.run();
if (CollectStats)
Analysis.collectStats(Stats);
}| // if the expression has no specific origin, increment the missing origin | ||
| // counter. | ||
| std::string ExprStr(E.getStmtClassName()); | ||
| ExprStr = ExprStr + "<" + E.getType().getAsString() + ">"; | ||
| auto CountIt = ExprTypeToMissingOriginCount.find(ExprStr); | ||
| if (CountIt == ExprTypeToMissingOriginCount.end()) { | ||
| ExprTypeToMissingOriginCount[ExprStr] = 1; | ||
| } else { | ||
| CountIt->second++; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of concerns and suggestions:
- This is quite expensive to do as it involves serialising expressions as
strings. This should be done only when required. - This captures the expressions which we tried to
getbut could not. This misses those expressions which we never calledgetas well. - I would suggest to collect this in a black-box fashion. We do this collection at the end of the analysis (and not during the analysis). We have something like
OriginManager::collectMissingOrigins(Stmt* FunctionBody, LifetimeSafetyStats &Stats). This can be called fromLifetimeSafetyAnalysis::collectStats(LifetimeSafetyStats &). You can get the function body from `AC.getBody(). - Now in
OriginManager::collectMissingOrigins, we can use aRecursiveASTVisitorto visit all expressions in the function body, and for each expression, if it shouldhasOriginthen we can query the origins to see if we have it or not. - (you can inline
hasOriginfrom FactsGenerator.cpp at the moment. at this point this is justE->isGLValue() || isPointerType(E->getType()))
Some AI-suggested implementation:
// Visitor to collect missing origins
class MissingOriginCollector
: public RecursiveASTVisitor<MissingOriginCollector> {
public:
MissingOriginCollector(const OriginManager &OM,
llvm::StringMap<unsigned> &MissingCount)
: OriginMgr(OM), MissingOriginCount(MissingCount) {}
bool VisitExpr(const Expr *E) {
// Skip if this expression doesn't need an origin
if (!hasOrigin(E))
return true;
// Check if we have an origin for this expression
auto It = OriginMgr.ExprToOriginID.find(E);
if (It == OriginMgr.ExprToOriginID.end()) {
// No origin found - count this as missing
std::string ExprStr = std::string(E->getStmtClassName()) + "<" +
E->getType().getAsString() + ">";
MissingOriginCount[ExprStr]++;
}
return true;
}
private:
const OriginManager &OriginMgr;
llvm::StringMap<unsigned> &MissingOriginCount;
};
} // anonymous namespace
void OriginManager::collectMissingOrigins(
const Stmt *FunctionBody,
LifetimeSafetyStats &Stats) const {
if (!FunctionBody)
return;
MissingOriginCollector Collector(*this, Stats.MissingOriginCount);
Collector.TraverseStmt(const_cast<Stmt *>(FunctionBody));
}
This PR adds the implementation for printing missing origin stats for lifetime analysis.
Purpose:
This capability is added to track the expression types with missing origin. While retrieving the origins from origin manager, some expressions show missing origins. Currently these are created on the fly using getOrCreate function. For analysing the coverage of the check, it will be necessary to see what kind of expressions have a missing origin. It prints the counts in this form:
StmtClassName<Type> : countApproach:
Example output:
For the file llvm-project/llvm/lib/Demangle/Demangle.cpp: