-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[Clang] [Diagnostics] Simplify filenames that contain '..' #143520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang-tools-extra @llvm/pr-subscribers-clang Author: None (Sirraide) ChangesThis can significantly shorten file paths to standard library headers, e.g. on my system, /usr/lib/gcc/x86_64-redhat-linux/15/../../../../include/c++/15/ranges but with this change, we instead print /usr/include/c++/15/ranges This is of course just a heuristic, so there will definitely be paths that get longer as a result of this, but it helps with the standard library, and a lot of the diagnostics we print tend to originate in standard library headers (especially if you include notes listing overload candidates etc.). @AaronBallman pointed out that this might be problematic for network file systems since path resolution might take a while, so this is enabled only for paths that are part of a local filesystem. The file names are cached in Full diff: https://github.com/llvm/llvm-project/pull/143520.diff 2 Files Affected:
diff --git a/clang/include/clang/Frontend/TextDiagnostic.h b/clang/include/clang/Frontend/TextDiagnostic.h
index e2e88d4d648a2..9c77bc3e00e19 100644
--- a/clang/include/clang/Frontend/TextDiagnostic.h
+++ b/clang/include/clang/Frontend/TextDiagnostic.h
@@ -35,6 +35,7 @@ namespace clang {
class TextDiagnostic : public DiagnosticRenderer {
raw_ostream &OS;
const Preprocessor *PP;
+ llvm::StringMap<SmallString<128>> SimplifiedFileNameCache;
public:
TextDiagnostic(raw_ostream &OS, const LangOptions &LangOpts,
diff --git a/clang/lib/Frontend/TextDiagnostic.cpp b/clang/lib/Frontend/TextDiagnostic.cpp
index b9e681b52e509..edbad42b39950 100644
--- a/clang/lib/Frontend/TextDiagnostic.cpp
+++ b/clang/lib/Frontend/TextDiagnostic.cpp
@@ -738,12 +738,20 @@ void TextDiagnostic::printDiagnosticMessage(raw_ostream &OS,
}
void TextDiagnostic::emitFilename(StringRef Filename, const SourceManager &SM) {
-#ifdef _WIN32
- SmallString<4096> TmpFilename;
-#endif
- if (DiagOpts.AbsolutePath) {
- auto File = SM.getFileManager().getOptionalFileRef(Filename);
- if (File) {
+ auto File = SM.getFileManager().getOptionalFileRef(Filename);
+
+ // Try to simplify paths that contain '..' in any case since paths to
+ // standard library headers especially tend to get quite long otherwise.
+ // Only do that for local filesystems though to avoid slowing down
+ // compilation too much.
+ auto AlwaysSimplify = [&] {
+ return File->getName().contains("..") &&
+ llvm::sys::fs::is_local(File->getName());
+ };
+
+ if (File && (DiagOpts.AbsolutePath || AlwaysSimplify())) {
+ SmallString<128> &CacheEntry = SimplifiedFileNameCache[Filename];
+ if (CacheEntry.empty()) {
// We want to print a simplified absolute path, i. e. without "dots".
//
// The hardest part here are the paths like "<part1>/<link>/../<part2>".
@@ -759,15 +767,15 @@ void TextDiagnostic::emitFilename(StringRef Filename, const SourceManager &SM) {
// on Windows we can just use llvm::sys::path::remove_dots(), because,
// on that system, both aforementioned paths point to the same place.
#ifdef _WIN32
- TmpFilename = File->getName();
- llvm::sys::fs::make_absolute(TmpFilename);
- llvm::sys::path::native(TmpFilename);
- llvm::sys::path::remove_dots(TmpFilename, /* remove_dot_dot */ true);
- Filename = StringRef(TmpFilename.data(), TmpFilename.size());
+ CacheEntry = File->getName();
+ llvm::sys::fs::make_absolute(CacheEntry);
+ llvm::sys::path::native(CacheEntry);
+ llvm::sys::path::remove_dots(CacheEntry, /* remove_dot_dot */ true);
#else
- Filename = SM.getFileManager().getCanonicalName(*File);
+ CacheEntry = SM.getFileManager().getCanonicalName(*File);
#endif
}
+ Filename = CacheEntry;
}
OS << Filename;
|
Actually, it just occurred to me that we could just cache whichever path ends up being shorter, the original one or the resolved one. |
I’m not exactly sure how to test this change since this is not only platform-dependent but also path-dependent since we may end up producing absolute paths here. |
The downside to it being in |
I think this is a case where maybe we want to use unit tests. We have |
I mean, that’s also true I suppose; the only thing is then that we’d be normalising them twice if I guess we could always compute both the absolute and the ‘short’ path for a file whenever @AaronBallman Thoughts? |
We definitely don't want to normalize twice. Could we parameterize
I think we could try it to see how it goes in terms of performance. Again, I think I'd be most worried about network builds -- I would expect a measurable different in performance even if there are no diagnostics issued just because we need the file information for |
One idea I just had is we could do something like: enum class DiagnosticFileNameMode {
Unmodified, // As specified by the user
Canonical, // Absolute path
Short, // Whichever is shorter
}
class FileManager {
// ...
StringRef getFileNameForDiagnostic(DiagnosticFileNameMode Mode);
}; And then have separate caches in |
I think that approach makes sense! Thought "short path" means something different to those of us old enough to remember DOS 8.3 filenames. :-D |
Ha, those I’m not planning to add support for thankfully... |
Actually, we could also just put it in |
I’ve done that. Also, |
Ok, I’ve fixed a crash involving a dangling reference and also disabled the check for a local filesystem on windows. It also seems that we do have a single test for this already. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I'm in favor of this patch. Precommit CI found relevant failures that need to be fixed, but I think this is otherwise good to go once those are addressed.
#ifdef _WIN32 | ||
TempBuf = File->getName(); | ||
llvm::sys::fs::make_absolute(TempBuf); | ||
llvm::sys::path::native(TempBuf); | ||
llvm::sys::path::remove_dots(TempBuf, /* remove_dot_dot */ true); | ||
#else | ||
TempBuf = getFileManager().getCanonicalName(*File); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of pre-existing, but I am not sure doing something different for windows actually make sense.
Symlinks on Windows exist, they are just very rare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I was interpreting the comment above this to mean that Windows itself doesn’t care about symlinks when resolving ..
and actually just deletes preceding path segments, but I’m not much of a Windows person so I don’t know to be fair...
if (!SimplifyPath) | ||
return Filename; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be more efficient to check the map first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, not sure. This is a perf/
branch anyway so when I’m done fixing the tests (I think it’s just a clang-tidy test that’s left at this point?) we can just check if there are any regressions and try doing this instead if so.
Ok, looks like the clang-tidy test failure is related to the // Check that `-header-filter` operates on the same file paths as paths in
// diagnostics printed by ClangTidy.
#include "dir1/dir2/../header_alias.h"
// CHECK_HEADER_ALIAS: dir1/dir2/../header_alias.h:1:11: warning: single-argument constructors So, I guess my question is now, should the header filter apply to the original filename, the simplified filename, or both? @AaronBallman Thoughts? Or alternatively who do best ping for clang-tidy related questions? |
I mean, I guess this beacuse it’s what the user specified and it’s what we’re currently doing? It’s just that the end result might be weird, e.g. if a user writes |
CC @5chmidti @PiotrZSL @HerrCai0907 @LegalizeAdulthood for more opinions on this I would naively expect that I'm giving the tool a path, the tool will resolve all symlinks and relative parts, etc for me same as it would do when specifying the file to compile/tidy. |
My expectation would be that if I specify a header filter I'm not going to use weird paths like a/b/../foo.h, but just a/foo.h because that is where foo.h lives. |
What about symlinks though? Would you expect that passing |
In that case WYGIWYD (what you get is what you deserve) applies. Why would relative path simplification care about symlinks? My understanding is that this change simplies a/b/../ to a/ not chasing through arbitrary symlinks. |
Sgtm; I’ll leave the clang-tidy side of things as-is then and just update the test to reflect this. |
Ok; that’s done now. Thanks to that clang-tidy test I’ve also managed to figure out how to writer a proper regression test for this patch (only for linux though to be fair; not sure we support Windows-only tests that create files etc., and symlinks on windows require administrator permissions anyway from what I know). |
Also, looking at the compile-time tracker page for this branch, it doesn’t seem to have any performance impact. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Windows CI failures seem unrelated. |
This can significantly shorten file paths to standard library headers, e.g. on my system,
<ranges>
is currently printed as/usr/lib/gcc/x86_64-redhat-linux/15/../../../../include/c++/15/ranges
but with this change, we instead print
/usr/include/c++/15/ranges
This is of course just a heuristic, so there will definitely be paths that get longer as a result of this, but it helps with the standard library, and a lot of the diagnostics we print tend to originate in standard library headers (especially if you include notes listing overload candidates etc.).Update: We now always use whichever path ends up being shorter.@AaronBallman pointed out that this might be problematic for network file systems since path resolution might take a while, so this is enabled only for paths that are part of a local filesystem.
The file names are cached in
TextDiagnostic
. While we could move it up into e.g.TextDiagnosticPrinter
,DiagnosticsEngine
, or maybe even theFileManager
, to me this seems like something that we mainly care about when printing to the terminal (other diagnostics consumers probably don’t mind receiving the original file path). Moreover, this is already where we handle-fdiagnostics-absolute-paths
or whatever that flag is called again.