Skip to content

Commit

Permalink
[clang-format] Optimize processing .clang-format-ignore files (#76733)
Browse files Browse the repository at this point in the history
Reuse the patterns governing the previous input file being formatted if
the current input file is from the same directory.
  • Loading branch information
owenca committed Jan 4, 2024
1 parent 6ae7f66 commit 42ec976
Show file tree
Hide file tree
Showing 3 changed files with 69 additions and 27 deletions.
6 changes: 5 additions & 1 deletion clang/docs/ClangFormat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,9 @@ An easy way to create the ``.clang-format`` file is:
Available style options are described in :doc:`ClangFormatStyleOptions`.

.clang-format-ignore
====================

You can create ``.clang-format-ignore`` files to make ``clang-format`` ignore
certain files. A ``.clang-format-ignore`` file consists of patterns of file path
names. It has the following format:
Expand All @@ -141,7 +144,8 @@ names. It has the following format:
* A non-comment line is a single pattern.
* The slash (``/``) is used as the directory separator.
* A pattern is relative to the directory of the ``.clang-format-ignore`` file
(or the root directory if the pattern starts with a slash).
(or the root directory if the pattern starts with a slash). Patterns
containing drive names (e.g. ``C:``) are not supported.
* Patterns follow the rules specified in `POSIX 2.13.1, 2.13.2, and Rule 1 of
2.13.3 <https://pubs.opengroup.org/onlinepubs/9699919799/utilities/
V3_chap02.html#tag_18_13>`_.
Expand Down
25 changes: 19 additions & 6 deletions clang/test/Format/clang-format-ignore.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,26 @@

// RUN: touch .clang-format-ignore
// RUN: clang-format -verbose foo.c foo.js 2> %t.stderr
// RUN: grep "Formatting \[1/2] foo.c" %t.stderr
// RUN: grep "Formatting \[2/2] foo.js" %t.stderr
// RUN: grep -Fx "Formatting [1/2] foo.c" %t.stderr
// RUN: grep -Fx "Formatting [2/2] foo.js" %t.stderr

// RUN: echo "*.js" > .clang-format-ignore
// RUN: clang-format -verbose foo.c foo.js 2> %t.stderr
// RUN: grep "Formatting \[1/2] foo.c" %t.stderr
// RUN: not grep "Formatting \[2/2] foo.js" %t.stderr
// RUN: grep -Fx "Formatting [1/2] foo.c" %t.stderr
// RUN: not grep -F foo.js %t.stderr

// RUN: cd ../../..
// RUN: rm -rf %t.dir
// RUN: cd ../..
// RUN: clang-format -verbose *.cc level1/*.c* level1/level2/foo.* 2> %t.stderr
// RUN: grep -x "Formatting \[1/5] .*foo\.c" %t.stderr
// RUN: not grep -F foo.js %t.stderr

// RUN: rm .clang-format-ignore
// RUN: clang-format -verbose *.cc level1/*.c* level1/level2/foo.* 2> %t.stderr
// RUN: grep -x "Formatting \[1/5] .*foo\.cc" %t.stderr
// RUN: grep -x "Formatting \[2/5] .*bar\.cc" %t.stderr
// RUN: grep -x "Formatting \[3/5] .*baz\.c" %t.stderr
// RUN: grep -x "Formatting \[4/5] .*foo\.c" %t.stderr
// RUN: not grep -F foo.js %t.stderr

// RUN: cd ..
// RUN: rm -r %t.dir
65 changes: 45 additions & 20 deletions clang/tools/clang-format/ClangFormat.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -571,6 +571,11 @@ static int dumpConfig(bool IsSTDIN) {
return 0;
}

using String = SmallString<128>;
static String IgnoreDir; // Directory of .clang-format-ignore file.
static StringRef PrevDir; // Directory of previous `FilePath`.
static SmallVector<String> Patterns; // Patterns in .clang-format-ignore file.

// Check whether `FilePath` is ignored according to the nearest
// .clang-format-ignore file based on the rules below:
// - A blank line is skipped.
Expand All @@ -586,45 +591,65 @@ static bool isIgnored(StringRef FilePath) {
if (!is_regular_file(FilePath))
return false;

using namespace llvm::sys::path;
SmallString<128> Path, AbsPath{FilePath};
String Path;
String AbsPath{FilePath};

using namespace llvm::sys::path;
make_absolute(AbsPath);
remove_dots(AbsPath, /*remove_dot_dot=*/true);

StringRef IgnoreDir{AbsPath};
do {
IgnoreDir = parent_path(IgnoreDir);
if (IgnoreDir.empty())
if (StringRef Dir{parent_path(AbsPath)}; PrevDir != Dir) {
PrevDir = Dir;

for (;;) {
Path = Dir;
append(Path, ".clang-format-ignore");
if (is_regular_file(Path))
break;
Dir = parent_path(Dir);
if (Dir.empty())
return false;
}

IgnoreDir = convert_to_slash(Dir);

std::ifstream IgnoreFile{Path.c_str()};
if (!IgnoreFile.good())
return false;

Path = IgnoreDir;
append(Path, ".clang-format-ignore");
} while (!is_regular_file(Path));
Patterns.clear();

std::ifstream IgnoreFile{Path.c_str()};
if (!IgnoreFile.good())
return false;
for (std::string Line; std::getline(IgnoreFile, Line);) {
if (const auto Pattern{StringRef{Line}.trim()};
// Skip empty and comment lines.
!Pattern.empty() && Pattern[0] != '#') {
Patterns.push_back(Pattern);
}
}
}

const auto Pathname = convert_to_slash(AbsPath);
for (std::string Line; std::getline(IgnoreFile, Line);) {
auto Pattern = StringRef(Line).trim();
if (Pattern.empty() || Pattern[0] == '#')
continue;
if (IgnoreDir.empty())
return false;

const bool IsNegated = Pattern[0] == '!';
const auto Pathname{convert_to_slash(AbsPath)};
for (const auto &Pat : Patterns) {
const bool IsNegated = Pat[0] == '!';
StringRef Pattern{Pat};
if (IsNegated)
Pattern = Pattern.drop_front();

if (Pattern.empty())
continue;

Pattern = Pattern.ltrim();

// `Pattern` is relative to `IgnoreDir` unless it starts with a slash.
// This doesn't support patterns containing drive names (e.g. `C:`).
if (Pattern[0] != '/') {
Path = convert_to_slash(IgnoreDir);
Path = IgnoreDir;
append(Path, Style::posix, Pattern);
remove_dots(Path, /*remove_dot_dot=*/true, Style::posix);
Pattern = Path.str();
Pattern = Path;
}

if (clang::format::matchFilePath(Pattern, Pathname) == !IsNegated)
Expand Down

4 comments on commit 42ec976

@hctim
Copy link
Collaborator

@hctim hctim commented on 42ec976 Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, looks like this broke Format/clang-format-ignore.cpp on most of the sanitizer buildbots (https://lab.llvm.org/buildbot/#/builders/5/builds/39794). Some of the buildbots were already broken due to 53edf12, so it's easy to miss the notification.

On our aarch64-hwasan bot as well (https://lab.llvm.org/buildbot/#/builders/236/builds/8430), looks like additionally Clang :: Format/multiple-inputs-inplace.cpp was broken.

Snippet below:

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80
FAIL: Clang :: Format/clang-format-ignore.cpp (27935 of 79817)
******************** TEST 'Clang :: Format/clang-format-ignore.cpp' FAILED ********************
Exit Code: 1
Command Output (stderr):
--
RUN: at line 1: rm -rf /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Format/Output/clang-format-ignore.cpp.tmp.dir
+ rm -rf /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Format/Output/clang-format-ignore.cpp.tmp.dir
RUN: at line 2: mkdir -p /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Format/Output/clang-format-ignore.cpp.tmp.dir/level1/level2
+ mkdir -p /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Format/Output/clang-format-ignore.cpp.tmp.dir/level1/level2
RUN: at line 4: cd /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Format/Output/clang-format-ignore.cpp.tmp.dir
+ cd /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Format/Output/clang-format-ignore.cpp.tmp.dir
RUN: at line 5: echo "*" > .clang-format-ignore
+ echo '*'
RUN: at line 6: echo "level*/*.c*" >> .clang-format-ignore
+ echo 'level*/*.c*'
RUN: at line 7: echo "*/*2/foo.*" >> .clang-format-ignore
+ echo '*/*2/foo.*'
RUN: at line 8: touch foo.cc
+ touch foo.cc
RUN: at line 9: /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/clang-format -verbose .clang-format-ignore foo.cc 2> /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/tools/clang/test/Format/Output/clang-format-ignore.cpp.tmp.stderr
+ /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/clang-format -verbose .clang-format-ignore foo.cc
--
********************

@hctim
Copy link
Collaborator

@hctim hctim commented on 42ec976 Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed that the runlines aren't particularly descriptive because stderr is captured to a file. Here's how I reproduced it locally, and the result is below:

$ cmake \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DLLVM_USE_LINKER=lld \
-GNinja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_FLAGS="-fsanitize=address" \
-DCMAKE_CXX_FLAGS="-fsanitize=address" \
-DLLVM_ENABLE_PROJECTS="'clang;lld;clang-tools-extra;mlir'" \
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi" \
-DLLVM_LIBC_ENABLE_LINTING=OFF \
-DLLVM_USE_SANITIZER=Address \
-DLLVM_ENABLE_ASSERTIONS=On \
path/to/llvm/llvm

$ ninja check-clang-format

$ cat tools/clang/test/Format/Output/clang-format-ignore.cpp.tmp.stderr
=================================================================
==177235==ERROR: AddressSanitizer: heap-use-after-free on address 0x5120000001c0 at pc 0x559e0b23ed08 bp 0x7ffd4fcf5b80 sp 0x7ffd4fcf5320
READ of size 113 at 0x5120000001c0 thread T0
    #0 0x559e0b23ed07 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long) /usr/local/google/home/mitchp/llvm/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:810:7
    #1 0x559e0b23f2f9 in bcmp /usr/local/google/home/mitchp/llvm/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:856:10
    #2 0x559e0b3033f1 in compareMemory /usr/local/google/home/mitchp/llvm/llvm/include/llvm/ADT/StringRef.h:69:14
    #3 0x559e0b3033f1 in equals /usr/local/google/home/mitchp/llvm/llvm/include/llvm/ADT/StringRef.h:166:15
    #4 0x559e0b3033f1 in operator== /usr/local/google/home/mitchp/llvm/llvm/include/llvm/ADT/StringRef.h:878:16
    #5 0x559e0b3033f1 in operator!= /usr/local/google/home/mitchp/llvm/llvm/include/llvm/ADT/StringRef.h:881:71
    #6 0x559e0b3033f1 in isIgnored /usr/local/google/home/mitchp/llvm/clang/tools/clang-format/ClangFormat.cpp:601:52
    #7 0x559e0b3033f1 in main /usr/local/google/home/mitchp/llvm/clang/tools/clang-format/ClangFormat.cpp:711:21
    #8 0x7f3d8c0456c9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #9 0x7f3d8c045784 in __libc_start_main csu/../csu/libc-start.c:360:3
    #10 0x559e0b224450 in _start (/usr/local/google/home/mitchp/llvm-build/asan-test/bin/clang-format+0x392450)

0x5120000001c0 is located 0 bytes inside of 257-byte region [0x5120000001c0,0x5120000002c1)
freed by thread T0 here:
    #0 0x559e0b2bfdb6 in free /usr/local/google/home/mitchp/llvm/compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3
    #1 0x559e0b30699c in ~SmallVectorImpl /usr/local/google/home/mitchp/llvm/llvm/include/llvm/ADT/SmallVector.h:608:7
    #2 0x559e0b30699c in ~SmallVector /usr/local/google/home/mitchp/llvm/llvm/include/llvm/ADT/SmallVector.h:1207:3
    #3 0x559e0b30699c in isIgnored /usr/local/google/home/mitchp/llvm/clang/tools/clang-format/ClangFormat.cpp:660:1
    #4 0x559e0b30699c in main /usr/local/google/home/mitchp/llvm/clang/tools/clang-format/ClangFormat.cpp:711:21
    #5 0x7f3d8c0456c9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

previously allocated by thread T0 here:
    #0 0x559e0b2c005e in malloc /usr/local/google/home/mitchp/llvm/compiler-rt/lib/asan/asan_malloc_linux.cpp:69:3
    #1 0x559e0b37be3f in safe_malloc /usr/local/google/home/mitchp/llvm/llvm/include/llvm/Support/MemAlloc.h:26:18
    #2 0x559e0b37be3f in llvm::SmallVectorBase<unsigned long>::grow_pod(void*, unsigned long, unsigned long) /usr/local/google/home/mitchp/llvm/llvm/lib/Support/SmallVector.cpp:143:15
    #3 0x559e0b41f82e in grow_pod /usr/local/google/home/mitchp/llvm/llvm/include/llvm/ADT/SmallVector.h:141:11
    #4 0x559e0b41f82e in grow /usr/local/google/home/mitchp/llvm/llvm/include/llvm/ADT/SmallVector.h:529:41
    #5 0x559e0b41f82e in reserve /usr/local/google/home/mitchp/llvm/llvm/include/llvm/ADT/SmallVector.h:669:13
    #6 0x559e0b41f82e in llvm::SmallVectorImpl<char>::swap(llvm::SmallVectorImpl<char>&) /usr/local/google/home/mitchp/llvm/llvm/include/llvm/ADT/SmallVector.h:982:9
    #7 0x559e0b427ceb in llvm::sys::fs::make_absolute(llvm::Twine const&, llvm::SmallVectorImpl<char>&) /usr/local/google/home/mitchp/llvm/llvm/lib/Support/Path.cpp:928:10
    #8 0x559e0b4288dd in llvm::sys::fs::make_absolute(llvm::SmallVectorImpl<char>&) /usr/local/google/home/mitchp/llvm/llvm/lib/Support/Path.cpp:965:3
    #9 0x559e0b303381 in isIgnored /usr/local/google/home/mitchp/llvm/clang/tools/clang-format/ClangFormat.cpp:598:3
    #10 0x559e0b303381 in main /usr/local/google/home/mitchp/llvm/clang/tools/clang-format/ClangFormat.cpp:711:21
    #11 0x7f3d8c0456c9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

SUMMARY: AddressSanitizer: heap-use-after-free /usr/local/google/home/mitchp/llvm/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:810:7 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long)
Shadow bytes around the buggy address:
  0x511fffffff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x511fffffff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x512000000000: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x512000000080: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x512000000100: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
=>0x512000000180: fa fa fa fa fa fa fa fa[fd]fd fd fd fd fd fd fd
  0x512000000200: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x512000000280: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
  0x512000000300: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x512000000380: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x512000000400: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==177235==ABORTING

@hctim
Copy link
Collaborator

@hctim hctim commented on 42ec976 Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static StringRef PrevDir is a stringref to Dir (:602), which itself is a reference to the function-scope String AbsPath.

The first time the function gets called, PrevDir is an empty reference. Then, it's set to a reference to the function scope AbsPath, then the function finishes up and PrevDir is now dangling.

The second time the function gets called, PrevDir gets dereferenced, and is a use-after-scope (a use-after-free in this case as the backing string is heap-allocated).

@owenca
Copy link
Contributor Author

@owenca owenca commented on 42ec976 Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hctim thanks a lot! I should have known better. I have relanded the patch as b53628a after changing the type of PrevDir to String (i.e. SmallString<128>).

Please sign in to comment.