Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SpecialCaseList] Use glob by default #74809

Merged
merged 2 commits into from
Dec 11, 2023
Merged

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Dec 8, 2023

https://reviews.llvm.org/D154014 addes glob support and enables it when
#!special-case-list-v2 is the first line. This patch makes the glob
support the default (faster than regex after
https://reviews.llvm.org/D156046) and switches to the deprecated regex
support if #!special-case-list-v1 is the first line.

I have surveyed many ignore lists. All ignore lists I find only use
basic * . and don't use regex metacharacters such as ( and ).
(As neither src: nor fun: benefits from using regex.)
They are unaffected by the transition (with a caution that regex
src:x/a.pb.* matches x/axpbx but glob src:x/a.pb.* doesn't).

There is no deprecating warning. If a user finds
#!special-case-list-v1, they shall read that the old syntax is
deprecated.

Link: https://discourse.llvm.org/t/use-glob-instead-of-regex-for-specialcaselists/71666

@llvmbot llvmbot added clang Clang issues not falling into any other category llvm:support labels Dec 8, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Dec 8, 2023

@llvm/pr-subscribers-llvm-support

@llvm/pr-subscribers-clang

Author: Fangrui Song (MaskRay)

Changes

https://reviews.llvm.org/D154014 addes glob support and enables it when
#!special-case-list-v2 is the first line. This patch makes the glob
support the default (faster than regex after
https://reviews.llvm.org/D156046) and switches to the deprecated regex
support if #!special-case-list-v1 is the first line.

I have surveyed many ignore lists. All ignore lists I find only use
basic * . and don't use regex metacharacters such as ( and ).
(As neither src: nor fun: benefits from using regex.)
They are unaffected by the transition (with a caution that regex
src:x/a.pb.* matches x/axpbx but glob src:x/a.pb.* doesn't).

There is no deprecating warning. If a user finds
#!special-case-list-v1, they shall read that the old syntax is
deprecated.

Link: https://discourse.llvm.org/t/use-glob-instead-of-regex-for-specialcaselists/71666


Full diff: https://github.com/llvm/llvm-project/pull/74809.diff

3 Files Affected:

  • (modified) clang/docs/SanitizerSpecialCaseList.rst (+11-7)
  • (modified) llvm/lib/Support/SpecialCaseList.cpp (+5-6)
  • (modified) llvm/unittests/Support/SpecialCaseListTest.cpp (+4-4)
diff --git a/clang/docs/SanitizerSpecialCaseList.rst b/clang/docs/SanitizerSpecialCaseList.rst
index ab39276b04395..c7fb0fa3f8a82 100644
--- a/clang/docs/SanitizerSpecialCaseList.rst
+++ b/clang/docs/SanitizerSpecialCaseList.rst
@@ -56,13 +56,18 @@ and lines starting with "#" are ignored.
 
 .. note::
 
-  In `D154014 <https://reviews.llvm.org/D154014>`_ we transitioned to using globs instead
-  of regexes to match patterns in special case lists. Since this was a
-  breaking change, we will temporarily support the original behavior using
-  regexes. If ``#!special-case-list-v2`` is the first line of the file, then
-  we will use the new behavior using globs. For more details, see
-  `this discourse post <https://discourse.llvm.org/t/use-glob-instead-of-regex-for-specialcaselists/71666>`_.
+  Prior to Clang 18, section names and entries described below use a variant of
+  regex where ``*`` is translated to ``.*``. Clang 18 (`D154014
+  <https://reviews.llvm.org/D154014>`) switches to glob and plans to remove
+  regex support in Clang 19.
 
+  For Clang 18, regex is supported if ``#!special-case-list-v1`` is the first
+  line of the file.
+
+  Many special case lists use ``.`` to indicate the literal character and do
+  not use regex metacharacters such as ``(``, ``)``. They are unaffected by the
+  regex to glob transition. For more details, see `this discourse post
+  <https://discourse.llvm.org/t/use-glob-instead-of-regex-for-specialcaselists/71666>`_.
 
 Section names are globs written in square brackets that denote
 which sanitizer the following entries apply to. For example, ``[address]``
@@ -80,7 +85,6 @@ tool-specific docs.
 
 .. code-block:: bash
 
-    #!special-case-list-v2
     # The line above is explained in the note above
     # Lines starting with # are ignored.
     # Turn off checks for the source file
diff --git a/llvm/lib/Support/SpecialCaseList.cpp b/llvm/lib/Support/SpecialCaseList.cpp
index ac8877cca8bc6..7a23421eaeb89 100644
--- a/llvm/lib/Support/SpecialCaseList.cpp
+++ b/llvm/lib/Support/SpecialCaseList.cpp
@@ -150,13 +150,12 @@ bool SpecialCaseList::parse(const MemoryBuffer *MB, std::string &Error) {
     return false;
   }
 
-  // In https://reviews.llvm.org/D154014 we transitioned to using globs instead
-  // of regexes to match patterns in special case lists. Since this was a
-  // breaking change, we will temporarily support the original behavior using
-  // regexes. If "#!special-case-list-v2" is the first line of the file, then
-  // we will use the new behavior using globs. For more details, see
+  // In https://reviews.llvm.org/D154014 we added glob support and planned to
+  // remove regex support in patterns. We temporarily support the original
+  // behavior using regexes if "#!special-case-list-v1" is the first line of the
+  // file. For more details, see
   // https://discourse.llvm.org/t/use-glob-instead-of-regex-for-specialcaselists/71666
-  bool UseGlobs = MB->getBuffer().starts_with("#!special-case-list-v2\n");
+  bool UseGlobs = !MB->getBuffer().starts_with("#!special-case-list-v1\n");
 
   for (line_iterator LineIt(*MB, /*SkipBlanks=*/true, /*CommentMarker=*/'#');
        !LineIt.is_at_eof(); LineIt++) {
diff --git a/llvm/unittests/Support/SpecialCaseListTest.cpp b/llvm/unittests/Support/SpecialCaseListTest.cpp
index 81faeca5d6357..725d20a9b4def 100644
--- a/llvm/unittests/Support/SpecialCaseListTest.cpp
+++ b/llvm/unittests/Support/SpecialCaseListTest.cpp
@@ -25,8 +25,8 @@ class SpecialCaseListTest : public ::testing::Test {
                                                        std::string &Error,
                                                        bool UseGlobs = true) {
     auto S = List.str();
-    if (UseGlobs)
-      S = (Twine("#!special-case-list-v2\n") + S).str();
+    if (!UseGlobs)
+      S = (Twine("#!special-case-list-v1\n") + S).str();
     std::unique_ptr<MemoryBuffer> MB = MemoryBuffer::getMemBuffer(S);
     return SpecialCaseList::create(MB.get(), Error);
   }
@@ -46,8 +46,8 @@ class SpecialCaseListTest : public ::testing::Test {
     SmallString<64> Path;
     sys::fs::createTemporaryFile("SpecialCaseListTest", "temp", FD, Path);
     raw_fd_ostream OF(FD, true, true);
-    if (UseGlobs)
-      OF << "#!special-case-list-v2\n";
+    if (!UseGlobs)
+      OF << "#!special-case-list-v1\n";
     OF << Contents;
     OF.close();
     return std::string(Path.str());

Copy link
Contributor

@ellishg ellishg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for following up!

@dwblaikie
Copy link
Collaborator

Probably would be good to introduce the -v1 version and require it first, then eventually change the default - so people don't get a silent behavior change? Even the existing users only using * and . need to change their syntax to migrate to v2, right? They'll need to change .* to * and . to ? and if they don't, their matches will start behaving strangely without a specific/actionable error message?

@ellishg
Copy link
Contributor

ellishg commented Dec 8, 2023

Probably would be good to introduce the -v1 version and require it first, then eventually change the default - so people don't get a silent behavior change? Even the existing users only using * and . need to change their syntax to migrate to v2, right? They'll need to change .* to * and . to ? and if they don't, their matches will start behaving strangely without a specific/actionable error message?

Since #!special-case-list-v1 is just a comment and v1 is the default before this change, users can actually add it to the first line today and the behavior won't change until Clang 19.

@dwblaikie
Copy link
Collaborator

Probably would be good to introduce the -v1 version and require it first, then eventually change the default - so people don't get a silent behavior change? Even the existing users only using * and . need to change their syntax to migrate to v2, right? They'll need to change .* to * and . to ? and if they don't, their matches will start behaving strangely without a specific/actionable error message?

Since #!special-case-list-v1 is just a comment and v1 is the default before this change, users can actually add it to the first line today and the behavior won't change until Clang 19.

Right, but they wouldn't know to do that in advance of the behavior change, would they? If they do nothing/don't read any release notes, they'd get a silent behavior change, I think/if I'm understanding this correctly?

@MaskRay
Copy link
Member Author

MaskRay commented Dec 8, 2023

Probably would be good to introduce the -v1 version and require it first, then eventually change the default - so people don't get a silent behavior change? Even the existing users only using * and . need to change their syntax to migrate to v2, right? They'll need to change .* to * and . to ? and if they don't, their matches will start behaving strangely without a specific/actionable error message?

Since #!special-case-list-v1 is just a comment and v1 is the default before this change, users can actually add it to the first line today and the behavior won't change until Clang 19.

Right, but they wouldn't know to do that in advance of the behavior change, would they? If they do nothing/don't read any release notes, they'd get a silent behavior change, I think/if I'm understanding this correctly?

It will be silent but the main idea is that almost all special case lists are unaffected by this transition
https://discourse.llvm.org/t/use-glob-instead-of-regex-for-specialcaselists/71666 and my first comment are to demonstrate this.

I have checked CodeSearch (and ignorelist.txt uses in projects such as blue/v8/chromiumos) and actually haven't found any non-glob metacharacters uses (( { etc).

This is understandable: src: and fun: are the most common entries. fun: is almost always an exact function name or a glob with a leading/trailing *.
Many people tend to think src: uses globs and do not escape .. We have envisioned src:a.(c|cpp) (though I haven't seen such a use case) which can be satisfied by the glob brace expansion src:a.{c,cpp}.

Going forward, glob is the only supported format and we do not want users to annotate their ignore list with #!special-case-list-v2 or #!glob.
Enforcing a #! header comment (by issuing a warning if not seen) would likely cause more disruption.

@dwblaikie
Copy link
Collaborator

Still seems like an unfortunate and subtle silent change in behavior to me. But shrug if folks who own these features think it's fine, so be it.

https://reviews.llvm.org/D154014 addes glob support and enables it when
`#!special-case-list-v2` is the first line. This patch makes the glob
support the default (faster than regex after
https://reviews.llvm.org/D156046) and switches to the deprecated regex
support if `#!special-case-list-v1` is the first line.

I have surveyed many ignore lists. All ignore lists I find only use
basic `*` `.` and don't use regex metacharacters such as `(` and `)`.
(As neither `src:` nor `fun:` benefits from using regex.)
They are unaffected by the transition (with a caution that regex
`src:x/a.pb.*` matches `x/axpbx` but glob `src:x/a.pb.*` doesn't).

There is no deprecating warning. If a user finds
`#!special-case-list-v1`, they shall read that the old syntax is
deprecated.

Link: https://discourse.llvm.org/t/use-glob-instead-of-regex-for-specialcaselists/71666
@MaskRay MaskRay merged commit 81d1df2 into llvm:main Dec 11, 2023
4 of 5 checks passed
@MaskRay MaskRay deleted the SpecialCaseList branch December 11, 2023 23:30
@aeubanks
Copy link
Contributor

aeubanks commented Jan 5, 2024

This caused some ignorelist changes, e.g.

src:*third_party/vulkan_memory_allocator/include/vk_mem_alloc.h

didn't work anymore and the opt-out made it work again. Still investigating why.

@ellishg
Copy link
Contributor

ellishg commented Jan 6, 2024

This caused some ignorelist changes, e.g.

src:*third_party/vulkan_memory_allocator/include/vk_mem_alloc.h

didn't work anymore and the opt-out made it work again. Still investigating why.

Not sure if it's the reason, but the . in vk_mem_alloc.h matches any character for regex, but only the literal for glob (use ? for this). I think everything else is the same.

@aeubanks
Copy link
Contributor

aeubanks commented Jan 6, 2024

the file name is vk_mem_alloc.h so that shouldn't be the issue

@aeubanks
Copy link
Contributor

aeubanks commented Jan 6, 2024

ah it's because we something like

[cfi-unrelated-cast|cfi-derived-cast]

src:*third_party/vulkan_memory_allocator/include/vk_mem_alloc.h

it seems like the new system doesn't match [cfi-unrelated-cast|cfi-derived-cast]

@MaskRay
Copy link
Member Author

MaskRay commented Jan 7, 2024

ah it's because we something like

[cfi-unrelated-cast|cfi-derived-cast]

src:*third_party/vulkan_memory_allocator/include/vk_mem_alloc.h

it seems like the new system doesn't match [cfi-unrelated-cast|cfi-derived-cast]

The glob mode can use the section name [cfi-*-cast] or [cfi-{unrelated,derived}-cast]. https://clang.llvm.org/docs/SanitizerSpecialCaseList.html contains an example.

@dwblaikie
Copy link
Collaborator

(this sort of example reinforces my concerns expressed earlier that this kind of silent change in behavior is problematic - moreso in the wild, rather than in Google's fairly constrained environment (frequent updates, good test coverage, and good bisection infrastructure, etc - other folks wouldn't have such a good time figuring this out, I suspect))

@efriedma-quic
Copy link
Collaborator

A report from the field: we had an ignorelist that contained [cfi-vcall|cfi-nvcall|cfi-icall]. This was recommended syntax from the documentation (https://releases.llvm.org/17.0.1/tools/clang/docs/SanitizerSpecialCaseList.html)... but it broke with the transition. This took a significant effort to track down.

I see the following issues with the transition:

  • There was no release note, or any sort of announcement of this breaking change.
  • No version of clang produces a diagnostic if you use the affected syntax; the behavior just silently changes.
  • The proposed transition period is extremely short; it's impossible to write an ignorelist that works on both clang 17 and clang 19 if you use any regex/glob features other than "*".

@ellishg
Copy link
Contributor

ellishg commented Mar 14, 2024

A report from the field: we had an ignorelist that contained [cfi-vcall|cfi-nvcall|cfi-icall]. This was recommended syntax from the documentation (https://releases.llvm.org/17.0.1/tools/clang/docs/SanitizerSpecialCaseList.html)... but it broke with the transition. This took a significant effort to track down.

I see the following issues with the transition:

  • There was no release note, or any sort of announcement of this breaking change.

  • No version of clang produces a diagnostic if you use the affected syntax; the behavior just silently changes.

  • The proposed transition period is extremely short; it's impossible to write an ignorelist that works on both clang 17 and clang 19 if you use any regex/glob features other than "*".

I announced this change on discourse last year. I think others have seen this specific bug before, so maybe I should call out this case in that post.

@efriedma-quic
Copy link
Collaborator

CC @llvm/clang-vendors

I announced this change on discourse last year.

"Use glob instead of regex for SpecialCaseLists" doesn't mean anything for anyone not actively working on the relevant code. An announcement would be titled something like "Syntax change for -fsanitize-ignorelist"... and it would be posted when the change actually happened.

@pogo59
Copy link
Collaborator

pogo59 commented Mar 14, 2024

CC @llvm/clang-vendors

Thank you for that! I don't know whether our sanitizer guys were aware of this, I've filed an internal ticket to find out.

+1 to the complaint about no Release Note.

@MaskRay
Copy link
Member Author

MaskRay commented Mar 15, 2024

CC @llvm/clang-vendors

I announced this change on discourse last year.

"Use glob instead of regex for SpecialCaseLists" doesn't mean anything for anyone not actively working on the relevant code. An announcement would be titled something like "Syntax change for -fsanitize-ignorelist"... and it would be posted when the change actually happened.

Apologies. This could have been better handled. I've also seen a report of [cfi-vcall|cfi-nvcall|cfi-icall] from Chromium. Now I re-checked, https://reviews.llvm.org/D154014 did mention [cfi-vcall|cfi-icall]. Grepping \| (and ^\[.*\| for section names) in *san_ignorelist.txt files is probably good enough to know whether you affected.

@pogo59
Copy link
Collaborator

pogo59 commented Apr 17, 2024

FTR, got an internal report about this. Luckily it was my turn to catch new bugs and I recognized the issue.

@MaskRay Is it too late to add a Release Note for LLVM 18?

@MaskRay
Copy link
Member Author

MaskRay commented Apr 17, 2024

FTR, got an internal report about this. Luckily it was my turn to catch new bugs and I recognized the issue.

@MaskRay Is it too late to add a Release Note for LLVM 18?

If it is not too late, #89141 will add it to release/18.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category llvm:support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants