Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang] [Gnu] Improve GCCVersion parsing to match versions such as "10-win32" #69079

Merged
merged 4 commits into from Oct 26, 2023

Conversation

mstorsjo
Copy link
Member

In earlier GCC versions, the Debian/Ubuntu provided mingw toolchains were packaged in /usr/lib/gcc/ with version strings such as "5.3-win32", which were matched and found since 6afcd64. However in recent versions, they have stopped including the minor version number and only have version strings such as "10-win32" and "10-posix".

Generalize the parsing code to tolerate the patch suffix to be present on a version number with only a major number.

Refactor the string parsing code to highlight the overall structure of the parsing. This implementation should yield the same result as before, except for when there's only one segment and it has trailing, non-number contents.

This allows Clang to find the GCC libraries and headers in Debian/Ubuntu provided MinGW cross compilers.

@mstorsjo mstorsjo added the clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' label Oct 14, 2023
@mstorsjo mstorsjo requested a review from MaskRay October 14, 2023 21:08
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Oct 14, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 14, 2023

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-driver

Author: Martin Storsjö (mstorsjo)

Changes

In earlier GCC versions, the Debian/Ubuntu provided mingw toolchains were packaged in /usr/lib/gcc/<triple> with version strings such as "5.3-win32", which were matched and found since 6afcd64. However in recent versions, they have stopped including the minor version number and only have version strings such as "10-win32" and "10-posix".

Generalize the parsing code to tolerate the patch suffix to be present on a version number with only a major number.

Refactor the string parsing code to highlight the overall structure of the parsing. This implementation should yield the same result as before, except for when there's only one segment and it has trailing, non-number contents.

This allows Clang to find the GCC libraries and headers in Debian/Ubuntu provided MinGW cross compilers.


Full diff: https://github.com/llvm/llvm-project/pull/69079.diff

3 Files Affected:

  • (modified) clang/lib/Driver/ToolChains/Gnu.cpp (+54-28)
  • (modified) clang/unittests/Driver/CMakeLists.txt (+1)
  • (added) clang/unittests/Driver/GCCVersionTest.cpp (+49)
diff --git a/clang/lib/Driver/ToolChains/Gnu.cpp b/clang/lib/Driver/ToolChains/Gnu.cpp
index cdd911af9a73361..e6f94836c4110a1 100644
--- a/clang/lib/Driver/ToolChains/Gnu.cpp
+++ b/clang/lib/Driver/ToolChains/Gnu.cpp
@@ -2007,45 +2007,71 @@ Generic_GCC::GCCVersion Generic_GCC::GCCVersion::Parse(StringRef VersionText) {
   std::pair<StringRef, StringRef> First = VersionText.split('.');
   std::pair<StringRef, StringRef> Second = First.second.split('.');
 
-  GCCVersion GoodVersion = {VersionText.str(), -1, -1, -1, "", "", ""};
-  if (First.first.getAsInteger(10, GoodVersion.Major) || GoodVersion.Major < 0)
-    return BadVersion;
-  GoodVersion.MajorStr = First.first.str();
-  if (First.second.empty())
-    return GoodVersion;
+  StringRef MajorStr = First.first;
   StringRef MinorStr = Second.first;
-  if (Second.second.empty()) {
-    if (size_t EndNumber = MinorStr.find_first_not_of("0123456789")) {
-      GoodVersion.PatchSuffix = std::string(MinorStr.substr(EndNumber));
-      MinorStr = MinorStr.slice(0, EndNumber);
-    }
-  }
-  if (MinorStr.getAsInteger(10, GoodVersion.Minor) || GoodVersion.Minor < 0)
-    return BadVersion;
-  GoodVersion.MinorStr = MinorStr.str();
+  StringRef PatchStr = Second.second;
 
-  // First look for a number prefix and parse that if present. Otherwise just
-  // stash the entire patch string in the suffix, and leave the number
-  // unspecified. This covers versions strings such as:
-  //   5        (handled above)
+  GCCVersion GoodVersion = {VersionText.str(), -1, -1, -1, "", "", ""};
+
+  // Parse version number strings such as:
+  //   5
   //   4.4
   //   4.4-patched
   //   4.4.0
   //   4.4.x
   //   4.4.2-rc4
   //   4.4.x-patched
-  // And retains any patch number it finds.
-  StringRef PatchText = Second.second;
-  if (!PatchText.empty()) {
-    if (size_t EndNumber = PatchText.find_first_not_of("0123456789")) {
-      // Try to parse the number and any suffix.
-      if (PatchText.slice(0, EndNumber).getAsInteger(10, GoodVersion.Patch) ||
-          GoodVersion.Patch < 0)
-        return BadVersion;
-      GoodVersion.PatchSuffix = std::string(PatchText.substr(EndNumber));
+  //   10-win32
+  // Split on '.', handle 1, 2 or 3 such segments. Each segment must contain
+  // purely a number, except for the last one, where a non-number suffix
+  // is stored in PatchSuffix. The third segment is allowed to not contain
+  // a number at all.
+
+  auto HandleLastNumber = [&](StringRef Segment, int &Number,
+                              std::string &OutStr) -> bool {
+    // Look for a number prefix and parse that, and split out any trailing
+    // string into GoodVersion.PatchSuffix.
+
+    if (size_t EndNumber = Segment.find_first_not_of("0123456789")) {
+      StringRef NumberStr = Segment.slice(0, EndNumber);
+      if (NumberStr.getAsInteger(10, Number) || Number < 0)
+        return false;
+      OutStr = NumberStr;
+      GoodVersion.PatchSuffix = Segment.substr(EndNumber);
+      return true;
     }
+    return false;
+  };
+  auto HandleNumber = [](StringRef Segment, int &Number) -> bool {
+    if (Segment.getAsInteger(10, Number) || Number < 0)
+      return false;
+    return true;
+  };
+
+  if (MinorStr.empty()) {
+    // If no minor string, major is the last segment
+    if (!HandleLastNumber(MajorStr, GoodVersion.Major, GoodVersion.MajorStr))
+      return BadVersion;
+    return GoodVersion;
+  } else {
+    if (!HandleNumber(MajorStr, GoodVersion.Major))
+      return BadVersion;
+    GoodVersion.MajorStr = MajorStr;
+  }
+  if (PatchStr.empty()) {
+    // If no patch string, minor is the last segment
+    if (!HandleLastNumber(MinorStr, GoodVersion.Minor, GoodVersion.MinorStr))
+      return BadVersion;
+    return GoodVersion;
+  } else {
+    if (!HandleNumber(MinorStr, GoodVersion.Minor))
+      return BadVersion;
+    GoodVersion.MinorStr = MinorStr;
   }
 
+  // For the last segment, tolerate a missing number.
+  std::string DummyStr;
+  HandleLastNumber(PatchStr, GoodVersion.Patch, DummyStr);
   return GoodVersion;
 }
 
diff --git a/clang/unittests/Driver/CMakeLists.txt b/clang/unittests/Driver/CMakeLists.txt
index e37c158d7137a88..752037f78fb147d 100644
--- a/clang/unittests/Driver/CMakeLists.txt
+++ b/clang/unittests/Driver/CMakeLists.txt
@@ -9,6 +9,7 @@ set(LLVM_LINK_COMPONENTS
 add_clang_unittest(ClangDriverTests
   DistroTest.cpp
   DXCModeTest.cpp
+  GCCVersionTest.cpp
   ToolChainTest.cpp
   ModuleCacheTest.cpp
   MultilibBuilderTest.cpp
diff --git a/clang/unittests/Driver/GCCVersionTest.cpp b/clang/unittests/Driver/GCCVersionTest.cpp
new file mode 100644
index 000000000000000..91842a2ea959754
--- /dev/null
+++ b/clang/unittests/Driver/GCCVersionTest.cpp
@@ -0,0 +1,49 @@
+//===- unittests/Driver/GCCVersionTest.cpp --- GCCVersion parser tests ----===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Unit tests for Generic_GCC::GCCVersion
+//
+//===----------------------------------------------------------------------===//
+
+#include "../../lib/Driver/ToolChains/Gnu.h"
+#include "gtest/gtest.h"
+
+using namespace clang::driver;
+using namespace clang;
+
+struct VersionParseTest {
+  std::string Text;
+
+  int Major, Minor, Patch;
+  std::string MajorStr, MinorStr, PatchSuffix;
+};
+
+const VersionParseTest TestCases[] = {
+    {"5", 5, -1, -1, "5", "", ""},
+    {"4.4", 4, 4, -1, "4", "4", ""},
+    {"4.4-patched", 4, 4, -1, "4", "4", "-patched"},
+    {"4.4.0", 4, 4, 0, "4", "4", ""},
+    {"4.4.x", 4, 4, -1, "4", "4", ""},
+    {"4.4.2-rc4", 4, 4, 2, "4", "4", "-rc4"},
+    {"4.4.x-patched", 4, 4, -1, "4", "4", ""},
+    {"not-a-version", -1, -1, -1, "", "", ""},
+    { "10-win32", 10, -1, -1, "10", "", "-win32" },
+};
+
+TEST(GCCVersionTest, Parse) {
+  for (const auto &TC : TestCases) {
+    auto V = toolchains::Generic_GCC::GCCVersion::Parse(TC.Text);
+    ASSERT_EQ(V.Text, TC.Text);
+    ASSERT_EQ(V.Major, TC.Major);
+    ASSERT_EQ(V.Minor, TC.Minor);
+    ASSERT_EQ(V.Patch, TC.Patch);
+    ASSERT_EQ(V.MajorStr, TC.MajorStr);
+    ASSERT_EQ(V.MinorStr, TC.MinorStr);
+    ASSERT_EQ(V.PatchSuffix, TC.PatchSuffix);
+  }
+}

@mstorsjo
Copy link
Member Author

This goes on top of #69078 - the first commit is reviewed there, thus within this PR, only review the second commit on its own.

@github-actions
Copy link

github-actions bot commented Oct 14, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@mstorsjo mstorsjo force-pushed the clang-gccversion branch 2 times, most recently from 322d9e0 to 468befb Compare October 18, 2023 09:38
@mstorsjo
Copy link
Member Author

The prerequisite to this PR has been merged now.

@mstorsjo
Copy link
Member Author

Ping

clang/lib/Driver/ToolChains/Gnu.cpp Outdated Show resolved Hide resolved
}

// For the last segment, tolerate a missing number.
std::string DummyStr;
HandleLastNumber(PatchStr, GoodVersion.Patch, DummyStr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the segment after the - is only a number e.g. 10-10? If I'm reading this correctly, think in that case we end up leaving that out of the PatchSuffix.

Looks like https://semver.org/ allows this case in the grammar, though I'm not sure if GCC versions strictly adhere to that standard.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation does parse 10-10 as Major=10, the rest left at -1, and PatchSuffix="-10".

I'm not sure exactly which bit gives you the impression that case wouldn't get handled like that. The comment above ("For the last segment, tolerate a missing number") only means that for the case 4.4.x-patched, we don't return an error even if the last bit is x-patched, but we return what we've parsed up to that point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the first return false; in HandleLastNumber that is making me think that, since that skips setting PatchSuffix. Maybe my example should have been: 1.2.3-4

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.

As the snippet looks like this:

    if (size_t EndNumber = Segment.find_first_not_of("0123456789")) {
      StringRef NumberStr = Segment.slice(0, EndNumber);
      if (NumberStr.getAsInteger(10, Number) || Number < 0)
        return false;

Due to the find_first_not_of, the substring NumberStr can only contain the chars [0-9] (and EndNumber must be nonzero here), so the integer parsing really should succeed (unless it's out of range for a regular int?), and can't really be negative either (as the string can't contain a leading -).

In practice, 1.2.3-4 does get parsed as one would like. However the find_first_not_of also has the effect that the PatchSuffix doesn't really need to start with a dash either; if we parse 1.2.3x4, we get Major/Minor/Patch set as 1, 2, 3, and PatchSuffix set to x4.

I'm not sure if this is the ideal implementation or not, I'm mostly keeping this untouched and just abstracts away to apply it at any of the positions in the version string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shrug close enough for now. Thanks for explaining!

…0-win32"

In earlier GCC versions, the Debian/Ubuntu provided mingw toolchains
were packaged in /usr/lib/gcc/<triple> with version strings such
as "5.3-win32", which were matched and found since
6afcd64. However in recent versions,
they have stopped including the minor version number and only
have version strings such as "10-win32" and "10-posix".

Generalize the parsing code to tolerate the patch suffix to be
present on a version number with only a major number.

Refactor the string parsing code to highlight the overall structure
of the parsing. This implementation should yield the same result
as before, except for when there's only one segment and it has
trailing, non-number contents.

This allows Clang to find the GCC libraries and headers in
Debian/Ubuntu provided MinGW cross compilers.
}

// For the last segment, tolerate a missing number.
std::string DummyStr;
HandleLastNumber(PatchStr, GoodVersion.Patch, DummyStr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the first return false; in HandleLastNumber that is making me think that, since that skips setting PatchSuffix. Maybe my example should have been: 1.2.3-4

clang/lib/Driver/ToolChains/Gnu.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@jroelofs jroelofs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mstorsjo mstorsjo merged commit 05dcfa4 into llvm:main Oct 26, 2023
2 of 3 checks passed
@mstorsjo mstorsjo deleted the clang-gccversion branch October 26, 2023 07:56
zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Oct 26, 2023
…0-win32" (llvm#69079)

In earlier GCC versions, the Debian/Ubuntu provided mingw toolchains
were packaged in /usr/lib/gcc/<triple> with version strings such as
"5.3-win32", which were matched and found since
6afcd64. However in recent versions,
they have stopped including the minor version number and only have
version strings such as "10-win32" and "10-posix".

Generalize the parsing code to tolerate the patch suffix to be present
on a version number with only a major number.

Refactor the string parsing code to highlight the overall structure of
the parsing. This implementation should yield the same result as before,
except for when there's only one segment and it has trailing, non-number
contents.

This allows Clang to find the GCC libraries and headers in Debian/Ubuntu
provided MinGW cross compilers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants