Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clangd] [C++20] [Modules] Introduce initial support for C++20 Modules #66462

Merged
merged 8 commits into from
Jul 18, 2024

Conversation

ChuanqiXu9
Copy link
Member

@ChuanqiXu9 ChuanqiXu9 commented Sep 15, 2023

Alternatives to https://reviews.llvm.org/D153114.

Try to address clangd/clangd#1293.

See the links for design ideas and the consensus so far. We want to have some initial support in clang18.

This is the initial support for C++20 Modules in clangd.
As suggested by sammccall in https://reviews.llvm.org/D153114,
we should minimize the scope of the initial patch to make it easier
to review and understand so that every one are in the same page:

> Don't attempt any cross-file or cross-version coordination: i.e. don't
> try to reuse BMIs between different files, don't try to reuse BMIs
> between (preamble) reparses of the same file, don't try to persist the
> module graph. Instead, when building a preamble, synchronously scan
> for the module graph, build the required PCMs on the single preamble
> thread with filenames private to that preamble, and then proceed to
> build the preamble.

This patch reflects the above opinions.

Testing in real-world project

I tested this with a modularized library: https://github.com/alibaba/async_simple/tree/CXX20Modules. This library has 3 modules (async_simple, std and asio) and 65 module units. (Note that a module consists of multiple module units). Both std module and asio module have 100k+ lines of code (maybe more, I didn't count). And async_simple itself has 8k lines of code. This is the scale of the project.

The result shows that it works pretty well, ..., well, except I need to wait roughly 10s after opening/editing any file. And this falls in our expectations. We know it is hard to make it perfect in the first move.

What this patch does in detail

  • Introduced an option --experimental-modules-support for the support for C++20 Modules. So that no matter how bad this is, it wouldn't affect current users. Following off the page, we'll assume the option is enabled.
  • Introduced two classes ModuleFilesInfo and ModuleDependencyScanner. Now ModuleDependencyScanner is only used by ModuleFilesInfo.
  • The class ModuleFilesInfo records the built module files for specific single source file. The module files can only be built by the static member function ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...).
  • The class PreambleData adds a new member variable with type ModuleFilesInfo. This refers to the needed module files for the current file. It means the module files info is part of the preamble, which is suggested in the first patch too.
  • In isPreambleCompatible(), we add a call to ModuleFilesInfo::CanReuse() to check if the built module files are still up to date.
  • When we build the AST for a source file, we will load the built module files from ModuleFilesInfo.

What we need to do next

Let's split the TODOs into clang part and clangd part to make things more clear.

The TODOs in the clangd part include:

  1. Enable reusing module files across source files. The may require us to bring a ModulesManager like thing which need to handle scheduling, the possibility of BMI version conflicts and various events that can invalidate the module graph.
  2. Get a more efficient method to get the <module-name> -> <module-unit-source> map. Currently we always scan the whole project during ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...). This is clearly inefficient even if the scanning process is pretty fast. I think the potential solutions include:
    • Make a global scanner to monitor the state of every source file like I did in the first patch. The pain point is that we need to take care of the data races.
    • Ask the build systems to provide the map just like we ask them to provide the compilation database.
  3. Persist the module files. So that we can reuse module files across clangd invocations or even across clangd instances.

TODOs in the clang part include:

  1. Clang should offer an option/mode to skip writing/reading the bodies of the functions. Or even if we can requrie the parser to skip parsing the function bodies.

And it looks like we can say the support for C++20 Modules is initially workable after we made (1) and (2) (or even without (2)).

CC: @HighCommander4 @ilya-biryukov

@ChuanqiXu9 ChuanqiXu9 added the clang:modules C++20 modules and Clang Header Modules label Sep 15, 2023
@llvmbot llvmbot added the clangd label Sep 15, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Sep 15, 2023

@llvm/pr-subscribers-clang-tools-extra
@llvm/pr-subscribers-clangd

@llvm/pr-subscribers-clang-modules

Changes (to be edited)

(not ready for review yet)

Patch is 66.97 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/66462.diff

28 Files Affected:

  • (modified) clang-tools-extra/clangd/CMakeLists.txt (+3)
  • (modified) clang-tools-extra/clangd/ClangdServer.cpp (+1)
  • (modified) clang-tools-extra/clangd/ClangdServer.h (+3)
  • (modified) clang-tools-extra/clangd/GlobalCompilationDatabase.cpp (+21)
  • (modified) clang-tools-extra/clangd/GlobalCompilationDatabase.h (+6)
  • (added) clang-tools-extra/clangd/ModuleDependencyScanner.cpp (+86)
  • (added) clang-tools-extra/clangd/ModuleDependencyScanner.h (+78)
  • (added) clang-tools-extra/clangd/ModuleFilesInfo.cpp (+282)
  • (added) clang-tools-extra/clangd/ModuleFilesInfo.h (+118)
  • (modified) clang-tools-extra/clangd/ParsedAST.cpp (+8)
  • (modified) clang-tools-extra/clangd/Preamble.cpp (+17-6)
  • (modified) clang-tools-extra/clangd/Preamble.h (+7)
  • (modified) clang-tools-extra/clangd/TUScheduler.cpp (+13)
  • (modified) clang-tools-extra/clangd/TUScheduler.h (+3)
  • (modified) clang-tools-extra/clangd/test/CMakeLists.txt (+1)
  • (added) clang-tools-extra/clangd/test/modules.test (+79)
  • (modified) clang-tools-extra/clangd/tool/Check.cpp (+6-2)
  • (modified) clang-tools-extra/clangd/tool/ClangdMain.cpp (+8)
  • (modified) clang-tools-extra/clangd/unittests/CMakeLists.txt (+2)
  • (modified) clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp (+14-4)
  • (modified) clang-tools-extra/clangd/unittests/FileIndexTests.cpp (+4-1)
  • (added) clang-tools-extra/clangd/unittests/ModuleDependencyScannerTest.cpp (+173)
  • (added) clang-tools-extra/clangd/unittests/ModuleFilesInfoTest.cpp (+223)
  • (added) clang-tools-extra/clangd/unittests/ModulesTestSetup.h (+105)
  • (modified) clang-tools-extra/clangd/unittests/ParsedASTTests.cpp (+6-2)
  • (modified) clang-tools-extra/clangd/unittests/PreambleTests.cpp (+4-2)
  • (modified) clang-tools-extra/clangd/unittests/TestTU.cpp (+10-4)
  • (modified) clang-tools-extra/docs/ReleaseNotes.rst (+3)
diff --git a/clang-tools-extra/clangd/CMakeLists.txt b/clang-tools-extra/clangd/CMakeLists.txt
index 3911fb6c6c746a8..bcfb49551a02591 100644
--- a/clang-tools-extra/clangd/CMakeLists.txt
+++ b/clang-tools-extra/clangd/CMakeLists.txt
@@ -97,6 +97,8 @@ add_clang_library(clangDaemon
   IncludeFixer.cpp
   InlayHints.cpp
   JSONTransport.cpp
+  ModuleDependencyScanner.cpp
+  ModuleFilesInfo.cpp
   PathMapping.cpp
   Protocol.cpp
   Quality.cpp
@@ -161,6 +163,7 @@ clang_target_link_libraries(clangDaemon
   clangAST
   clangASTMatchers
   clangBasic
+  clangDependencyScanning
   clangDriver
   clangFormat
   clangFrontend
diff --git a/clang-tools-extra/clangd/ClangdServer.cpp b/clang-tools-extra/clangd/ClangdServer.cpp
index 13d788162817fb4..e4c85858b6882ae 100644
--- a/clang-tools-extra/clangd/ClangdServer.cpp
+++ b/clang-tools-extra/clangd/ClangdServer.cpp
@@ -202,6 +202,7 @@ ClangdServer::Options::operator TUScheduler::Options() const {
   Opts.UpdateDebounce = UpdateDebounce;
   Opts.ContextProvider = ContextProvider;
   Opts.PreambleThrottler = PreambleThrottler;
+  Opts.ExperimentalModulesSupport = ExperimentalModulesSupport;
   return Opts;
 }
 
diff --git a/clang-tools-extra/clangd/ClangdServer.h b/clang-tools-extra/clangd/ClangdServer.h
index a416602251428b0..dc546b118cb8f5e 100644
--- a/clang-tools-extra/clangd/ClangdServer.h
+++ b/clang-tools-extra/clangd/ClangdServer.h
@@ -112,6 +112,9 @@ class ClangdServer {
     /// This throttler controls which preambles may be built at a given time.
     clangd::PreambleThrottler *PreambleThrottler = nullptr;
 
+    /// Enable experimental support for modules.
+    bool ExperimentalModulesSupport = false;
+
     /// If true, ClangdServer builds a dynamic in-memory index for symbols in
     /// opened files and uses the index to augment code completion results.
     bool BuildDynamicSymbolIndex = false;
diff --git a/clang-tools-extra/clangd/GlobalCompilationDatabase.cpp b/clang-tools-extra/clangd/GlobalCompilationDatabase.cpp
index d1833759917a30f..bcc8f4f0dd9e5ac 100644
--- a/clang-tools-extra/clangd/GlobalCompilationDatabase.cpp
+++ b/clang-tools-extra/clangd/GlobalCompilationDatabase.cpp
@@ -729,6 +729,20 @@ DirectoryBasedGlobalCompilationDatabase::getProjectInfo(PathRef File) const {
   return Res-&gt;PI;
 }
 
+std::vector&lt;std::string&gt;
+DirectoryBasedGlobalCompilationDatabase::getAllFilesInProjectOf(
+    PathRef File) const {
+  CDBLookupRequest Req;
+  Req.FileName = File;
+  Req.ShouldBroadcast = false;
+  Req.FreshTime = Req.FreshTimeMissing =
+      std::chrono::steady_clock::time_point::min();
+  auto Res = lookupCDB(Req);
+  if (!Res)
+    return {};
+  return Res-&gt;CDB-&gt;getAllFiles();
+}
+
 OverlayCDB::OverlayCDB(const GlobalCompilationDatabase *Base,
                        std::vector&lt;std::string&gt; FallbackFlags,
                        CommandMangler Mangler)
@@ -805,6 +819,13 @@ std::optional&lt;ProjectInfo&gt; DelegatingCDB::getProjectInfo(PathRef File) const {
   return Base-&gt;getProjectInfo(File);
 }
 
+std::vector&lt;std::string&gt;
+DelegatingCDB::getAllFilesInProjectOf(PathRef File) const {
+  if (!Base)
+    return {};
+  return Base-&gt;getAllFilesInProjectOf(File);
+}
+
 tooling::CompileCommand DelegatingCDB::getFallbackCommand(PathRef File) const {
   if (!Base)
     return GlobalCompilationDatabase::getFallbackCommand(File);
diff --git a/clang-tools-extra/clangd/GlobalCompilationDatabase.h b/clang-tools-extra/clangd/GlobalCompilationDatabase.h
index 2bf8c973c534c6f..eaeff8d627a0960 100644
--- a/clang-tools-extra/clangd/GlobalCompilationDatabase.h
+++ b/clang-tools-extra/clangd/GlobalCompilationDatabase.h
@@ -45,6 +45,10 @@ class GlobalCompilationDatabase {
     return std::nullopt;
   }
 
+  virtual std::vector&lt;std::string&gt; getAllFilesInProjectOf(PathRef File) const {
+    return {};
+  }
+
   /// Makes a guess at how to build a file.
   /// The default implementation just runs clang on the file.
   /// Clangd should treat the results as unreliable.
@@ -75,6 +79,7 @@ class DelegatingCDB : public GlobalCompilationDatabase {
   getCompileCommand(PathRef File) const override;
 
   std::optional&lt;ProjectInfo&gt; getProjectInfo(PathRef File) const override;
+  std::vector&lt;std::string&gt; getAllFilesInProjectOf(PathRef File) const override;
 
   tooling::CompileCommand getFallbackCommand(PathRef File) const override;
 
@@ -121,6 +126,7 @@ class DirectoryBasedGlobalCompilationDatabase
   /// Returns the path to first directory containing a compilation database in
   /// \p File&#x27;s parents.
   std::optional&lt;ProjectInfo&gt; getProjectInfo(PathRef File) const override;
+  std::vector&lt;std::string&gt; getAllFilesInProjectOf(PathRef File) const override;
 
   bool blockUntilIdle(Deadline Timeout) const override;
 
diff --git a/clang-tools-extra/clangd/ModuleDependencyScanner.cpp b/clang-tools-extra/clangd/ModuleDependencyScanner.cpp
new file mode 100644
index 000000000000000..d706d2eb2fc8d6e
--- /dev/null
+++ b/clang-tools-extra/clangd/ModuleDependencyScanner.cpp
@@ -0,0 +1,86 @@
+//===---------------- ModuleDependencyScanner.cpp ----------------*- C++-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include &quot;ModuleDependencyScanner.h&quot;
+
+namespace clang {
+namespace clangd {
+using P1689Rule = tooling::dependencies::P1689Rule;
+
+std::optional&lt;P1689Rule&gt; ModuleDependencyScanner::scan(PathRef FilePath) {
+  if (ScanningCache.count(FilePath))
+    return ScanningCache[FilePath];
+
+  std::optional&lt;tooling::CompileCommand&gt; Cmd = CDB.getCompileCommand(FilePath);
+
+  if (!Cmd)
+    return std::nullopt;
+
+  using namespace clang::tooling::dependencies;
+
+  llvm::SmallString&lt;128&gt; FilePathDir(FilePath);
+  llvm::sys::path::remove_filename(FilePathDir);
+  DependencyScanningTool ScanningTool(
+      Service,
+      TFS ? TFS-&gt;view(FilePathDir) : llvm::vfs::createPhysicalFileSystem());
+
+  llvm::Expected&lt;P1689Rule&gt; Result =
+      ScanningTool.getP1689ModuleDependencyFile(*Cmd, Cmd-&gt;Directory);
+
+  if (auto E = Result.takeError()) {
+    // Ignore any error.
+    llvm::consumeError(std::move(E));
+    return std::nullopt;
+  }
+
+  if (Result-&gt;Provides)
+    ModuleNameToSourceMapper[Result-&gt;Provides-&gt;ModuleName] = FilePath;
+
+  ScanningCache[FilePath] = *Result;
+  return *Result;
+}
+
+void ModuleDependencyScanner::globalScan(PathRef File) {
+  std::vector&lt;std::string&gt; AllFiles = CDB.getAllFilesInProjectOf(File);
+
+  for (auto &amp;File : AllFiles)
+    scan(File);
+}
+
+PathRef ModuleDependencyScanner::getSourceForModuleName(StringRef ModuleName) const {
+  if (!ModuleNameToSourceMapper.count(ModuleName))
+    return {};
+
+  return ModuleNameToSourceMapper[ModuleName];
+}
+
+std::vector&lt;std::string&gt;
+ModuleDependencyScanner::getRequiredModules(PathRef File) const {
+  if (!ScanningCache.count(File))
+    return {};
+
+  const P1689Rule &amp;CachedResult = ScanningCache[File];
+  std::vector&lt;std::string&gt; Result;
+
+  for (const auto &amp;Info : CachedResult.Requires)
+    Result.push_back(Info.ModuleName);
+
+  return Result;
+}
+
+StringRef ModuleDependencyScanner::getModuleName(PathRef File) const {
+  if (!ScanningCache.count(File))
+    return {};
+
+  if (!ScanningCache[File].Provides)
+    return {};
+
+  return ScanningCache[File].Provides-&gt;ModuleName;
+}
+} // namespace clangd
+} // namespace clang
diff --git a/clang-tools-extra/clangd/ModuleDependencyScanner.h b/clang-tools-extra/clangd/ModuleDependencyScanner.h
new file mode 100644
index 000000000000000..1d6eb58fda59e20
--- /dev/null
+++ b/clang-tools-extra/clangd/ModuleDependencyScanner.h
@@ -0,0 +1,78 @@
+//===-------------- ModuleDependencyScanner.h --------------------*- C++-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANGD_MODULEDEPENDENCYSCANNER_H
+#define LLVM_CLANG_TOOLS_EXTRA_CLANGD_MODULEDEPENDENCYSCANNER_H
+
+#include &quot;GlobalCompilationDatabase.h&quot;
+#include &quot;support/Path.h&quot;
+#include &quot;support/ThreadsafeFS.h&quot;
+
+#include &quot;clang/Tooling/DependencyScanning/DependencyScanningService.h&quot;
+#include &quot;clang/Tooling/DependencyScanning/DependencyScanningTool.h&quot;
+
+#include &quot;llvm/ADT/SmallString.h&quot;
+#include &quot;llvm/ADT/StringMap.h&quot;
+
+namespace clang {
+namespace clangd {
+
+/// A scanner to produce P1689 format for C++20 Modules.
+///
+/// The scanner can scan a single file with `scan(PathRef)` member function
+/// or scan the whole project with `globalScan(PathRef)` member function. See
+/// the comments of `globalScan` to see the details.
+class ModuleDependencyScanner {
+public:
+  ModuleDependencyScanner(const GlobalCompilationDatabase &amp;CDB,
+                          const ThreadsafeFS *TFS)
+      : CDB(CDB), TFS(TFS),
+        Service(tooling::dependencies::ScanningMode::CanonicalPreprocessing,
+                tooling::dependencies::ScanningOutputFormat::P1689) {}
+
+  /// Scanning the single file specified by \param FilePath.
+  std::optional&lt;clang::tooling::dependencies::P1689Rule&gt; scan(PathRef FilePath);
+
+  /// Scanning every source file in the current project to get the
+  /// &lt;module-name&gt; to &lt;module-unit-source&gt; map.
+  /// It looks unefficiency to scan the whole project especially for
+  /// every version of every file!
+  /// TODO: We should find a efficient method to get the &lt;module-name&gt;
+  /// to &lt;module-unit-source&gt; map. We can make it either by providing
+  /// a global module dependency scanner to monitor every file. Or we
+  /// can simply require the build systems (or even if the end users)
+  /// to provide the map.
+  void globalScan(PathRef File);
+
+  PathRef getSourceForModuleName(StringRef ModuleName) const;
+
+  /// Return the direct required modules. Indirect required modules are not
+  /// included.
+  std::vector&lt;std::string&gt; getRequiredModules(PathRef File) const;
+  StringRef getModuleName(PathRef File) const;
+
+  const ThreadsafeFS *getThreadsafeFS() const { return TFS; }
+
+  const GlobalCompilationDatabase &amp;getCompilationDatabase() const { return CDB; }
+
+private:
+  const GlobalCompilationDatabase &amp;CDB;
+  const ThreadsafeFS *TFS;
+
+  clang::tooling::dependencies::DependencyScanningService Service;
+
+  // Map source file to P1689 Result.
+  llvm::StringMap&lt;clang::tooling::dependencies::P1689Rule&gt; ScanningCache;
+  // Map module name to source file path.
+  llvm::StringMap&lt;std::string&gt; ModuleNameToSourceMapper;
+};
+
+} // namespace clangd
+} // namespace clang
+
+#endif
diff --git a/clang-tools-extra/clangd/ModuleFilesInfo.cpp b/clang-tools-extra/clangd/ModuleFilesInfo.cpp
new file mode 100644
index 000000000000000..845ff01ca09dff3
--- /dev/null
+++ b/clang-tools-extra/clangd/ModuleFilesInfo.cpp
@@ -0,0 +1,282 @@
+//===----------------- ModuleFilesInfo.cpp -----------------------*- C++-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include &quot;ModuleFilesInfo.h&quot;
+#include &quot;support/Logger.h&quot;
+
+#include &quot;clang/Frontend/FrontendAction.h&quot;
+#include &quot;clang/Frontend/FrontendActions.h&quot;
+#include &quot;clang/Serialization/ASTReader.h&quot;
+
+namespace clang {
+namespace clangd {
+
+namespace {
+llvm::SmallString&lt;128&gt; getAbsolutePath(const tooling::CompileCommand &amp;Cmd) {
+  llvm::SmallString&lt;128&gt; AbsolutePath;
+  if (llvm::sys::path::is_absolute(Cmd.Filename)) {
+    AbsolutePath = Cmd.Filename;
+  } else {
+    AbsolutePath = Cmd.Directory;
+    llvm::sys::path::append(AbsolutePath, Cmd.Filename);
+    llvm::sys::path::remove_dots(AbsolutePath, true);
+  }
+  return AbsolutePath;
+}
+} // namespace
+
+ModuleFilesInfo::ModuleFilesInfo(PathRef MainFile,
+                                 const GlobalCompilationDatabase &amp;CDB) {
+  std::optional&lt;ProjectInfo&gt; PI = CDB.getProjectInfo(MainFile);
+  if (!PI)
+    return;
+
+  llvm::SmallString&lt;128&gt; Result(PI-&gt;SourceRoot);
+  llvm::sys::path::append(Result, &quot;.cache&quot;);
+  llvm::sys::path::append(Result, &quot;clangd&quot;);
+  llvm::sys::path::append(Result, &quot;module_files&quot;);
+  llvm::sys::fs::create_directories(Result, /*IgnoreExisting=*/true);
+
+  llvm::sys::path::append(Result, llvm::sys::path::filename(MainFile));
+  llvm::sys::fs::createUniqueDirectory(Result, UniqueModuleFilesPathPrefix);
+
+  log(&quot;Initialized module files to {0}&quot;, UniqueModuleFilesPathPrefix.str());
+}
+
+ModuleFilesInfo::~ModuleFilesInfo() {
+  DependentModuleNames.clear();
+  Successed = false;
+
+  if (UniqueModuleFilesPathPrefix.empty())
+    return;
+
+  llvm::sys::fs::remove_directories(UniqueModuleFilesPathPrefix);
+  UniqueModuleFilesPathPrefix.clear();
+}
+
+llvm::SmallString&lt;256&gt;
+ModuleFilesInfo::getModuleFilePath(StringRef ModuleName) const {
+  llvm::SmallString&lt;256&gt; ModuleFilePath;
+
+  ModuleFilePath = UniqueModuleFilesPathPrefix;
+  auto [PrimaryModuleName, PartitionName] = ModuleName.split(&#x27;:&#x27;);
+  llvm::sys::path::append(ModuleFilePath, PrimaryModuleName);
+  if (!PartitionName.empty()) {
+    ModuleFilePath.append(&quot;-&quot;);
+    ModuleFilePath.append(PartitionName);
+  }
+  ModuleFilePath.append(&quot;.pcm&quot;);
+
+  return ModuleFilePath;
+}
+
+bool ModuleFilesInfo::IsModuleUnitBuilt(StringRef ModuleName) const {
+  if (!DependentModuleNames.count(ModuleName))
+    return false;
+
+  auto BMIPath = getModuleFilePath(ModuleName);
+  if (llvm::sys::fs::exists(BMIPath))
+    return true;
+
+  Successed = false;
+
+  DependentModuleNames.erase(ModuleName);
+  return false;
+}
+
+void ModuleFilesInfo::ReplaceHeaderSearchOptions(
+    HeaderSearchOptions &amp;Options) const {
+  if (!IsInited())
+    return;
+
+  Options.PrebuiltModulePaths.insert(Options.PrebuiltModulePaths.begin(),
+                                     UniqueModuleFilesPathPrefix.str().str());
+
+  for (auto Iter = Options.PrebuiltModuleFiles.begin();
+       Iter != Options.PrebuiltModuleFiles.end();) {
+    if (IsModuleUnitBuilt(Iter-&gt;first)) {
+      Iter = Options.PrebuiltModuleFiles.erase(Iter);
+      continue;
+    }
+
+    Iter++;
+  }
+}
+
+void ModuleFilesInfo::ReplaceCompileCommands(
+    tooling::CompileCommand &amp;Cmd) const {
+  if (!IsInited())
+    return;
+
+  std::vector&lt;std::string&gt; CommandLine(std::move(Cmd.CommandLine));
+
+  Cmd.CommandLine.emplace_back(CommandLine[0]);
+  Cmd.CommandLine.emplace_back(
+      llvm::Twine(&quot;-fprebuilt-module-path=&quot; + UniqueModuleFilesPathPrefix)
+          .str());
+
+  for (std::size_t I = 1; I &lt; CommandLine.size(); I++) {
+    const std::string &amp;Arg = CommandLine[I];
+    const auto &amp;[LHS, RHS] = StringRef(Arg).split(&quot;=&quot;);
+
+    // Remove original `-fmodule-file=&lt;module-name&gt;=&lt;module-path&gt;` form if it
+    // already built.
+    if (LHS == &quot;-fmodule-file&quot; &amp;&amp; RHS.contains(&quot;=&quot;)) {
+      const auto &amp;[ModuleName, _] = RHS.split(&quot;=&quot;);
+      if (IsModuleUnitBuilt(ModuleName))
+        continue;
+    }
+
+    Cmd.CommandLine.emplace_back(Arg);
+  }
+}
+
+void ModuleFilesInfo::ReplaceCompileCommands(tooling::CompileCommand &amp;Cmd,
+                                             StringRef OutputModuleName) const {
+  if (!IsInited())
+    return;
+
+  ReplaceCompileCommands(Cmd);
+
+  Cmd.Output = getModuleFilePath(OutputModuleName).str().str();
+}
+
+bool ModuleFilesInfo::buildModuleFile(PathRef ModuleUnitFileName,
+                                      ModuleDependencyScanner &amp;Scanner) {
+  if (ModuleUnitFileName.empty())
+    return false;
+
+  for (auto &amp;ModuleName : Scanner.getRequiredModules(ModuleUnitFileName)) {
+    // Return early if there are errors building the module file.
+    if (!IsModuleUnitBuilt(ModuleName) &amp;&amp;
+        !buildModuleFile(Scanner.getSourceForModuleName(ModuleName), Scanner)) {
+      log(&quot;Failed to build module {0}&quot;, ModuleName);
+      return false;
+    }
+  }
+
+  auto Cmd =
+      Scanner.getCompilationDatabase().getCompileCommand(ModuleUnitFileName);
+  if (!Cmd)
+    return false;
+
+  ReplaceCompileCommands(*Cmd, Scanner.getModuleName(ModuleUnitFileName));
+
+  ParseInputs Inputs;
+  Inputs.TFS = Scanner.getThreadsafeFS();
+  Inputs.CompileCommand = std::move(*Cmd);
+
+  IgnoreDiagnostics IgnoreDiags;
+  auto CI = buildCompilerInvocation(Inputs, IgnoreDiags);
+  if (!CI)
+    return false;
+
+  auto FS = Inputs.TFS-&gt;view(Inputs.CompileCommand.Directory);
+  auto AbsolutePath = getAbsolutePath(Inputs.CompileCommand);
+  auto Buf = FS-&gt;getBufferForFile(AbsolutePath);
+  if (!Buf)
+    return false;
+
+  // Hash the contents of input files and store the hash value to the BMI files.
+  // So that we can check if the files are still valid when we want to reuse the
+  // BMI files.
+  CI-&gt;getHeaderSearchOpts().ValidateASTInputFilesContent = true;
+
+  CI-&gt;getFrontendOpts().OutputFile = Inputs.CompileCommand.Output;
+  auto Clang =
+      prepareCompilerInstance(std::move(CI), /*Preamble=*/nullptr,
+                              std::move(*Buf), std::move(FS), IgnoreDiags);
+  if (!Clang)
+    retu...

@ChuanqiXu9
Copy link
Member Author

ChuanqiXu9 commented Sep 15, 2023

The bot shows some escaping code issue on windows:

YAML:6:44: error: Unrecognized escape code

  | "file": "C:\Users\ContainerAdministrator\AppData\Local\Temp\lit-tmp-ns06ti05\modules-test-e111b6/foo.cppm",

And I'm like to skip the tests on windows temporarily since I don't have a windows development environment. I hope someone else can fix it later and I feel it may not be hard.

@Arthapz
Copy link

Arthapz commented Sep 15, 2023

Oh nice going to try this :)

@ChuanqiXu9
Copy link
Member Author

@sam-mccall gentle ping

Copy link
Collaborator

@sam-mccall sam-mccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay - this looks like a much more manageable scope!
Comments mostly on structure still...

@@ -45,6 +45,10 @@ class GlobalCompilationDatabase {
return std::nullopt;
}

virtual std::vector<std::string> getAllFilesInProjectOf(PathRef File) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The layering here looks close, but not quite right to me.

Rather than having GCDB expose "get all files" and building module-scanning on top of that, I think GCDB should expose module data: e.g. through a subobject interface like shared_ptr<ProjectModules> GlobalCDB::getModulesDatabase(PathRef File), returning something like the current ModulesDependencyScanner behind an interface.

(If the interface to ModuleDependencyScanner was narrower, it could also expose those queries directly)

Reasons to prefer this:

  • suppose we augment compile_commands.json with some module manifest, now we want to use a different strategy for the projects that have one - which is something we only know inside DirectoryBasedCDB
  • GlobalCDB is our general "build system integration" point - if we need to swap out the high level "what are the projects and how do we talk to their build systems" strategy, we likely need to reconsider module discovery too
  • getAllFilesInProjectOf exposes more internal structure than I'd like for general GlobalCDB: we use this abstraction to integrate in cases where there's no "project" structure and source files are not enumerable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compile_commands.json probably isn't the best place for it as it cannot be filled out at the same time (e.g., CMake knows the compile commands once it generates; it does not know the relevant module information until the build happens and updating the database would be a huge bottleneck).

I have a prototype (that needs some more work) that outlines such a module_commands.json-like file in this CMake branch. Of note that doesn't work:

  • flags is not filled out; and
  • the commands that merge the per-target databases into a top-level one does not work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the example for module_commands.json? So that we can get a rough image.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done by adding ProjectModules class in ProjectModules.cpp file.

Copy link
Contributor

@mathstuf mathstuf Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably easiest to see it by the writer code here. Basically, it is:

{
  "version": 1,
  "revision": 0,
  "module_sets": [
    {
      "name": "unique name of module set",
      "visible_module_sets": [
        "module sets",
        "which can satisfy",
        "required modules within this module set",
      ],
      "modules": [
        {
          "work-directory": "/path/to/working/directory",
          "source": "path/to/source/file",
          "name": "name.of.provided.module",
          "private": false,
          "requires": [
            "list",
            "of",
            "required",
            "modules"
          ],
          "flag_sets": [
            "-fstandalone-flag",
            ["-fflag", "with-arg"],
          ]
        }
      ]
    }
  ]
}

The semantics are:

  • module sets may refer to other module sets named in order to satisfy their required modules;
  • however, private modules in other module sets are not usable;
  • work-directory is optional;
  • private defaults to false;
  • flag sets need to be used to compile the BMI in question (Daniel Ruoso's "local preprocessor arguments");
  • flags may be deduplicated/merged by top-level identities.

tooling::dependencies::ScanningOutputFormat::P1689) {}

/// Scanning the single file specified by \param FilePath.
std::optional<clang::tooling::dependencies::P1689Rule> scan(PathRef FilePath);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a little detail, but P1689Rule is a completely opaque name.

It's also undocumented (what's "PrimaryOutput", especially in a clangd context?), and doesn't include the source file, which would help make the interface here much more regular:

optional<ModuleNode> getModuleFromSource(PathRef);
optional<ModuleNode> getModuleFromName(StringRef);

WDYT about defining our own struct here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I added ModuleDependencyInfo struct.

struct ModuleDependencyInfo {
    // The name of the module if the file is a module unit.
    std::optional<std::string> ModuleName;
    // A list of names for the modules that the file directly depends.
    std::vector<std::string> RequiredModules;
  };

The complete P1689 information may be redundant for us.

I didn't add the suggested interfaces. Since I found the following two interfaces are sufficient and more straight forward in practice (from my mind. I feel like we don't need to look at the definition of ModuleNode now when we use the scanner):

PathRef getSourceForModuleName(StringRef ModuleName);
std::vector<std::string> getRequiredModules(PathRef File);

class ModuleDependencyScanner {
public:
ModuleDependencyScanner(const GlobalCompilationDatabase &CDB,
const ThreadsafeFS *TFS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFS is not optional and should also be a reference

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

namespace clang {
namespace clangd {

/// A scanner to produce P1689 format for C++20 Modules.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1689 describes a JSON format, we're not producing that here.

This is used to query the module graph, and the implementation strategy is scanning all source files in the CDB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/// a global module dependency scanner to monitor every file. Or we
/// can simply require the build systems (or even if the end users)
/// to provide the map.
void globalScan(PathRef File);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used for its side-effect, but we don't describe what that is (i.e. that getSourceForModuleName only works after calling this function).

We can either:

  • make that explicit, and have getSourceForModuleName assert that globalScan has been called
  • hide that detail, and make getSourceForModuleName call globalScan if that hasn't happened already

even though it hides important performance characteristics, I think I prefer the second as it provides an interface that makes sense for other (non-scanning) strategies

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the scanner are only used by ScanningAllProjectModules class. In the scanner itself, I add the assertion in getSourceForModuleName . And in the ScanningAllProjectModules class, I make it to call globalScan if that hasn't happened already.

/// The users should call `ReplaceHeaderSearchOptions(...)` or `ReplaceCompileCommands(CompileCommand&)`
/// member function to update the compilation commands to select the built module files first.
struct ModuleFilesInfo {
ModuleFilesInfo() = default;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why expose the default constructor if it's not meant to be used? similarly, why is this class movable?

edit: I see you use an empty instance when the feature is off. This contradicts the "should only be initialized" comment, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Now the default constructor is private. But it is still movable since we need to assign it to Preamble.

/// Return true if the modile file specified by ModuleName is built.
/// Note that this interface will only check the existence of the module
/// file instead of checking the validness of the module file.
bool IsModuleUnitBuilt(StringRef ModuleName) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function isn't used externally, remove?
(If it was used externally, I'm not sure how callers are meant to interpret the note)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used in unittests. I've added a comment.

if (!IsInited())
return true;

if (!Successed)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the behavior of this function depend on whether IsModuleUnitBuilt() was called is surprising and seems undesirable.

When the module files are not created, isn't there some other way to detect that error? Is it really too expensive to eagerly check whether they were created, given that we were going to write them?
Do we really want CanReuse to return false if the compilation failed, causing us to repeat it over and over again even though inputs haven't changed?

My suspicion is that Successed should just be removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done by removing Successed.

StringRef ModuleName = Iter.first();
auto BMIPath = getModuleFilePath(ModuleName);
auto ReadResult =
Clang.getASTReader()->ReadAST(BMIPath, serialization::MK_MainFile,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too expensive to be a good end state for "can reuse preamble" - if we can't work out yet how to do it cheaply, maybe it's better just not to validate whether modules are out-of-date yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it sounds a little bit crazy to not validate whether modules are out-of-date to me. But maybe we can add an option to control this later. Also the compiler tried a lot to make it pretty fast to load the module file. In my experience, it won't take 1s to load a 200+MB modules. So I feel it may not be so hurting in practice. And after all, it should not be bad as the initial version of the patch.

if (!ModulesInfo.buildModuleFile(
Scanner.getSourceForModuleName(Info.ModuleName), Scanner)) {
log("Failed to build module {0}", Info.ModuleName);
return ModulesInfo;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we should either carry on and build all the modules we can, or return an empty/failed set - what's the value of returning half the modules?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ChuanqiXu9
Copy link
Member Author

@sam-mccall thanks for your high qualified comments! I think I've addressed most of them. I am not so familiar with the Github review system. Tell me if you're more happy with other practices.

@ChuanqiXu9
Copy link
Member Author

@sam-mccall gentle ping~

@bretbrownjr
Copy link

I highly recommend talking to @mathstuf of the CMake project about a way for the build system can emit a next generation set of build graph metadata (think modules appropriate compile commands, including dependencies).

I expect it will be more efficient for the build system, compiler, and LSP to all work together compared to implementing large subsets of a build system in the clangd LSP.

@ChuanqiXu9
Copy link
Member Author

I highly recommend talking to @mathstuf of the CMake project about a way for the build system can emit a next generation set of build graph metadata (think modules appropriate compile commands, including dependencies).

I expect it will be more efficient for the build system, compiler, and LSP to all work together compared to implementing large subsets of a build system in the clangd LSP.

Yeah, yeah, of course. This is listed as the second step of What we need to do next section. And for the patch itself, it is just an initial move in the clangd for C++20 modules. Since it is complex so that we decide to split the works into several patches to make sure every one could be in the same page. I've already left the space to extend how we find the map from module name to source files later. But now I prefer to not change the design even if it looks unefficient clearly.

@ChuanqiXu9
Copy link
Member Author

@sam-mccall gentle ping~

@sam-mccall sam-mccall self-assigned this Oct 23, 2023
Copy link
Collaborator

@sam-mccall sam-mccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks really close to me!
Tried to focus on how the components fit together, the details of module building and reuse can easily be tweaked later if needed.

virtual PathRef getSourceForModuleName(StringRef ModuleName) = 0;
};

class ScanningAllProjectModules : public ProjectModules {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this class only implements an interface, we could just expose a factory function from the header instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

clang-tools-extra/clangd/ModuleDependencyScanner.cpp Outdated Show resolved Hide resolved
GlobalScanned &&
"We should only call getSourceForModuleName after calling globalScan()");

if (!ModuleNameToSource.count(ModuleName))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rather if (auto It = find(); It != end()) to avoid the double-lookup?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

clang-tools-extra/clangd/ModuleDependencyScanner.cpp Outdated Show resolved Hide resolved
llvm::sys::path::remove_filename(FilePathDir);
DependencyScanningTool ScanningTool(Service, TFS.view(FilePathDir));

llvm::Expected<P1689Rule> P1689Result =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: again, drop P1689 from the var name?
We can't do anything about the function being misnamed, but...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (!PI)
return;

llvm::SmallString<128> Result(PI->SourceRoot);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PI->SourceRoot can be empty, depending on the CDB strategy. For now it's OK to bail out in this case, but we need a FIXME

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done by adding a FIXME.

llvm::SmallString<128> Result(PI->SourceRoot);
llvm::sys::path::append(Result, ".cache");
llvm::sys::path::append(Result, "clangd");
llvm::sys::path::append(Result, "module_files");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just modules I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I prefer using the term module files to describe on disk BMI files. Since the term "modules" may have too many meanings.

}

llvm::SmallString<256>
PrerequisiteModules::getModuleFilePath(StringRef ModuleName) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

giving the dir a random name and the modules predictable names, rather than just random names for the modules is an interesting choice

It makes things more complicated, but I can see maybe it's easier to debug.

Leave a comment somewhere motivating the layout?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done by adding comments to the definition of PrerequisiteModules.

return false;

auto BMIPath = getModuleFilePath(ModuleName);
if (llvm::sys::fs::exists(BMIPath))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a FIXME somewhere in this file to make this logic VFS-safe

(You're mixing physical writes and reads with VFS-based reads. It's messy to resolve because the VFS infra is read-only and because DirBasedCDB is real-FS based IIRC. So let's not try to totally sort it out now)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done by adding the FIXME to the constructor of PrerequisiteModules

std::vector<std::string> CommandLine(std::move(Cmd.CommandLine));

Cmd.CommandLine.emplace_back(CommandLine[0]);
Cmd.CommandLine.emplace_back(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this line? we should be passing all paths explicitly with -fmodule-file=, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Originally I just felt it was simpler. And now I feel passing -fmodule-file=<module-name>=<module-file> looks more neat indeed.

@ChuanqiXu9
Copy link
Member Author

@sam-mccall Thanks for you high quality comments! I think all the comments (except few ones I explained in inline comments) should be addressed in the latest force-push commit. I didn't click the Resolve Conversation button since there is a discussion for it: https://discourse.llvm.org/t/rfc-github-pr-resolve-conversation-button/73178/42.

The "significant" changes in the latest commit are mainly interface changes. They include:

  • Introduced a ModulesBuilder class. The class is responsible for building module files. Currently it follows the original style to build module files seperetely for every different source files. As we discussed, ModulesBuilder class can be used to reuse the module files across different source files.
  • Split buildPrerequisiteModulesFor from PrerequisiteModules to ModulesBuilder class. Now PrerequisiteModules are simply collection of module files. And it has two duties:
    • Check if all the module files are still up-to-date (or valid)
    • Adjust the header search options to make it to try search for the module files built by us first.
  • Split the original PrerequisiteModules to a pure virtual class and move the implementations to ModulesBuilder.cpp.
  • ScanningAllProjectModules are removed from the header files.
  • The suggested ModuleFile class are introduced as a private class to StandalonePrerequisiteModules. I think it is good now since all the module files are owned by StandalonePrerequisiteModules and they are private to others.

I can image, in the end of the day, we need to have a "public" ModuleFile class which are owned by ModulesBuilder to make it to be shared with other source files. And derived class from PrerequisiteModules will only have references to some module files. But I didn't move on. I feel they should be part of the later patches.

@Destroyerrrocket
Copy link
Contributor

Thank you for your work, @ChuanqiXu9, have the best of luck!

@ChuanqiXu9
Copy link
Member Author

@kadircet ping~

@kadircet
Copy link
Member

kadircet commented Jul 4, 2024

Hi @ChuanqiXu9, I was on vacation and just returned, i'll provide next round of feedback quite soon.

@chriselrod
Copy link

Hi, sorry if this is off topic --
How can I try this PR?

I've done the following:

  1. checked out the latest commit on the clangmodules branch (16541c8), and built it from source.
  2. Added clangd to my path, by sym-linking $LLVMBUILDDIR/bin/clangd to ~/.local/bin/clangd
$ clangd --version                                                                                                                (base) 
clangd version 19.0.0git (git@github.com:llvm/llvm-project.git 16541c8bf44cf38497ce6fd7377ee966d61be5aa)
  1. Built a cmake project after setting CXX=$LLVMBUILDDIR/bin/clang++, so that the BMI files that are produced are compatible with clangd.

Is there more I have to do? I'd prefer not to install it, but could if that's the easiest way to get everything needed in place.

I have a few major issues:

  1. This branch seems to crash a lot more often than 18.1.6 did; I never noticed the LSP disconnecting when using v18.1.6, but it is a regular occurrence with this PR.
  2. I need to run clang++ to see error messages, rather than getting diagnostics from clangd while viewing the files. That is a significant regression compared to using header files. clangd 18.1.6 didn't give diagnostics either, which is why I decided to try this PR.
  3. The diagnostics I do get tend to be false positives, e.g.
import Arena; // - In included file: 'std::ranges::__iswap::_IterSwap' with definition in module 'Allocator.<global>' has different definitions in diffe...
import Array; // - In included file: 'std::fpos' has different definitions in different modules; first difference is definition in module 'Pair.<global>...
import BoxOpt; // - In included file: 'std::array' has different definitions in different modules; first difference is definition in module 'BoxOpt.<glo...

Overall, the developer experience has been significantly worse with modules than it is with headers.

@ChuanqiXu9
Copy link
Member Author

Hi, sorry if this is off topic -- How can I try this PR?

I've done the following:

  1. checked out the latest commit on the clangmodules branch (16541c8), and built it from source.
  2. Added clangd to my path, by sym-linking $LLVMBUILDDIR/bin/clangd to ~/.local/bin/clangd
$ clangd --version                                                                                                                (base) 
clangd version 19.0.0git (git@github.com:llvm/llvm-project.git 16541c8bf44cf38497ce6fd7377ee966d61be5aa)
  1. Built a cmake project after setting CXX=$LLVMBUILDDIR/bin/clang++, so that the BMI files that are produced are compatible with clangd.

Is there more I have to do? I'd prefer not to install it, but could if that's the easiest way to get everything needed in place.

I have a few major issues:

  1. This branch seems to crash a lot more often than 18.1.6 did; I never noticed the LSP disconnecting when using v18.1.6, but it is a regular occurrence with this PR.
  2. I need to run clang++ to see error messages, rather than getting diagnostics from clangd while viewing the files. That is a significant regression compared to using header files. clangd 18.1.6 didn't give diagnostics either, which is why I decided to try this PR.
  3. The diagnostics I do get tend to be false positives, e.g.
import Arena; // - In included file: 'std::ranges::__iswap::_IterSwap' with definition in module 'Allocator.<global>' has different definitions in diffe...
import Array; // - In included file: 'std::fpos' has different definitions in different modules; first difference is definition in module 'Pair.<global>...
import BoxOpt; // - In included file: 'std::array' has different definitions in different modules; first difference is definition in module 'BoxOpt.<glo...

Overall, the developer experience has been significantly worse with modules than it is with headers.

It looks like you're missing enabling the modules support by -experimental-modules-support option.

@ChuanqiXu9
Copy link
Member Author

Given we are going to branch clang19 in the end of the month, it is highly possible that we can't land sufficient patches before that. Note that even if we land this patch before that, the support of modules in clangd is expected to very slow for real world applications. See the summary of the patch for details. So we may miss the train again.

To mitigate this, I opened https://github.com/ChuanqiXu9/clangd-for-modules. This contains the following patches to make the clangd sufficient to work with modules in practical workloads. Feedbacks and contributions are welcomed. Hope this can be helpful for people who are waiting this.

clang-tools-extra/clangd/ClangdLSPServer.cpp Outdated Show resolved Hide resolved
// Currently we simply bail out.
if (ModuleUnitFileName.empty())
return llvm::createStringError(llvm::formatv(
"Failed to build '{0}': Failed to get the primary source", ModuleName));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry i don't follow, i wasn't suggesting to drop module-name from the logs completely. i was just saying that we don't need to include ModuleName in the internal error we create, as it's a parameter passed in by the caller, hence they can prepend it to the log if they want.

e.g.:

llvm::Error buildModuleFile(....) {
  ...
  return llvm::createStringError("something went wrong");
}

void caller() {
  ....
   if (auto Err = buildModuleFile(ModuleName, ..)) {
          elog("Failed to build module '{0}': {1}", ModuleName, llvm::toString(Err));
   }
  ....
}

clang-tools-extra/clangd/ClangdServer.h Outdated Show resolved Hide resolved
@@ -146,6 +146,8 @@ class Checker {
ClangdLSPServer::Options Opts;
// from buildCommand
tooling::CompileCommand Cmd;
std::unique_ptr<GlobalCompilationDatabase> BaseCDB;
std::unique_ptr<OverlayCDB> CDB;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we dropped the support for --check completely. i think it's quite important and useful to be able to test/repro issues with it. can we keep that?

clang-tools-extra/clangd/ModulesBuilder.h Outdated Show resolved Hide resolved
@guijiyang
Copy link

@sam-mccall gentle ping~

@ChuanqiXu9 hi, can your patch add support msvc module with *.ixx suffix source? I use xmake to generate compile database, clangd cant work properly.

hi, did you meet problems when testing this? I took a quick look and I don't see I treated suffix specially. This patch find module interface unit by scanning instead of by suffixes.

hi, clangd got error
I[17:59:54.146] Scanning modules dependencies for C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.39.33519\modules\std.ixx failed: error: no such file or directory: '/ifcOutput' error: no such file or directory: '/interface' error: cannot specify '/Fobuild\.objs\test\windows\x64\debug\0aee06d009cc4e4b8a8427a06257c7f1\std.ixx.obj' when compiling multiple source files
'/ifcOutput' and '/interface' are flags to msvc cl,why clangd ocurr this error,
and compile_commands.json like this:
{ "directory": "d:\\Code\\MyProject\\guis", "arguments": ["C:\\Program Files\\Microsoft Visual Studio\\2022\\Professional\\VC\\Tools\\MSVC\\14.39.33519\\bin\\HostX64\\x64\\cl.exe", "/c", "/nologo", "/Zi", "/FS", "/Fdbuild\\windows\\x64\\debug\\compile.test.pdb", "/W3", "/WX", "/Od", "/std:c++latest", "/MDd", "/EHsc", "/TP", "/ifcOutput", "build\\.gens\\test\\windows\\x64\\debug\\rules\\bmi\\cache\\modules\\1382cb37\\std.ifc", "/interface", "/Fobuild\\.objs\\test\\windows\\x64\\debug\\0aee06d009cc4e4b8a8427a06257c7f1\\std.ixx.obj", "C:\\Program Files\\Microsoft Visual Studio\\2022\\Professional\\VC\\Tools\\MSVC\\14.39.33519\\modules\\std.ixx", "-imsvc", "C:\\Program Files (x86)\\Windows Kits\\10\\Include\\10.0.22621.0\\cppwinrt", "-imsvc", "C:\\Program Files (x86)\\Windows Kits\\10\\Include\\10.0.22621.0\\shared", "-imsvc", "C:\\Program Files (x86)\\Windows Kits\\10\\Include\\10.0.22621.0\\ucrt", "-imsvc", "C:\\Program Files (x86)\\Windows Kits\\10\\Include\\10.0.22621.0\\um", "-imsvc", "C:\\Program Files (x86)\\Windows Kits\\10\\Include\\10.0.22621.0\\winrt", "-imsvc", "C:\\Program Files\\Microsoft Visual Studio\\2022\\Professional\\VC\\Tools\\MSVC\\14.39.33519\\include"], "file": "C:\\Program Files\\Microsoft Visual Studio\\2022\\Professional\\VC\\Tools\\MSVC\\14.39.33519\\modules\\std.ixx" }

@Arthapz
Copy link

Arthapz commented Jul 15, 2024

Given we are going to branch clang19 in the end of the month, it is highly possible that we can't land sufficient patches before that. Note that even if we land this patch before that, the support of modules in clangd is expected to very slow for real world applications. See the summary of the patch for details. So we may miss the train again.

To mitigate this, I opened https://github.com/ChuanqiXu9/clangd-for-modules. This contains the following patches to make the clangd sufficient to work with modules in practical workloads. Feedbacks and contributions are welcomed. Hope this can be helpful for people who are waiting this.

i tried your branch (and modified my xmake a little to gen the module_map.txt, https://github.com/Arthapz/xmake/tree/clangd-module-map), i did get some of LSP support (neovim)

image

but the logs are full of errors like

[ERROR][2024-07-15 16:03:31] .../vim/lsp/rpc.lua:770    "rpc"   "/opt/llvm-git/bin/clangd"  "stderr"    "E[16:03:31.557] Scanning modules dependencies for build/../modules/stormkit/Core/Utils/Stacktrac
e.mpp failed: error: unable to handle compilation, expected exactly one compiler job in '/opt/llvm-git/bin/clang -c -Qunused-arguments -m64 -g -fvisibility=hidden -fvisibility-inlines-hidden -Wall -Wpe
dantic -Wextra -ffp-model=fast -O3 -mfma -mfpu=neon -mavx -mavx2 -msse -msse2 -msse3 -mssse3 -msse4.2 -stdlib=libc++ -cxx-isystem/opt/llvm-git/include/c++/v1 -Ibuild/.gens/include -I/home/arthapz/Repos
/StormKit/include -D_GLIBCXX_DEBUG -DSTORMKIT_BUILD -DSTORMKIT_BUILD_DEBUG -DSTORMKIT_ASSERT=1 -DSTORMKIT_STATIC -DBOOST_STACKTRACE_USE_ADDR2LINE -DBOOST_STACKTRACE_LINK -isystem /home/arthapz/.xmake/p
ackages/g/glm/1.0.1/e8b755c3b7d04ec88316715dbafe4cb6/include -isystem /home/arthapz/Repos/StormKit/build/.packages/f/frozen/latest/9a1f279102e34dbe91527197132695fd/include -isystem /home/arthapz/.xmake
/packages/u/unordered_dense/v4.4.0/9226fe6ee99949838863899207aec25e/include -isystem /home/arthapz/.xmake/packages/m/magic_enum/v0.9.5/c9d129e2b37f49a5897d1ef2a130e7c3/include -isystem /home/arthapz/.x
make/packages/t/tl_function_ref/v1.0.0/db82eb7eb9e64a40a64595c6319b711b/include -isystem /home/arthapz/.xmake/packages/b/boost/1.84.0/1be587ae28144669827487e346d4bc81/include -isystem /home/arthapz/.xm
ake/packages/l/libbacktrace/v1.0/5ff2e6e6560147f597b99dfa2bbea685/include -Wno-experimental-header-units -fcolor-diagnostics -fexperimental-library -fstrict-aliasing -Wstrict-aliasing -Wno-missing-fiel
d-initializers -Wno-include-angled-in-module-purview -Wno-unknown-attributes -Wno-deprecated-declarations -ggdb3 -Wno-language-extension-token -fmodule-file=std=build/.gens/stormkit-core/linux/x86_64/d
ebug/rules/bmi/cache/modules/500a1880/std.pcm -fmodule-file=stormkit.Core:Parallelism.ThreadUtils=build/.gens/stormkit-core/linux/x86_64/debug/rules/bmi/cache/modules/500a1880/stormkit.Core-Parallelism
.ThreadUtils.pcm --precompile -- build/../modules/stormkit/Core/Utils/Stacktrace.mpp'\n\nI[16:03:31.557] Recorded module unit path build/../modules/stormkit/Core/Utils/Stacktrace.mpp doesn't declare mo
dule stormkit.Core:Utils.Stacktrace\n"

and after some time some errors show up
image

i used these parameters for clangd

"--offset-encoding=utf-16", "--header-insertion=never", "--experimental-modules-support", "--compile-commands-dir=.vscode", "--modules-map-path=build/clangd_mapper.txt"

@ChuanqiXu9
Copy link
Member Author

ChuanqiXu9 commented Jul 16, 2024

@Arthapz @guijiyang thanks for reporting. But could you try to give feedbacks to that repo instead of this page. I feel this page it too long now. (My browser needs seconds to load it). And this page is indeed not a good place to track that.


@guijiyang it looks like the reason why it can't work is that the clang tools can't handle with MSVC's option '/ifcOutput' and '/interface' . It may be better if you can compile your project with clang.exe on Windows and use a compile commands from that. And also, as far as I know, clang can't compile std module from MSSTL. Then clangd can't handle that too. Maybe we have to wait for clang to compile std module from MS STL.


@Arthapz For the first error, it says the module unit doesn't declare that module. Maybe it will helpful to check that. I noticed that there is a \n in the error message. So maybe it is related to the new line character.

For the second error, it should be a inconsistent configuration in the clangd side. https://clang.llvm.org/docs/StandardCPlusPlusModules.html#object-definition-consistency. We skipped ODR check by default in clang but it looks like we forgot it in clangd. I've updated it in https://github.com/ChuanqiXu9/clangd-for-modules. You can give it another try.

@guijiyang
Copy link

@Arthapz @guijiyang thanks for reporting. But could you try to give feedbacks to that repo instead of this page. I feel this page it too long now. (My browser needs seconds to load it). And this page is indeed not a good place to track that.

@guijiyang it looks like the reason why it can't works is that the clang tools can't handle with MSVC's option '/ifcOutput' and '/interface' . It may be better if you can compile your project with clang.exe on Windows and use a compile commands from that. And also, as far as I know, clang can't compile std module from MSSTL. Then clangd can't handle that too. Maybe we have to need to wait for clang to compile std module from STL.

@Arthapz For the first error, it says the module unit doesn't declare that module. Maybe it will helpful to check that. For the second error, it should be a inconsistent configuration in the clangd side. https://clang.llvm.org/docs/StandardCPlusPlusModules.html#object-definition-consistency. We skipped ODR check by default in clang but it looks like we forgot it in clangd.

Ok, i got it, and which repo can give feedback to you?

@ChuanqiXu9
Copy link
Member Author

@Arthapz @guijiyang thanks for reporting. But could you try to give feedbacks to that repo instead of this page. I feel this page it too long now. (My browser needs seconds to load it). And this page is indeed not a good place to track that.
@guijiyang it looks like the reason why it can't works is that the clang tools can't handle with MSVC's option '/ifcOutput' and '/interface' . It may be better if you can compile your project with clang.exe on Windows and use a compile commands from that. And also, as far as I know, clang can't compile std module from MSSTL. Then clangd can't handle that too. Maybe we have to need to wait for clang to compile std module from STL.
@Arthapz For the first error, it says the module unit doesn't declare that module. Maybe it will helpful to check that. For the second error, it should be a inconsistent configuration in the clangd side. https://clang.llvm.org/docs/StandardCPlusPlusModules.html#object-definition-consistency. We skipped ODR check by default in clang but it looks like we forgot it in clangd.

Ok, i got it, and which repo can give feedback to you?

Please use this one: https://github.com/ChuanqiXu9/clangd-for-modules

@ChuanqiXu9
Copy link
Member Author

@kadircet Comments addressed. Maybe due to some github's issue, my replies are separated and didn't form a standalone list. Hope this won't make it harder.

Copy link
Member

@kadircet kadircet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot, LGTM!

i think the only big item remaining is injecting resource-dir correctly when running clang-scan-deps.

the most important follow up (for functionality, rather than optimizations and usability) is indexing the PCMs as we build them (similar to preamble-indexing we run today). For any future travelers, please do start a discussion in advance for that one.


#include "ModulesBuilder.h"
#include "ScanningProjectModules.h"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: drop spaces between includes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Currently we simply bail out.
if (ModuleUnitFileName.empty())
return llvm::createStringError(
llvm::formatv("Failed to get the primary source"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for formatv

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (llvm::Error Err = buildModuleFile(RequiredModuleName, CDB, TFS, MDB,
ModuleFilesPrefix, BuiltModuleFiles))
return llvm::createStringError(
llvm::formatv("Failed to build dependency {0}", RequiredModuleName));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add the underlying error? e.g.: "Failed to build dependency {0}: {1}", RequiredModuleName, llvm::toString(std::move(Err))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto CI = buildCompilerInvocation(Inputs, IgnoreDiags);
if (!CI)
return llvm::createStringError(
llvm::formatv("Failed to build compiler invocation"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for formatv

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto FS = Inputs.TFS->view(Inputs.CompileCommand.Directory);
auto Buf = FS->getBufferForFile(Inputs.CompileCommand.Filename);
if (!Buf)
return llvm::createStringError(llvm::formatv("Failed to create buffer"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for formatv

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

std::move(*Buf), std::move(FS), IgnoreDiags);
if (!Clang)
return llvm::createStringError(
llvm::formatv("Failed to prepare compiler instance"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for formatv

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Clang->ExecuteAction(Action);

if (Clang->getDiagnostics().hasErrorOccurred())
return llvm::createStringError(llvm::formatv("Compilation failed"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for formatv

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

clang-tools-extra/clangd/ModulesBuilder.cpp Show resolved Hide resolved
Arthapz referenced this pull request in ChuanqiXu9/clangd-for-modules Jul 16, 2024
@ChuanqiXu9
Copy link
Member Author

thanks a lot, LGTM!

i think the only big item remaining is injecting resource-dir correctly when running clang-scan-deps.

the most important follow up (for functionality, rather than optimizations and usability) is indexing the PCMs as we build them (similar to preamble-indexing we run today). For any future travelers, please do start a discussion in advance for that one.

Thanks for reviewing!

For future plans, I still like to chase for optimizations and usability first. For example, we need to reuse built module files across opened files. Otherwise every time we open a new file, we need to build all the modules dependent. This can't be tolerant in real workloads. For example, some modules and their dependent modules need 10+ seconds to build. Then it is a nightmare.

For index the PCMs, what's your suggestion and the path to do that? I don't have an idea for that yet.

@ChuanqiXu9
Copy link
Member Author

ChuanqiXu9 commented Jul 18, 2024

Given this is approved, I'd like to land this to give it more baking times. I think the current one is usable as long as the users can be tolerant about the long latency for openning each file. If users want it to be more faster by reusing module files, users can try https://github.com/ChuanqiXu9/clangd-for-modules. The changes there will be contributed to the upstream repo step by step. But the early days testing is pretty meaningful. There will be many dark corners can only be found in practice.

@ChuanqiXu9 ChuanqiXu9 merged commit fe6c240 into llvm:main Jul 18, 2024
7 of 8 checks passed
@@ -161,6 +163,7 @@ clang_target_link_libraries(clangDaemon
clangAST
clangASTMatchers
clangBasic
clangDependencyScanning
Copy link
Contributor

@nico nico Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dependency makes clangd depend (transitively) on clangCodeGen (and its dependencies, e.g. LLVM_TARGETS_TO_BUILD), which it didn't depend on previously.

This makes clangd take ~30% longer to build, increases its binary size, and it's conceptually strange that a language server contains code generation code.

Any chance we could cut the dependency on clangCodeGen again?

Copy link
Member Author

@ChuanqiXu9 ChuanqiXu9 Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it comes from

PCHContainerOps->registerReader(
std::make_unique<ObjectFilePCHContainerReader>());

but clangd didn't touch this function. So it might be possible to split that. I file an issue to track this: #99479

Harini0924 pushed a commit to Harini0924/llvm-project that referenced this pull request Jul 22, 2024
llvm#66462)

Alternatives to https://reviews.llvm.org/D153114.

Try to address clangd/clangd#1293.

See the links for design ideas and the consensus so far. We want to have
some initial support in clang18.

This is the initial support for C++20 Modules in clangd.
As suggested by sammccall in https://reviews.llvm.org/D153114,
we should minimize the scope of the initial patch to make it easier
 to review and understand so that every one are in the same page:

> Don't attempt any cross-file or cross-version coordination: i.e. don't
    > try to reuse BMIs between different files, don't try to reuse BMIs
> between (preamble) reparses of the same file, don't try to persist the
> module graph. Instead, when building a preamble, synchronously scan
> for the module graph, build the required PCMs on the single preamble
> thread with filenames private to that preamble, and then proceed to
    > build the preamble.

This patch reflects the above opinions.

# Testing in real-world project

I tested this with a modularized library:
https://github.com/alibaba/async_simple/tree/CXX20Modules. This library
has 3 modules (async_simple, std and asio) and 65 module units. (Note
that a module consists of multiple module units). Both `std` module and
`asio` module have 100k+ lines of code (maybe more, I didn't count). And
async_simple itself has 8k lines of code. This is the scale of the
project.

The result shows that it works pretty well, ..., well, except I need to
wait roughly 10s after opening/editing any file. And this falls in our
expectations. We know it is hard to make it perfect in the first move.

# What this patch does in detail

- Introduced an option `--experimental-modules-support` for the support
for C++20 Modules. So that no matter how bad this is, it wouldn't affect
current users. Following off the page, we'll assume the option is
enabled.
- Introduced two classes `ModuleFilesInfo` and
`ModuleDependencyScanner`. Now `ModuleDependencyScanner` is only used by
`ModuleFilesInfo`.
- The class `ModuleFilesInfo` records the built module files for
specific single source file. The module files can only be built by the
static member function `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef
File, ...)`.
- The class `PreambleData` adds a new member variable with type
`ModuleFilesInfo`. This refers to the needed module files for the
current file. It means the module files info is part of the preamble,
which is suggested in the first patch too.
- In `isPreambleCompatible()`, we add a call to
`ModuleFilesInfo::CanReuse()` to check if the built module files are
still up to date.
- When we build the AST for a source file, we will load the built module
files from ModuleFilesInfo.

# What we need to do next

Let's split the TODOs into clang part and clangd part to make things
more clear.

The TODOs in the clangd part include:
1. Enable reusing module files across source files. The may require us
to bring a ModulesManager like thing which need to handle `scheduling`,
`the possibility of BMI version conflicts` and `various events that can
invalidate the module graph`.
2. Get a more efficient method to get the `<module-name> ->
<module-unit-source>` map. Currently we always scan the whole project
during `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`.
This is clearly inefficient even if the scanning process is pretty fast.
I think the potential solutions include:
- Make a global scanner to monitor the state of every source file like I
did in the first patch. The pain point is that we need to take care of
the data races.
- Ask the build systems to provide the map just like we ask them to
provide the compilation database.
3. Persist the module files. So that we can reuse module files across
clangd invocations or even across clangd instances.

TODOs in the clang part include:
1. Clang should offer an option/mode to skip writing/reading the bodies
of the functions. Or even if we can requrie the parser to skip parsing
the function bodies.

And it looks like we can say the support for C++20 Modules is initially
workable after we made (1) and (2) (or even without (2)).
sgundapa pushed a commit to sgundapa/upstream_effort that referenced this pull request Jul 23, 2024
llvm#66462)

Alternatives to https://reviews.llvm.org/D153114.

Try to address clangd/clangd#1293.

See the links for design ideas and the consensus so far. We want to have
some initial support in clang18.

This is the initial support for C++20 Modules in clangd.
As suggested by sammccall in https://reviews.llvm.org/D153114,
we should minimize the scope of the initial patch to make it easier
 to review and understand so that every one are in the same page:

> Don't attempt any cross-file or cross-version coordination: i.e. don't
    > try to reuse BMIs between different files, don't try to reuse BMIs
> between (preamble) reparses of the same file, don't try to persist the
> module graph. Instead, when building a preamble, synchronously scan
> for the module graph, build the required PCMs on the single preamble
> thread with filenames private to that preamble, and then proceed to
    > build the preamble.

This patch reflects the above opinions.

# Testing in real-world project

I tested this with a modularized library:
https://github.com/alibaba/async_simple/tree/CXX20Modules. This library
has 3 modules (async_simple, std and asio) and 65 module units. (Note
that a module consists of multiple module units). Both `std` module and
`asio` module have 100k+ lines of code (maybe more, I didn't count). And
async_simple itself has 8k lines of code. This is the scale of the
project.

The result shows that it works pretty well, ..., well, except I need to
wait roughly 10s after opening/editing any file. And this falls in our
expectations. We know it is hard to make it perfect in the first move.

# What this patch does in detail

- Introduced an option `--experimental-modules-support` for the support
for C++20 Modules. So that no matter how bad this is, it wouldn't affect
current users. Following off the page, we'll assume the option is
enabled.
- Introduced two classes `ModuleFilesInfo` and
`ModuleDependencyScanner`. Now `ModuleDependencyScanner` is only used by
`ModuleFilesInfo`.
- The class `ModuleFilesInfo` records the built module files for
specific single source file. The module files can only be built by the
static member function `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef
File, ...)`.
- The class `PreambleData` adds a new member variable with type
`ModuleFilesInfo`. This refers to the needed module files for the
current file. It means the module files info is part of the preamble,
which is suggested in the first patch too.
- In `isPreambleCompatible()`, we add a call to
`ModuleFilesInfo::CanReuse()` to check if the built module files are
still up to date.
- When we build the AST for a source file, we will load the built module
files from ModuleFilesInfo.

# What we need to do next

Let's split the TODOs into clang part and clangd part to make things
more clear.

The TODOs in the clangd part include:
1. Enable reusing module files across source files. The may require us
to bring a ModulesManager like thing which need to handle `scheduling`,
`the possibility of BMI version conflicts` and `various events that can
invalidate the module graph`.
2. Get a more efficient method to get the `<module-name> ->
<module-unit-source>` map. Currently we always scan the whole project
during `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`.
This is clearly inefficient even if the scanning process is pretty fast.
I think the potential solutions include:
- Make a global scanner to monitor the state of every source file like I
did in the first patch. The pain point is that we need to take care of
the data races.
- Ask the build systems to provide the map just like we ask them to
provide the compilation database.
3. Persist the module files. So that we can reuse module files across
clangd invocations or even across clangd instances.

TODOs in the clang part include:
1. Clang should offer an option/mode to skip writing/reading the bodies
of the functions. Or even if we can requrie the parser to skip parsing
the function bodies.

And it looks like we can say the support for C++20 Modules is initially
workable after we made (1) and (2) (or even without (2)).
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
#66462)

Summary:
Alternatives to https://reviews.llvm.org/D153114.

Try to address clangd/clangd#1293.

See the links for design ideas and the consensus so far. We want to have
some initial support in clang18.

This is the initial support for C++20 Modules in clangd.
As suggested by sammccall in https://reviews.llvm.org/D153114,
we should minimize the scope of the initial patch to make it easier
 to review and understand so that every one are in the same page:

> Don't attempt any cross-file or cross-version coordination: i.e. don't
    > try to reuse BMIs between different files, don't try to reuse BMIs
> between (preamble) reparses of the same file, don't try to persist the
> module graph. Instead, when building a preamble, synchronously scan
> for the module graph, build the required PCMs on the single preamble
> thread with filenames private to that preamble, and then proceed to
    > build the preamble.

This patch reflects the above opinions.

# Testing in real-world project

I tested this with a modularized library:
https://github.com/alibaba/async_simple/tree/CXX20Modules. This library
has 3 modules (async_simple, std and asio) and 65 module units. (Note
that a module consists of multiple module units). Both `std` module and
`asio` module have 100k+ lines of code (maybe more, I didn't count). And
async_simple itself has 8k lines of code. This is the scale of the
project.

The result shows that it works pretty well, ..., well, except I need to
wait roughly 10s after opening/editing any file. And this falls in our
expectations. We know it is hard to make it perfect in the first move.

# What this patch does in detail

- Introduced an option `--experimental-modules-support` for the support
for C++20 Modules. So that no matter how bad this is, it wouldn't affect
current users. Following off the page, we'll assume the option is
enabled.
- Introduced two classes `ModuleFilesInfo` and
`ModuleDependencyScanner`. Now `ModuleDependencyScanner` is only used by
`ModuleFilesInfo`.
- The class `ModuleFilesInfo` records the built module files for
specific single source file. The module files can only be built by the
static member function `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef
File, ...)`.
- The class `PreambleData` adds a new member variable with type
`ModuleFilesInfo`. This refers to the needed module files for the
current file. It means the module files info is part of the preamble,
which is suggested in the first patch too.
- In `isPreambleCompatible()`, we add a call to
`ModuleFilesInfo::CanReuse()` to check if the built module files are
still up to date.
- When we build the AST for a source file, we will load the built module
files from ModuleFilesInfo.

# What we need to do next

Let's split the TODOs into clang part and clangd part to make things
more clear.

The TODOs in the clangd part include:
1. Enable reusing module files across source files. The may require us
to bring a ModulesManager like thing which need to handle `scheduling`,
`the possibility of BMI version conflicts` and `various events that can
invalidate the module graph`.
2. Get a more efficient method to get the `<module-name> ->
<module-unit-source>` map. Currently we always scan the whole project
during `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`.
This is clearly inefficient even if the scanning process is pretty fast.
I think the potential solutions include:
- Make a global scanner to monitor the state of every source file like I
did in the first patch. The pain point is that we need to take care of
the data races.
- Ask the build systems to provide the map just like we ask them to
provide the compilation database.
3. Persist the module files. So that we can reuse module files across
clangd invocations or even across clangd instances.

TODOs in the clang part include:
1. Clang should offer an option/mode to skip writing/reading the bodies
of the functions. Or even if we can requrie the parser to skip parsing
the function bodies.

And it looks like we can say the support for C++20 Modules is initially
workable after we made (1) and (2) (or even without (2)).

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60250983
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules clang-tools-extra clangd
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet