-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[TableGen] Use a more efficient memory buffer for output #123353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
TableGen writes all output to an in-memory buffer in case the -write-if-change option is being used. Using raw_string_ostream for this is inefficient because all the data has to be copied every time the underlying std::string is resized. Fix this by writing to a custom raw_ostream which stores the buffered data as the concatenation of a vector of strings of increasing capacity. Each string in the vector is never resized beyond its initial capacity to avoid unnecessary copying.
@llvm/pr-subscribers-tablegen Author: Jay Foad (jayfoad) ChangesTableGen writes all output to an in-memory buffer in case the Full diff: https://github.com/llvm/llvm-project/pull/123353.diff 1 Files Affected:
diff --git a/llvm/lib/TableGen/Main.cpp b/llvm/lib/TableGen/Main.cpp
index 55a99cbfc58acd..88bca04ec19a41 100644
--- a/llvm/lib/TableGen/Main.cpp
+++ b/llvm/lib/TableGen/Main.cpp
@@ -37,6 +37,52 @@
#include <utility>
using namespace llvm;
+class stringvec_ostream : public raw_ostream {
+ std::vector<std::string> V;
+
+ size_t Pos = 0;
+ uint64_t current_pos() const override { return Pos; }
+
+ void write_impl(const char *Ptr, size_t Size) override {
+ Pos += Size;
+
+ size_t ThisSize = std::min(Size, V.back().capacity() - V.back().size());
+ V.back().append(Ptr, ThisSize);
+ Ptr += ThisSize;
+ Size -= ThisSize;
+
+ if (Size != 0) {
+ size_t NewCapacity = std::max(Size, V.back().capacity() * 2);
+ V.emplace_back();
+ V.back().reserve(NewCapacity);
+ V.back().append(Ptr, Size);
+ }
+ }
+
+public:
+ stringvec_ostream() : V(1) { SetUnbuffered(); }
+
+ friend raw_ostream &operator<<(raw_ostream &OS,
+ const stringvec_ostream &RHS) {
+ for (const std::string &S : RHS.V)
+ OS << S;
+ return OS;
+ }
+
+ bool operator==(StringRef RHS) {
+ if (Pos != RHS.size())
+ return false;
+
+ size_t Offset = 0;
+ for (const std::string &S : V) {
+ if (S != RHS.slice(Offset, Offset + S.size()))
+ return false;
+ Offset += S.size();
+ }
+ return true;
+ }
+};
+
static cl::opt<std::string>
OutputFilename("o", cl::desc("Output filename"), cl::value_desc("filename"),
cl::init("-"));
@@ -130,8 +176,7 @@ int llvm::TableGenMain(const char *argv0,
// Write output to memory.
Timer.startBackendTimer("Backend overall");
- std::string OutString;
- raw_string_ostream Out(OutString);
+ stringvec_ostream Out;
unsigned status = 0;
// ApplyCallback will return true if it did not apply any callback. In that
// case, attempt to apply the MainFn.
@@ -158,7 +203,7 @@ int llvm::TableGenMain(const char *argv0,
// aren't any.
if (auto ExistingOrErr =
MemoryBuffer::getFile(OutputFilename, /*IsText=*/true))
- if (std::move(ExistingOrErr.get())->getBuffer() == OutString)
+ if (Out == std::move(ExistingOrErr.get())->getBuffer())
WriteFile = false;
}
if (WriteFile) {
@@ -167,7 +212,7 @@ int llvm::TableGenMain(const char *argv0,
if (EC)
return reportError(argv0, "error opening " + OutputFilename + ": " +
EC.message() + "\n");
- OutFile.os() << OutString;
+ OutFile.os() << Out;
if (ErrorsPrinted == 0)
OutFile.keep();
}
|
This is mostly just an RFC to see if the idea makes sense, and if anything like my |
@jayfoad Would it be acceptable to call llvm-project/llvm/lib/TableGen/Main.cpp Lines 155 to 163 in 69d0c4c
|
That's a clever trick but it only helps if you're using But I guess if you're not using |
I think we should use timestamp based approach rather than file comparison (see gcc's -MD/-MT/-MF/...). |
Not sure what you're suggesting. Those gcc options are to do with generating dependencies, but we already get this mostly right. E.g. if I touch
|
Ah, right. A dependency-generating option already exists ( |
How much does stringvec_ostream improve the performance? |
No, I wasn't thinking that far ahead, but, yes, we could write directly to the output file. As @MaskRay asks, I'm also curious about the performance with this PR. If we simply rely on |
TableGen writes all output to an in-memory buffer in case the
-write-if-change option is being used. Using raw_string_ostream for this
is inefficient because all the data has to be copied every time the
underlying std::string is resized. Fix this by writing to a custom
raw_ostream which stores the buffered data as the concatenation of a
vector of strings of increasing capacity. Each string in the vector is
never resized beyond its initial capacity to avoid unnecessary copying.