-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[symbolizer] Empty string is not an error #92660
Conversation
After commit 1792852 ([symbolizer] Change reaction on invalid input) llvm-symbolizer issues an error on malformed command instead of echoing it to the standard output, as in previous versions. It turns out this behavior broke a use case when echoing was used to check if llvm-symbolizer is working (llvm@1792852#commitcomment-142161925). With this change an empty line as input is not considered as an error anymore and does not produce any output on stderr. llvm-symbolizer still respond on empty line with line not found, this is consistent with GNU addr2line.
@llvm/pr-subscribers-llvm-binary-utilities Author: Serge Pavlov (spavloff) ChangesAfter commit 1792852 ([symbolizer] Change reaction on invalid input) llvm-symbolizer issues an error on malformed command instead of echoing it to the standard output, as in previous versions. It turns out this behavior broke a use case when echoing was used to check if llvm-symbolizer is working With this change an empty line as input is not considered as an error anymore and does not produce any output on stderr. llvm-symbolizer still respond on empty line with line not found, this is consistent with GNU addr2line. Full diff: https://github.com/llvm/llvm-project/pull/92660.diff 2 Files Affected:
diff --git a/llvm/test/tools/llvm-symbolizer/get-input-file.test b/llvm/test/tools/llvm-symbolizer/get-input-file.test
index 8c21816591c81..fcf31ac8ff306 100644
--- a/llvm/test/tools/llvm-symbolizer/get-input-file.test
+++ b/llvm/test/tools/llvm-symbolizer/get-input-file.test
@@ -1,9 +1,9 @@
# If binary input file is not specified, llvm-symbolizer assumes it is the first
# item in the command.
-# No input items at all, complain about missing input file.
+# No input items at all. Report unknown line, but do not produce any output on stderr.
RUN: echo | llvm-symbolizer 2>%t.1.err | FileCheck %s --check-prefix=NOSOURCE
-RUN: FileCheck --input-file=%t.1.err --check-prefix=NOFILE %s
+RUN: FileCheck --input-file=%t.1.err --implicit-check-not={{.}} --allow-empty %s
# Only one input item, complain about missing addresses.
RUN: llvm-symbolizer "foo" 2>%t.2.err | FileCheck %s --check-prefix=NOSOURCE
@@ -32,8 +32,6 @@ RUN: FileCheck --input-file=%t.7.err --check-prefix=BAD-QUOTE %s
NOSOURCE: ??
NOSOURCE-NEXT: ??:0:0
-NOFILE: error: no input filename has been specified
-
NOADDR: error: 'foo': no module offset has been specified
NOTFOUND: error: 'foo': [[MSG]]
diff --git a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
index b98bdbc388faf..4fe1f4505a1b9 100644
--- a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
+++ b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
@@ -337,6 +337,13 @@ static void symbolizeInput(const opt::InputArgList &Args,
object::BuildID BuildID(IncomingBuildID.begin(), IncomingBuildID.end());
uint64_t Offset = 0;
StringRef Symbol;
+
+ // Empty input string may be used to check if the process is alive. Do not
+ // emit a message on stderr in this case but respond on stdout.
+ if (InputString.empty()) {
+ printUnknownLineInfo(ModuleName, Printer);
+ return;
+ }
if (Error E = parseCommand(Args.getLastArgValue(OPT_obj_EQ), IsAddr2Line,
StringRef(InputString), Cmd, ModuleName, BuildID,
Symbol, Offset)) {
|
If llvm-symbolizer finds a malformed command, it echoes it to the standard output. New versions of binutils (starting from 2.39) allow to specify an address by a symbols. Implementation of this feature in llvm-symbolizer makes the current reaction on invalid input inappropriate. Almost any invalid command may be treated as a symbol name, so the right reaction should be "symbol not found" in such case. The exception are commands that are recognized but have incorrect syntax, like "FILE:FILE:". The utility must produce descriptive diagnostic for such input and route it to the stderr. This change implements the new reaction on invalid input and is a prerequisite for implementation of symbol lookup in llvm-symbolizer. Differential Revision: https://reviews.llvm.org/D157210
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honetly, I feel like the requested "check that llvm-symbolizer is responding to input by sending it an arbitrary string" is a bit of a hack too: what if llvm-symbolizer were to crash immediately after printing its response? I don't have any great alternative suggestions that meet the use-case though, so I'm willing to let this in, unless any other ideas appear better.
Some alternatives for potential discussion, which may or may not be better:
- Have llvm-symbolizer print something to stdout immediately before starting its waiting for input loop. In this case, the launching process could monitor llvm-symbolizer's stdout via a pipe or similar and when the relevant output has been received, it can assume it is safe to start sending input. We could put this additional output under a switch, so that the tool isn't spouting unnecessary output under other usage.
- Have a "magic" input string that when received causes llvm-symbolizer to print a specific output string to indicate it is ready. I'm not necessarily in favour of this ("magic" stuff , but I could see it being the start of a potential full interpreter functionality, where llvm-symbolizer receives various non-address inputs and performs certain operations on receipt (this could be useful to change output options mid-run, for example).
- Frame challenge: the use case is invalid as a good response doesn't mean that more responses will come from llvm-symbolizer (see above re. crash after a response for one possible example). Instead, the process communicating with llvm-symbolizer should just assume that the process is able to receive input unless its stdout pipe is closed or the process is no longer alive. (To be clear, I'm not saying this challenge is correct, but I want to make sure we're not adding edge case behaviour changes to handle a use case that isn't actually necessary or reliable, so it's more about provoking thoughts on the issue).
Co-authored-by: James Henderson <46713263+jh7370@users.noreply.github.com>
Co-authored-by: James Henderson <46713263+jh7370@users.noreply.github.com>
Actually this change implements this variant. The "maigc" string is just an empty string. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
You can test this locally with the following command:git-clang-format --diff 219476d20fcdb21644944b0c204b4ac6d6ef3760 010d4dca6143c740616fc66a830978575f07abaf -- llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp View the diff from clang-format here.diff --git a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
index beb70f2d7a..6d7953f310 100644
--- a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
+++ b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
@@ -338,8 +338,9 @@ static void symbolizeInput(const opt::InputArgList &Args,
uint64_t Offset = 0;
StringRef Symbol;
- // An empty input string may be used to check if the process is alive and responding to input. Do not
- // emit a message on stderr in this case but respond on stdout.
+ // An empty input string may be used to check if the process is alive and
+ // responding to input. Do not emit a message on stderr in this case but
+ // respond on stdout.
if (InputString.empty()) {
printUnknownLineInfo(ModuleName, Printer);
return;
|
What does binutils addr2line do in this case? |
A good response means that the client program found the llvm-symbolizer executable and it reached its main loop, which is a useful thing to know on its own regardless of whether later requests fail. A bad response is a strong signal (not a guarantee) that there was a user error and the client program should tell the user to e.g. check their $PATH. A good response rules out this type of user error and means that if we see a crash later it means that there is a bug (either in the client program or llvm-symbolizer) and the client program can respond by telling the user to file a bug report. This is the case whether the crash was caused by responding to the
Looks like it prints |
Thanks. Would my suggestion 1. above (printing something immediately before the loop starts) be sufficient from your point of view? Alternatively suggestion 2 I think would satisfy this, though I'd prefer the input string to cause a response to be a little less magic, e.g. introduce a "ECHO" directive, a bit like we have
I'm largely ambivalent about whether we match GNU addr2line in GNU output mode, so if there's a preference for that, I'm okay with it (though if it makes the code significantly more complex, then that's a different story). As for the LLVM output mode, llvm-symbolizer also historically responded to "arglefargle" with "arglefargle": "\n" wasn't a special case (as far as I understand it) - it was just simply echoing the input whenever it didn't recognise the input as a valid address. I'm reluctant to introduce special behaviour for "\n" simply because it isn't obvious why this special behaviour should exist (assuming no knowledge about specific use cases of course). Hence my preference for one of the two other suggestions I made (both imply how they might be useful by carefully selecting the output/magic string, without further context needed). |
This is already implemented in llvm-symbolizer. The client can send empty line to it and get expected result
Echoing is in the past. We can implement it for empty string, but we should think twice. Now interface of llvm-symbolizer is compatible with addr2line (where functionality intersects) and is consistent. The tool responds to any input and does it uniformly. If the input is something that cannot be mapped into source line, llvm-symbolizer outputs fixed string. If it starts printing something new in such case, some users(scripts) may be confused, somebody's software could be broken. If we decide to make such change, we should mention it in release notes and have solid justification for breaking compatibility. Why |
If we were going to do something else,
I wouldn't say "just". Reading stdout while filtering out "harmless" errors from stderr makes the client program an order of magnitude more complex. For example, we now need to create a pipe for stderr and select on (stdout, stderr) every time we want to read something. It's also not necessarily clear when llvm-symbolizer has finished printing the error. For example, what if there are multiple errors or a single error over multiple lines, or what if llvm-symbolizer "succeeded" but printed a warning? Looks like there are already a couple of cases where llvm-symbolizer will print a warning. So the current interface is really only suitable for client programs that want to display any errors to the user (by not intercepting stderr). If we wanted to make things easier for client programs that want to do something else with errors, there should probably be a separate mode with one output stream that conveys both results and errors.
1792852 caused my llvm-symbolizer-utilizing program to break because it was sending |
Let's go with this then if there aren't any objections? To be clear, I'm happy with either a line starting with Regarding printing to stderr, I can see why that might be a problem under some usage. Probably the simplest solution, if there's a need for it, is to add a new switch that does the "print stderr to stdout" aspect. That should be another PR, if there's a request for it though. |
Compatibility is an important motivation, if we can maintain it for wider set of cases, we probably should. However if compatibility is your primary concern, the responce to empty string must be empty string, as this is the behavior of older versions. More complex patterns, like sending some non-empty commands don't address your concern, do they? |
Thanks, either approach works for me, with a slight preference for echoing the whole line back including the
My primary concern is that there should be a way to do this. I don't have a particularly strong opinion on how we do it, but it seems preferable if we can do it in a way that's backwards compatible. By that I mean that it should be possible for me to write my program so that it accepts the behavior of new versions of llvm-symbolizer as well as older versions. For example, if we adopt either variant of the |
"ECHO\n" looks good to me (compatible with pre 1792852 output).
|
If llvm-symbolizer reads input and sees "ECHO", it could be a command |
That's a minor pain, but I think we can work around this. How about this proposal:
This does have the limitation that it's impossible to simply echo a blank line. I don't have an idea for how to handle this, short of some sort of special character interpretation or similar, e.g. Thoughts? |
A simpler proposal: |
Is there a reason why users, who don't pass
Meaning of a command should be definite and not depend on the content of inspected file.
It would be a case of secret symbol, invisible but significant. We need to be clear, what problem is being solved. Initially it was something like "llvm-symbolizer needs a method of checking its liveness, compatible with the previous versions". Adding a new command does not solve this problem, - if a user is can adapt their scripts for using the new command, they can adapt them for the reply Now we have solutions for this problem:
I would vote for this patch, - it is not invasive, as the specified behavior of llvm-symbolier does not change. However the the last variant is also reasonable. |
Agree.
My original feeling was that I agree that the current patch looks better to me:
|
Honestly, I'm still super unclear why checking a symbolizer process is "alive" is a use case we need/want to support? Can you talk more about this @pcc ? Is this a problem that comes up frequently/is worth the improved ergonomics compared to the failure mode in the absence of this functionality? How/where/why does this come up so often it's worthwhile? |
It comes up often enough that compiler-rt explicitly tries to detect this case: llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_posix_libcdep.cpp Line 194 in 2ba0838
It does so by sleeping for 10ms and checking whether the process is still running. But this is clearly racy: 10ms is not necessarily enough time for the child process to reach the point where it would fail. I think there ought to be a non-racy way to do this. |
What bad thing happens if this isn't checked for, though? (presumably if the process has exited, the next thing you're going to do is try to write to its stdin, and that should fail fast, right?) And even if it can be checked for, as discussed, things can still fail (though perhaps less likely - but then again, what're the common failures we're trying to account for early on? Are they so common that they're worth this extra checking, special case, and extra functionality?) |
I answered it earlier:
Maybe the client program should always tell the user to check their $PATH if the read fails? But it seems like that could be confusing for users if there is an actual bug. |
Perhaps the best thing would be to print the command that was run, and tell the user to ensure that command works? Then if they run it themselves they'll get all the more descriptive output from their shell, etc, that'll help them understand what went wrong? (I guess that's part of the problem, yeah? If one runs the command from a shell you get more informative failures, but if it's just exec'd you only get an exit code and no user-readable description of the problem (file not found, permissions not executable, etc)) |
@spavloff, there was still ongoing discussion with this ticket (from @dwblaikie and myself - I was on PTO much of last week, so hadn't had time to review the latest comments yet). Merging a change while discussion is ongoing, even if there was one approver, is generally frowned upon. I haven't got the time now to read, digest, and respond to the latest version and comments yet, but I think this should be reverted pending the conclusion of these discussions. |
Sorry, I thoufgt that the discussion is over. Added a reverting PR. |
Reverts #92660 It needs more discussion.
As I understand it, the functionality is only needed by a tool to detect if llvm-symbolizer has entered interactive mode and is ready to receive input. I believe this is only possible when
I don't think it would be any more than other instances of options that take an empty string as the value. It's certainly not a "secret symbol", given that we'd document such a behaviour. What do we do about
To be clear, as I understand it adapting to the
Just to be clear, my issue with this particular solution is that it special cases an empty string argument, which I'd prefer to avoid. I'm not strongly opposed to it, but would prefer an alternative approach if we can find one that has consensus.
Incompatibility with addr2line is a non issue: there are several other ways in which llvm-symbolizer consumes input/produces output that is not compatible with addr2line. Specifically printing/echoing an empty string (or more correctly, a blank line, i.e.
I would hope you would vote for your own patch ;) |
If
Imagine a user who tries to reveal a problem using logs. The same (visually) commands produce different results. Yes, documenting this behavior must help, but who reads documentation? If possible, simple actions must be implemented with simple, "obvious" commands, that would not require studying documentation. Empty string is not the best solution. It is also invisible, but it also is obvious. Users expect some harmless reaction on pressing Enter.
All these commands require an argument, separated by one or more spaces. It cannot be a symbol name, at least
Exactly!
Agree, if we could invent something suitable instead of empty string, it could be a solution. The only possibility that comes to mind is ECHO with mandatory argument. It does not seem obvious and may be a source of troubles for a user, it they omit the argument, because in this case
Yes, Maybe we could consider this functionality from two viewpoints. First, the reaction on blank line. It cannot represent a symbol, so printing error message on stderr probably is not right. Second, a way to check if |
Sorry, you're right. I misunderstood some behaviour I was seeing when experimenting locally with it. In that case, I don't think the idea of using In that case, as I don't have any alternative suggestions, I'm okay with you relanding the original patch. @dwblaikie any objections? |
Blank lines get echoed? Yeah, I'm OK enough with it. |
This patch implements a bit different solution: llvm-symbolizer reacts on blank line as on unknown symbol, - it prints |
@dwblaikie Do you agree with the proposed behavior (printing |
Sure, I'm good with it along with @jh7370 |
This is recommit of #92660, reverted in #94424. Original commit message is below. After commit 1792852 ([symbolizer] Change reaction on invalid input) llvm-symbolizer issues an error on malformed command instead of echoing it to the standard output, as in previous versions. It turns out this behavior broke a use case when echoing was used to check if llvm-symbolizer is working (1792852#commitcomment-142161925). With this change an empty line as input is not considered as an error anymore and does not produce any output on stderr. llvm-symbolizer still respond on empty line with line not found, this is consistent with GNU addr2line.
This is recommit of llvm#92660, reverted in llvm#94424. Original commit message is below. After commit llvm@1792852 ([symbolizer] Change reaction on invalid input) llvm-symbolizer issues an error on malformed command instead of echoing it to the standard output, as in previous versions. It turns out this behavior broke a use case when echoing was used to check if llvm-symbolizer is working (llvm@1792852#commitcomment-142161925). With this change an empty line as input is not considered as an error anymore and does not produce any output on stderr. llvm-symbolizer still respond on empty line with line not found, this is consistent with GNU addr2line.
After commit 1792852 ([symbolizer] Change reaction on invalid input) llvm-symbolizer issues an error on malformed command instead of echoing it to the standard output, as in previous versions. It turns out this behavior broke a use case when echoing was used to check if llvm-symbolizer is working
(1792852#commitcomment-142161925).
With this change an empty line as input is not considered as an error anymore and does not produce any output on stderr. llvm-symbolizer still respond on empty line with line not found, this is consistent with GNU addr2line.