-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[lldb] colorize symbols in image lookup with a regex pattern #69422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
So, the automatic checkers here are going to complain about formatting and style - ignore them for now. We'll review all that once the important stuff is done. You should put this PR into "draft" mode, there is a button somewhere on the page (that only you can see). Just to make it clear to other reviewers. I have a few major things to mention, I wouldn't try to handle these all at once. Tackle them incrementally. I think you could extend the existing Dump functions by adding a new new name argument with a default value of nullptr. llvm-project/lldb/include/lldb/Target/Target.h Line 1193 in 9322a0c
For an example in tree. This means that all existing callers of all these dump functions can have the same result. Inside the functions you can say By doing that, you do not need to make new versions of each dump function. Second, I'm looking at this printRed function thinking that this is a thing lldb must have done already elsewhere. That we'd have a utility function for. And we do.
That file does essentially what you're doing using it. You'd use the strings found here instead of the ASNI_... So then you don't need to duplicate printRed everywhere. To be clear, I'm not criticizing you for not knowing this. Once you've been around a while you get a sense for these things and, well, it's also what reviewers are for :). Which brings me onto the fact that this feature will have to respect the
And you can test it out manually. Next I think you'll benefit from writing some tests. Best thing to do is adapt existing ones. The tests for There are other shell tests that check colour (I'm British, color :) ) output, or lack of. If you search To run just this test:
You showed some example output on Discourse and that's a decent starting point for the checks themselves. You had one where there was one match, many matches, etc. that's about all you'd need. And the reason I say to get some tests done is it'll preserve your sanity if you want to refactor the implementation. So my final thing is, I'm seeing This might be too big of a change for this feature though, so let's deal with everything else first. As it's possible that the symbol table breaks after the first match in the string, so it wouldn't be able to return the full set of matches anyway. So it may mean adding an interface that does (in Python terms) Anyway, that'll seem like a lot but take it one step at a time and feel free to update this as you go and/or ask more questions. Great work so far! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi David! We're done with point no. 1 i.e., extending the existing dump functions by adding name (searched symbol) argument in the function definition. Can you please have a look at the changes. Also here I'm doing a lot of this in these dump functions:
if(name) {Pass name to next dump function}
else {Use the next dump function without name}
Instead of this, should I call the next dump function with name parameter and handle the name=nullptr in the PrintRed function. Because at the end, the program is going to land in the PrintRed function in the case of if. Even if I land the program at the PrintRed function in case of else as well, will it be okay?
Then instead of having writing if else in every dump function, I'll have to write this in PrintRed function only:
if(name == nullptr){print without Red}
else(){Print with Red}
Also, Can you please confirm if the current version is done the right way as per you suggested :)
Next: We will be working on point 2 (updating PrintRed function) and point 3 (Making test cases) and will update you shortly.
lldb/source/Core/Address.cpp
Outdated
pointer_sc.DumpStopContext(s, exe_scope, so_addr, true, false, | ||
false, true, true, name); | ||
else | ||
pointer_sc.DumpStopContext(s, exe_scope, so_addr, true, false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you can simplify this a lot.
If if (name)
is true, then name
is not nullptr. So:
pointer_sc.DumpStopContext(s, exe_scope, so_addr, true, false,
false, true, true, name);
Is fine. Ok so far.
In the else
, name
must be nullptr
. DumpStopContext
's name parameter defaults to nullptr
anyway, so by passing it here too, you wouldn't create any problems.
So the final code could be:
if(name)
pointer_sc.DumpStopContext(s, exe_scope, so_addr, true, false,
false, true, true, name);
else
pointer_sc.DumpStopContext(s, exe_scope, so_addr, true, false,
Here you can simplify this a lot.
If if (name)
is true, then name
is not nullptr. So:
pointer_sc.DumpStopContext(s, exe_scope, so_addr, true, false,
false, true, true, name);
Is fine. Ok so far.
In the else
, name
must be nullptr
. DumpStopContext
's name parameter defaults to nullptr
anyway, so by passing it here too, you wouldn't create any problems.
So the final code could be:
pointer_sc.DumpStopContext(s, exe_scope, so_addr, true, false,
false, true, true, name);
If name is nullptr, we pass a nullptr, if it isn't, we pass it on. DumpStopContext
is happy with either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may help you see what's going on behind the scenes: https://godbolt.org/z/vxc56798P
You can ignore the assembly specifics but for the fact that fn2 and fn3 generate identical code. Just in the source code one "explicitly" passes nullptr, and one "implicitly" passes it.
lldb/source/Symbol/SymbolContext.cpp
Outdated
@@ -185,6 +226,111 @@ bool SymbolContext::DumpStopContext(Stream *s, ExecutionContextScope *exe_scope, | |||
return dumped_something; | |||
} | |||
|
|||
void SymbolContext::GetDescription(Stream *s, lldb::DescriptionLevel level, | |||
Target *target, const char* name) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we need a second copy of this function? I assume it existed prior.
const std::string reset_color = ANSI_ESC_START + std::to_string(ANSI_CTRL_NORMAL) + ANSI_ESC_END; | ||
|
||
const char *match = text; | ||
size_t name_len = strlen(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://en.cppreference.com/w/c/string/byte/strlen
Specifically:
The behavior is undefined if str is not a pointer to a null-terminated byte string.
I won't go into what that means according to the standard but here specifically, if clang knew a call to PrintRed took a nullptr always, it could make essentially choose the result of strlen to be almost anything. Which is not what we want.
So, as you suggested in your comment on the PR page, starting with some if a nullptr then just print it out normally
is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://en.cppreference.com/w/c/string/byte/strstr
The behavior is undefined if either str or substr is not a pointer to a null-terminated byte string.
You see the theme, a lot of these functions do not check for you.
It may be helpful that a pointer to the empty string ""
will not be nullptr, a nullptr literally cannot point to anything, it's an invalid pointer (sort of, see the standards if you want the super specific language).
The point being that when you're coming from a language without strings that can be null and can only be empty string or some characters, that difference can be confusing.
Cool, some good improvements so far.
It would be a good idea to do this yes. I left some comments suggesting things to check there. I would do that and remove the redundant |
|
||
# CHECK: 3 symbols match the regular expression 'ma' in {{.*}} | ||
# The [[ confuses FileCheck so regex match it. | ||
# CHECK-NEXT: Name: {{.+}}31mma{{.+}}0min.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the lines that are supposed to have colour are important here, and ordering isn't really an issue.
Keep the "3 symbols match" check, just as a sanity check in case the output changes, but then only check for the lines with colour (you won't need -NEXT).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, ordering is something we want to be correct - but that's tested by other tests. You can assume it will be correct.
UNSUPPORTED: system-windows | ||
|
||
# RUN: %clang_host -g %S/Inputs/main.c -o %t | ||
# RUN: %lldb %t -b -o 'image lookup -r -s ma' | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have brought this up earlier but the way you're coloring the output doesn't actually do a regex match, it looks for substrings. This works fine if the input is itself basically a substring like here, but would not for m[abc]
.
(keep these test lines they're good, but add more to cover this)
So you'll need some way to either do the matching again while formatting, or pass instead of name
, a list of matched locations from the initial search.
I suggest you start by doing the first idea, redo the regex match in the formatting function. That'll give you an idea of what data you'd require, if we were to instead pass the result of the first matching to the formatter.
Hi David, I thought writing a few simple and very basic test cases will be better before updating the
First test is a basic regex symbol search for "ma" Also, we've incorporated the previous feedback. Next we'll update the
Can you please confirm weather the test cases good for now and the future plan looks okay? ;) P.S: Currently The test case tests/Shell/Commands/command-target-modules-lookup is failing as it is not checking for ANSI color codes in the output. Once we update the |
Sounds good but see my comment about substrings vs. regex. If you go and special case all the characters you're gonna end up reinventing a regex engine, but you don't need to do that, we have functions to handle that already. |
Hi David! I've done the following:
Can you please confirm if passing the pointer to |
Yes but it can be better :) See my comments.
I suggested two more test cases, otherwise they look good to me. |
Hi David!
I searched for the function |
My point is not that It's that passing Imagine you are reading one of those first calls and you see it passes in an interpreter pointer. You would have to assume it could call anything on that object. This means you've got to go and read the entire set of calls to understand what happens to it. Ok that's not a problem now but what if I want to refactor interpreter, or this code in general? If instead, that call just did ``interpreter->GetDebugger().GetUseColor()` and passed the result down 2/3/4 levels of calls, that's much clearer. Now we know it only needs that one property. (and as it turns out, you might not even need another parameter for it, see my comment about the value of |
Also side note: if you ever find yourself passing a pointer that is never going to be nullptr, use a reference instead. This also helps refactoring later. |
I made an example to illustrate: https://godbolt.org/z/zxdq5jMGT The Also, anyone writing tests for the ...hope all of that makes some sense :) |
Hi David!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently I didn't submit the last set of review comments sorry about that.
The one about vector is the key one this time.
return; | ||
} | ||
|
||
bool use_color = interpreter->GetDebugger().GetUseColor(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
If we only need this one part of the CommandInterpreter, we should only be passing that one thing down.
Try to find the earliest place you can to do the GetUseColor call, then pass the result down to here. -
You can combine that approach with one observation, that if
name
is nullptr, this will never use colour regardless of the setting. Therefore instead of having another parameter you could just set name to nullptr if colours are disabled.
As in:
- No regex search - name is nullptr, colour setting is unused
- Regex search - name is not nullptr, colour setting must be read
So if early in the callstack you know that GetUseColor returns false, you could just pass name=nullptr. No extra parameters needed. So it becomes something like:
interpreter->GetDebugger().GetUseColor() ? name : nullptr;
std::sregex_iterator end; | ||
|
||
std::string red_start = lldb_private::ansi::FormatAnsiTerminalCodes("${ansi.fg.red}", use_color); | ||
std::string reset_color = lldb_private::ansi::FormatAnsiTerminalCodes("${ansi.normal}", use_color); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally you'd use this by making a format string like:
${ansi.fg.red}%s${ansi.normal}
Then the function will remove the ansi bits as needed and you can use it as a format string to printf. strm.Printf(fmt, ...)
.
What you have isn't wrong, but changing it will save a call or two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though on second thought, doing it your way saves making a new string later when you do strm.Write(text + match.position(), match.length());
.
So leave it as is for now.
strm.PutCString(reset_color.c_str()); | ||
|
||
last_pos = match.position() + match.length(); | ||
++next; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally prefer a for loop for this sort of thing, so that one doesn't forget this last bit.
for ( ; next != end; ++next) {
...
}
//=========================================================================================== | ||
|
||
// This function is the one which colorizes the regex symbol searched | ||
static void PrintRed(Stream &strm, const char *text, const char *name, CommandInterpreter *interpreter= nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider changing name
to re_pattern
, something that tells us that it is in fact a regex.
|
||
bool use_color = interpreter->GetDebugger().GetUseColor(); | ||
|
||
std::string str_text(text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid some string copies by:
- Passing the StringRef to this function, instead of converting it to std::string and then getting a char* from that.
- Then using StringRef's
.begin
and.end
in thenext
iterator. If it doesn't like the type you can build the begin/end from.bytes_begin
and.bytes_end
instead. Those give you char* instead.
Basically StringRef is a "view" onto a string, so as long as we know where it ends we can iterate over that instead of a new copy of it.
# CHECK4: Name: Scrt{{.+}}31m1{{.+}}0m.o | ||
# CHECK4: Summary: {{.+}}`completed.{{.+}}31m0{{.+}}0m | ||
# CHECK4: Name: __libc_start_main@GLIBC_{{.+}}31m2{{.+}}0m_{{.+}}31m3{{.+}}0m{{.+}}31m4{{.+}}0m | ||
# CHECK4: Name: __cxa_finalize@GLIBC_{{.+}}31m2{{.+}}0m_{{.+}}31m2{{.+}}0m_{{.+}}31m5{{.+}}0m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one will be a bit tricky running on all the OS we support. I suggest you keep your tests to symbols we know are defined in the user's program.
As if I go to FreeBSD for example, there will be no glibc in the process' address space, it'll be the FreeBSD libc with different names.
So stick to names from main.c.
# CHECK3: Name: {{.+}}31mma{{.+}}0min.c | ||
# CHECK3: Summary: {{.+}}`{{.+}}31mma{{.+}}0min at main.c:2 | ||
# CHECK3: Summary: {{.+}}`___lldb_unnamed_sy{{.+}}31mmb{{.+}}0mol36 | ||
# CHECK3: Summary: {{.+}}`___lldb_unnamed_sy{{.+}}31mmb{{.+}}0mol37 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to here is fine, I think you can drop the last test. It's fine to assume the regex engine works as expected, as long as you've demonstrated you've called it properly, which you have.
# CHECK3: 5 symbols match the regular expression 'm[abc]' in {{.*}} | ||
# CHECK3: Name: {{.+}}31mma{{.+}}0min.c | ||
# CHECK3: Summary: {{.+}}`{{.+}}31mma{{.+}}0min at main.c:2 | ||
# CHECK3: Summary: {{.+}}`___lldb_unnamed_sy{{.+}}31mmb{{.+}}0mol36 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also these __lldb_unnamed
tend to come from inlined functions or libc functions we don't have debug info for. Don't add checks for them as they'll vary by platform.
So I would say here to maybe drop the __lldb lines and not check for the 5 symbols match
line either. It's not part of what you're testing here.
@@ -0,0 +1,31 @@ | |||
UNSUPPORTED: system-windows | |||
|
|||
# RUN: %clang_host -g %S/Inputs/main.c -o %t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can, a test where the regular expression matches at the very end of the name would be a good addition.
This would check the tail end of your PrintRed loop, where you print any remaining text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also just for sanity checking, add one where you don't match anything at all. It shouldn't do any matching or even attempt to, so the test would just ensure that it doesn't try and end up crashing.
The classic "nothing in, nothing out" test case. So it will match 0 symbols but that's what we expect.
lldb/include/lldb/Core/Address.h
Outdated
@@ -247,7 +247,17 @@ class Address { | |||
bool Dump(Stream *s, ExecutionContextScope *exe_scope, DumpStyle style, | |||
DumpStyle fallback_style = DumpStyleInvalid, | |||
uint32_t addr_byte_size = UINT32_MAX, | |||
bool all_ranges = false) const; | |||
bool all_ranges = false, | |||
std::vector<std::pair<bool, const char*>>* info = nullptr) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gonna choose this one as the example. You're on the right lines again but implementation is a bit odd. Let's take it bit by bit.
You're using a pointer not a reference, which makes some sense as you can't default construct a reference. So just FYI in future, try to use const &
if you are never going to pass a nullptr
and can default construct the type (or better, use optional
).
Next bit, pair
is a decent choice but I don't see the need for a vector here. A single pair would do, so std::pair<bool, const char*>*
would suffice.
If you were to do that, you could default construct the parameter and pass via copy. No reference or pointer needed (and pointer/bool are small types cheap to pass by copy).
So you may choose to apply those idea maybe not after I tell you the next bit, which makes them academic.
If you were to write out a table of values of name
(aka the pattern) and whether we're using colour, what would it look like? This:
const char* name | use_colour | Do we use name? |
---|---|---|
non-null | false | No |
nullptr | false | No |
non-null | true | Yes |
nullptr | true | No |
(This is a https://en.wikipedia.org/wiki/Truth_table if you haven't seen one before)
Now look where we use the name. What's the equation for that state?
if (name is non-null) and (use_colour is true)
Which means that any function currently taking this pair/vector could just take name
. Except that we change the value of name depending on the value of use_colour
.
If name
is non-null but colour is disabled, make name
nullptr. The result is the same, but functions only need name
to know what to do.
Example: https://godbolt.org/z/1W3EWTPqa
By doing this you can combine name and use_colour earlier in the callstack, and only pass name to the subsequent functions.
If we later added a way to highlight names that did not use colour, then yes, you would need a separate bool. But that is not what you're doing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to put this is:
- When
name
is nullptr, we won't print any colours. - When
use_colour
is false, we won't print any colours.
So why not use name == nullptr
to also mean that colour is disabled? The effect is the same.
Oh and good work on the updates! (because that feedback is probably hard to see with all these comments :) ) |
Hi David!
As the program should only go inside the re-pattern (new name of printRed ;-) function if use-color is true and name is not null. Hence if this condition
Was facing some issues while checking for empty string using:
But getting this error:
Was having issues while using both But I'll look into both of these remaining things. For the test case maybe I will try to check the length of the output to be zero and for String reference I'll look more in to the |
Added the wrong test case in the previous commit by mistake. Have removed it in the new one :/ Sorry for the inconvenience |
Hi @JDevlieghere! did you got a chance to look at the code 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an awesome feature, thank you for working on this! I left a few comments but overall this PR is in pretty good shape.
lldb/include/lldb/Core/Address.h
Outdated
uint32_t addr_byte_size = UINT32_MAX, | ||
bool all_ranges = false) const; | ||
uint32_t addr_byte_size = UINT32_MAX, bool all_ranges = false, | ||
const char *pattern = nullptr) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is pattern
a const char*
and not a StringRef with an empty string as the default argument? Or potentially even better an std::optional<llvm::Regex>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to StringRef
. I could not make it work with std::optional<llvm::Regex>
. Maybe I was doing something wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What errors did you get?
Remember that optionals are "pointer like" when you want to get at the value within them, *foo
, foo->method()
etc. that trips me up sometimes. Also the default value would be = std::nullopt
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize, I should have noted the errors. But as far as I remember, it was related to use of deleted functions and type conversions. I also tried passing the parameter as a reference thinking the error could be caused by copying.
Remember that optionals are "pointer like" when you want to get at the value within them, *foo, foo->method() etc. that trips me up sometimes.
Hmm, I was trying to get pattern.value()
, perhaps that was part of the problem.
Also the default value would be = std::nullopt
At least I got that right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbf I had to look up nullopt, I don't see it much. .value
returns a reference to the contained value (assuming it has one), ->
does the same so either is fine.
Can you post the error message(s)? Don't need to update the PR, just describe how you're constructing and passing it, that'll likely be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @DavidSpickett . Tried this incremental approach:
static void test_func1(Stream &strm, llvm::Regex *regex_param = nullptr) {
strm.Printf("Inside test_func1\n");
}
static void test_func2(Stream &strm, llvm::Regex ®ex_param) {
strm.Printf("Inside test_func2\n");
}
static void test_func3(Stream &strm, const std::optional<llvm::Regex> regex_param = std::nullopt) {
strm.Printf("Inside test_func3\n");
}
static void test_func4(Stream &strm, const std::optional<llvm::Regex> ®ex_param = std::nullopt) {
strm.Printf("Inside test_func4\n");
}
static uint32_t LookupSymbolInModule(CommandInterpreter &interpreter,
Stream &strm, Module *module,
const char *name, bool name_is_regex,
bool verbose, bool all_ranges) {
llvm::Regex regex1(name);
std::optional<llvm::Regex> regex2 = std::make_optional<llvm::Regex>(name);
test_func1(strm, ®ex1);
test_func2(strm, regex1);
// test_func3(strm, regex2);
test_func4(strm, regex2);
...
...
}
In test1, used the same logic like used previously i.e., function accepting pointer to llvm::Regex
and default value is null pointer.
In test2 passed llvm::regex
by reference but could not give a default value.
In test 3 used the std::optional<llvm::Regex>
but was getting the error mentioned above i.e., use of deleted functions. The exact error is:
error: use of deleted function ‘std::optional<llvm::Regex>::optional(const std::optional<llvm::Regex>&)’
1611 | test_func3(strm, regex2)
I believe it's due to the deletion of copy constructor which you mentioned.
In test 4 passing the reference of std::optional<llvm::Regex>
to the function and giving nullopt
as default value. It is working fine as expected.
I'm a little confused. Right now, we're passing the pattern as llvm::StringRef
, and finally when we reach PutCStringColorHighlighted
we convert it to llvm::Regex
:
llvm::Regex reg_pattern(pattern);
What will be the benefit of converting const char* name
to std::optional<llvm::Regex>
from the very beginning and than passing it till we reach PutCStringColorHighlighted
? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't speak for Jonas here so just my interpretation -
If you pass this as llvm::Regex, you're using the type system to signal to anyone reading this code that this is a regex pattern. Whereas using const char*
and a comment that it is a regex could be missed by the reader.
test_func3 is the ideal here (though you may have to remove const, you'll find that out later). Deleting the copy constructor is usually done when you have a move only type, which is usually something that contains a resource where copying it doesn't make sense.
I think something like test_func3(strm, std::move(regex2));
might fix it. At least, that's the obvious way to try to move construct the regex inside the optional.
Can you pass std::make_optional<llvm::Regex>(name)
to func3 maybe?
If all this is too much hassle in the end, using a pointer is the fallback option. As long as the lifetimes are ok which I'm guessing they are. By which I mean, nothing earlier in the callstack is going to destroy the pattern before you end up using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @DavidSpickett! Sorry for the delay 🙏
test_func3(strm, std::move(regex2))
works but it gives the following problem.
When we use std::move(regex2)
to pass it to test_func3
, it's essentially moved, and after the move, the regex2
in LookupSymbolInModule
becomes garbage.
So in functions where pattern is moved multiple times like this:
static void DumpAddress(ExecutionContextScope *exe_scope,
const Address &so_addr, bool verbose, bool all_ranges,
Stream &strm, std::optional<llvm::Regex> pattern = std::nullopt) {
...
...
so_addr.Dump(&strm, exe_scope, Address::DumpStyleResolvedDescription,
Address::DumpStyleInvalid, UINT32_MAX, false, std::move(pattern));
...
...
if (verbose) {
strm.EOL();
so_addr.Dump(&strm, exe_scope, Address::DumpStyleDetailedSymbolContext,
Address::DumpStyleInvalid, UINT32_MAX, all_ranges, std::move(pattern));
}
...
}
The second std::move(pattern)
, does not move the original pattern value but some garbage value. The only way is to make a copy of the original pattern and then use it with std::move(copy_pattern)
every time (which doesn't look okay :-/).
This is the example code and output if I couldn't explain it properly above:
CODE:
static void test_func3(Stream &strm, const std::optional<llvm::Regex> pattern = std::nullopt) {
if (pattern.has_value() && pattern.value().match("main")) {
strm.Printf("Pattern matches 'main' inside test_func3\n");
}
}
static uint32_t LookupSymbolInModule(CommandInterpreter &interpreter,
Stream &strm, Module *module,
const char *name, bool name_is_regex,
bool verbose, bool all_ranges) {
// test_func1(strm, std::make_optional<llvm::Regex>(name));
std::optional<llvm::Regex> regex2 = std::make_optional<llvm::Regex>(name);
if (regex2.has_value()) {
strm.Printf("Pattern has a value\n");
if(regex2.value().match("main")) {
strm.Printf("Pattern matches main\n");
}
else{
strm.Printf("Pattern doesn't match main\n");
}
} else {
strm.Printf("Pattern doesn't have a value\n");
}
test_func3(strm, std::move(regex2));
if (regex2.has_value()) {
strm.Printf("Pattern has a value\n");
if(regex2.value().match("main")) {
strm.Printf("Pattern matches main\n");
}
else{
strm.Printf("Pattern doesn't match main\n");
}
} else {
strm.Printf("Pattern doesn't have a value\n");
} ...
...
}
OUTPUT:
talha@talha-ROG-Strix-G513QE-G513QE:~/Desktop/LLDB/build/bin$ ./lldb a.out
(lldb) target create "a.out"
Current executable set to '/home/talha/Desktop/LLDB/build/bin/a.out' (x86_64).
(lldb) image lookup -r -s main
Pattern has a value
Pattern matches main
Pattern matches 'main' inside test_func3
Pattern has a value
Pattern doesn't match main
...
...
So after std::move(pattern)
, pattern != "main"
within the same function
Similarly passing std::make_optional<llvm::Regex>(name)
to func3 also causes problem problem:
i.e., test_func3(strm, std::make_optional<llvm::Regex>(name));
When we directly pass std::make_optional<llvm::Regex>(name)
, it creates a temporary std::optional<llvm::Regex>
object that is passed directly to test_func3
. This temporary object can be moved without requiring an explicit std::move
because I guess temporaries are considered rvalues, and std::optional
has appropriate move constructors to handle them.
But storing the result of std::make_optional<llvm::Regex>(name)
in a variable and then passing it without using std::move
, if the variable is passed as an argument to the function, it attempts to perform a copy unless we use std::move
. So, initially we can pass it directly to the next dumping function but further passing it will require std::move
and the same problem comes back.
Also 1 last thing for clarification, as discussed later i.e.,
I think we may be able to reduce the boilerplate by putting the prefix/suffix/pattern in a struct, and passing that as an optional assuming we can figure out the move issue with regex. But this is for a future change if at all.
The struct will be optional and not the member variables inside it as we can check them in the very first function right?
So the struct will look like:
struct information {
pattern, prefix, sufix
}
And information will be passed as optional argument rather than
struct information {
optional pattern = EMPTY, optional prefix= EMPTY, optional sufix = EMPTY
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we use std::move(regex2) to pass it to test_func3, it's essentially moved, and after the move, the regex2 in LookupSymbolInModule becomes garbage.
This is where I get on my Rust powered soapbox and preach about C++ move not being what you'd think it is. :)
Good research anyway. Sound like it's what I suspected is that the regex is carrying around the match "engine"'s state.
On the struct, yes, the struct itself would be optional. If you do pass an instance of it you must provide all 3.
To be honest, once we're packing all this into a struct, that's quite an obvious signal to the programmer that this is no ordinary string. So you could do just try the struct idea without using the Regex
type.
And ok, it's not perfect because all 3 struct members have the same type still, but if making the pattern a Regex
is this much trouble it's probably not worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in code that would be roughly:
struct information {
pattern, prefix, sufix
}
PutCStringHighlighted(....., std::optional<information> pattern_info);
Which tells the reader clearly that if they want to provide any one of the 3 they must provide all 3. Otherwise they must provide none of them.
@@ -1593,6 +1595,7 @@ static uint32_t LookupSymbolInModule(CommandInterpreter &interpreter, | |||
return 0; | |||
|
|||
SymbolContext sc; | |||
bool use_color = interpreter.GetDebugger().GetUseColor(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike in llvm we do const
variables (and there's examples of that in the surrounding code) so let's make this const bool use_color
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lldb/source/Utility/Stream.cpp
Outdated
return; | ||
} | ||
|
||
// If pattern is not nullptr, we should use color |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: seems more informative to have the inverse of this comment at the start of the function where we have an early return for the trivial case of not having a patter to highlight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Changed this.
lldb/source/Utility/Stream.cpp
Outdated
std::string format_str = lldb_private::ansi::FormatAnsiTerminalCodes( | ||
"${ansi.fg.red}%.*s${ansi.normal}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most (all?) other colors are configurable. I think this one should be too. Some examples of this are the show-autosuggestion-ansi-prefix
and show-progress-ansi-prefix
. This pattern should work here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I got this suggestion right. What I did was:
- Defined these properties in
lldb/source/Core/CoreProperties.td
:
def ShowRegexMatchAnsiPrefix: Property<"show-regex-match-ansi-prefix", "String">,
Global,
DefaultStringValue<"${ansi.fg.red}">,
Desc<"When displaying a regex match in a color-enabled terminal, use the ANSI terminal code specified in this format immediately before the match.">;
def ShowRegexMatchAnsiSuffix: Property<"show-regex-match-ansi-suffix", "String">,
Global,
DefaultStringValue<"${ansi.normal}">,
Desc<"When displaying a regex match in a color-enabled terminal, use the ANSI terminal code specified in this format immediately after the match.">;
- Declared the getter functions in
Debugger.h
and defined them inDebugger.cpp
following the pattern:
llvm::StringRef Debugger::GetRegexMatchAnsiPrefix() const {
const uint32_t idx = ePropertyShowRegexMatchAnsiPrefix;
return GetPropertyAtIndexAs<llvm::StringRef>(
idx, g_debugger_properties[idx].default_cstr_value);
}
llvm::StringRef Debugger::GetRegexMatchAnsiSuffix() const {
const uint32_t idx = ePropertyShowRegexMatchAnsiSuffix;
return GetPropertyAtIndexAs<llvm::StringRef>(
idx, g_debugger_properties[idx].default_cstr_value);
}
- Added two parameters to
Stream::PutCStringColorHighlighted
:prefix
andsuffix
. - In
Address.cpp
andSymbol.cpp
, there was aTarget
object that I used to get the debugger and its properties. InCommandObjectTarget.cpp
, there was anCommandInterpreter
object that was used for the same. But inSymbolContext.cpp
, there was aTargetSP
object and I tried using it to get the debugger properties, I got a segmentation fault, that's why I put the following two lines hard coded (while we don't solve this):
llvm::StringRef ansi_prefix = "${ansi.fg.red}";
llvm::StringRef ansi_suffix = "${ansi.normal}";
Please let me know if there's a better way to do this or if I completely misunderstood the suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I was going to add a test case for different colors, but I will only add when that error is fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be able to get it from the exe_scope like this does:
bool Address::Dump(Stream *s, ExecutionContextScope *exe_scope, DumpStyle style,
DumpStyle fallback_style, uint32_t addr_size,
bool all_ranges) const {
But you won't always have a target. Which is a bit odd because there'd always be a settings object somewhere, if only you could get to it.
Seems like symbolcontext can have a target or not, and methods that definitely need one have it as a parameter. So that's the other way to go, add a parameter for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could have sworn I did this yesterday and it did not work. But I tried (again ?) and it works now. Just checking if that's okay... This is how it is now:
llvm::StringRef ansi_prefix = exe_scope->CalculateTarget()->GetDebugger().GetRegexMatchAnsiPrefix();
llvm::StringRef ansi_suffix = exe_scope->CalculateTarget()->GetDebugger().GetRegexMatchAnsiSuffix();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, before pushing these, check-lldb
failed because of this change.
You can check the log here: https://pastebin.com/atfvLauV
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's either find another way to get to the settings, or assume no colour if the target isn't present.
Jonas may know a way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on this approach? (In SymbolContext.cpp
- line 98). I tested it and it only fails on the new test this PR creates, so we would only need to adapt it.
if (name) {
llvm::StringRef ansi_prefix;
llvm::StringRef ansi_suffix;
if (target_sp) {
ansi_prefix = target_sp->GetDebugger().GetRegexMatchAnsiPrefix();
ansi_suffix = target_sp->GetDebugger().GetRegexMatchAnsiSuffix();
}
s->PutCStringColorHighlighted(name.GetStringRef(), pattern, ansi_prefix,
ansi_suffix);
}
Or even forcing the red color when there is no target? Honestly, I do not completely understand what would mean not having a target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume if we have no target set we won't know whether to colour anything, so we'd default to no colours. Which means the prefix/suffix can default to empty string. Which is exactly what you've suggested.
Worst case in some situation it doesn't highlight and if the user cares they can open an issue for it.
I expect the SymbolContext can live without a target in some scenario, if you want to find out what, find somewhere in SymbolContext that null checks target, remove the check and see what tests fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed @JDevlieghere suggestions.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize it was going to be so tedious to pass the prefix and suffix down, but I still think it's the right thing to do, so thanks for making that work. It would be awesome if we can make the optional<Regex>
work, but that's not a blocker so this LGTM.
lldb/source/Utility/Stream.cpp
Outdated
if (pattern.empty()) { | ||
PutCString(text); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the prefix and suffix are empty, we can also take this shortcut, right?
@JDevlieghere thanks for reviewing. Once Jonas' last comment is addressed let me know if you're happy with the changes and PR message and I can merge this for you. I think we may be able to reduce the boilerplate by putting the prefix/suffix/pattern in a struct, and passing that as an optional assuming we can figure out the move issue with regex. But this is for a future change if at all. |
In the last commit:
I had to copy the same statements from line 99 here. Is it better if I declared llvm::StringRef ansi_prefix;
llvm::StringRef ansi_suffix;
if (target_sp) {
ansi_prefix = target_sp->GetDebugger().GetRegexMatchAnsiPrefix();
ansi_suffix = target_sp->GetDebugger().GetRegexMatchAnsiSuffix();
}
s->PutCStringColorHighlighted(symbol->GetName().GetStringRef(), pattern,
ansi_prefix, ansi_suffix); Also, since these changes were approved, we opted to finalize this PR like it is now. Since we are creating another PR to do the same thing for function search, we are planning on refactoring this with an optional struct and I would also like to understand better the relation between SymbolContext and the target (or another way of getting Debugger properties). |
llvm::StringRef ansi_prefix = | ||
interpreter.GetDebugger().GetRegexMatchAnsiPrefix(); | ||
llvm::StringRef ansi_suffix = | ||
interpreter.GetDebugger().GetRegexMatchAnsiSuffix(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are doing a follow up, I would put these calls inside the call to PutCStringColorHighlighted
. To limit the scope of the values even more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to suggest that too, but the Stream
class lives in utility so you can't pass in the debugger or the target without breaking the layering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might be mistaking putting the calls inside PutCStringColorHighlighted
vs. putting the calls inside the call to PutCStringColorHighlighted
.
I'm suggesting:
PutCStringColorHighlighted(<...>, interpreter.GetDebugger().GetRegexMatchAnsiPrefix(), interpreter.GetDebugger().GetRegexMatchAnsiSuffix())
(should have included that example in my first comment tbh :) )
As far as I can tell, that's how you must do it, or at least, one of the simpler ways. As the call must be passed something if target is nulptr. You could do:
If you prefer. There's not much difference to it. One puts it all in a single call which is good because the scope of the values is limited. The other has 1 conditional instead of two. Leave it as is. Sounds like you will end up refactoring it anyway. |
I really thought about doing this, but it looked too long, so I gave up, good to know it's ok.
Nice! Thank you for all the help so far! |
Wait, one last thing :)
If the prefix and suffix are empty. Not prefix or suffix is empty. If there's no pattern or (the prefix is empty and the suffix is empty) -> just print the string normally. Also please add a some tests where you set only suffix, and only prefix, neither. To validate that. |
Or rather, set the suffix to |
I tried this now and I guess it's not correct. For example, if I have a prefix (red color) and no suffix, the text does not go back to normal and will be forever red. Or am I doing something wrong?
Yes, I'm adding new tests like this now. |
This is expected. We don't expect many people to do this but if they do, that's their mistake. A more realistic use might be prefix= |
I see now. Thank you for the explanation. The last commit adds the test cases and change the |
This is failing tests but I'm pretty sure it's a missing nullptr check. I'll fix it. |
Ok bots look good, so congratulations on landing this change! 🥳 (you will have some email from build bots, it's a good idea to go look at the failures just to see what they look like, but I have fixed them) Sounds like you already have ideas for follow ups. You should also add a release note (https://github.com/llvm/llvm-project/blob/main/llvm/docs/ReleaseNotes.rst#changes-to-lldb) at some point when you're happy with the feature overall. Or in other words, when it can be described in 1 or 2 sentences without a lot of caveats. I'm trying the feature myself and noticing the thing you said about the symbol context. Even when looking up symbols on a running process, when we print a (though one can work around this by adding Could be that there's never been a need for target here, and as you said, what you really want is the debugger settings. So perhaps the better route is to add a link back to those. |
Spoke too soon, had to disable the test on Windows: 810d09f Making it work there may require making function lookup work, not sure. And Mac OS: 61f1825 Which I think is a matter of finding the right regex for the CHECK. If either of you have access to Windows/Mac, please give it a go. Otherwise, @JDevlieghere please could you (/one of your colleagues) try the test on Mac and see what can be done? |
Sure will try to make it work for windows. Hoping it will be as easier to build lldb on windows as it's for linux 🙂 |
Follow-up to #69422. This PR puts all the highlighting settings into a single struct for easier handling Co-authored-by: Talha Tahir <talha.tahir@10xengineers.ai>
Fixes #57372
Previously some work has already been done on this. A PR was generated but it remained in review:
https://reviews.llvm.org/D136462
In short previous approach was following:
Changing the symbol names (making the searched part colorized) -> printing them -> restoring the symbol names back in their original form.
The reviewers suggested that instead of changing the symbol table, this colorization should be done in the dump functions itself. Our strategy involves passing the searched regex pattern to the existing dump functions responsible for printing information about the searched symbol. This pattern is propagated until it reaches the line in the dump functions responsible for displaying symbol information on screen.
At this point, we've introduced a new function called "PutCStringColorHighlighted," which takes the searched pattern and the expected output and applies colorization to highlight the pattern in the output. This approach aims to streamline the symbol search process to improve readability of search results.
Co-authored-by: José Junior josejunior@10xengineers.ai