Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow '.' in idents for golang #166

Merged
merged 2 commits into from
Oct 16, 2018

Conversation

psanford
Copy link
Contributor

@psanford psanford commented Oct 10, 2018

Go symbols are in the format 'package.function'. Allow '.' in
ident so that you can attach uprobes to go functions.

Prior to this patch, the following would error:

$ objdump -tT /tmp/main  | grep main.doWork
objdump: /tmp/main: not a dynamic object
0000000000485360 g     F .text  0000000000000049 main.doWork

$bpftrace -e 'uprobe:/tmp/main:main.doWork { printf("doWork\n"); }'
1.22: syntax error, unexpected ., expecting {

With this patch applied I am able to trace this function in a go binary.

Copy link
Member

@ajor ajor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd definitely be good to get in support for Go symbols. As it is, this change breaks a few other things in the language though due to "." being used for struct field access. We really need to get the CI working again...

It might be enough to just update the definition of wildcard in parser.yy with an extra "." rule (not sure though). Also, please add a parser test once this is working!

@psanford
Copy link
Contributor Author

Thanks for looking. I'm not surprised that it broke other things.

I'll see if I can update the parser properly and also run the tests!

@psanford
Copy link
Contributor Author

I pushed an update to this branch implementing this as suggested by @ajor. I also added a test.

@psanford
Copy link
Contributor Author

This is actually still insufficient for go. This fixes package level functions but methods on structs are more complicated. Given the following definition:

package main
type ExampleStruct struct {
}

func (s *ExampleStruct) PtrRecv() {
	fmt.Println("ptr")
}

The symbol shows up as main.(*ExampleStruct).PtrRecv.

Supporting this seems like it would be more complicated. Would you want to support escaping (\*)?

@jasonkeene
Copy link
Contributor

jasonkeene commented Oct 13, 2018

Thanks @psanford for working on this! Using your change, I was able to attach to some of my functions using uprobes. I'm hoping we can get Golang to be first class with bpftrace.

Another consideration that is relevant to this change is that Golang allows for any unicode letter or digit to be used in an identifier. For example:

package main

import "fmt"

func ⱑ() {
	fmt.Println("this works")
}

func main() {
	ⱑ()
}

This compiles to:

$ objdump -t unicode_test | grep main.ⱑ
0000000000484b10 g     F .text	000000000000006e main.ⱑ

However this doesn't appear to parse well:

$ bpftrace -e 'uprobe:/tmp/unicode_test:main.ⱑ { printf("nope\n"); }'
1.61: invalid character '�'
1.61-62: invalid character '�'
1.61-63: invalid character '�'
Attaching 1 probe...
Could not resolve symbol: /home/pivotal/Desktop/unicode_test/unicode_test:main.

Incidentally, I think developing a grammar for escapes might not be the only option we should consider. Another solution would be to just have a literal syntax. Something like this:

uprobe:/tmp/main:"ⱑ.*([{}]):" { printf("doWork\n"); }

This would allow for whatever bytes you want with 0 ambiguity. You would only have to escape the " which wouldn't be hard. It would support unicode, periods, parens, colon, whatever is a valid symbol byte.

Wildcards could just exist outside the literal:

uprobe:/tmp/main:*"literal" { printf("doWork\n"); }

Or they could exist inside the literal with special meaning and just have an escape for the byte value.

Outside of golang support, it should be noted that filenames can have unicode characters as well and that also appears to currently not work.

What do you think?

@psanford
Copy link
Contributor Author

Thats a good point about unicode that I hadn't considered. Quoting does seem like a better option than trying to do escaping.

@ajor
Copy link
Member

ajor commented Oct 16, 2018

Quoting sounds good. Could probably just reuse the string type to do this

Allow quoted string attach points so that you can match more than
[a-Z0-9].

This specific issue that this fixes is attaching to go
functions. These symbols are in the format '<package>.<function>' and
can contain unicode letters and digits.
When specifying a string literal attach point do not try to expand
wildcards if they are present.

This specifically is required for Go methods on pointer receivers,
which have a '*' in their symbol name.
@psanford
Copy link
Contributor Author

I pushed the change to support quoted strings. This fixes both the unicode issue and the pointer receiver issue.

In addition to updating the parser.yy I updated the AttachPoint class to have an explicit bool need_expansion that is set to false when we are in the string literal case. This short circuits the wildcard logic in add_probe so that we look for the literal '*' instead of trying to do an expansion.

I left the other attach_point form the same as before (need_expansion=true).

@brendangregg
Copy link
Contributor

This looks good to me. Note that there's follow-up work to get uretprobes to work with go sanely; that's been discussed before ( iovisor/bcc#1320 ).

@ajor ajor merged commit d123315 into bpftrace:master Oct 16, 2018
@ajor
Copy link
Member

ajor commented Oct 16, 2018

Thanks!

@jasonkeene
Copy link
Contributor

@psanford Thanks a lot for doing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants