Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/gopls: consider a -utf8 flag to talk UTF8 offsets #38274

Open
myitcv opened this issue Apr 6, 2020 · 4 comments
Open

x/tools/gopls: consider a -utf8 flag to talk UTF8 offsets #38274

myitcv opened this issue Apr 6, 2020 · 4 comments

Comments

@myitcv
Copy link
Member

@myitcv myitcv commented Apr 6, 2020

What version of Go are you using (go version)?

$ go version
go version devel +801cd7c84d Thu Apr 2 09:00:44 2020 +0000 linux/amd64
$ go list -m golang.org/x/tools
golang.org/x/tools v0.0.0-20200402175326-9fc00b0a7ff6
$ go list -m golang.org/x/tools/gopls
golang.org/x/tools/gopls v0.1.8-0.20200402175326-9fc00b0a7ff6

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/myitcv/.cache/go-build"
GOENV="/home/myitcv/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/myitcv/gostuff"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/myitcv/gos"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/myitcv/gos/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/myitcv/gostuff/src/github.com/myitcv/govim/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build775712682=/tmp/go-build -gno-record-gcc-switches"

What did you do?

For editors like Vim that natively speak the encoding of the files they are editing (e.g. UTF8 for Go files), there is an overhead in translating to/from UT8/UTF16. This same cost applies on the gopls side.

Whilst exploring the addition of the workspace symbol method to govim this became particularly apparent. In the approach laid out in that linked issue, a child instance of govim can ask questions of a parent (the instance being used within an editor session). For example, the child can ask the parent to return the results of a call to the gopls.Symbol method. However, because the []SymbolInformation results from gopls are in terms of UTF16 code points, all the responses have to be re-evaluated against the contents of buffers in Vim or files on disk (if those files are not open). This places yet another unnecessary processing cost on the client.

If instead gopls had a -utf8 flag that instead talked UTF8 offsets then all of this overhead, on both sides, would be entirely obviated.

This being a flag it would be entirely opt-in and so have no effect on existing integrations like VSCode.


cc @stamblerre @findleyr

FYI @leitzler @bhcleek

@ianthehat
Copy link

@ianthehat ianthehat commented Apr 7, 2020

While I am sympathetic, I don't think this is the right answer.
At the moment the utf-16 is specified at the protocol level, so switching to utf-8 makes it non compliant, the benefits would have to be truly huge to be worth doing that.
Doing so with a command line flag when you have client server modes would almost inevitably result in two clients operating against the same server in competing modes.
We could make it negotiate per connection (although if we add that we should probably try to add it to the initialization dance and get it added to the spec) but that also means we cannot share cached results between two clients that disagree about what a position means, which would be kind of horrible, or we have to pick one format, and then do the dance at the protocol layer to convert all the positions (in which case we are not any better off than leaving it utf-16 and having the client do it instead).

@myitcv
Copy link
Member Author

@myitcv myitcv commented Apr 7, 2020

Thanks @ianthehat. I'm very aware this proposal is not without its downsides! So appreciate the discussion.

At the moment the utf-16 is specified at the protocol level, so switching to utf-8 makes it non compliant, the benefits would have to be truly huge to be worth doing that.

Agreed. The most recent motivation has been the work to add fuzzy-finder driver symbol searching to govim. There we use the Symbol method as an input to the Vim fuzzy finder/selector. Whilst the gopls results are incredibly fast because they are cached in memory, the client needs to read file contents from disk for every match not open in the editor. Whilst not a huge cost on an individual basis, this starts to add up if you have hundreds of results per keystroke.

This same overhead applies to diagnostics returned by LSP (although the numbers there are generally far less).

It also makes it much harder to write a "plain old LSP client" for things like scripting (cc @findleyr).

Doing so with a command line flag when you have client server modes would almost inevitably result in two clients operating against the same server in competing modes.

Under what conditions do you see this as being "inevitable"? Do many people have two editors running against the same LSP?

Would it not be the case that for those people who are not using -remote right now there would be no impact because they are not sharing the same remote instance?

For those people who are using remote, you could also automatically add the result of the -utf8 flag to the key that determines which remote instance to select. Worst case you end up with two remote instances in auto mode.

Whilst it would always be possible for people to find edge cases, it feels like it would be entirely possible to prevent foot-gun incidents entirely if people use the forwarder to connect to gopls, remote or otherwise (that design has, incidentally, been worth its weight in gold).

although if we add that we should probably try to add it to the initialization dance and get it added to the spec

This is a good point. I struggle imagine that I'm the first person ever to bring this up.

@mattn
Copy link
Member

@mattn mattn commented Apr 7, 2020

@ianthehat
Copy link

@ianthehat ianthehat commented Apr 7, 2020

@mattn thanks, I had not seen that one, very relevant

@myitcv I expect in the future that everyone will be using -remote, it should be the default, and as for picking a different client, we already use far to much memory we don't want to double it.
The common case I would expect is something like you are using vim, it runs up the shared server, but sets it to utf-8 mode, then you run a command line tool, which tries to use the same server because then it gets to run fast on the results already sitting there thanks to vim being open on the same project.... but the tool was compiled only knowing about utf-16 because that is the protocol standard.
I just think that is far too much of a foot gut waiting to happen, if we do it then it absolutely has to go into the protocol negotiation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.