Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider uploading RA LSIF data to sourcegraph, from bors builds. #101446

Open
eddyb opened this issue Sep 5, 2022 · 13 comments
Open

Consider uploading RA LSIF data to sourcegraph, from bors builds. #101446

eddyb opened this issue Sep 5, 2022 · 13 comments
Labels
T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.

Comments

@eddyb
Copy link
Member

eddyb commented Sep 5, 2022

Sourcegraph supports what they call "Precise code navigation" using data from tools such as rust-analyzer, and there's even convenience tools like https://github.com/sourcegraph/lsif-rust-action for uploading such data from CI.

I don't personally use sourcegraph yet, and have been relying a lot on the GitHub CodeSearch (still in private beta, I believe), but having a publicly available service with better source code understanding seems really good and I'll try to remember it for the future.

I was looking around and found that at least one of the rust-lang repos happens to have it:
https://sourcegraph.com/github.com/rust-lang/rustc-demangle@2811a1ad6f7c8bead2ef3671e4fdc10de1553e96/-/blob/src/v0.rs?L65

You can see how well hovering and clicking around works, it's a lot of fun frankly.


For rust-lang/rust it would likely be harder than that sourcegraph/lsif-rust-action helper (at the very least custom rust-analyzer configuration might be needed), but it might still be doable enough that we could maybe try it and see what happens (as long as no "paid plan" is required, I don't know the full details on the sourcegraph side).

cc @bjorn3 (who mentioned the LSIF support, which got me looking at sourcegraph)
cc @rust-lang/infra

@eddyb eddyb added the T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. label Sep 5, 2022
@jyn514
Copy link
Member

jyn514 commented Sep 5, 2022

cc @rust-lang/rustdoc this would be a good alternative to rustdoc's go-to-definition feature

@GuillaumeGomez
Copy link
Member

If it can be integrated into rustdoc, why not.

@notriddle
Copy link
Contributor

I think it’s supposed to replace rustdoc’s entire source code view, so we would basically want to add a way to disable rustdoc’s own source code pages, while having the “source” links point to Sourcegraph instead.

rust-lang/docs.rs#31

@GuillaumeGomez
Copy link
Member

I guess we would just keep the source code rendering for the code examples in the docs and generate sourcegraph instead of the source code pages indeed. Potential issues we need to check at this point (that I can think of) would be:

  • Maintenance (what happens if something breaks or if a new features is needed).
  • Integration with rustdoc
    • In term of UI/UX (This is the biggest potential issue in my opinion, it'd be weird to have a completely different output for the source code pages and no support for themes or not the search either).
    • In term of code integration.
  • How big it is (if it takes too much time to render a page, it's maybe not a good fit).
  • How well it works on mobile devices.
  • How much longer the rustdoc run would be (this one is minor but still worth mentioning).

@bjorn3
Copy link
Member

bjorn3 commented Sep 5, 2022

I would prefer if the rustdoc pages shipped in the rust-doc component don't have any external dependencies, like sourcegraph. The primary reason to use the rust-doc component is offline usage I think. Using sourcegraph is incompatible with this.

Also we already have goto definition inside rustdoc source pages behind an unstable feature. I would prefer extending this over replacing it with sourcegraph.

@GuillaumeGomez
Copy link
Member

Oh, I thought we could generate source graph pages to be used locally. Did I misunderstood maybe?

@bjorn3
Copy link
Member

bjorn3 commented Sep 5, 2022

AFAIK sourcegraph is a hosted service to which you can upload analysis data. You can also self-host it, but it needs a full database (both postgresql and redis) and a webserver. See https://docs.sourcegraph.com/dev/setup/quickstart

@GuillaumeGomez
Copy link
Member

Oh ok. So definitely not a good lead for the rustdoc jump-to-def feature indeed. However it could be nice on docs.rs.

@jsha
Copy link
Contributor

jsha commented Sep 5, 2022

This is really cool! I think one of the big questions for jump-to-definition support in rustdoc is "why can't why use rust-analyzer for some of this?" Part of the answer has been that rust-analyzer needs to run live, and we want rustdoc output to work on a "basic" web server that only knows how to serve files (or on file:/// URLs).

SourceGraph uses Language Server Index Format (LSIF) to extract info from a language server and make it available in a static format. And there are rust-analyzer bindings for LSIF already.

Indeed as @bjorn3 says, if we wanted to run SourceGraph itself, that requires a live backend. But I believe the output of LSIF is static, so we could generate LSIF data at doc build time, and either write our own JS to process it for navigation or see if SourceGraph's JS is appropriately licensed.

One of the interesting design tidbits here: doing truly precise code navigation essentially requires compiling the code (including version resolution). GitHub's code navigation and Google Code Search don't do this so they can't be quite correct; the hosted version of SourceGraph has a clever workaround: since you're probably building each commit of your code in CI anyhow, it asks you to build the relevant indexes in CI while you're building your code, and push it to SourceGraph!

At any rate, @GuillaumeGomez, SourceGraph at least deserves some discussion in the RFC.

@GuillaumeGomez
Copy link
Member

Yes I'll mention it too!

@matklad
Copy link
Member

matklad commented Sep 5, 2022

cc @tjdevries

@eddyb
Copy link
Member Author

eddyb commented Sep 10, 2022

To be clear, I believe rustdoc is mostly off-topic for this issue.

I made this issue about better supporting Sourcegraph, as a 3rd party service.

That is, if Rust users are already using Sourcegraph, or considering using it, having the LSIF data automatically provided to it would greatly increase the usefulness of that service.

I don't see a strong connection to rustdoc, other than potentially attempting to provide an opt-in (in the browser-side settings menu) that makes the source pages redirect to equivalent Sourcegraph URLs instead.


However, I'm not sure I would want to strongly push for that, and there may be other ways to handle the situation, such as browser extensions for custom URL rewrite rules.

Rewrite rules should be pretty easy for docs.rs, e.g. regex::Regex::replace in the latest version:
(specifying an exact version instead of latest would be better - it's 1.6.0 right now so if you see this in the future you should be able to replace latest with 1.6.0, but I wanted to show that the URLs map neatly anyway)

https://docs.rs/regex/latest/regex/struct.Regex.html#method.replace
docs.rs src https://docs.rs/regex/latest/src/regex/re_unicode.rs.html#506-512
Sourcegraph https://sourcegraph.com/crates/regex@latest/-/blob/src/re_unicode.rs?L506-512

And in general (replace all $FOO with appropriate pattern syntax, regex or otherwise):

           "https://docs.rs/$CRATE/$VERSION/src/$CRATE/$PATH.rs.html#$LINES"
                                      ⇓⇓⇓
"https://sourcegraph.com/crates/$CRATE@$VERSION/-/blob/src/$PATH.rs?L$LINES"

And yes regex does seem to have LSIF data, you can see some details here: https://sourcegraph.com/crates/regex/-/code-graph - but I can't easily tell from that whether Sourcegraph themselves are auto-indexing some (or all?) of crates.io ("created by an auto-indexing job" seems to suggest that, though).

Perhaps all we need for rust-lang/rust is adding some RA configuration to Sourcegraph for auto-indexing?

EDIT: looks like you only get that level of control when you run your own instance: https://docs.sourcegraph.com/code_navigation/how-to/enable_auto_indexing (though maybe there are separate docs for auto-indexing the public sourcegraph.com instance?)

@tjdevries
Copy link
Contributor

Thanks for the ping @matklad :) this is a super interesting thread. I can try to write up my thoughts sometime in the next few weeks, but I'm mostly away from my computer at the moment since my wife and I just had a baby girl 😍 . I'll tag in @olafurpg in case he wants to chime in at all -- otherwise I'll respond later.

Super excited about this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants