Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added bazel workspace lsp support #51

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

DanielEliad
Copy link

Hi :)
I found this project after looking for an lsp server for working on bazel projects,
I added some load path resolution features to make it be able to handle bazel workspaces and remote repositories.
I added:

  1. relative dir resolution e.g. ":file_name.bzl" "package1/package2:file_name2.bzl"
  2. global repository resolution e.g. "//package1/package2:file_name.bzl" "//:file_name2.bzl"
  3. remote repository resolution e.g. "@bazel_skylib//:bzl_library.bzl"
  4. BUILD target resolution load(SOME_PATH, TARGET_NAME) redirects to a BUILD/BUILD.bazel file in that directory and to the target with the name TARGET_NAME

This PR also supports bzlmod and regular workspaces, local_path_override and --override_repository

Thank you for taking the time to look at it :)

@facebook-github-bot
Copy link
Contributor

Hi @DanielEliad!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@nataliejameson
Copy link

Hey @DanielEliad , thanks so much for the PR. We're still actively developing this, so I'm surprised to see people finding it! I haven't dug through all the way yet (it's after quitting time on Friday here ;) ), but I do want to say at the outset, we're not really looking to put bazel specific logic inside of this language server directly, so we wouldn't be able to use this as-is. We have multiple projects that use this (build systems, some other tooling) that have different needs around what they can interpret, how path resolution is done, etc, and the starlark part of this needs to remain reasonably generic. (Unless it's changed, I think the spec also still doesn't really define what a "load" is, so we want to keep that API boundary too so projects can do their own thing).

I would say, though, it should be reasonably straight forward to implement what you had mentioned as an impl of LspContext that's something more standalone. That's what I've done, personally, for our internal projects, and it's worked well. Let me know if that could work for you, and that's for giving things a look!

@DanielEliad
Copy link
Author

Thank you for taking a look at it :)
If you want to keep this repository generic I could open another repo and depend on this one
I think it would be cool to create a new bazel lsp crate in this repo so that all tests run together,
what do you think?

in any case thank you for working on this repo it's awesome 👍

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 13, 2022
@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

1 similar comment
@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@ndmitchell
Copy link
Contributor

Thanks @DanielEliad for the contribution! Definitely a super cool thing, CC @aherrmann who was considering doing something similar, and should definitely check this out.

Looking at what this patch does:

  1. It changes so that if you fail to load the AST at the destination, you jump to the whole file rather than producing an error. That seems a very good change to have in isolation.
  2. Adds variants to the enum for ResolveLoadError. I don't think Bazel specific errors should end up in that enum. But in general with such error enums, we've had much better success not exporting them at all, so making the enum private, having it only have the variants that are useful in the module where it is defined, and then defining a separate enum for errors in bin, rather than reusing a library one. That seems the way to go here, and means there is no Bazel stuff in the library.

Then for the Bazel specific logic, there are three options for where it goes:

  1. In the generic Starlark-Rust binary. Perhaps with a --bazel flag argument in addition to --lsp which opts you into the Bazel LSP.
  2. In a separate binary in this repo, say starlark-rust-bazel, which has the Bazel specific logic in it only.
  3. In a separate repo that depends on this library.

I'm not too fussed, but my inclination would be to start at option 1 (it's not huge), move to option 2 when it gets bigger, and if it ever takes off properly, move to option 3. While I wouldn't be keen to have Bazel logic in the starlark-rust library itself, having it in the binary we ship alongside is less of a problem, since its a more lump of somewhat useful things, doesn't have to be precisely the standard or widely reusable. At the moment we are fairly rapidly iterating on the external API for the LSP server, so jumping straight to option 3 has costs in terms of breaking stuff fairly often, which would be a bit sad.

@DanielEliad
Copy link
Author

Hey sounds great!
If the lsp bazel support could be in the repo I think that would be best because this project is actively worked on so depending on it from an external repo will be tough.

I think the best thing to do is combine 2. & 3.
I'll take all the generic bazel logic into a different crate(lib) (in this repo) and if --bazel flag is supplied I'll run some more bazel specific resolution logic from that crate. This way I'll also remove most of the bazel logic from the main lsp and fix ResolveLoadError enum.

And if you would ever want a different binary for the bazel lsp we could create it and either copy the some of the lsp logic or put in a common crate that both binaries will depend on.
For now I'll start with separating most of the bazel logic to a different crate and add the --bazel flag to the main binary

Thank you for your response! :)

@DanielEliad DanielEliad force-pushed the feature/bazel_load_statement_support branch from bf88954 to bc34622 Compare August 21, 2022 17:45
@ndmitchell
Copy link
Contributor

I might suggest holding off on doing too much restructuring. There are a few of us who maintain this crate, and there's no guarantee we'll agree. We should be talking tomorrow so can give you an answer then.

@aherrmann
Copy link

Thanks for the ping @ndmitchell, this looks very interesting! Also pinging @k1nkreet, who was looking into this.

Copy link

@aherrmann aherrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUILD target resolution load(SOME_PATH, TARGET_NAME) redirects to a BUILD/BUILD.bazel file in that directory and to the target with the name TARGET_NAME

I'm not sure I understand that point. In Bazel, the load statement is used to load symbols defined in .bzl files. E.g. load("//some:file.bzl", "some_symbol"). How does this relate to targets defined in BUILD files.

Comment on lines 263 to 266
let malformed_err = ResolveLoadError::PathMalformed(path_buf.clone());
let mut split_parts = path.trim_start_match("//").split(':');
let package = split_parts.next().ok_or(malformed_err.clone())?;
let target = split_parts.next().ok_or(malformed_err.clone())?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this code-base, yet. So, take my comments with a grain of salt. But, I see a lot of label parsing logic repeated throughout. Perhaps it would be better to factor it out into one function producing a structured label representation?
Bazel supports a number of short-hands, e.g. @foo is short for @foo//:foo, or @foo//bar is short for @foo//bar:bar, or baz is short for :baz. I think it'll be easier to get these right if the label parsing logic is located in one place and not spread out.

EDIT Just saw, the next commit does just that.

Ok(info
.output_base
.join("external")
.join(repository)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, this logic will probably no longer work once bzlmod becomes the norm (Bazel >=6.0) and repositories are stored under their canonical repository name.

Comment on lines 370 to 373
if build.exists() {
Some(build)
} else if build_bazel.exists() {
Some(build_bazel)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bazel will prefer BUILD.bazel over BUILD if both exist. So, IIUC the order should be swapped.

Comment on lines 31 to 32
let target = split_parts.next()?;
Some((package, target))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, //foo can be used as a short-hand for //foo:foo,
and relative labels can skip the :, e.g. foo instead of :foo.


fn split_repository_package_target(label: &str) -> Option<(&str, &str, &str)> {
let mut split_parts = label.split("//");
let repository = split_parts.next()?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And @foo as a short-hand for @foo//:foo.

@laurentlb laurentlb mentioned this pull request Aug 22, 2022
@DanielEliad
Copy link
Author

Hey :)
Thank you for the CR
I fixed everything but the bzlmod support I'm still looking into how I could find the canonical repository name in bzlmod without parsing implementing what bazel is doing again. I tried getting information from bazel query but couldn't get the repository mapping anywhere,
maybe @laurentlb could help us out with a way to find these mappings?
for now I noticed that local_path_override still uses the same name but bazel modules and that extension repositories have both .ext.repo and repo directories under external/

For now I could check for all permutations of a name
e.g. rules_pkg -> rules_pkg.0.5.1 if there is only one version of this module and that should solve most of the problems

this is not completely hermetic but there seem to be a lot of issues under the bazel repository that want this information to be exported somehow.

…ading //package:target we now load // as the workspace the file is in right now
@laurentlb
Copy link

could help us out with a way to find these mappings?

@Wyverald can help maybe?

@nataliejameson
Copy link

Hey @DanielEliad , sorry, I was out for a bit and am just catching up. So, I do want to detangle things a bit, and also I guess get my opinion in writing here (@stepancheg might want to too :) )

  1. I'm going to break the two less-related changes out and run them through our internal system today to get them landed. They're good features (thanks again!), but a little tangential to the rest of the PR. Those two things are just returning the file on ast load failure, and making the enum private in the interface.

  2. As for whether we're cool with bazel things in the repo.

My first inclination was "No. Nope nope nope, generic all the way". However, making someone depend on things externally also means we can't change the APIs as willy-nilly as we can now. And I'm lazy, and don't want to have to deal with that. So, I think I'd be fine keeping this in the repo, but only for the starlark bin. Bazel logic should never be in the library portion. I'd also rather it were kept in a contrib submodule to make it clear that this is not something that the we (bluntly: The folks working on this at Meta) are actively working on or supporting, other than making sure if we make API changes, the tests pass.

The only other concern I'd have, is if there's going to be a lot of iteration on the bazel api side of things. If there's potentially going to be a lot of churn there, I'd honestly rather it were kept in another repo (and #52 seems to suggest that there would at least be some codegen tools, etc to create the bazel builtins interface). Not that we don't love you folks, but a higher volume of pull requests for a piece of the project that we're not even really supporting does become a bit of a support load after a while.

But, this is just one person's thoughts, I'll let the rest of the team chime in :)

@ndmitchell
Copy link
Contributor

My initial thought was in the repo is fine. But as I see the iteration/discussion between three different people on what this Bazel stuff should do (which I love to see!), I worry that we are putting non-Bazel people in the loop as the ultimate arbiters of what gets committed, and I worry we're going to suck at that. A separate repo would let the Bazel interested parties have more freedom to do what makes sense there, without us slowing you down.

Those are my thoughts, rather than a final answer though!

@stepancheg
Copy link
Contributor

One possibility is to extend starkark-lsp with starkark (or with command).

I. e. starlark-rust --lsp accepts additional optional command line flag --lsp-resolver resolver.star.

resolver.star provides callbacks to resolve load paths to filesystem paths, provide globals etc.

def resolve_file_externals(source_defs):
  # source_defs is
  # {
  #   "path": "/Users/me/project/foo/bar.bzl",
  #   "loads": [":x.bzl", "qux//quux.bzl"],
  # }
  # and should return resolve
  return {
    "globals": {
       "select": { ... }
       ...
    },
    "loads": {
      ":x.bzl": "/Users/me/project/foo/x.bzl",
      "qux//quux.bzl": "/Users/me/qux/quux.bzl",
    },
  }

LSP binary will call resolve_file_externals.

This script:

  • can be generated by a wrapper script before LSP starts
  • or a function like execute_command_capture_json can be provided to the script to resolve things dynamically

Or maybe instead of starlark, --lsp-resolver should take a path to a command which takes JSON input and outputs another JSON, without any starlark.

This way no need to write rust code to create your custom LSP setup for Bazel (or whoever else needs it).

Comment on lines +511 to +531
});
match location_result {
// if the location result was successful
// only return a location link if a location was found
Ok(location) => location.and_then(|target_range| {
Some(LocationLink {
origin_selection_range: Some(source.into()),
target_uri: url.clone().try_into().ok()?,
target_range,
target_selection_range: target_range,
})
}),
// if the location result was an error
// try to at least go to the file
_ => Some(LocationLink {
origin_selection_range: Some(source.into()),
target_uri: url.try_into()?,
target_range: Range::default(),
target_selection_range: Range::default(),
}),
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually a little curious how you were hitting this issue. The first unwrap_or_default() gets the default range in the case of an error, and the second one gets the default range in the case of a None. I actually had a test written up to make sure we caught the problem, and it doesn't actually repro.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because right now what it'll do for loads, is, if it can't find the symbol, it'll jump to where it is in the load statement, and you can get jump to the file by doing a gotodefinition on the file portion of the load. I could kind of argue either way on that, tbh? Like, if the file doesn't parse, or the symbol doesn't exist, wanting to know that it's supposed to be loaded. Popping a "random" file up might be confusing. But I could also see an argument in the other direction, that you at least want to get to the file that it's in.

I think I'll probably go with adding a little bit of user-facing error reporting, and then making this jump to the file in the load statement (which I think is the only place this should be an actual issue)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because of how the String Literal definition location is built I first have to redirect to the url of the BUILD file and only then I get the ast that I can give to the location_finder to find the exact range in the file.
Like looking for target :my_target will redirect to the url for the closest BUILD file and then calculate the ast and call location_finder to get the location of my_target in that file. If it doesn't exist there location_finder returns Ok(None).
because resolve_string_literal doesn't have access to the ast and can't calculate it I have to "assume" a target with that name exists parse the file and then decide what the range is.
because of bazel label ambiguity we can try to goto definition on "manual" in tag = ["manual" ] and then get sent to the closest BUILD file.
So in case the location_finder couldn't find the range I read that as the resolve_string_literal guess was wrong.

If the LspContext had a way of calculating the ast the logic would be a lot simpler

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we don't have the context do the calculation because it can't be assumed to have the most up to date file contents. (The server maintains contents of files that are open and edited, but not necessarily saved).

I'm still a tad confused, tbh, about this behavior. Is the point with the "manual" bit that you wouldn't want the BUILD file to be visited, or... ? Because right now, it should open whatever BUILD file you return from the first pass of resolve_string_literal, whether Some or None is returned from the location finder.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I would like it so say can't find definition instead of taking me to the nearest BUILD file,
the location_finder returns Ok(None) if it cant find the target (in this case "manual") in the build file so I don't redirect to the BUILD file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could probably make that an enum and make it more configurable. Because our projects do want that behavior :) Will try to knock something up in the morning.

@nataliejameson
Copy link

@stepancheg I'd be down with something like that (further details to be hashed out in an Issue if we go that way :) ). It'd also I think work with @laurentlb 's concerns in #52 . Agreed a simpler interface for users who have non-starlark-rust interpreters might be desirable :)

@ndmitchell
Copy link
Contributor

@stepancheg nice idea! I imagine that most users of the script will find Starlark too limiting (e.g. they'll want to read and parse a proto file), so pregenerating it as JSON might be simpler for the user, and probably simpler for us.

@nataliejameson
Copy link

@stepancheg nice idea! I imagine that most users of the script will find Starlark too limiting (e.g. they'll want to read and parse a proto file), so pregenerating it as JSON might be simpler for the user, and probably simpler for us.

To clarify, though path resolution inherently tends to require some programmatic logic, so you'd still have to have a starlark file. Just, it might get to have access to the pregenerated json from whatever script you invoke (which e.g. might parse through protobufs, etc) :) But for "builtin" deifnitions, yeah, JSON would be nice, since we already have a Doc structure that we use throughout the LSP and for documentation generation anyways.

@DanielEliad
Copy link
Author

Thank you all for taking the time to look at this!
the external resolver solution seems awesome and will definitely make everything simpler and will allow me to write the bazel logic away from this repo.
To be able to generate that json I would need to parse more files and I would need every single string literal in the file.
For example a bazel BUILD file might look like this

cc_binary(
    name="my_binary",
    srcs=["a.c"],
    deps=["//package1:my_library"]

I would need to generate a definition location for "//package1:my_library" but for that I would need the ast for the corresponding BUILD file in package1
so with an external script without the ast it would be difficult to generate a json that can goto definition on every string literal.
Also in your example source_deps would need to contain every single string literal because the logic of determening if its a label or not, will be in the external script.

Maybe the quickest solution here would be to depend on this crate from a different repo and implementing LspContext, but without access to the asts of the files in the project it wouldn't be possible to provide good language server experience (which is why I had to put bazel specific logic in the lsp crate in this pr).
If we could pass the abillity to calculate asts to the LspContext then I would be able to write everything in a different crate.
I could change the purpose of this PR to making the LspContext trait more generic and with access to calculating asts and then publish a new version to crates.io and work on the bazel lsp in a separate repo.

What do you think?

@ndmitchell
Copy link
Contributor

Currently the Starlark AST is (deliberately!) not exported. The reason was that certain aspects of the AST (e.g. where we box etc) are driven by somewhat ad-hoc concerns, and thus it isn't a good stable API. Would there be a subset of operations/queries on the AST that would be sufficient?

I kind of suspect that exporting the entire AST sooner or later is going to be necessary, so if this is the change that makes it necessary, that's not a problem either.

@stepancheg
Copy link
Contributor

I would need to generate a definition location for "//package1:my_library" but for that I would need the ast for the corresponding BUILD file in package1

I forgot about resolving target names.

Then the resolver script should accept a list of string literals. It becomes complicated.

Maybe the quickest solution here would be to depend on this crate from a different repo

That was the idea.

I could change the purpose of this PR to making the LspContext trait more generic and with access to calculating ast

Yes, simpler API with only as little dependencies on starlark internals would be better.

@DanielEliad
Copy link
Author

So in terms of the API I would need from the ast, I think the minimum now is:
find_function_call_with_name given a file_name to calculate the ast on and the symbol name

as the trait LspContext stands today the implementation depends too much on the Context struct being able to parse starlark files into ast and then do some analysis on them.

So what I'll do is convert this pr into making LspContext more generic and depend lesson the ast and make it as independent as I can from Context, and exposing as little as I can from the ast to the LspContext.

I'll keep working on it and get back to you with the refactor soon :)

@ndmitchell
Copy link
Contributor

So getting back to this, and in conjunction with #68, there are three broad pieces:

  • Add facilities to Starlark analysis - e.g. observe what are the local variables at a location, find out method vs variable in the exported symbols. All seems sensible and just moving code you have in other places.
  • Extend the LspContext to be able to render load statements, and also have it instead of goto definition saying the precise location, it should give the file and a further hint that gets resolved after it is loaded (I think?). It might be best to propose these APIs and the docs in comments if you want, to iterate before turning them into code.
  • Split out a bazel-lsp project that is tightly coupled to starlark-rust, but separate and standalone.

Let me know if anything seems missing.

@ndmitchell
Copy link
Contributor

@MaartenStaa - does your PR's cover this? Anything remaining here we should seek to be merging?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants