Per-root salsa inputs for files and packages#1221
Conversation
72b0681 to
2a650df
Compare
DavisVaughan
left a comment
There was a problem hiding this comment.
A few questions about things that don't feeeel quite right yet in file.rs and inputs.rs
|
|
||
| /// Filesystem path corresponding to this URL. | ||
| /// | ||
| /// Returns `None` for non-`file:` URLs (untitled buffers, custom |
There was a problem hiding this comment.
Should we rely on that, though? Or should we try and detect it early?
Like, I file like we should only call this on file:/ schemed URLs
If we start calling it on everything and rely on this behavior, we can't tell the difference between:
- An untitled buffer, which is fine
- A failure to convert a real
file:/url back to aPathBuf, which probably means something is wrong
I'd also advocate for keeping the Result output type, even with the odd () Error, because I do think the point of this is trying to signal an error path
There was a problem hiding this comment.
Good point! Now an anyhow::Result() and we either return early with None or log_err() at call sites.
| use crate::lsp::inputs::source_root::SourceRoot; | ||
|
|
There was a problem hiding this comment.
I do not have extremely strong feelings about this move, except to reflect that:
- This is not how ty does it
- It feels like it is possible that
OakDatabasecould one day have a struct field that is LSP specific, and in that case the concrete impl should live here
Like, I would actually call this LspDatabase which just so happens to fulfill the oak_db::Db contract
There was a problem hiding this comment.
I think LspDatabase signals the wrong layering. LSP specific fields should live elsewhere.
- ty's concrete database in
ty_projectsits in-between the LSP and IDE layers. - In R-A, the concrete database in
ide-dbsits below the IDE layer.
Both of these are good, as long as the concrete DB is isolated from the LSP layer. This allows a separate frontend mechanism to hook onto the IDE and storage layers without reimplementing the latter.
Initially I had the DB in oak_storage. I moved it back as a module because it's so simple right now, but we can extract it later on when it becomes more complex.
| parking_lot = "0.12.5" | ||
| paste = "1.0.15" | ||
| quote = "1.0.45" | ||
| query-group-macro = { package = "ra_ap_query-group-macro", version = "0.0.331" } |
There was a problem hiding this comment.
FWIW the description of this crate is
A macro that mimics the old Salsa-style #[query_group] macro
i.e. its an old salsa thing that they no longer do?
do we really want to opt in to an old style feature?
There was a problem hiding this comment.
Deep wiki tells me that query_group has been indeed removed from Salsa and has "largely been replaced" by both tracked functions and #[salsa::jar]
There was a problem hiding this comment.
I used it to solve compilation issues with method-call syntax on both abstract and concrete dbs. But trying again, I could get an alternative that doesn't quite look as bad as what I had before. Dependency removed.
| pub url: UrlId, | ||
| #[returns(ref)] | ||
| pub contents: String, | ||
| pub package: Option<Package>, |
There was a problem hiding this comment.
Something about this doesn't feel quite right / feels impure, but I can't quite express a better alternative
There was a problem hiding this comment.
I think what irks me about this is that I expected it to be Root here, not Package?
And every File should have a backpointer to its Root? Unconditionally!
This is somewhat confirmed by the fact that package doesn't even seem to be used directly. You just use it to get at root
/// The root containing this file, if any.
///
/// If the file has a registered [`Package`], dispatches through
/// `Package.root`. Otherwise falls back to a URL-prefix lookup
/// against [`WorkspaceRoots`] (orphan files live under a workspace
/// root or nowhere; library files always have a package).
///
/// Callers that need to distinguish workspace from library roots
/// inspect `root.kind(db)`.
#[salsa::tracked]
pub fn root(self, db: &dyn Db) -> Option<Root> {
if let Some(pkg) = self.package(db) {
return Some(pkg.root(db));
}
root_by_url(db, self.url(db))
}
There was a problem hiding this comment.
I had root backpointer on File in a previous iteration but I removed it because it doesn't feel that useful. A backpointer to package on the other hand is much more useful, e.g. resolving a file's imports.
I don't think one backpointer is more correct over the other. It's more a question of what's more useful.
|
|
||
| /// Implementation of [`Db::file_by_url`]. Walks the per-root indices. | ||
| /// | ||
| /// Not itself salsa-tracked (its `&UrlId` argument isn't a salsa |
There was a problem hiding this comment.
Should we be working towards some kind of Url struct that is salsa tracked? Or is this just the LSP border?
There was a problem hiding this comment.
I wondered about that too but I think this would have double usage with File. So I thought we'd try with only File for now.
| pub struct Root { | ||
| #[returns(ref)] | ||
| pub path: UrlId, | ||
| pub kind: RootKind, | ||
| /// Top-level R scripts directly under this root. Each entry is a | ||
| /// `File` with `package(db) == None`. Always empty for `Library` | ||
| /// roots. | ||
| #[returns(ref)] | ||
| pub scripts: Vec<File>, | ||
| /// Packages discovered under this root (workspace packages for | ||
| /// `Workspace`, installed packages for `Library`). | ||
| #[returns(ref)] | ||
| pub packages: Vec<Package>, | ||
| } | ||
|
|
||
| #[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)] | ||
| pub enum RootKind { | ||
| Workspace, | ||
| Library, | ||
| } |
There was a problem hiding this comment.
I am somewhat confident I would prefer
struct WorkspaceRoot {
#[returns(ref)]
pub path: UrlId,
/// Top-level R scripts directly under this root. Each entry is a
/// `File` with `package(db) == None`.
#[returns(ref)]
pub scripts: Vec<File>,
/// Packages discovered under this root
#[returns(ref)]
pub packages: Vec<Package>,
}
struct LibraryRoot {
#[returns(ref)]
pub path: UrlId,
/// Packages discovered under this root
#[returns(ref)]
pub packages: Vec<Package>,
}Possibly even with WorkspacePackage since it has different fields from Package
I'm not entirely sure how important the abstraction over Root is going to be, but the specificity of WorkspaceRoot vs LibraryRoot feels quite nice to me, especially because it reduces the number of fields down to exactly what is required by each one.
At the very least, maybe its what I have described above wrapped in
enum Root {
Workspace(WorkspaceRoot),
Library(LibraryRoot)
}There was a problem hiding this comment.
I had it set up exactly like this in an earlier iteration, then went back on this because packages also contain scripts (e.g. build scripts). But now that I think of it, Package itself could gain a scripts field, rather than relying on the flat root's field? The unit of containment makes sense since scripts in different packages can't source each other.
| //! | ||
| //! For integration-style tests that need orphan placement, multi-root | ||
| //! lookups, or `set_file` / `set_package` upsert semantics, use | ||
| //! `oak_storage::OakDatabase` in `oak_storage/tests/`. |
There was a problem hiding this comment.
Outdated? Or upcoming? oak_storage is not a thing
There was a problem hiding this comment.
Both! It's both a crate that used to exist and that could exist again in the future.
4a20f93 to
082a9ee
Compare
To avoid dependency on Rust 1.95
f536951 to
b8fa07f
Compare
Branched from #1220
Progress towards #1212
Follow-up to #1214
Alternative to #1183 (closes #1183)
#1214 introduced a Source Graph data structure inspired by Rust-Analyzer's Crate Graph, that bundled workspace packages, installed packages, and standalone scripts into one
#[salsa::input].While working with it, I realised this data structure missed the mark a bit:
It's not really graph-shaped. The
SourceNodeenum is used for exactly one level, it's not a recursive data structure.Rust-Analyzer doesn't actually have a concrete crate graph data structure. The crate graph is an implicit concept in the
CrateGraphBuilder, which is in charge of setting and updating crate data, which are the main#[salsa::input]in RA (arranged in a flat arrayall_crates).The
SourceGraphstructure has been replaced with a model closer to how R looks up packages:The top-level
#[salsa::input]are three sets of roots:WorkspaceRoots, the workspace folders from the LSPLibraryRoots, the folders of installed packages corresponding to.libPaths()OrphanRootfor opened files that don't belong to any workspace roots.Each root has its own
scripts: Vec<File>andpackages: Vec<Package>, and eachPackageholds its ownfiles: Vec<File>Filegains apackage: Option<Package>back-pointer and a derivedFile::rootquery to link it back to its root.file_by_url()andpackage_by_name()become tracked methods onDbthat walk root-specific index maps, short-circuiting once the relevant root is found. Each index map is itself a#[salsa::tracked]query keyed by aRoot, so a write to one root's contents only invalidates that specific root's map.That last point in particular seems more elegant than the file maps in ty, which are external to Salsa and need independent updating. Since they are external, Salsa can't see changes to the maps, and invalidation needs to be driven manually with revision-bump notifications. All queries depending on the maps must also depend on the relevant revision. This increases the risk of stale queries that don't get invalidated when the file/root structure has changed. The advantage of this PR's approach is that it's fully Salsa-based and correctness is structural.
The trade-off with our new approach is that the maps are coarsely regenerated when a root structure changes (the set of scripts or packages changes). That said:
Only the indexes for the updated root need to be rebuilt, and only when the structure changes (new or deleted scripts, packages, or package files). Editing contents doesn't invalidate indexes.
Once rebuilt, invalidation follows the natural precedence between roots: installing a package in a library root that comes later in
.libPaths()doesn't invalidate queries already resolved from a higher-precedence root. That's because downstream deps offile_by_urlandpackage_by_namedon't depend on short-circuited roots and are not invalidated by structural changes in these roots.Queries depending on
package_by_name()do not get invalidated when the set of scripts changes, or when files are added/removed in a package. Worth noting the reverse is not true: adding or removing a package does invalidate file-based queries (e.g.file_by_url()) since packages hold a list of files.Other changes:
Taking advantage of the new infrastructure,
SalsaImportsResolver::resolve_sourcenow anchors relativesource()paths against the workspace root when the calling file is under one, falling back to the calling file's parent directory otherwise.The concrete
OakDatabasemoves from the LSP layer to oak_db, instorage.rs. The reasons that originally motivated this change have disappeared from this PR in a previous iteration, but I've kept the move since I think the LSP layer is not the best fit for that concrete database. Other consumers than the LSP might want to construct it. And in the future, it might become rather complex, e.g. aggregate multiple databases when we introduce a type DB. This instantiation and aggregation work shouldn't live in the LSP.Not part of this PR but planned:
"Firewall" queries like
File::exports()that shield downstream consumers from content edits that don't change the view projected by the intermediate query.An
oak_scancrate that hooks up to LSP document and file system events to populateWorkspaceRootsandOrphanRoot. TheLibraryRootspopulation comes later as well in a two-step process:.libPaths()scan first, thenoak_sourcesgeneration.