Skip to content

Per-root salsa inputs for files and packages#1221

Merged
lionel- merged 16 commits into
mainfrom
oak/salsa-8-files-interning
May 21, 2026
Merged

Per-root salsa inputs for files and packages#1221
lionel- merged 16 commits into
mainfrom
oak/salsa-8-files-interning

Conversation

@lionel-
Copy link
Copy Markdown
Contributor

@lionel- lionel- commented May 19, 2026

Branched from #1220
Progress towards #1212
Follow-up to #1214
Alternative to #1183 (closes #1183)

#1214 introduced a Source Graph data structure inspired by Rust-Analyzer's Crate Graph, that bundled workspace packages, installed packages, and standalone scripts into one #[salsa::input].

While working with it, I realised this data structure missed the mark a bit:

  • It's not really graph-shaped. The SourceNode enum is used for exactly one level, it's not a recursive data structure.

  • Rust-Analyzer doesn't actually have a concrete crate graph data structure. The crate graph is an implicit concept in the CrateGraphBuilder, which is in charge of setting and updating crate data, which are the main #[salsa::input] in RA (arranged in a flat array all_crates).

The SourceGraph structure has been replaced with a model closer to how R looks up packages:

  • The top-level #[salsa::input] are three sets of roots:

    • WorkspaceRoots, the workspace folders from the LSP
    • LibraryRoots, the folders of installed packages corresponding to .libPaths()
    • OrphanRoot for opened files that don't belong to any workspace roots.
  • Each root has its own scripts: Vec<File> and packages: Vec<Package>, and each Package holds its own files: Vec<File>

  • File gains a package: Option<Package> back-pointer and a derived File::root query to link it back to its root.

  • file_by_url() and package_by_name() become tracked methods on Db that walk root-specific index maps, short-circuiting once the relevant root is found. Each index map is itself a #[salsa::tracked] query keyed by a Root, so a write to one root's contents only invalidates that specific root's map.

That last point in particular seems more elegant than the file maps in ty, which are external to Salsa and need independent updating. Since they are external, Salsa can't see changes to the maps, and invalidation needs to be driven manually with revision-bump notifications. All queries depending on the maps must also depend on the relevant revision. This increases the risk of stale queries that don't get invalidated when the file/root structure has changed. The advantage of this PR's approach is that it's fully Salsa-based and correctness is structural.

The trade-off with our new approach is that the maps are coarsely regenerated when a root structure changes (the set of scripts or packages changes). That said:

  • Only the indexes for the updated root need to be rebuilt, and only when the structure changes (new or deleted scripts, packages, or package files). Editing contents doesn't invalidate indexes.

  • Once rebuilt, invalidation follows the natural precedence between roots: installing a package in a library root that comes later in .libPaths() doesn't invalidate queries already resolved from a higher-precedence root. That's because downstream deps of file_by_url and package_by_name don't depend on short-circuited roots and are not invalidated by structural changes in these roots.

  • Queries depending on package_by_name() do not get invalidated when the set of scripts changes, or when files are added/removed in a package. Worth noting the reverse is not true: adding or removing a package does invalidate file-based queries (e.g. file_by_url()) since packages hold a list of files.

Other changes:

  • Taking advantage of the new infrastructure, SalsaImportsResolver::resolve_source now anchors relative source() paths against the workspace root when the calling file is under one, falling back to the calling file's parent directory otherwise.

  • The concrete OakDatabase moves from the LSP layer to oak_db, in storage.rs. The reasons that originally motivated this change have disappeared from this PR in a previous iteration, but I've kept the move since I think the LSP layer is not the best fit for that concrete database. Other consumers than the LSP might want to construct it. And in the future, it might become rather complex, e.g. aggregate multiple databases when we introduce a type DB. This instantiation and aggregation work shouldn't live in the LSP.

Not part of this PR but planned:

  • "Firewall" queries like File::exports() that shield downstream consumers from content edits that don't change the view projected by the intermediate query.

  • An oak_scan crate that hooks up to LSP document and file system events to populate WorkspaceRoots and OrphanRoot. The LibraryRoots population comes later as well in a two-step process: .libPaths() scan first, then oak_sources generation.

@lionel- lionel- force-pushed the oak/salsa-8-files-interning branch from 72b0681 to 2a650df Compare May 19, 2026 06:35
Copy link
Copy Markdown
Contributor

@DavisVaughan DavisVaughan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions about things that don't feeeel quite right yet in file.rs and inputs.rs

Comment thread crates/aether_url/src/lib.rs Outdated

/// Filesystem path corresponding to this URL.
///
/// Returns `None` for non-`file:` URLs (untitled buffers, custom
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rely on that, though? Or should we try and detect it early?

Like, I file like we should only call this on file:/ schemed URLs

If we start calling it on everything and rely on this behavior, we can't tell the difference between:

  • An untitled buffer, which is fine
  • A failure to convert a real file:/ url back to a PathBuf, which probably means something is wrong

I'd also advocate for keeping the Result output type, even with the odd () Error, because I do think the point of this is trying to signal an error path

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Now an anyhow::Result() and we either return early with None or log_err() at call sites.

Comment on lines 23 to 24
use crate::lsp::inputs::source_root::SourceRoot;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have extremely strong feelings about this move, except to reflect that:

  • This is not how ty does it
  • It feels like it is possible that OakDatabase could one day have a struct field that is LSP specific, and in that case the concrete impl should live here

Like, I would actually call this LspDatabase which just so happens to fulfill the oak_db::Db contract

Copy link
Copy Markdown
Contributor Author

@lionel- lionel- May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think LspDatabase signals the wrong layering. LSP specific fields should live elsewhere.

  • ty's concrete database in ty_project sits in-between the LSP and IDE layers.
  • In R-A, the concrete database in ide-db sits below the IDE layer.

Both of these are good, as long as the concrete DB is isolated from the LSP layer. This allows a separate frontend mechanism to hook onto the IDE and storage layers without reimplementing the latter.

Initially I had the DB in oak_storage. I moved it back as a module because it's so simple right now, but we can extract it later on when it becomes more complex.

Comment thread Cargo.toml Outdated
parking_lot = "0.12.5"
paste = "1.0.15"
quote = "1.0.45"
query-group-macro = { package = "ra_ap_query-group-macro", version = "0.0.331" }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the description of this crate is

A macro that mimics the old Salsa-style #[query_group] macro

i.e. its an old salsa thing that they no longer do?

do we really want to opt in to an old style feature?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep wiki tells me that query_group has been indeed removed from Salsa and has "largely been replaced" by both tracked functions and #[salsa::jar]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used it to solve compilation issues with method-call syntax on both abstract and concrete dbs. But trying again, I could get an alternative that doesn't quite look as bad as what I had before. Dependency removed.

Comment thread crates/oak_db/src/file.rs
pub url: UrlId,
#[returns(ref)]
pub contents: String,
pub package: Option<Package>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something about this doesn't feel quite right / feels impure, but I can't quite express a better alternative

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what irks me about this is that I expected it to be Root here, not Package?

And every File should have a backpointer to its Root? Unconditionally!

This is somewhat confirmed by the fact that package doesn't even seem to be used directly. You just use it to get at root

    /// The root containing this file, if any.
    ///
    /// If the file has a registered [`Package`], dispatches through
    /// `Package.root`. Otherwise falls back to a URL-prefix lookup
    /// against [`WorkspaceRoots`] (orphan files live under a workspace
    /// root or nowhere; library files always have a package).
    ///
    /// Callers that need to distinguish workspace from library roots
    /// inspect `root.kind(db)`.
    #[salsa::tracked]
    pub fn root(self, db: &dyn Db) -> Option<Root> {
        if let Some(pkg) = self.package(db) {
            return Some(pkg.root(db));
        }
        root_by_url(db, self.url(db))
    }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had root backpointer on File in a previous iteration but I removed it because it doesn't feel that useful. A backpointer to package on the other hand is much more useful, e.g. resolving a file's imports.

I don't think one backpointer is more correct over the other. It's more a question of what's more useful.

Comment thread crates/oak_db/src/db.rs

/// Implementation of [`Db::file_by_url`]. Walks the per-root indices.
///
/// Not itself salsa-tracked (its `&UrlId` argument isn't a salsa
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be working towards some kind of Url struct that is salsa tracked? Or is this just the LSP border?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered about that too but I think this would have double usage with File. So I thought we'd try with only File for now.

Comment on lines +21 to +40
pub struct Root {
#[returns(ref)]
pub path: UrlId,
pub kind: RootKind,
/// Top-level R scripts directly under this root. Each entry is a
/// `File` with `package(db) == None`. Always empty for `Library`
/// roots.
#[returns(ref)]
pub scripts: Vec<File>,
/// Packages discovered under this root (workspace packages for
/// `Workspace`, installed packages for `Library`).
#[returns(ref)]
pub packages: Vec<Package>,
}

#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)]
pub enum RootKind {
Workspace,
Library,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am somewhat confident I would prefer

struct WorkspaceRoot {
    #[returns(ref)]
    pub path: UrlId,
    /// Top-level R scripts directly under this root. Each entry is a
    /// `File` with `package(db) == None`.
    #[returns(ref)]
    pub scripts: Vec<File>,
    /// Packages discovered under this root
    #[returns(ref)]
    pub packages: Vec<Package>,
}

struct LibraryRoot {
    #[returns(ref)]
    pub path: UrlId,
    /// Packages discovered under this root
    #[returns(ref)]
    pub packages: Vec<Package>,
}

Possibly even with WorkspacePackage since it has different fields from Package

I'm not entirely sure how important the abstraction over Root is going to be, but the specificity of WorkspaceRoot vs LibraryRoot feels quite nice to me, especially because it reduces the number of fields down to exactly what is required by each one.

At the very least, maybe its what I have described above wrapped in

enum Root {
    Workspace(WorkspaceRoot),
    Library(LibraryRoot)
}

Copy link
Copy Markdown
Contributor Author

@lionel- lionel- May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it set up exactly like this in an earlier iteration, then went back on this because packages also contain scripts (e.g. build scripts). But now that I think of it, Package itself could gain a scripts field, rather than relying on the flat root's field? The unit of containment makes sense since scripts in different packages can't source each other.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logged at #1212 (comment)

Comment thread crates/oak_db/src/tests/test_db.rs Outdated
//!
//! For integration-style tests that need orphan placement, multi-root
//! lookups, or `set_file` / `set_package` upsert semantics, use
//! `oak_storage::OakDatabase` in `oak_storage/tests/`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outdated? Or upcoming? oak_storage is not a thing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both! It's both a crate that used to exist and that could exist again in the future.

@lionel- lionel- force-pushed the oak/salsa-6-url-id branch from 4a20f93 to 082a9ee Compare May 21, 2026 12:37
Base automatically changed from oak/salsa-6-url-id to main May 21, 2026 12:50
@lionel- lionel- force-pushed the oak/salsa-8-files-interning branch from f536951 to b8fa07f Compare May 21, 2026 12:57
@lionel- lionel- merged commit 131ab7c into main May 21, 2026
17 checks passed
@lionel- lionel- deleted the oak/salsa-8-files-interning branch May 21, 2026 13:10
@github-actions github-actions Bot locked and limited conversation to collaborators May 21, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Moving towards a source graph

2 participants