Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ reqwest-retry = "0.9.1"
rust-embed = "8.11.0"
rustc-hash = "2.1.2"
salsa = { version = "0.26.2", features = ["compact_str"] }
salsa-macros = "0.26.2"
scraper = "0.26.0"
semver = "1.0.28"
serde = { version = "1.0.228", features = ["derive"] }
Expand Down
20 changes: 20 additions & 0 deletions crates/aether_url/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,26 @@ impl UrlId {
pub fn as_url(&self) -> &Url {
&self.0
}

/// Whether this URL points at a filesystem file (`file:` scheme).
/// Returns `false` for virtual documents like `untitled:` (unsaved
/// buffers) and `ark:` (synthesized sources from R). Callers that
/// might receive virtual URLs should gate on this before reaching for
/// [`Self::to_file_path`].
pub fn is_file(&self) -> bool {
self.0.scheme() == "file"
}

/// Filesystem path corresponding to this URL.
///
/// Errors for non-`file:` URLs (untitled buffers, custom schemes) and
/// for `file:` URLs whose path can't be reconstructed (rare). Callers
/// that handle virtual documents should check [`Self::is_file`] first.
pub fn to_file_path(&self) -> anyhow::Result<std::path::PathBuf> {
self.0
.to_file_path()
.map_err(|()| anyhow::anyhow!("URL has no filesystem path: {}", self.0))
}
}

impl fmt::Display for UrlId {
Expand Down
27 changes: 0 additions & 27 deletions crates/ark/src/lsp/state.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,11 @@ use std::collections::HashMap;
use std::collections::HashSet;
use std::path::Path;
use std::path::PathBuf;
use std::sync::Arc;
use std::sync::OnceLock;

use anyhow::anyhow;
use oak_core::file::list_r_files;
use oak_db::semantic_index_with_source_resolver;
use oak_db::LegacyDb;
use oak_db::SourceGraph;
use oak_ide::ExternalScope;
use oak_semantic::library::Library;
use oak_semantic::scope_layer::default_search_path;
Expand All @@ -25,30 +22,6 @@ use crate::lsp::config::LspConfig;
use crate::lsp::document::Document;
use crate::lsp::inputs::source_root::SourceRoot;

Comment on lines 23 to 24
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have extremely strong feelings about this move, except to reflect that:

  • This is not how ty does it
  • It feels like it is possible that OakDatabase could one day have a struct field that is LSP specific, and in that case the concrete impl should live here

Like, I would actually call this LspDatabase which just so happens to fulfill the oak_db::Db contract

Copy link
Copy Markdown
Contributor Author

@lionel- lionel- May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think LspDatabase signals the wrong layering. LSP specific fields should live elsewhere.

  • ty's concrete database in ty_project sits in-between the LSP and IDE layers.
  • In R-A, the concrete database in ide-db sits below the IDE layer.

Both of these are good, as long as the concrete DB is isolated from the LSP layer. This allows a separate frontend mechanism to hook onto the IDE and storage layers without reimplementing the latter.

Initially I had the DB in oak_storage. I moved it back as a module because it's so simple right now, but we can extract it later on when it becomes more complex.

/// Concrete Salsa database that owns the storage for all LSP inputs.
#[salsa::db]
#[derive(Clone, Default)]
pub struct OakDatabase {
storage: salsa::Storage<Self>,
source_graph: Arc<OnceLock<SourceGraph>>,
}

impl OakDatabase {
pub fn new() -> Self {
Self::default()
}
}

#[salsa::db]
impl salsa::Database for OakDatabase {}

#[salsa::db]
impl oak_db::Db for OakDatabase {
fn source_graph(&self) -> SourceGraph {
*self.source_graph.get_or_init(|| SourceGraph::empty(self))
}
}

impl LegacyDb for WorldState {
fn semantic_index(&self, file: &Url) -> Option<SemanticIndex> {
let doc = self.workspace_document(file)?;
Expand Down
3 changes: 3 additions & 0 deletions crates/oak_db/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,10 @@ compact_str.workspace = true
log.workspace = true
oak_package_metadata.workspace = true
oak_semantic = { workspace = true, features = ["salsa"] }
rustc-hash.workspace = true
salsa.workspace = true
salsa-macros.workspace = true
stdext.workspace = true
url.workspace = true

[dev-dependencies]
Expand Down
137 changes: 127 additions & 10 deletions crates/oak_db/src/db.rs
Original file line number Diff line number Diff line change
@@ -1,14 +1,131 @@
use crate::SourceGraph;
use aether_url::UrlId;
use rustc_hash::FxHashMap;

/// Salsa Database trait.
use crate::File;
use crate::LibraryRoots;
use crate::OrphanRoot;
use crate::Package;
use crate::Root;
use crate::WorkspaceRoots;

/// Concrete-input surface of the salsa database. Each impl
/// ([`crate::OakDatabase`], the test db) supplies the three singleton input
/// handles.
///
/// Kept separate from [`Db`] (the query trait) so input accessors and derived
/// queries live on different traits. Mirrors rust-analyzer's `SourceDatabase`
/// / `DefDatabase` split: input plumbing on the base trait, derived queries
/// on the query trait.
#[salsa::db]
pub trait DbInputs: salsa::Database {
/// Workspace folders opened by the editor.
fn workspace_roots(&self) -> WorkspaceRoots;

/// R library roots (entries in `.libPaths()`).
fn library_roots(&self) -> LibraryRoots;

/// Files not yet anchored to any workspace or library root.
fn orphan_root(&self) -> OrphanRoot;
}

/// Salsa database trait used throughout `oak_db`. Tracked queries take `&dyn
/// Db`, so query code never names the concrete db type.
///
/// Methods aren't memoized at this level: they delegate to free helpers
/// (`file_by_url_query` etc.) that walk per-root indices which *are* memoized,
/// so salsa records dep edges through those.
///
/// Queries take a `dyn Db` rather than the concrete database owned by
/// the LSP layer.
/// Each concrete db type provides its own forwarding `impl Db`, which is
/// what lets `db.file_by_url(url)` work on both `&dyn Db` (via the trait
/// method) and concrete db references (via the type's impl).
#[salsa::db]
pub trait Db: salsa::Database {
/// The workspace's source graph. Each concrete database holds one
/// salsa input handle for the lifetime of the database. The
/// recommended implementation lazily allocates it on first access
/// via `Arc<OnceLock<SourceGraph>>`.
fn source_graph(&self) -> SourceGraph;
pub trait Db: DbInputs {
/// Look up the `File` interned at `url`, if any.
///
/// Walks the per-root URL indices in workspace-then-library order,
/// then falls back to the orphan bucket. The walk short-circuits
/// on the first hit, so callers depend only on the index maps
/// actually visited.
fn file_by_url(&self, url: &UrlId) -> Option<File>;

/// Look up the `Package` named `name`, applying the following precedence:
/// - Workspace packages shadow installed ones
/// - Installed packages in an earlier root shadow later ones
/// (mirroring `.libPaths()`).
fn package_by_name(&self, name: &str) -> Option<Package>;
}

/// Implementation of [`Db::file_by_url`]. Walks the per-root indices.
///
/// Not itself salsa-tracked (its `&UrlId` argument isn't a salsa
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be working towards some kind of Url struct that is salsa tracked? Or is this just the LSP border?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered about that too but I think this would have double usage with File. So I thought we'd try with only File for now.

/// entity), but every step is: each [`root_url_index`] call returns a
/// cached map, so adding a file to one root invalidates only that
/// root's index.
pub fn file_by_url_query(db: &dyn Db, url: &UrlId) -> Option<File> {
for root in db.workspace_roots().roots(db) {
if let Some(&file) = root_url_index(db, *root).get(url) {
return Some(file);
}
}
for root in db.library_roots().roots(db) {
if let Some(&file) = root_url_index(db, *root).get(url) {
return Some(file);
}
}
orphan_url_index(db).get(url).copied()
}

/// Implementation of [`Db::package_by_name`]. Same shape as
/// [`file_by_url_query`].
pub fn package_by_name_query(db: &dyn Db, name: &str) -> Option<Package> {
for root in db.workspace_roots().roots(db) {
if let Some(&pkg) = root_package_index(db, *root).get(name) {
return Some(pkg);
}
}
for root in db.library_roots().roots(db) {
if let Some(&pkg) = root_package_index(db, *root).get(name) {
return Some(pkg);
}
}
None
}

/// Per-root URL -> File index. Salsa caches one map per `Root`;
/// reads only `root.scripts`, `root.packages`, and each
/// `pkg.files` reachable from this root. Adding or removing a file
/// in *this* root invalidates this entry; other roots stay cached.
#[salsa::tracked(returns(ref))]
fn root_url_index(db: &dyn Db, root: Root) -> FxHashMap<UrlId, File> {
let mut map = FxHashMap::default();
for &file in root.scripts(db) {
map.insert(file.url(db).clone(), file);
}
for &pkg in root.packages(db) {
for &file in pkg.files(db) {
map.insert(file.url(db).clone(), file);
}
}
map
}

/// Orphan URL -> File index. Reads only `orphan_root().files`.
#[salsa::tracked(returns(ref))]
fn orphan_url_index(db: &dyn Db) -> FxHashMap<UrlId, File> {
let mut map = FxHashMap::default();
for &file in db.orphan_root().files(db) {
map.insert(file.url(db).clone(), file);
}
map
}

/// Per-root name -> Package index. Same granularity as
/// [`root_url_index`].
#[salsa::tracked(returns(ref))]
fn root_package_index(db: &dyn Db, root: Root) -> FxHashMap<String, Package> {
let mut map = FxHashMap::default();
for &pkg in root.packages(db) {
map.insert(pkg.name(db).clone(), pkg);
}
map
}
65 changes: 65 additions & 0 deletions crates/oak_db/src/file.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,13 @@ use oak_semantic::semantic_index::ScopeId;
use oak_semantic::semantic_index::SemanticIndex;
use oak_semantic::semantic_index::SymbolTable;
use oak_semantic::use_def_map::UseDefMap;
use stdext::result::ResultExt;

use crate::imports::SalsaImportsResolver;
use crate::parse::OakParse;
use crate::Db;
use crate::Package;
use crate::Root;

/// A source file tracked by Salsa.
///
Expand All @@ -18,12 +21,35 @@ use crate::Db;
///
/// The `url` field is a [`UrlId`], so the type system enforces "everything
/// inside Salsa is a canonical URL".
///
/// `package` is a back-pointer to the [`Package`] this file belongs to, or
/// `None` for standalone scripts. Inverse of `Package.files`, so queries
/// answering "what package owns this file?" don't walk the forward edge.
/// Files with `package == None` are either standalone scripts under a
/// workspace root or orphan files not registered anywhere.
///
/// # Placement invariant
///
/// `File.package` and the file's physical location in a `Vec<File>` are
/// expected to agree. A file with `package == Some(pkg)` should live in
/// `pkg.files`. A file with `package == None` should live in either some
/// `root.scripts` or `orphan_root().files`. The salsa setters (`set_url`,
/// `set_contents`, `set_package`) are `pub` because field visibility couples to
/// setter visibility in salsa but calling `set_package` directly leaves the
/// file in its old bucket and silently breaks this invariant.
///
/// The scanner crate (`oak_scan`) wraps these setters in helpers that
/// maintain placement (move the file between `pkg.files`,
/// `root.scripts`, and `orphan_root().files` as `package` changes).
/// Callers that go around the helpers and use the salsa setters
/// directly must maintain placement themselves.
#[salsa::input(debug)]
pub struct File {
#[returns(ref)]
pub url: UrlId,
#[returns(ref)]
pub contents: String,
pub package: Option<Package>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something about this doesn't feel quite right / feels impure, but I can't quite express a better alternative

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what irks me about this is that I expected it to be Root here, not Package?

And every File should have a backpointer to its Root? Unconditionally!

This is somewhat confirmed by the fact that package doesn't even seem to be used directly. You just use it to get at root

    /// The root containing this file, if any.
    ///
    /// If the file has a registered [`Package`], dispatches through
    /// `Package.root`. Otherwise falls back to a URL-prefix lookup
    /// against [`WorkspaceRoots`] (orphan files live under a workspace
    /// root or nowhere; library files always have a package).
    ///
    /// Callers that need to distinguish workspace from library roots
    /// inspect `root.kind(db)`.
    #[salsa::tracked]
    pub fn root(self, db: &dyn Db) -> Option<Root> {
        if let Some(pkg) = self.package(db) {
            return Some(pkg.root(db));
        }
        root_by_url(db, self.url(db))
    }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had root backpointer on File in a previous iteration but I removed it because it doesn't feel that useful. A backpointer to package on the other hand is much more useful, e.g. resolving a file's imports.

I don't think one backpointer is more correct over the other. It's more a question of what's more useful.

}

#[salsa::tracked]
Expand Down Expand Up @@ -85,6 +111,45 @@ impl File {
pub fn use_def_map(self, db: &dyn Db, scope: ScopeId) -> Arc<UseDefMap> {
Arc::clone(self.semantic_index(db).use_def_map(scope))
}

/// The root containing this file, if any.
///
/// If the file has a registered [`Package`], dispatches through
/// `Package.root`. Otherwise falls back to a URL-prefix lookup
/// against [`WorkspaceRoots`] (orphan files live under a workspace
/// root or nowhere; library files always have a package).
///
/// Callers that need to distinguish workspace from library roots
/// inspect `root.kind(db)`.
#[salsa::tracked]
pub fn root(self, db: &dyn Db) -> Option<Root> {
if let Some(pkg) = self.package(db) {
return Some(pkg.root(db));
}
root_by_url(db, self.url(db))
}
}

/// Find the workspace `Root` whose path is the longest-prefix ancestor
/// of `url`. Returns `None` for non-`file:` URLs and for URLs outside
/// every workspace folder. Private helper: the only caller is
/// [`File::root`] (for files without a registered package).
fn root_by_url(db: &dyn Db, url: &UrlId) -> Option<Root> {
// Virtual documents (e.g. untitled scheme) don't have roots
if !url.is_file() {
return None;
}

let path = url.to_file_path().log_err()?;
db.workspace_roots()
.roots(db)
.iter()
.filter_map(|root| {
let root_path = root.path(db).to_file_path().log_err()?;
path.starts_with(&root_path).then_some((root_path, *root))
})
.max_by_key(|(p, _)| p.components().count())
.map(|(_, r)| r)
}

fn build_semantic_index(file: File, db: &dyn Db) -> SemanticIndex {
Expand Down
Loading
Loading