Skip to content

Integrate PackageCache with goto definition#1172

Open
DavisVaughan wants to merge 6 commits intooak/base-package-cachefrom
oak/package-cache-in-library
Open

Integrate PackageCache with goto definition#1172
DavisVaughan wants to merge 6 commits intooak/base-package-cachefrom
oak/package-cache-in-library

Conversation

@DavisVaughan
Copy link
Copy Markdown
Contributor

@DavisVaughan DavisVaughan commented Apr 27, 2026

Branched from #1166

Each commit is self contained, with the final one being the brunt of the work. I can make the first few commits separate PRs if you prefer.


Some initial setup changes:

  • Threaded r_home: PathBuf through to both the LSP and Console by calling r_home_setup() way earlier
  • oak_index_vec is now its own crate, spun out from oak_index. It just holds IndexVec, a useful concept outside of the SemanticIndex (this should probably just be oak_index after we rename the existing oak_index to oak_semantic)
  • oak_layers is now its own crate, spun out from oak_index. This allows oak_index to be solely focused on the SemanticIndex, since the "layers" concepts need access to the heavier PackageCache dependency

Then 4b8b21b is the major feature commit.

New oak_package_definitions crate, which holds a struct LibraryDefinitions. This is like Library, but instead of holding metadata about a package, it holds the top level definitions exposed by the package, with file + range fields for jumping to a package definition. We make PackageDefinitions for a package by doing the following for each package file: reading it, parsing it, creating a semantic index, categorizing the file_exports(). Then, for now, we just throw away the semantic index for that file.

The WorldState now holds a LibraryDefinitions alongside the preexisting Library. I'm not sure if this is right setup longterm, or if these should be one object, but for now it allows Library to be lightweight.

goto_definition() then gets LibraryDefinitions and can use them to determine definition locations for external packages.


Some questions / challenges along the way:

  • Library should probably be renamed LibraryMetadata since that's all it is really about, and we are likely going to have things like LibraryDefinitions and eventually LibraryHelp that all use different caches / dependencies

  • Currently the PackageCache caches DESCRIPTION, NAMESPACE, and INDEX alongside the R/ sources, which felt right at the time but now I am not so sure! I am starting to think that instead PackageCache should only have the R/ sources, and we should treat the .libPaths() as our source of truth for those other files, accessed via Library, rather than having them in two places.

    • I think originally my thought was that we could recreate a "source package like" structure in the cache (with R/, DESCRIPTION, and NAMESPACE all present), but I no longer think we need to do that, and should instead just keep 1 source of truth for the metadata files.
  • I had originally wanted LibraryDefinitions to really be something like LibrarySemanticIndexes where it would hold a package worth of SemanticIndex objects for reuse across LSP features. But this is not easy! SemanticIndex is not Send + Sync due to DefinitionKind holding RSyntaxNode, so that means we can't put a LibrarySemanticIndexes in the WorldState, since it needs to be snapshotted and shuffled between threads. I think we are going to need to resolve this eventually for Salsa purposes, but for now I just cache the PackageDefinition objects extracted from the semantic index instead, which are send+sync.

@DavisVaughan DavisVaughan changed the base branch from main to oak/base-package-cache April 27, 2026 21:19
@DavisVaughan DavisVaughan force-pushed the oak/package-cache-in-library branch from 2b60581 to f01caad Compare April 28, 2026 13:04
- Stored in `WorldState`
- Used in `goto_definition()`
- Split out `oak_layers/` from `oak_index/`  so that `oak_index` is purely about the `SemanticIndex`, because `oak_layers` needs the heavier package cache utilities
@DavisVaughan DavisVaughan force-pushed the oak/package-cache-in-library branch from cae49f2 to 4b8b21b Compare April 28, 2026 15:50
@DavisVaughan DavisVaughan marked this pull request as ready for review April 28, 2026 15:52
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking this whole file should move out of harp into something like aether_r_process eventually

Comment on lines +69 to +71
/// Map of package name to package definitions for installed libraries.
/// Only usable when a package cache is available.
pub(crate) library_definitions: Option<LibraryDefinitions>,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WorldState now gets library_definitions if we have a cache available

Comment thread crates/ark/src/console.rs
}

pub(crate) struct Console {
r_home: PathBuf,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused, but I think will be nice for the DAP + PackageCache ideas

Comment on lines +10 to +16
/// The result of resolving a name against the external scope chain.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct ExternalDefinition {
file: Url,
name: String,
range: TextRange,
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to be an enum that knew if the definition came from a project file vs a package, but now that I'm actually filling out the package side of things, I think they really boil down to the same structure, so I've flattened it

Comment on lines +87 to +111
pub fn resolve_in_package(
name: &str,
package: &str,
visibility: PackageDefinitionVisibility,
library: &Library,
library_definitions: Option<&LibraryDefinitions>,
) -> Option<ExternalDefinition> {
let library_definitions = library_definitions?;
let package = library.get(package)?;
let package_definitions = library_definitions.get(&package)?;
let definition = package_definitions.get(name)?;

if visibility != definition.visibility() {
return None;
}

let file = package_definitions.file(definition.file_id());
let file = Url::from_file_path(file).log_err()?;

Some(ExternalDefinition {
file,
name: name.to_string(),
range: definition.range(),
})
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real crux of goto-definition for a package export

pub struct Namespace {
/// Names of objects exported with `export()`
pub exports: Vec<String>,
pub exports: SortedVec<String>,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we build PackageDefinitions, we categorize top level definitions as Internal vs Exported based on whether or not they exist as exports in the Namespace (this helps with :: vs ::: goto definition resolution).

Because we search exports for every top level definition in the package, it seemed useful to make this a sorted vec we can binary search in

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left imports and package_imports alone for the moment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core file for "given a R/ directory, collect all top level definitions"

Comment on lines +13 to +15
/// Cache used for looking up package sources. `dyn` to allow easy swapping to
/// `TestPackageCache` in test files.
cache: Arc<dyn PackageCache>,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have both struct PackageCache and struct TestPackageCache that can be slotted in here, where the latter is very nice for testing in goto-definition.rs and externals.rs, since PackageCache itself requires a live R session and internet access.

We do this with a small trait PackageCache that requires a simple get() method.

It's dyn rather than using a generic like LibraryDefinitions<C: PackageCache> because that "leaks" the generic cache details way up the chain of callers (all the way up through goto-definition, it got gross).

I had extensive conversations with Claude about this design and we both decided this was a nice way to handle this kind of thing, and that Zed also does this kind of thing as well (with a "fake" Filesystem for similar reasons).

The vtable hit required by using dyn should be negligible vs the overhead of the cache itself

@DavisVaughan DavisVaughan changed the title WIP: Integrate PackageCache with goto definition Integrate PackageCache with goto definition Apr 28, 2026
@DavisVaughan DavisVaughan requested a review from lionel- April 28, 2026 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant