Skip to content

Add SourceGraph to Oak database#1214

Merged
lionel- merged 9 commits into
mainfrom
oak/salsa-2-source-graph
May 15, 2026
Merged

Add SourceGraph to Oak database#1214
lionel- merged 9 commits into
mainfrom
oak/salsa-2-source-graph

Conversation

@lionel-
Copy link
Copy Markdown
Contributor

@lionel- lionel- commented May 13, 2026

Branched from #1213

Progress towards #1212
Progress towards #1183

Just simple data structures for now.

The source graph contains all workspace scripts, workspace packages, and installed packages. The edges are encoded differently for Script and Package nodes. In scripts they are populated from library() or :: occurrences, stored in the file's semantic index. For packages, the package's metadata in the namespace field defines edges.

Comment thread crates/oak_db/src/source_graph.rs Outdated
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
pub enum PackageOrigin {
Workspace { root: PathBuf },
Installed { version: String, libpath: PathBuf },
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that version is required to make a unique identifier in case the package is reinstalled?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I don't think so, we'll need to depend on all package metadata and R files.

Let me delete both version and libpath here, we don't use either of those in future PRs.

Comment thread crates/ark/src/lsp/state.rs Outdated
pub fn new() -> Self {
Self::default()
let db = Self::default();
oak_db::SourceGraph::new(&db, vec![], vec![], vec![]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an example of this odd pattern, see ty's Program::from_settings() in ProjectDatabase::new(), where Program is a singleton as well

Comment thread crates/ark/src/lsp/state.rs Outdated
let db = Self::default();
oak_db::SourceGraph::new(&db, vec![], vec![], vec![]);
db
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that ty sets .durability(Durability::HIGH) on their Program singleton, but that's presumably because it is rare that it ever changes

Our SourceGraph on the other hand is likely to change quite a bit, I think

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a whole yes. Down the line we'll want to set high durability on installed packages, and low durability on scripts and workspace packages.

Comment thread crates/oak_db/src/db.rs Outdated
Comment on lines +9 to +11
fn source_graph(&self) -> SourceGraph {
SourceGraph::get(self)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be splitting hairs a bit, but, just to confirm, you want to inflict a SourceGraph on all salsa Databases we might ever want to create?

Like, IIUC, ty has an inheritance-like approach where they have this base db

/// Most basic database that gives access to files, the host system, source code, and parsed AST.
pub trait Db: salsa::Database {
    fn vendored(&self) -> &VendoredFileSystem;
    fn system(&self) -> &dyn System;
    fn files(&self) -> &Files;
    fn python_version(&self) -> PythonVersion;
}

they reference this as SourceDb throughout the project, i.e. all databases should be able to provide file sources and ASTs

then further up they have something they refer to as a SemanticDb, which inherits from SemanticDb -> ModuleResolverDb -> SourceDb, but is noted as

/// Database giving access to semantic information about a Python program.

it looks like this with these extra methods

// SemanticDb
pub trait Db: ModuleResolverDb {
    /// Returns `true` if the file should be checked.
    fn should_check_file(&self, file: File) -> bool;

    /// Resolves the rule selection for a given file.
    fn rule_selection(&self, file: File) -> &RuleSelection;

    fn lint_registry(&self) -> &LintRegistry;

    fn analysis_settings(&self, file: File) -> &AnalysisSettings;

    /// Whether ty is running with logging verbosity INFO or higher (`-v` or more).
    fn verbose(&self) -> bool;
}

// ModuleResolverDb
pub trait Db: SourceDb {
    /// Returns the search paths for module resolution.
    fn search_paths(&self) -> &SearchPaths;
}

And, for example, this search_paths() method is where they use their singleton Program::get()

#[salsa::db]
impl ty_module_resolver::Db for ProjectDatabase {
    fn search_paths(&self) -> &SearchPaths {
        Program::get(self).search_paths(self)
    }
}

So I just wondered if we should at least have this base Db with no trait methods for now (it will probably have files() eventually) and then a SemanticDb that has source_graph()

Copy link
Copy Markdown
Contributor Author

@lionel- lionel- May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I see the need for thinking about abstraction now?

I also don't think we "inflict" anything with the source graph. Are you concerned about more verbose tests? The other side of the coin is that it makes them more realistic.

For context, I can see the semantic index being useful on its own, independent of oak_db. So we should be careful to keep it that way (e.g. the ImportResolver coming up is generic) But as soon as cross-file resolution is needed, we need the full DB and its graph.

Comment thread crates/oak_db/src/name.rs Outdated
#[salsa::interned]
pub struct Name<'db> {
#[returns(ref)]
pub text: String,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be interesting to consider CompactString used by ty at some point

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, that's a Salsa feature. I've just switched.

Comment thread crates/oak_db/src/source_graph.rs
Comment thread crates/oak_db/src/db.rs Outdated
@lionel- lionel- force-pushed the oak/salsa-2-file branch from ad8bcd2 to 800d17c Compare May 14, 2026 08:48
@lionel- lionel- force-pushed the oak/salsa-2-source-graph branch from f9880bc to e797f3e Compare May 14, 2026 08:50
@lionel- lionel- force-pushed the oak/salsa-2-file branch from 800d17c to 95ae7c8 Compare May 14, 2026 12:02
@lionel- lionel- force-pushed the oak/salsa-2-source-graph branch from e797f3e to eedc2cf Compare May 14, 2026 13:21
@lionel- lionel- force-pushed the oak/salsa-2-file branch from 9e7b239 to 3ad7300 Compare May 15, 2026 10:42
Base automatically changed from oak/salsa-2-file to main May 15, 2026 11:54
@lionel- lionel- force-pushed the oak/salsa-2-source-graph branch from eedc2cf to 78e798c Compare May 15, 2026 12:01
@lionel- lionel- merged commit 33d894a into main May 15, 2026
17 checks passed
@lionel- lionel- deleted the oak/salsa-2-source-graph branch May 15, 2026 13:13
@github-actions github-actions Bot locked and limited conversation to collaborators May 15, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants